+ - 0:00:00
Notes for current slide
Notes for next slide

Advanced R

Chapter 25: Rewriting R code in C++

Daryn Ramsden

1 / 22

Sad reality: sometimes R won't cut it

When can C++ help the R programmer?

  • Loops that can’t be easily vectorised because subsequent iterations depend on previous ones.

  • Recursive functions, or problems which involve calling functions millions of times. The overhead of calling a function in C++ is much lower than in R.

  • Problems that require advanced data structures and algorithms that R doesn’t provide.

2 / 22

Rcpp: The best way to use C++

library(Rcpp)

Key functions in Rcpp:

  • cppFunction

  • sourceCpp

3 / 22

Detour: If you really want an Rcpp tutorial

Dirk Eddelbuettel's Rcpp tutorial at useR! 2020 (h/t: Pavitra)

4 / 22

Using cppFunction

Just put the whole C++ function in a string and pass it to cppFunction

cppFunction('int add(int x, int y, int z) {
int sum = x + y + z;
return sum;
}')
add(1729, 99, 14)
## [1] 1842
5 / 22

An example R function vs C++ implementation

fibonacci_r <- function(n){
if(n < 2) return(n)
return (fibonacci_r(n-1) + fibonacci_r(n-2))
}
cppFunction("int fibonacci_cpp(int n){
if (n < 2) return n;
return fibonacci_cpp(n-1) + fibonacci_cpp(n-2);
}")
bench::mark(fibonacci_r(10), fibonacci_cpp(10))
## # A tibble: 2 x 6
## expression min median `itr/sec` mem_alloc `gc/sec`
## <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
## 1 fibonacci_r(10) 60.36µs 66.55µs 14535. 3.31MB 23.2
## 2 fibonacci_cpp(10) 1.63µs 3.26µs 313664. 2.49KB 0
6 / 22

Using sourceCPP

sourceCpp reads a C++ file and exports functions for use in R

To use sourceCpp, start your standalone C++ file with:

#include <Rcpp.h>
using namespace Rcpp;

And before each function you want to export:

// [[Rcpp::export]]
7 / 22

Really useful: the vector classes

Rcpp provides C++ vector types corresponding to the main vector types in R

  • IntegerVector
  • NumericVector
  • LogicalVector
  • CharacterVector
8 / 22

Example using NumericVector

#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double meanC(NumericVector x) {
int n = x.size();
double total = 0;
for(int i = 0; i < n; ++i) {
total += x[i];
}
return total / n;
}
/*** R
x <- runif(1e5)
bench::mark(
mean(x),
meanC(x)
)
/
9 / 22

Benchmarking against mean

x <- runif(1e5)
bench::mark(
mean(x),
meanC(x)
)
## # A tibble: 2 x 6
## expression min median `itr/sec` mem_alloc `gc/sec`
## <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
## 1 mean(x) 170µs 171µs 5388. 0B 0
## 2 meanC(x) 113µs 114µs 8291. 2.49KB 0
10 / 22

Other types

Rcpp also has types for the following R entities

  • List

  • DataFrame

  • Function

11 / 22

Attributes

The attributes of an R object can be queried using the .attr() method of the corresponding Rcpp object.

An example of using .attr()

#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector attribs() {
NumericVector out = NumericVector::create(1, 2, 3);
out.names() = CharacterVector::create("a", "b", "c");
out.attr("my-attr") = "my-value";
out.attr("class") = "my-class";
return out;
}
12 / 22

Missing values: scalars

NA corresponds to a different C++ constant for each underlying scalar type:

cppFunction("
List scalar_missings() {
int int_s = NA_INTEGER;
String chr_s = NA_STRING;
bool lgl_s = NA_LOGICAL;
double num_s = NA_REAL;
return List::create(int_s, chr_s, lgl_s, num_s);
}
")
str(scalar_missings())
## List of 4
## $ : int NA
## $ : chr NA
## $ : logi TRUE
## $ : num NA

Looks mostly good. But there are some pesky details in section 25.4 (I'm going to pretend they don't exist.)

13 / 22

Missing values: vectors

With vectors, you need to use a missing value specific to the type of vector, NA_REAL, NA_INTEGER, NA_LOGICAL, NA_STRING:

cppFunction("List missing_sampler() {
return List::create(
NumericVector::create(NA_REAL),
IntegerVector::create(NA_INTEGER),
LogicalVector::create(NA_LOGICAL),
CharacterVector::create(NA_STRING)
);
}")
str(missing_sampler())
## List of 4
## $ : num NA
## $ : int NA
## $ : logi NA
## $ : chr NA
14 / 22

Standard Template Library

The STL:

  • a really extensive C++ software library

  • has 4 components:

    • algorithms

    • containers

    • functions

    • iterators

Note: this is not exactly the same thing as the C++ Standard Library

15 / 22

Detour: if you really want to learn the STL

You can learn the STL from STL (video 1 of n):

You will truly get the STL if you get to the end of this series.

16 / 22

Iterators

C++ has many iterator types.

Key features of iterators are:

  • Advance with ++.

  • Get the value they refer to, or dereference, with *.

  • Compare with ==

17 / 22

Example using iterator features

#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double sum3(NumericVector x) {
double total = 0;
NumericVector::iterator it;
for(it = x.begin(); it != x.end(); ++it) {
total += *it;
}
return total;
}
sum3(c(1,12,201,2001))
## [1] 2215
18 / 22

Algorithms

The STL also has a lot of efficiently implemented algorithms.

The following code uses the std::upper_bound algorithm from the STL to create a function that takes two arguments a vector of values and a vector of breaks, and locates the bin that each x falls into.

#include <algorithm>
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector findInterval2(NumericVector x, NumericVector breaks) {
IntegerVector out(x.size());
NumericVector::iterator it, pos;
IntegerVector::iterator out_it;
for(it = x.begin(), out_it = out.begin(); it != x.end();
++it, ++out_it) {
pos = std::upper_bound(breaks.begin(), breaks.end(), *it);
*out_it = std::distance(breaks.begin(), pos);
}
return out;
}
19 / 22

Particularly useful STL data structures

  • vector: similar to an R vector. But more efficient. Templated.

  • set: maintain a set of unique values. Good when you need to identify if you have seen a particular value already

    • std::set

    • std::unordered_set

  • map: data structure containing (key,value) pairs aka dictionaries.

20 / 22

Using Rcpp in a package

Two simple steps:

  • In DESCRIPTION add
LinkingTo: Rcpp
Imports: Rcpp
  • Make sure your NAMESPACE includes:
useDynLib(mypackage)
importFrom(Rcpp, sourceCpp)

The only thing left is the actual code.

21 / 22

The end

I have tried to read this book on multiple occasions. And failed.

So I am glad that I saw this tweet:

22 / 22

Sad reality: sometimes R won't cut it

When can C++ help the R programmer?

  • Loops that can’t be easily vectorised because subsequent iterations depend on previous ones.

  • Recursive functions, or problems which involve calling functions millions of times. The overhead of calling a function in C++ is much lower than in R.

  • Problems that require advanced data structures and algorithms that R doesn’t provide.

2 / 22
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow