Rewriting R code in C++ (and Rust)

Interfaces to other software are part of R

— Chambers JM (2016). Extending R.

Learning objectives:

  • Why rewrite R code in C++/Rust
  • Compare C++/Rust function and package development
  • Run C++/Rust in R
  • Create and call C++/Rust functions in R
  • Evaluate function performance for R/C/C++/Rust
  • Compare compiler error messages between C++ and Rust
  • Identify advanced tooling for C++/Rust
library(Rcpp)
#library(rextendr) # rust performance shown below on MacOS 16.7.3 Intel i7

Even after optimizing R code, it still might not be optimized enough

  • R’s for loops are much slower compared to C++

    • Can’t vectorize in R when loops depend on subsequent iterations
  • R function calls are much slower compared to C++

    • Recursive functions involving millions of function calls
  • R does not include many data structures compared to C++

    • C++ Standard Template Library (STL) for efficiency, correctness, and maintainability
  • R cannot exploit parallelism as effectively as Rust

    • Rust memory is safer (ownership model with borrow checking) and so lacks GIL or run time locks
  • R object memory is borrowed via C/C++ unlike Rust

    • Rust owns objects and code compiles directly to machine code
  • R or C code debugging is more challenging than Rust

    • Rust has no GC so lacks memory bugs and has great error messages comparable with the {tidyverse}

C++ and Rust code are used via packages and offer similar implementation approaches

Rcpp::evalCpp('NA_INTEGER + 1')
#> [1] -2147483647
Rcpp::cppFunction('int add(int x, int y, int z) {
  int sum = x + y + z;
  return sum;
}')
add
#> function (x, y, z) 
#> .Call(<pointer: 0x7f60860b8530>, x, y, z)
add(1,2,3)
#> [1] 6
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
double meanC(NumericVector x) {
  int n = x.size();
  double total = 0;

  for(int i = 0; i < n; ++i) {
    total += x[i];
  }
  return total / n;
}
  • Creates infrastructure for using C++ within R packages
rextendr::rust_function("fn add(a:f64, b:f64) -> f64 { a + b }")
add(2.5, 4.7)
# 7.2
code <- '
#[extendr]
fn main() {
  println!("Hello, world!");
}
'

rextendr::rust_source(
  code = code
)

main()
# Hello, world!
# create a new R package
usethis::create_package("helloextendr")

# Use extendr
rextendr::use_extendr()

# Document and build the package
rextendr::document()

# run hello_world()
hello_world()
#> [1] "Hello, world!"
  • Newer Rust toolchain and R version is required
  • Using VSCode/Positron is strongly recommended with rust-analyzer extension

Reading C++ and Rust code is similar to reading R code

signR <- function(x) {
  if (x > 0) {
    1
  } else if (x == 0) {
    0
  } else {
    -1
  }
}
Rcpp::cppFunction('int signC(double x) {
  if (x > 0) {
    return 1;
  } else if (x == 0) {
    return 0;
  } else {
    return -1;
  }
}') |> system.time()
#>    user  system elapsed 
#>   1.908   0.408   1.987
rextendr::rust_function("
fn signRust(x: f64) -> i32 {
    if x > 0.0 {
        1
    } else if x == 0.0 {
        0
    } else {
        -1
    }
}
")
sign
#> function (x)  .Primitive("sign")
signR <- function(x) {
  as.integer(vapply(x,sign,double(1)))
}
Rcpp::cppFunction("
NumericVector signC(NumericVector x) {
  int n = x.size();
  NumericVector out(n);

  for (int i = 0; i < n; i++) {
    if (x[i] > 0.0) {
      out[i] = 1;
    } else if (x[i] == 0.0) {
      out[i] = 0;
    } else {
      out[i] = -1;
    }
  }

  return out;
}") |> system.time()
#>    user  system elapsed 
#>   1.969   0.422   2.063
Rcpp::cppFunction("
NumericVector signCfaster(NumericVector x) {
  int n = x.size();
  NumericVector out(n);

  for (int i = 0; i < n; ++i) {
    double v = x[i];
    out[i] = (v > 0) ? 1.0 : (v == 0 ? 0.0 : -1.0);
  }

  return out;

}") |> system.time()
#>    user  system elapsed 
#>   1.937   0.370   2.062
rextendr::rust_function("
fn signRust(x: Vec<f64>) -> Vec<Rint> {
    x.into_iter()
        .map(|v| {
            if v > 0.0 {
                Rint::from(1)
            } else if v == 0.0 {
                Rint::from(0)
            } else {
                Rint::from(-1)
            }
        })
        .collect()
}
")

code <- '

use rayon::prelude::*;

#[extendr]
fn signRustPar(x: Vec<f64>) -> Vec<Rint> {
    x.into_par_iter()
        .map(|v| {
            if v > 0.0 {
                Rint::from(1)
            } else if v == 0.0 {
                Rint::from(0)
            } else {
                Rint::from(-1)
            }
        })
        .collect()
}
'

rextendr::rust_source(
  code = code,
  dependencies = list(rayon = "1")
)

The C optimized sign is most performant, then C++ re-implementation is faster but Rust uses less memory

vec <- rnorm(1e7,sd = 50)
bench::mark(
  signR(vec),
  sign(vec),
  signCfaster(vec),
  signC(vec),
  signRustPar(vec),
  signRust(vec),
  relative = TRUE
)
## A tibble: 6 × 13
# expression         min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
# <bch:expr>       <dbl>  <dbl>     <dbl>     <dbl>    <dbl> <int> <dbl>   <bch:tm>      
#1 signR(vec)       98.3   83.6       1         3.00      Inf     1    34      3.84s
#2 sign(vec)         1      1        58.0       2.00      Inf     8     3    529.3ms
#3 signCfaster(vec)  1.70   1.86     44.7       2.00      Inf     6     1   514.77ms
#4 signC(vec)        3.48   2.97     26.5       2.00      Inf     4     1   578.39ms
#5 signRustPar(vec) 10.1    8.62      9.70      1         NaN     2     0   791.64ms
#6 signRust(vec)    12.7   13.3       6.29      1         NaN     2     0      1.22s

Rust has more readable error messages than C++

Rcpp::cppFunction("
NumericVector signCfaster(NumericVector x) {
  int n = x.size();
  NumericVector out(n);

  for (int i = 0; i < n; ++i) {
    out[i] = (v > 0) ? 1.0 : (v == 0 ? 0.0 : -1.0);
  }

  return out;

}")
#> using C++ compiler: ‘g++ (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0’
#> g++ -std=gnu++17 -I"/opt/R/4.5.2/lib/R/include" -DNDEBUG   -I"/home/runner/work/_temp/Library/Rcpp/include" -I"/tmp/RtmpqdbWeh/sourceCpp-x86_64-pc-linux-gnu-1.1.1" -I/usr/local/include    -fpic  -g -O2   -c file1de4260813f2.cpp -o file1de4260813f2.o
#> file1de4260813f2.cpp: In function ‘Rcpp::NumericVector signCfaster(Rcpp::NumericVector)’:
#> file1de4260813f2.cpp:12:15: error: ‘v’ was not declared in this scope
#>    12 |     out[i] = (v > 0) ? 1.0 : (v == 0 ? 0.0 : -1.0);
#>       |               ^
#> make: *** [/opt/R/4.5.2/lib/R/etc/Makeconf:211: file1de4260813f2.o] Error 1
#> Error in `sourceCpp()`:
#> ! Error 1 occurred building shared library.
rextendr::rust_function('
fn signRustSlice(x: Robj) -> Robj {
    // Borrow numeric slice from R (no copy)
    let slice = x.as_real_slice().expect("Expected numeric vector");

    let n = slice.len();
    let mut out = Doubles::new(n);

    // Unsafe block gives direct mutable slice (no bounds checks)
    let out_slice = unsafe { out.as_mut_slice() };

    for i in 0..n {
        let v = slice[i];
        out_slice[i] = if v.is_nan() {
            f64::NAN
        } else if v > 0.0 {
            1.0
        } else if v == 0.0 {
            0.0
        } else {
            -1.0
        };
    }

    out.into()
}
')

C++ and Rust have optimized and robust standard libraries

  • Rust Standard Library for algorithms and data structures
  • Primitives, modules, macros and keywords included in Rust
  • For example, multithreading using native OS threads with their own memory:
use std::thread;

thread::spawn(move || {
    // some work here
});

References