Calling C from R

This document provide instruction for calling C from R using two old-fashion interfaces: .C and .Call. Rcpp is a relatively newer mechanism for calling C/C++ from R, which provides easier and cleaner interface. This chapter from Advanced R provides a wonderful and comprehensive description of Rcpp, so I don’t want to retype what they described.

I did some comparison for using Rcpp, .C, and .Call. I found that in general Rcpp has the best performance, .Call is comparable to Rcpp, and .C is the slowest by a rather large margin. This page shows some examples and codes to compare these interfaces.

Prerequisite

Need to have a system supports C compiler (gcc, clang, etc). If you want to have C code within R package, supports for make, perl are necessary.
- Linux system has those by default.
- On Mac OS, a compiler system like Xcode needs to be installed.
- On Windows, needs to install a compiler system such as Cygwin.
Of course, you need to know how to program C.
Other resources :
- Writting R extension, the official R document, covers some information on this topic. But it’s lengthy.
- R’s C interface, a chapter in Advanced R book.
- Debugging C/C++, provided on Bioconductor.
- Debugging an R Package with C++, for debugging. Most contents are the same for debugging C (not C++) with R.

Procedure

To create a C function that can be called from R,

Write the C function.
Compile the C function using R CMD SHLIB.
Call the C function in R, using .C or .Call.

.Call is more flexible but more complicated. We will focus on .C fist. A few points for .C:

It only takes basic R data type (numeric, character, etc.). List is not supported.
From .C call, everything passed in C from R is a pointer (even a scalar), which is the address for the memory storing the input. So on the C side, the input values need to be specified as pointers.

A very simple `Hello world` example

Here is the C code named hello.c. It just prints out a Hello world message. There’s no input/output.

#include <stdio.h>
void hello() {
    printf("Hello world! \n")
}

Compile the C code by doing following outside R in a command window:

R CMD SHLIB hello.c

After this step, two new files will be created: hello.o and hello.so. The .so file is the compiled library that can be loaded in R.

Go into R, do the following

dyn.load("hello.so")
a = .C("hello")

The first line loads in the shared library created from C, and the second line calls the function.

A slightly more complicated one

Now we will write a function to compute the sum of a vector. In this example, C will take inputs from R (still no return value).

Here is the C code named mySum.c. It takes a vector of integers from R, compute the sum, and print out the results.

#include <stdio.h>
void mySum(int *x, int *nx) {
    int i, result=0;
    for(i=0; i<nx[0]; i++)
        result = result + x[i];
    printf ("%d\n", result);
}

Compile the C code by doing following outside R in a command window:

R CMD SHLIB mySum.c

Run the code

dyn.load("mySum.so")
.C("mySum", as.integer(1:5), as.integer(5))

There are a few very important points in this example:

The variables in the C function are all pointers.
When passing in a vector, C doesn’t know the length of the vector. So one needs to specify that. In the example, *nx specifies the length.
In R, the default numeric type is double. On the C side, if the input parameters are integer (int), the inputs on the R side need to be converted into int. That’s why I used as.integer in .C call.
The return object from .C call is a list containing all inputs.

Obtain results from C

Now we want to get some results from the C code. The easiest way from using .C is:

Initialize the result in R.
Pass the result variable (as a pointer) to C.
In C, modify the value of the result variable.

Note there is still no return object from C (so the C function is still void). Here is the C code named mySum2.c. It takes following variables (all pointers) from R:

a vector of integers
length of the input vector
a variable for the result

#include <stdio.h>
void mySum2(int *x, int *nx, int *result) {
    int i;
    for(i=0; i<nx[0]; i++)
        result[0] = result[0] + x[i];
}

Now we compile it, and run the code

dyn.load("mySum2.so")
result = .C("mySum2", as.integer(1:5), as.integer(5), as.integer(0))
result[[3]]

Remember the return object from .C is a list containing all inputs. Here result[[3]], the third input variable, is the sum computed from C. In this example, the value will be 15, which is sum(1:5).

Matrix in C

R matrix is internally a vector, saved by column. For example, a 3x3 matrix is saved as a 9 element vector, where the 4th element is the [1,2] item in the matrix.

The function below (myRowSum.c) computes the row-wise sums of a matrix. Pay attention to the indexing.

#include <stdio.h>
void myRowSum(int *x, int *nrowx, int *ncolx, int *result) {
    int i, j;
    for(i=0; i<nrowx[0]; i++)
        for(j=0; j<ncolx[0]; j++)
            result[i] = result[i] + x[j*nrowx[0]+i];
}

Now we compile it, and run the code

dyn.load("myRowSum.so")
X = matrix(1:6, ncol=2)
result = .C("myRowSum", as.integer(X), as.integer(nrow(X)), 
            as.integer(ncol(X)), as.integer(rep(0,nrow(X))) )
result[[4]] ## this gives 5 7 9

Using .Call

.Call is a more flexible (and more complicated) way to interface C.
From .Call, all R objects are stored in SEXP class, which stands for S-EXPression. Remember from .C, all R objects are pointers.
SEXP can be any objects from R, thus list is supported.
Information for an SEXP object can be extracted on the C side. For example, when we pass a vector into C, we don’t need to provide the length.
Functions to be used with .Call accepts and returns SEXP. It’s important to note that one cannot have a void function for .Call. If we don’t want to return anything, use return R_NilValue.

Using .Call to compute sum of a vector of integers

Below is the C code (mySum_call.c):

#include <stdio.h>
#include <R.h>
#include <Rinternals.h>

SEXP mySum_call(SEXP Rx) {
    int i, nx;
    nx = length(Rx);

    /* allocate and initialize memory for result */
    SEXP result = PROTECT( allocVector(INTSXP, 1) );
    int *result_ptr = INTEGER(result);
    result_ptr[0] = 0;

    /* take out the values from input */
    int *x = INTEGER(Rx);

    /* compute the sum */
    for(i=0; i<nx; i++)
        result_ptr[0] = result_ptr[0] + x[i];

    UNPROTECT(1);
    return(result);
}

Now we compile it, and run the code

dyn.load("mySum_call.so")
result = .Call("mySum_call", as.integer(1:5)) 
result ## this gives 15

A few important points in this example:

R.h and Rinternals.h need to be included.
All inputs/output are of type SEXP.
I use INTEGER function to extract the pointer for the input vector from the input SEXP object.
The length of the vector can be obtain by length, which operates on SEXP.
The result SEXP need to be declared, and memory for the results need to be allocated. In memory allocation, it is recommended to use PROTECT. Then at the end of the C code before return, it needs to be UNPROTECT, so that the memory can be garbage collected.

A little more about SEXP

SEXP is a general type including many subtypes for different R objects. For a complete list for types of SEXP, see https://colinfay.me/r-internals/r-internal-structures.html. Important subtypes include:
- REALSXP: numeric (double precision) vector
- INTSXP: integer vector
- STRSXP: character vector
- VECSXP: list
It’s important to know the R object passed into C, so that one can properly extract content from the object. For example, if the input is a numeric vector but one tries to use INTEGER function to get content, there will be error. For our mySum_call example above, there will be an error message if the input is not an integer vector.

dyn.load("mySum_call.so")
result = .Call("mySum_call", rnorm(5))

## Error in eval(expr, envir, enclos): INTEGER() can only be applied to a 'integer', not a 'double'

Functions to extract content from the object:
- INTEGER: for INTSXP
- REAL: for REALSXP
- Strings and lists are more complicated because they are vectors of SEXPs. I will not cover them here.

`SEXP` attributes

R objects often have some “attributes” (like dimension of matrix, names of a list, etc.). In R, these attributes can be obtained or set using attributes function.

When R objects passed into C are represented as SEXP. Their attributes can be can be obtained or set in C code, using getAttrib and setAttrib functions.

A list of attributes for SEXP is defined in the header file Rinternal.h. Some important ones are:

R_DimSymbol: dim
R_DimNamesSymbol: dimnames
R_NamesSymbol: names
R_RowNamesSymbol: rownames

Debug standalone C code

Here I will introduce how to debug a standalone C code (ones can be loaded into R by dyn.load). Debugging C code within an R package is slightly different, which will be introduced later.

Below I use myRowSum function as example.

The C code needs to be compiled to be debuggable. To do that, compile the C with following command: MAKEFLAGS="CFLAGS=-g" R CMD SHLIB myRowSum.c. Here -g flag means we want to debug the code. (This doesn’t seem to be necessary now. Maybe R make -g default?)
Run R within a GNU debugger.
- On linux, run R -d gdb.
- On newer versions of Mac OSX, gdb isn’t available anymore. But we can use lldb: run R -d lldb.
Once we are in the debugger, start R by doing run. Then we will be in the R console.
Debugging steps.
- Load in the recompiled library with -g flag: dyn.load("myRowSum.so").
- Press Control-C to switch back to lldb environment.
- Set a breakpoint, say, at line 9: b myRowSum.c:9.
- After setting breakpoint, go back to R by continue. Now we come back to R.
- Run the following codes.
```
X = matrix(1:6, ncol=2)
result = .C("myRowSum", as.integer(X), as.integer(nrow(X)), 
          as.integer(ncol(X)), as.integer(rep(0,nrow(X))) )
```
- Once the breakpoint in myRowSum is hit, it will go to lldb environment. We can then do all sorts of debugging, such as run line by line, inspect the value of variables, etc.
For detailed commands for gdb and lldb, see https://lldb.llvm.org/use/map.html.

C code with R package

To have C code in an R package, do following:

Create a directory src, and put in C codes
Add a line useDynLib(pkgname) in the DESCRIPTION file. Replace pkgname by the name of your package The useDynLib command specifies that some C functions will be dynamically.

The package can be built and installed using typical procedure. One can build a “binary” package, which precompiles the C code so that user without a compiler can install. Do following to build binary package:

R CMD INSTALL --build testpkg

Practice: build a package with the following R function and mySum_call.c:

mySum <- function(X) {
    .Call("mySum_call", as.integer(X))
}

Debug C code with R package

Start R with with debugger R -d lldb
Run R in lldb
Load in library
Break back to lldb (Ctl-C), set breakpoint in C code.
Back to R by `continue’, call R function. Once the breakpoint is hit, it will go to lldb.

It’s recommended to set compilation flags in the ~/.R/Makevars file by adding following line. This will make sure the C is compiled as debuggable. I found that this doesn’t seem to be necessary at least on my laptop (Mac).

CXXFLAGS = -g -O0

Calling R functions from C

There are ways to call R from C. This doesn’t seem to be particularly useful though.

Some resources on this:

http://pabercrombie.com/wordpress/2014/05/how-to-call-an-r-function-from-c/
Evaluating R expressions from C, from Writing R Extensions.

Use BLAS/LAPACK functions

BLAS and LAPACK are high-performance libraries for linear algebra computation.
- BLAS (Basic Linear Algebra Subprograms) is a library for basic linear algebra operations, such as matrix transpose and multiplication.
- LAPACK (Linear Algebra Package) is a library for more advanced linear algebra computation, such as matrix inversion, solving system linear equations,linear regression, LU/QR/Cholesky decomposition, etc.
These packages are the foundation for linear algebraic computation in many (if not all) languages (R, Matlab, python, etc.).
We can directly use the functions in these packages to speed up the computation.

Below is an example to multiply two matrices, using the dgemm function in BLAS

Here is the C function matmult.c:

#include <R.h>
#include <Rinternals.h>
#include <R_ext/BLAS.h>

SEXP matmult(SEXP X, SEXP Y) {
    int nrowX, ncolX, ncolY;
    double *x_ptr, *y_ptr, *result_ptr;

    /* extract the "dim" attribute */
    SEXP dim;
    dim = getAttrib( X, R_DimSymbol ) ;
    nrowX = INTEGER(dim)[0];
    ncolX = INTEGER(dim)[1];
    dim = getAttrib( Y, R_DimSymbol ) ;
    ncolY = INTEGER(dim)[1];

    /* allocate memory for the result matrix */
    SEXP result = PROTECT( allocVector(REALSXP, nrowX*ncolY) );

    /* get pointers for the input/output */
    x_ptr = REAL(X);
    y_ptr = REAL(Y);
    result_ptr = REAL(result);

    /* call dgemm function to multiple two matrices */
    double one = 1.0;
    double zero = 0.0;
    F77_CALL(dgemm)("n", "n", &nrowX, &ncolY, &ncolX, &one, x_ptr, &nrowX, y_ptr, &ncolX,
                    &zero, result_ptr, &nrowX);

    /* manipulate the output to make it a matrix */
    dim = PROTECT( allocVector(INTSXP, 2) );
    INTEGER(dim)[0] = nrowX;
    INTEGER(dim)[1] = ncolY;
    setAttrib( result, R_DimSymbol, dim ) ;

    UNPROTECT(2);
    return(result);
}

Here is the R code for testing the function

dyn.load("matmult.so")
X = matrix(rnorm(10), nrow=5)
Y = matrix(rnorm(8), nrow=2)
result = .Call("matmult", X, Y)
X %*% Y ## gives the same result

Calling C from R

Prerequisite

Procedure

A very simple Hello world example

A slightly more complicated one

Obtain results from C

Matrix in C

Using .Call

Using .Call to compute sum of a vector of integers

A little more about SEXP

SEXP attributes

Debug standalone C code

C code with R package

Debug C code with R package

Calling R functions from C

Use BLAS/LAPACK functions

A very simple `Hello world` example

`SEXP` attributes