This document provide instruction for calling C from R using two old-fashion interfaces: .C and .Call. Rcpp is a relatively newer mechanism for calling C/C++ from R, which provides easier and cleaner interface. This chapter from Advanced R provides a wonderful and comprehensive description of Rcpp, so I don’t want to retype what they described.
I did some comparison for using Rcpp, .C, and .Call. I found that in general Rcpp has the best performance, .Call is comparable to Rcpp, and .C is the slowest by a rather large margin. This page shows some examples and codes to compare these interfaces.
gcc, clang, etc). If you want to have C code within R package, supports for make, perl are necessary.
Of course, you need to know how to program C.
To create a C function that can be called from R,
R CMD SHLIB..C or .Call..Call is more flexible but more complicated. We will focus on .C fist. A few points for .C:
It only takes basic R data type (numeric, character, etc.). List is not supported.
From .C call, everything passed in C from R is a pointer (even a scalar), which is the address for the memory storing the input. So on the C side, the input values need to be specified as pointers.
Hello world exampleHello world message. There’s no input/output.#include <stdio.h>
void hello() {
printf("Hello world! \n")
}
R CMD SHLIB hello.c
After this step, two new files will be created: hello.o and hello.so. The .so file is the compiled library that can be loaded in R.
dyn.load("hello.so")
a = .C("hello")
The first line loads in the shared library created from C, and the second line calls the function.
Now we will write a function to compute the sum of a vector. In this example, C will take inputs from R (still no return value).
mySum.c. It takes a vector of integers from R, compute the sum, and print out the results.#include <stdio.h>
void mySum(int *x, int *nx) {
int i, result=0;
for(i=0; i<nx[0]; i++)
result = result + x[i];
printf ("%d\n", result);
}
R CMD SHLIB mySum.c
dyn.load("mySum.so")
.C("mySum", as.integer(1:5), as.integer(5))
There are a few very important points in this example:
The variables in the C function are all pointers.
When passing in a vector, C doesn’t know the length of the vector. So one needs to specify that. In the example, *nx specifies the length.
In R, the default numeric type is double. On the C side, if the input parameters are integer (int), the inputs on the R side need to be converted into int. That’s why I used as.integer in .C call.
The return object from .C call is a list containing all inputs.
Now we want to get some results from the C code. The easiest way from using .C is:
Note there is still no return object from C (so the C function is still void). Here is the C code named mySum2.c. It takes following variables (all pointers) from R:
#include <stdio.h>
void mySum2(int *x, int *nx, int *result) {
int i;
for(i=0; i<nx[0]; i++)
result[0] = result[0] + x[i];
}
Now we compile it, and run the code
dyn.load("mySum2.so")
result = .C("mySum2", as.integer(1:5), as.integer(5), as.integer(0))
result[[3]]
Remember the return object from .C is a list containing all inputs. Here result[[3]], the third input variable, is the sum computed from C. In this example, the value will be 15, which is sum(1:5).
R matrix is internally a vector, saved by column. For example, a 3x3 matrix is saved as a 9 element vector, where the 4th element is the [1,2] item in the matrix.
The function below (myRowSum.c) computes the row-wise sums of a matrix. Pay attention to the indexing.
#include <stdio.h>
void myRowSum(int *x, int *nrowx, int *ncolx, int *result) {
int i, j;
for(i=0; i<nrowx[0]; i++)
for(j=0; j<ncolx[0]; j++)
result[i] = result[i] + x[j*nrowx[0]+i];
}
Now we compile it, and run the code
dyn.load("myRowSum.so")
X = matrix(1:6, ncol=2)
result = .C("myRowSum", as.integer(X), as.integer(nrow(X)),
as.integer(ncol(X)), as.integer(rep(0,nrow(X))) )
result[[4]] ## this gives 5 7 9
.Call is a more flexible (and more complicated) way to interface C.
From .Call, all R objects are stored in SEXP class, which stands for S-EXPression. Remember from .C, all R objects are pointers.
SEXP can be any objects from R, thus list is supported.
Information for an SEXP object can be extracted on the C side. For example, when we pass a vector into C, we don’t need to provide the length.
Functions to be used with .Call accepts and returns SEXP. It’s important to note that one cannot have a void function for .Call. If we don’t want to return anything, use return R_NilValue.
Below is the C code (mySum_call.c):
#include <stdio.h>
#include <R.h>
#include <Rinternals.h>
SEXP mySum_call(SEXP Rx) {
int i, nx;
nx = length(Rx);
/* allocate and initialize memory for result */
SEXP result = PROTECT( allocVector(INTSXP, 1) );
int *result_ptr = INTEGER(result);
result_ptr[0] = 0;
/* take out the values from input */
int *x = INTEGER(Rx);
/* compute the sum */
for(i=0; i<nx; i++)
result_ptr[0] = result_ptr[0] + x[i];
UNPROTECT(1);
return(result);
}
Now we compile it, and run the code
dyn.load("mySum_call.so")
result = .Call("mySum_call", as.integer(1:5))
result ## this gives 15
A few important points in this example:
R.h and Rinternals.h need to be included.INTEGER function to extract the pointer for the input vector from the input SEXP object.length, which operates on SEXP.PROTECT. Then at the end of the C code before return, it needs to be UNPROTECT, so that the memory can be garbage collected.SEXP is a general type including many subtypes for different R objects. For a complete list for types of SEXP, see https://colinfay.me/r-internals/r-internal-structures.html. Important subtypes include:
REALSXP: numeric (double precision) vectorINTSXP: integer vectorSTRSXP: character vectorVECSXP: listINTEGER function to get content, there will be error. For our mySum_call example above, there will be an error message if the input is not an integer vector.dyn.load("mySum_call.so")
result = .Call("mySum_call", rnorm(5))
## Error in eval(expr, envir, enclos): INTEGER() can only be applied to a 'integer', not a 'double'
INTEGER: for INTSXPREAL: for REALSXPSEXP attributesR objects often have some “attributes” (like dimension of matrix, names of a list, etc.). In R, these attributes can be obtained or set using attributes function.
When R objects passed into C are represented as SEXP. Their attributes can be can be obtained or set in C code, using getAttrib and setAttrib functions.
A list of attributes for SEXP is defined in the header file Rinternal.h. Some important ones are:
R_DimSymbol: dimR_DimNamesSymbol: dimnamesR_NamesSymbol: namesR_RowNamesSymbol: rownamesHere I will introduce how to debug a standalone C code (ones can be loaded into R by dyn.load). Debugging C code within an R package is slightly different, which will be introduced later.
Below I use myRowSum function as example.
The C code needs to be compiled to be debuggable. To do that, compile the C with following command: MAKEFLAGS="CFLAGS=-g" R CMD SHLIB myRowSum.c. Here -g flag means we want to debug the code. (This doesn’t seem to be necessary now. Maybe R make -g default?)
R -d gdb.gdb isn’t available anymore. But we can use lldb: run R -d lldb.Once we are in the debugger, start R by doing run. Then we will be in the R console.
-g flag: dyn.load("myRowSum.so").Control-C to switch back to lldb environment.b myRowSum.c:9.continue. Now we come back to R.X = matrix(1:6, ncol=2)
result = .C("myRowSum", as.integer(X), as.integer(nrow(X)),
as.integer(ncol(X)), as.integer(rep(0,nrow(X))) )
myRowSum is hit, it will go to lldb environment. We can then do all sorts of debugging, such as run line by line, inspect the value of variables, etc.For detailed commands for gdb and lldb, see https://lldb.llvm.org/use/map.html.
To have C code in an R package, do following:
src, and put in C codesuseDynLib(pkgname) in the DESCRIPTION file. Replace pkgname by the name of your package The useDynLib command specifies that some C functions will be dynamically.The package can be built and installed using typical procedure. One can build a “binary” package, which precompiles the C code so that user without a compiler can install. Do following to build binary package:
R CMD INSTALL --build testpkg
Practice: build a package with the following R function and mySum_call.c:
mySum <- function(X) {
.Call("mySum_call", as.integer(X))
}
R -d lldblldblldb (Ctl-C), set breakpoint in C code.It’s recommended to set compilation flags in the ~/.R/Makevars file by adding following line. This will make sure the C is compiled as debuggable. I found that this doesn’t seem to be necessary at least on my laptop (Mac).
CXXFLAGS = -g -O0
There are ways to call R from C. This doesn’t seem to be particularly useful though.
Some resources on this:
http://pabercrombie.com/wordpress/2014/05/how-to-call-an-r-function-from-c/
Evaluating R expressions from C, from Writing R Extensions.
These packages are the foundation for linear algebraic computation in many (if not all) languages (R, Matlab, python, etc.).
We can directly use the functions in these packages to speed up the computation.
Below is an example to multiply two matrices, using the dgemm function in BLAS
Here is the C function matmult.c:
#include <R.h>
#include <Rinternals.h>
#include <R_ext/BLAS.h>
SEXP matmult(SEXP X, SEXP Y) {
int nrowX, ncolX, ncolY;
double *x_ptr, *y_ptr, *result_ptr;
/* extract the "dim" attribute */
SEXP dim;
dim = getAttrib( X, R_DimSymbol ) ;
nrowX = INTEGER(dim)[0];
ncolX = INTEGER(dim)[1];
dim = getAttrib( Y, R_DimSymbol ) ;
ncolY = INTEGER(dim)[1];
/* allocate memory for the result matrix */
SEXP result = PROTECT( allocVector(REALSXP, nrowX*ncolY) );
/* get pointers for the input/output */
x_ptr = REAL(X);
y_ptr = REAL(Y);
result_ptr = REAL(result);
/* call dgemm function to multiple two matrices */
double one = 1.0;
double zero = 0.0;
F77_CALL(dgemm)("n", "n", &nrowX, &ncolY, &ncolX, &one, x_ptr, &nrowX, y_ptr, &ncolX,
&zero, result_ptr, &nrowX);
/* manipulate the output to make it a matrix */
dim = PROTECT( allocVector(INTSXP, 2) );
INTEGER(dim)[0] = nrowX;
INTEGER(dim)[1] = ncolY;
setAttrib( result, R_DimSymbol, dim ) ;
UNPROTECT(2);
return(result);
}
Here is the R code for testing the function
dyn.load("matmult.so")
X = matrix(rnorm(10), nrow=5)
Y = matrix(rnorm(8), nrow=2)
result = .Call("matmult", X, Y)
X %*% Y ## gives the same result