This document provide instruction for calling C from R using two old-fashion interfaces: .C
and .Call
. Rcpp
is a relatively newer mechanism for calling C/C++ from R, which provides easier and cleaner interface. This chapter from Advanced R provides a wonderful and comprehensive description of Rcpp, so I don’t want to retype what they described.
I did some comparison for using Rcpp
, .C
, and .Call
. I found that in general Rcpp
has the best performance, .Call
is comparable to Rcpp
, and .C
is the slowest by a rather large margin. This page shows some examples and codes to compare these interfaces.
gcc
, clang
, etc). If you want to have C code within R package, supports for make
, perl
are necessary.
Of course, you need to know how to program C.
To create a C function that can be called from R,
R CMD SHLIB
..C
or .Call
..Call
is more flexible but more complicated. We will focus on .C
fist. A few points for .C
:
It only takes basic R data type (numeric, character, etc.). List is not supported.
From .C
call, everything passed in C from R is a pointer (even a scalar), which is the address for the memory storing the input. So on the C side, the input values need to be specified as pointers.
Hello world
exampleHello world
message. There’s no input/output.#include <stdio.h>
void hello() {
printf("Hello world! \n")
}
R CMD SHLIB hello.c
After this step, two new files will be created: hello.o
and hello.so
. The .so
file is the compiled library that can be loaded in R.
dyn.load("hello.so")
a = .C("hello")
The first line loads in the shared library created from C, and the second line calls the function.
Now we will write a function to compute the sum of a vector. In this example, C will take inputs from R (still no return value).
mySum.c
. It takes a vector of integers from R, compute the sum, and print out the results.#include <stdio.h>
void mySum(int *x, int *nx) {
int i, result=0;
for(i=0; i<nx[0]; i++)
result = result + x[i];
printf ("%d\n", result);
}
R CMD SHLIB mySum.c
dyn.load("mySum.so")
.C("mySum", as.integer(1:5), as.integer(5))
There are a few very important points in this example:
The variables in the C function are all pointers.
When passing in a vector, C doesn’t know the length of the vector. So one needs to specify that. In the example, *nx
specifies the length.
In R, the default numeric type is double
. On the C side, if the input parameters are integer (int
), the inputs on the R side need to be converted into int. That’s why I used as.integer
in .C
call.
The return object from .C
call is a list containing all inputs.
Now we want to get some results from the C code. The easiest way from using .C
is:
Note there is still no return object from C (so the C function is still void
). Here is the C code named mySum2.c
. It takes following variables (all pointers) from R:
#include <stdio.h>
void mySum2(int *x, int *nx, int *result) {
int i;
for(i=0; i<nx[0]; i++)
result[0] = result[0] + x[i];
}
Now we compile it, and run the code
dyn.load("mySum2.so")
result = .C("mySum2", as.integer(1:5), as.integer(5), as.integer(0))
result[[3]]
Remember the return object from .C
is a list containing all inputs. Here result[[3]]
, the third input variable, is the sum computed from C. In this example, the value will be 15, which is sum(1:5)
.
R matrix is internally a vector, saved by column. For example, a 3x3 matrix is saved as a 9 element vector, where the 4th element is the [1,2] item in the matrix.
The function below (myRowSum.c) computes the row-wise sums of a matrix. Pay attention to the indexing.
#include <stdio.h>
void myRowSum(int *x, int *nrowx, int *ncolx, int *result) {
int i, j;
for(i=0; i<nrowx[0]; i++)
for(j=0; j<ncolx[0]; j++)
result[i] = result[i] + x[j*nrowx[0]+i];
}
Now we compile it, and run the code
dyn.load("myRowSum.so")
X = matrix(1:6, ncol=2)
result = .C("myRowSum", as.integer(X), as.integer(nrow(X)),
as.integer(ncol(X)), as.integer(rep(0,nrow(X))) )
result[[4]] ## this gives 5 7 9
.Call
is a more flexible (and more complicated) way to interface C.
From .Call
, all R objects are stored in SEXP
class, which stands for S-EXPression
. Remember from .C
, all R objects are pointers.
SEXP
can be any objects from R, thus list is supported.
Information for an SEXP
object can be extracted on the C side. For example, when we pass a vector into C, we don’t need to provide the length.
Functions to be used with .Call
accepts and returns SEXP. It’s important to note that one cannot have a void
function for .Call
. If we don’t want to return anything, use return R_NilValue
.
Below is the C code (mySum_call.c):
#include <stdio.h>
#include <R.h>
#include <Rinternals.h>
SEXP mySum_call(SEXP Rx) {
int i, nx;
nx = length(Rx);
/* allocate and initialize memory for result */
SEXP result = PROTECT( allocVector(INTSXP, 1) );
int *result_ptr = INTEGER(result);
result_ptr[0] = 0;
/* take out the values from input */
int *x = INTEGER(Rx);
/* compute the sum */
for(i=0; i<nx; i++)
result_ptr[0] = result_ptr[0] + x[i];
UNPROTECT(1);
return(result);
}
Now we compile it, and run the code
dyn.load("mySum_call.so")
result = .Call("mySum_call", as.integer(1:5))
result ## this gives 15
A few important points in this example:
R.h
and Rinternals.h
need to be included.INTEGER
function to extract the pointer for the input vector from the input SEXP object.length
, which operates on SEXP.PROTECT
. Then at the end of the C code before return, it needs to be UNPROTECT
, so that the memory can be garbage collected.SEXP
is a general type including many subtypes for different R objects. For a complete list for types of SEXP, see https://colinfay.me/r-internals/r-internal-structures.html. Important subtypes include:
REALSXP
: numeric (double precision) vectorINTSXP
: integer vectorSTRSXP
: character vectorVECSXP
: listINTEGER
function to get content, there will be error. For our mySum_call
example above, there will be an error message if the input is not an integer vector.dyn.load("mySum_call.so")
result = .Call("mySum_call", rnorm(5))
## Error in eval(expr, envir, enclos): INTEGER() can only be applied to a 'integer', not a 'double'
INTEGER
: for INTSXP
REAL
: for REALSXP
SEXP
attributesR objects often have some “attributes” (like dimension of matrix, names of a list, etc.). In R, these attributes can be obtained or set using attributes
function.
When R objects passed into C are represented as SEXP
. Their attributes can be can be obtained or set in C code, using getAttrib
and setAttrib
functions.
A list of attributes for SEXP
is defined in the header file Rinternal.h. Some important ones are:
R_DimSymbol
: dimR_DimNamesSymbol
: dimnamesR_NamesSymbol
: namesR_RowNamesSymbol
: rownamesHere I will introduce how to debug a standalone C code (ones can be loaded into R by dyn.load
). Debugging C code within an R package is slightly different, which will be introduced later.
Below I use myRowSum
function as example.
The C code needs to be compiled to be debuggable. To do that, compile the C with following command: MAKEFLAGS="CFLAGS=-g" R CMD SHLIB myRowSum.c
. Here -g
flag means we want to debug the code. (This doesn’t seem to be necessary now. Maybe R make -g default?)
R -d gdb
.gdb
isn’t available anymore. But we can use lldb
: run R -d lldb
.Once we are in the debugger, start R by doing run
. Then we will be in the R console.
-g
flag: dyn.load("myRowSum.so")
.Control-C
to switch back to lldb
environment.b myRowSum.c:9
.continue
. Now we come back to R.X = matrix(1:6, ncol=2)
result = .C("myRowSum", as.integer(X), as.integer(nrow(X)),
as.integer(ncol(X)), as.integer(rep(0,nrow(X))) )
myRowSum
is hit, it will go to lldb
environment. We can then do all sorts of debugging, such as run line by line, inspect the value of variables, etc.For detailed commands for gdb
and lldb
, see https://lldb.llvm.org/use/map.html.
To have C code in an R package, do following:
src
, and put in C codesuseDynLib(pkgname)
in the DESCRIPTION
file. Replace pkgname
by the name of your package The useDynLib
command specifies that some C functions will be dynamically.The package can be built and installed using typical procedure. One can build a “binary” package, which precompiles the C code so that user without a compiler can install. Do following to build binary package:
R CMD INSTALL --build testpkg
Practice: build a package with the following R function and mySum_call.c
:
mySum <- function(X) {
.Call("mySum_call", as.integer(X))
}
R -d lldb
lldb
lldb
(Ctl-C), set breakpoint in C code.It’s recommended to set compilation flags in the ~/.R/Makevars
file by adding following line. This will make sure the C is compiled as debuggable. I found that this doesn’t seem to be necessary at least on my laptop (Mac).
CXXFLAGS = -g -O0
There are ways to call R from C. This doesn’t seem to be particularly useful though.
Some resources on this:
http://pabercrombie.com/wordpress/2014/05/how-to-call-an-r-function-from-c/
Evaluating R expressions from C, from Writing R Extensions
.
These packages are the foundation for linear algebraic computation in many (if not all) languages (R, Matlab, python, etc.).
We can directly use the functions in these packages to speed up the computation.
Below is an example to multiply two matrices, using the dgemm
function in BLAS
Here is the C function matmult.c:
#include <R.h>
#include <Rinternals.h>
#include <R_ext/BLAS.h>
SEXP matmult(SEXP X, SEXP Y) {
int nrowX, ncolX, ncolY;
double *x_ptr, *y_ptr, *result_ptr;
/* extract the "dim" attribute */
SEXP dim;
dim = getAttrib( X, R_DimSymbol ) ;
nrowX = INTEGER(dim)[0];
ncolX = INTEGER(dim)[1];
dim = getAttrib( Y, R_DimSymbol ) ;
ncolY = INTEGER(dim)[1];
/* allocate memory for the result matrix */
SEXP result = PROTECT( allocVector(REALSXP, nrowX*ncolY) );
/* get pointers for the input/output */
x_ptr = REAL(X);
y_ptr = REAL(Y);
result_ptr = REAL(result);
/* call dgemm function to multiple two matrices */
double one = 1.0;
double zero = 0.0;
F77_CALL(dgemm)("n", "n", &nrowX, &ncolY, &ncolX, &one, x_ptr, &nrowX, y_ptr, &ncolX,
&zero, result_ptr, &nrowX);
/* manipulate the output to make it a matrix */
dim = PROTECT( allocVector(INTSXP, 2) );
INTEGER(dim)[0] = nrowX;
INTEGER(dim)[1] = ncolY;
setAttrib( result, R_DimSymbol, dim ) ;
UNPROTECT(2);
return(result);
}
Here is the R code for testing the function
dyn.load("matmult.so")
X = matrix(rnorm(10), nrow=5)
Y = matrix(rnorm(8), nrow=2)
result = .Call("matmult", X, Y)
X %*% Y ## gives the same result