This document provide instruction for calling C from R using two old-fashion interfaces: .C and .Call. Rcpp is a relatively newer mechanism for calling C/C++ from R, which provides easier and cleaner interface. This chapter from Advanced R provides a wonderful and comprehensive description of Rcpp, so I don’t want to retype what they described.

I did some comparison for using Rcpp, .C, and .Call. I found that in general Rcpp has the best performance, .Call is comparable to Rcpp, and .C is the slowest by a rather large margin. This page shows some examples and codes to compare these interfaces.


Prerequisite


Procedure

To create a C function that can be called from R,

  1. Write the C function.
  2. Compile the C function using R CMD SHLIB.
  3. Call the C function in R, using .C or .Call.

.Call is more flexible but more complicated. We will focus on .C fist. A few points for .C:

  1. It only takes basic R data type (numeric, character, etc.). List is not supported.

  2. From .C call, everything passed in C from R is a pointer (even a scalar), which is the address for the memory storing the input. So on the C side, the input values need to be specified as pointers.


A very simple Hello world example

  1. Here is the C code named hello.c. It just prints out a Hello world message. There’s no input/output.
#include <stdio.h>
void hello() {
    printf("Hello world! \n")
}
  1. Compile the C code by doing following outside R in a command window:
R CMD SHLIB hello.c

After this step, two new files will be created: hello.o and hello.so. The .so file is the compiled library that can be loaded in R.

  1. Go into R, do the following
dyn.load("hello.so")
a = .C("hello")

The first line loads in the shared library created from C, and the second line calls the function.


A slightly more complicated one

Now we will write a function to compute the sum of a vector. In this example, C will take inputs from R (still no return value).

  1. Here is the C code named mySum.c. It takes a vector of integers from R, compute the sum, and print out the results.
#include <stdio.h>
void mySum(int *x, int *nx) {
    int i, result=0;
    for(i=0; i<nx[0]; i++)
        result = result + x[i];
    printf ("%d\n", result);
}
  1. Compile the C code by doing following outside R in a command window:
R CMD SHLIB mySum.c
  1. Run the code
dyn.load("mySum.so")
.C("mySum", as.integer(1:5), as.integer(5))

There are a few very important points in this example:

  1. The variables in the C function are all pointers.

  2. When passing in a vector, C doesn’t know the length of the vector. So one needs to specify that. In the example, *nx specifies the length.

  3. In R, the default numeric type is double. On the C side, if the input parameters are integer (int), the inputs on the R side need to be converted into int. That’s why I used as.integer in .C call.

  4. The return object from .C call is a list containing all inputs.


Obtain results from C

Now we want to get some results from the C code. The easiest way from using .C is:

  1. Initialize the result in R.
  2. Pass the result variable (as a pointer) to C.
  3. In C, modify the value of the result variable.

Note there is still no return object from C (so the C function is still void). Here is the C code named mySum2.c. It takes following variables (all pointers) from R:

#include <stdio.h>
void mySum2(int *x, int *nx, int *result) {
    int i;
    for(i=0; i<nx[0]; i++)
        result[0] = result[0] + x[i];
}

Now we compile it, and run the code

dyn.load("mySum2.so")
result = .C("mySum2", as.integer(1:5), as.integer(5), as.integer(0))
result[[3]]

Remember the return object from .C is a list containing all inputs. Here result[[3]], the third input variable, is the sum computed from C. In this example, the value will be 15, which is sum(1:5).


Matrix in C

R matrix is internally a vector, saved by column. For example, a 3x3 matrix is saved as a 9 element vector, where the 4th element is the [1,2] item in the matrix.

The function below (myRowSum.c) computes the row-wise sums of a matrix. Pay attention to the indexing.

#include <stdio.h>
void myRowSum(int *x, int *nrowx, int *ncolx, int *result) {
    int i, j;
    for(i=0; i<nrowx[0]; i++)
        for(j=0; j<ncolx[0]; j++)
            result[i] = result[i] + x[j*nrowx[0]+i];
}

Now we compile it, and run the code

dyn.load("myRowSum.so")
X = matrix(1:6, ncol=2)
result = .C("myRowSum", as.integer(X), as.integer(nrow(X)), 
            as.integer(ncol(X)), as.integer(rep(0,nrow(X))) )
result[[4]] ## this gives 5 7 9

Using .Call


Using .Call to compute sum of a vector of integers

Below is the C code (mySum_call.c):

#include <stdio.h>
#include <R.h>
#include <Rinternals.h>

SEXP mySum_call(SEXP Rx) {
    int i, nx;
    nx = length(Rx);

    /* allocate and initialize memory for result */
    SEXP result = PROTECT( allocVector(INTSXP, 1) );
    int *result_ptr = INTEGER(result);
    result_ptr[0] = 0;

    /* take out the values from input */
    int *x = INTEGER(Rx);

    /* compute the sum */
    for(i=0; i<nx; i++)
        result_ptr[0] = result_ptr[0] + x[i];

    UNPROTECT(1);
    return(result);
}

Now we compile it, and run the code

dyn.load("mySum_call.so")
result = .Call("mySum_call", as.integer(1:5)) 
result ## this gives 15

A few important points in this example:

  1. R.h and Rinternals.h need to be included.
  2. All inputs/output are of type SEXP.
  3. I use INTEGER function to extract the pointer for the input vector from the input SEXP object.
  4. The length of the vector can be obtain by length, which operates on SEXP.
  5. The result SEXP need to be declared, and memory for the results need to be allocated. In memory allocation, it is recommended to use PROTECT. Then at the end of the C code before return, it needs to be UNPROTECT, so that the memory can be garbage collected.

A little more about SEXP

dyn.load("mySum_call.so")
result = .Call("mySum_call", rnorm(5))
## Error in eval(expr, envir, enclos): INTEGER() can only be applied to a 'integer', not a 'double'

SEXP attributes

R objects often have some “attributes” (like dimension of matrix, names of a list, etc.). In R, these attributes can be obtained or set using attributes function.

When R objects passed into C are represented as SEXP. Their attributes can be can be obtained or set in C code, using getAttrib and setAttrib functions.

A list of attributes for SEXP is defined in the header file Rinternal.h. Some important ones are:


Debug standalone C code

Here I will introduce how to debug a standalone C code (ones can be loaded into R by dyn.load). Debugging C code within an R package is slightly different, which will be introduced later.

Below I use myRowSum function as example.


C code with R package

To have C code in an R package, do following:

  1. Create a directory src, and put in C codes
  2. Add a line useDynLib(pkgname) in the DESCRIPTION file. Replace pkgname by the name of your package The useDynLib command specifies that some C functions will be dynamically.

The package can be built and installed using typical procedure. One can build a “binary” package, which precompiles the C code so that user without a compiler can install. Do following to build binary package:

R CMD INSTALL --build testpkg

Practice: build a package with the following R function and mySum_call.c:

mySum <- function(X) {
    .Call("mySum_call", as.integer(X))
}

Debug C code with R package

  1. Start R with with debugger R -d lldb
  2. Run R in lldb
  3. Load in library
  4. Break back to lldb (Ctl-C), set breakpoint in C code.
  5. Back to R by `continue’, call R function. Once the breakpoint is hit, it will go to lldb.

It’s recommended to set compilation flags in the ~/.R/Makevars file by adding following line. This will make sure the C is compiled as debuggable. I found that this doesn’t seem to be necessary at least on my laptop (Mac).

CXXFLAGS = -g -O0

Calling R functions from C

There are ways to call R from C. This doesn’t seem to be particularly useful though.

Some resources on this:


Use BLAS/LAPACK functions

Below is an example to multiply two matrices, using the dgemm function in BLAS

Here is the C function matmult.c:

#include <R.h>
#include <Rinternals.h>
#include <R_ext/BLAS.h>

SEXP matmult(SEXP X, SEXP Y) {
    int nrowX, ncolX, ncolY;
    double *x_ptr, *y_ptr, *result_ptr;

    /* extract the "dim" attribute */
    SEXP dim;
    dim = getAttrib( X, R_DimSymbol ) ;
    nrowX = INTEGER(dim)[0];
    ncolX = INTEGER(dim)[1];
    dim = getAttrib( Y, R_DimSymbol ) ;
    ncolY = INTEGER(dim)[1];

    /* allocate memory for the result matrix */
    SEXP result = PROTECT( allocVector(REALSXP, nrowX*ncolY) );

    /* get pointers for the input/output */
    x_ptr = REAL(X);
    y_ptr = REAL(Y);
    result_ptr = REAL(result);

    /* call dgemm function to multiple two matrices */
    double one = 1.0;
    double zero = 0.0;
    F77_CALL(dgemm)("n", "n", &nrowX, &ncolY, &ncolX, &one, x_ptr, &nrowX, y_ptr, &ncolX,
                    &zero, result_ptr, &nrowX);

    /* manipulate the output to make it a matrix */
    dim = PROTECT( allocVector(INTSXP, 2) );
    INTEGER(dim)[0] = nrowX;
    INTEGER(dim)[1] = ncolY;
    setAttrib( result, R_DimSymbol, dim ) ;

    UNPROTECT(2);
    return(result);
}

Here is the R code for testing the function

dyn.load("matmult.so")
X = matrix(rnorm(10), nrow=5)
Y = matrix(rnorm(8), nrow=2)
result = .Call("matmult", X, Y)
X %*% Y ## gives the same result