Tuesday, February 24, 2015

RCall: Running an embedded R in Julia

I have used R (and S before it) for a couple of decades. In the last few years most of my coding has been in Julia, a language for technical computing that can provide remarkable performance for a dynamically typed language via Just-In-Time (JIT) compilation of functions and via multiple dispatch.

Nonetheless there are facilities in R that I would like to have access to from Julia. I created the RCall package for Julia to do exactly that. This IJulia notebook provides an introduction to RCall.

This is not a novel idea by any means. Julia already has PyCall and JavaCall packages that provide access to Python and to Java. These packages are used extensively and are much more sophisticated than RCall, at present. Many other languages have facilities to run an embedded instance of R. In fact, Python has several such interfaces.

The things I plan to do using RCall is to access datasets from R and R packages, to fit models that are not currently implemented in Julia and to use R graphics, especially the ggplot2 and lattice packages. Unfortunately I am not currently able to start a graphics device from the embedded R but I expect that to be fixed soon.

I can tell you the most remarkable aspect of RCall although it may not mean much if you haven't tried to do this kind of thing. It is written entirely in Julia. There is absolutely no "glue" code written in a compiled language like C or C++. As I said, this may not mean much to you unless you have tried to do something like this, in which case it is astonishing.

Monday, May 21, 2012

The Simple Gibbs example in Julia

The Gibbs sampler discussed on Darren Wilkinson's blog and also on Dirk Eddelbuettel's blog has been implemented in several languages, the first of which was R.

In preparation for a session at useR!2012 on "What other languages should R users know about?", Dirk, Chris Fonnesbeck and I have considered implementations of this simple sampler in other languages.  I describe a Julia implementation below.  Full details on all of the implementations are available at Chris's github repository.

The task is to create a Gibbs sampler for the unscaled density
 f(x,y) = x x^2 \exp(-xy^2 - y^2 + 2y - 4x)
using the conditional distributions
 x|y \sim Gamma(3, y^2 +4)
 y|x \sim Normal(\frac{1}{1+x}, \frac{1}{2(1+x)})
Dirk's version of Darren's original R function is
Rgibbs <- function(N,thin) {
    mat <- matrix(0,ncol=2,nrow=N)
    x <- 0
    y <- 0
    for (i in 1:N) {
        for (j in 1:thin) {
            x <- rgamma(1,3,y*y+4)
            y <- rnorm(1,1/(x+1),1/sqrt(2*(x+1)))
        mat[i,] <- c(x,y)
Dirk also shows the use of the R byte compiler on this function
RCgibbs <- cmpfun(Rgibbs)
In the examples directory of the Rcpp package, Dirk provides an R script using the inline, Rcpp and RcppGSL packages to implement this sampler in C++ code callable from R and time the results. On my desktop computer, timing 10 replications of Rgibbs(20000, 200) and the other versions produces
               test replications elapsed  relative user.self sys.self
4  GSLGibbs(N, thn)           10   8.338  1.000000     8.336    0.000
3 RcppGibbs(N, thn)           10  13.285  1.593308    13.285    0.000
2   RCgibbs(N, thn)           10 369.843 44.356320   369.327    0.032
1    Rgibbs(N, thn)           10 473.511 56.789518   472.754    0.044
A naive translation of Rgibbs to Julia can use the same samplers for the gamma and normal distributions as does R. The C code for R's d-p-q-r functions for probability densities, cumulative distribution, quantile and random sampling can be compiled into a separate Rmath library. These sources are included with the Julia sources and Julia functions with similar calling sequences are available as "extras/Rmath.jl"

function JGibbs1(N::Int, thin::Int)
    mat = Array(Float64, (N, 2))
    x   = 0.
    y   = 0.
    for i = 1:N
        for j = 1:thin
            x = rgamma(1,3,(y*y + 4))[1]
            y = rnorm(1, 1/(x+1),1/sqrt(2(x + 1)))[1]
        mat[i,:] = [x,y]
You can see that JGibbs1 is essentially the same code as Rgibbs with minor adjustments for syntax. A similar timing on the same computer gives
julia> sum([@elapsed JGibbs1(20000, 200) for i=1:10])
julia> sum([@elapsed JGibbs1(20000, 200) for i=1:10])
which is 17 times faster than Rgibbs and 13 times faster than RCgibbs. It's actually within a factor of 2 of the compiled code in RcppGibbs.
One of the big differences between this function and the compiled C++ function, RcppGibbs, is that the compiled function calls the underlying C code for the samplers directly, avoiding the overhead of creating a vector of length 1 and indexing to get the first element. As these operations are done in the inner loop of JGibbs1 their overhead mounts up.
Fortunately, Julia allows for calling a C function directly. You need the symbol from the dynamically loaded library, the signature of the function and the arguments.
It looks like
function JGibbs2(N::Int, thin::Int)
    mat = Array(Float64, (N, 2))
    x   = 0.
    y   = 0.
    for i = 1:N
        for j = 1:thin
            x = ccall(dlsym(_jl_libRmath, :rgamma),
                      Float64, (Float64, Float64), 3., (y*y + 4))
            y = ccall(dlsym(_jl_libRmath, :rnorm),
                      Float64, (Float64, Float64), 1/(x+1), 1/sqrt(2(x + 1)))
        mat[i,:] = [x,y]
The timings are considerably faster, essentially the same as RcppGibbs
julia> sum([@elapsed JGibbs2(20000, 200) for i=1:10])
julia> sum([@elapsed JGibbs2(20000, 200) for i=1:10])
If we switch to the native Julia random samplers for the gamma and normal distribution, the function becomes
function JGibbs3(N::Int, thin::Int)
    mat = Array(Float64, (N, 2))
    x   = 0.
    y   = 0.
    for i = 1:N
        for j = 1:thin
            x = randg(3) * (y*y + 4)
            y = 1/(x + 1) + randn()/sqrt(2(x + 1))
        mat[i,:] = [x,y]
and the timings are
julia> sum([@elapsed JGibbs3(20000, 200) for i=1:10])
julia> sum([@elapsed JGibbs3(20000, 200) for i=1:10])
So now we are beating the compiled code from both RcppGibbs (which is using slower samplers) and GSLGibbs (faster samplers but not as fast as those in Julia) while writing code that looks very much like the original R function.
But wait, there's more!
This computer has a 4-core processor (AMD Athlon(tm) II X4 635 Processor @ 2.9 GHz) and Julia can take advantage of that. When starting Julia we specify the number of processes
julia -p 4
and use Julia's tools for parallel execution. An appealing abstraction in Julia is that of "distributed arrays". A distributed array is declared like an array with an element type and dimensions plus two additional arguments: the dimension on which to distribute the array to the different processes and a function that states how each section should be constructed. Usually this is an anonymous function. We will show two versions of the distributed sampler: the first, dJGibbs3a, leaves the result as a distributed array and the second, dJGibbs3b, converts the result to a single array in the parent process.
## Distributed versions - keeping the results as a distributed array
dJGibbs3a(N::Int, thin::Int) = darray((T,d,da)->JGibbs3(d[1],thin), Float64, (N, 2), 1)
## Converting the results to an array controlled by the parent process
dJGibbs3b(N::Int, thin::Int) = convert(Array{Float64,2}, dJGibbs3a(N, thin))
Notice that these are one-liners. In Julia, a function consisting of a single expression can be written by giving the signature and that expression, as shown. The anonymous function in dJGibbs3a is declared with the right-pointing arrow construction ->. Its arguments are the type of the array, T, the dimensions of the local chunk, d, and the dimension on which the array is distributed, da. Here we only use the number of rows, d[1], of the chunk to be generated.
The timings,
julia> sum([@elapsed dJGibbs3a(20000, 200) for i=1:10])
julia> sum([@elapsed dJGibbs3a(20000, 200) for i=1:10])
julia> sum([@elapsed dJGibbs3b(20000, 200) for i=1:10])
julia> sum([@elapsed dJGibbs3b(20000, 200) for i=1:10])
are remarkable. The speed-up with 4 processes leaving the results as a distributed array, which would be the recommended approach if we were going to do further processing, is essentially 4x. This is because there is almost no communication overhead. When converting the results to a (non-distributed) array, the speed-up is 3x.
If you haven't looked into Julia before now, you owe it to yourself to do so.

Saturday, April 7, 2012

An R programmer looks at Julia

In January of this year I first saw mention of the Julia language in the release notes for LLVM. I mentioned this to Dirk Eddelbuettel and later we got in contact with Viral Shah regarding a Debian package for Julia.

There are many aspects of Julia that are quite intriguing to an R programmer. I am interested in programming languages for "Computing with Data", in John Chambers' term, or "Technical Computing", as the authors of Julia classify it. I believe that learning a programming language is somewhat like learning a natural language in that you need to live with it and use it for a while before you feel comfortable with it and with the culture surrounding it.

A common complaint for those learning R is finding the name of the function to perform a particular task. In writing a bit of Julia code for fitting generalized linear models, as described below, I found myself in exactly the same position of having to search through documentation to find how to do something that I felt should be simple. The experience is frustrating but I don't know of a way of avoiding it. One word of advice for R programmers looking at Julia, the names of most functions correspond to the Matlab/octave names, not the R names. One exception is the d-p-q-r functions for distributions, as I described in an earlier posting.

Evaluating a new language

There is a temptation to approach a new programming language with the idea that it should be exactly like the language I am currently using but better, faster, easier, ... This is, of course, ridiculous. If you want R syntax, semantics, function names, and packages then use R. If you consider another language you should accept that many tasks will be done differently. Some will be done better than in your current language and some will not be done as well. And the majority will just be done differently.

Two primary Julia developers, Jeff Bezanson and Stefan Karpinski, recently gave a presentation at the lang.NEXT conference. The slides and video can help you get a flavor of the language.

So, what does Julia provide that R doesn't?

  • A JIT. The first time that a function is called with a particular argument signature it is compiled to machine code using the LLVM compiler backend. This, naturally, provides a speed boost. It can also affect programming style. In R one instinctively writes vectorized code to avoid a performance hit. In Julia non-vectorized code can have similar and, under some circumstances, better performance than vectorized code.

  • A richer type system. Scalars, vectors/arrays, tuples, composite types and several others can be defined in Julia. In R the atomic types are actually atomic vectors and the composite types are either lists (which is actually a type of vector in R, not a lisp-like list), vectors with attributes (called structures in R but not to be confused with C's structs). The class systems, S3 and S4, are built on top of these structures.

    Currently it is difficult to work with 64-bit integers in R but not in Julia

  • Multiple dispatch. In Julia all functions provide multiple dispatch to handle different signatures of arguments. In some sense, functions are just names used to index into method tables. In R terminology all functions are generic functions with S4-like dispatch, not S3-like which only dispatches on the first argument.

  • Parallelism. A multiple process model and distributed arrays are a basic part of the Julia language.

  • A cleaner model for accessing libraries of compiled code. As Jeff and Stefan point out, languages like R and Matlab/octave typically implement the high-level operations in "glue" code that, in turn, accesses library code. In Julia one can short-circuit this process and call the library code from the high-level language. See the earlier blog posting on accessing the Rmath library from Julia. This may not seem like a big deal unless you have written thousands of lines of glue code, as I have.

There are some parts of R not present in Julia that programmers may miss.

  • Default arguments of functions and named actual arguments in function calls. The process of matching actual arguments to formal arguments in R function calls is richer. All matching in Julia is by position within function signature. Function design with many arguments requires more thought and care in Julia than in R.

  • The R package system and CRAN. One could also throw in namespaces and function documentation standards as part of the package system. These are important parts of R that are not yet available. However, that does not preclude simular facilities being developed for Julia. The R package system did not spring fully grown from the brow of Zeus.

  • Graphics. One of the big advantages of R over similar languages is the sophisticated graphics available in ggplot2 or lattice. Work is underway on graphics models for Julia but, again, it is early days still.

An example: Generalized Linear Models (GLMs)

Most R users are familiar with the basic model-fitting functions like lm() and glm(). These functions call model.frame and model.matrix to form the numerical representation of the model from the formula and data specification, taking into account optional arguments such as subset, na.action and contrasts. After that they call the numerical workhorses, lm.fit and glm.fit. What I will describe is a function like glm.fit (the name in Julia cannot include a period and I call it glmFit) and structures like the glm "family" in R.

A glm family is a collection of functions that describe the distribution of the response, given the mean, and the link between the linear predictor and the mean response. The link function takes the mean response, mu, to the "linear predictor", eta, of the form

X %*% beta  # in R


X * beta    # in Julia

where X is the model matrix and beta is the coefficient vector. In practice we more frequently evaluate the inverse link, from eta to mu, and its derivative.

We define a composite type

type Link
    name::String          # name of the link
    linkFun::Function     # link function  mu -> eta
    linkInv::Function     # inverse link  eta -> mu
    muEta::Function       # derivative    eta -> d mu/d eta

with specific instances

logitLink =
        mu  -> log(mu ./ (1. - mu)),
        eta -> 1. ./ (1. + exp(-eta)),
        eta -> (e = exp(-abs(eta)); f = 1. + e; e ./ (f .* f)))

logLink =
        mu  -> log(mu),
        eta -> exp(eta),
        eta -> exp(eta))

identityLink =
        mu  -> mu,
        eta -> eta,
        eta -> ones(eltype(eta), size(eta)))

inverseLink =
        mu  ->  1. ./ mu,
        eta ->  1. ./ eta,
        eta -> -1. ./ (eta .* eta))

for the common links. Here I have used the short-cut syntax

mu -> log(mu ./ (1. - mu))

to create anonymous functions. The muEta function for the logitLink shows the use of multiple statements within the body of the anonymous function. Like R, the last expression evaluated in a function is the value of a Julia function.

I also ensure that the functions in the Link object are safe for vector arguments by using the componentwise-forms of multiplicative operators ("./" and ".*"). These forms apply to scalars as well as vector. I use explicit floating-point constants, such as 1., to give the compiler a hint that the result should be a floating-point scalar or vector.

A distribution characterizes the variance as a function of the mean, and the "deviance residuals" used to evaluate the model-fitting criterion. For some distributions, including the Gaussian distribution the model-fitting criterion (residual sum of squares) is different from the deviance itself but minimizing this simpler criterion provides the maximum-likelihood (or minimum deviance) estimates of the coefficients.

When we eventually fit the model we want the value of the deviance itself, which may not be the same as the sum of squared deviance residuals. Other functions are used to check for valid mean vectors or linear predictors. Also each distribution in what is called the "exponential family", which includes many common distributions like Bernoulli, binomial, gamma, Gaussian and Poisson, has a canonical link function derived from the distribution itself.

The composite type representing the distribution is

type Dist
    name::String             # the name of the distribution
    canonical::Link          # the canonical link for the distribution
    variance::Function       # variance function mu -> var
    devResid::Function       # vector of squared deviance residuals
    deviance::Function       # the scalar deviance
    mustart::Function        # derive a starting estimate for mu
    validmu::Function        # check validity of the mu vector
    valideta::Function       # check validity of the eta vector

with specific distributions defined as

## utilities used in some distributions
logN0(x::Number) = x == 0 ? x : log(x)
logN0{T<:Number}(x::AbstractArray{T}) = reshape([ logN0(x[i]) | i=1:numel(x) ], size(x))
y_log_y(y, mu) = y .* logN0(y ./ mu)    # provides correct limit at y == 0

BernoulliDist =
         mu  -> max(eps(Float64), mu .* (1. - mu)),
         (y, mu, wt)-> 2 * wt .* (y_log_y(y, mu) +  y_log_y(1. - y, 1. - mu)),
         (y, mu, wt)-> -2. * sum(y .* log(mu) + (1. - y) .* log(1. - mu)),
         (y, wt)-> (wt .* y + 0.5) ./ (wt + 1.),
         mu  -> all((0 < mu) & (mu < 1)),
         eta -> true)
gammaDist =
        mu -> mu .* mu,
        (y, mu, wt)-> -2 * wt .* (logN0(y ./ mu) - (y - mu) ./ mu),
        (y, mu, wt)-> (n=sum(wt); disp=sum(-2 * wt .* (logN0(y ./ mu) - (y - mu) ./ mu))/n; invdisp(1/disp); sum(wt .* dgamma(y, invdisp, mu * disp, true))),
        (y, wt)-> all(y > 0) ? y : error("non-positive response values not allowed for gammaDist"),
        mu  -> all(mu > 0.),
        eta -> all(eta > 0.))

GaussianDist =
         mu  -> ones(typeof(mu), size(mu)),
         (y, mu, wt)-> (r = y - mu; wt .* r .* r),
         (y, mu, wt)-> (n = length(mu); r = y - mu; n * (log(2*pi*sum(wt .* r .* r)/n) + 1) + 2 - sum(log(wt))),
         (y, wt)-> y,
         mu  -> true,
         eta -> true)

PoissonDist =
         mu  -> mu,
         (y, mu, wt)-> 2 * wt .* (y .* logN0(y ./ mu) - (y - mu)),
         (y, mu, wt)-> -2 * sum(dpois(y, mu, true) * wt),
         (y, mu)-> y + 0.1,
         mu  -> all(mu > 0),
         eta -> true)

Finally the GlmResp type consists of the distribution, the link, the response, the mean and linear predictor. Occasionally we want to add an offset to the linear predictor expression or apply prior weights to the responses and these are included.

type GlmResp                            # response in a glm model
    eta::Vector{Float64}        # linear predictor
    mu::Vector{Float64}         # mean response
    offset::Vector{Float64}     # offset added to linear predictor (usually 0)
    wts::Vector{Float64}        # prior weights
    y::Vector{Float64}          # response

With just this definition we would need to specify the offset, wts, etc. every time we construct such an object. By defining an outer constructor (meaning a constructor that is defined outside the type definition) we can provide defaults

## outer constructor - the most common way of creating the object
function GlmResp(dist::Dist, link::Link, y::Vector{Float64})
    n  = length(y)
    wt = ones(Float64, (n,))
    mu = dist.mustart(y, wt)
    GlmResp(dist, link, link.linkFun(mu), mu, zeros(Float64, (n,)), wt, y)

## another outer constructor using the canonical link for the distribution
GlmResp(dist::Dist, y::Vector{Float64}) = GlmResp(dist, dist.canonical, y)

The second outer constructor allows us to use the canonical link without needing to specify it.

There are several functions that we wish to apply to the glmResp object.

deviance( r::GlmResp) = r.dist.deviance(r.y, r.mu, r.wts)
devResid( r::GlmResp) = r.dist.devResid(r.y, r.mu, r.wts)
drsum(    r::GlmResp) = sum(devResid(r))
muEta(    r::GlmResp) = r.link.muEta(r.eta)
sqrtWrkWt(r::GlmResp) = muEta(r) .* sqrt(r.wts ./ variance(r))
variance( r::GlmResp) = r.dist.variance(r.mu)
wrkResid( r::GlmResp) = (r.y - r.mu) ./ r.link.muEta(r.eta)
wrkResp(  r::GlmResp) = (r.eta - r.offset) + wrkResid(r)

Our initial definitions of these functions are actually method definitions because we declare that the argument r must be a GlmResp. However, defining a method also defines the function if it has not already been defined. In R we must separately specify S3 generics and methods. An S4 generic is automatically created from a method specification but the interactions of S4 generics and namespaces can become tricky.

Note that the period ('.') is used to access components or members of a composite type and cannot be used in a Julia identifier.

The function to update the linear predictor requires both a GlmResp and the linear predictor value to assign. Here we use an abstract type inheriting from Number to allow a more general specification

updateMu{T<:Number}(r::GlmResp, linPr::AbstractArray{T}) = (r.eta = linPr + r.offset; r.mu = r.link.linkInv(r.eta); drsum(r))

Updating the linear predictor also updates the mean response then returns the sum of the square deviance residuals. Note that this is not functional semantics in that the object r being passed as an argument is updated in place. Arguments to Julia functions are passed by reference. Those accustomed to the functional semantics of R (meaning that you can't change the value of an argument by passing it to a function) should beware.

Finally we create a composite type for a predictor

type predD                              # predictor with dense X
    X::Matrix{Float64}                  # model matrix
    beta0::Vector{Float64}              # base coefficient vector
    delb::Vector{Float64}               # increment

and an outer constructor

## outer constructor
predD(X::Matrix{Float64}) = (zz = zeros((size(X, 2),)); predD(X, zz, zz))

Given the current state of the predictor we create the weighted X'X matrix and the product of the weighted model matrix and the weighted working residuals in the accumulate function

function accumulate(r::GlmResp, p::predD)
    w   = sqrtWrkWt(r)
    wX  = diagmm(w, p.X)
    (wX' * (w .* wrkResp(r))), (wX' * wX)

This function doesn't need to be written separately but I hope that doing so will make it easier to apply these operations to distributed arrays, which, to me, seem like a natural way of applying parallel computing to many statistical computing tasks.

A function to calculate and apply the increment is

increment(r::GlmResp, p::predD) = ((wXz, wXtwX) = accumulate(r, p); bb = wXtwX \ wXz; p.delb = bb - p.beta0; updateMu(r, p.X * bb))

and finally we get to the glmFit function

function glmFit(p::predD, r::GlmResp, maxIter::Uint, minStepFac::Float64, convTol::Float64)
    if (maxIter < 1) error("maxIter must be positive") end
    if (minStepFac < 0 || 1 < minStepFac) error("minStepFac must be in (0, 1)") end
    cvg = false

    devold = typemax(Float64)           # Float64 version of Inf
    for i=1:maxIter
        dev = increment(r, p)
        if (dev < devold)
            p.beta0 = p.beta0 + p.delb
            error("code needed to handle the step-factor case")
        if abs((devold - dev)/dev) < convTol
            cvg = true
        devold = dev
    if !cvg
        error("failure to converge in $maxIter iterations")

glmFit(p::predD, r::GlmResp) = glmFit(p, r, uint(30), 0.001, 1.e-6)

Defining two methods allows for default values for some of the arguments. Here we have a kind of all-or-none approach to defaults. It is possible to create other method signatures for defaults applied to only some of the arguments. In retrospect I think I was being too cute in declaring the maxIter argument as an unsigned int. It only makes sense for this to be positive but it might make life simpler to allow it to be an integer as positivity is checked anyway.

Note the use of string interpolation in the last error message. The numeric value of maxIter will be substituted in the error message.

Checking that it works

The script


## generate a Bernoulli response

n    = 10000
X    = hcat(fill(1, (n,)), 3. * randn((n, 2)))
beta = randn((3,))
eta  = X * beta
mu   = logitLink.linkInv(eta)
y    = float64(rand(n) < mu)
println("Coefficient vector for simulation: $(beta)")

## establish the glmResp object

rr   = GlmResp(BernoulliDist, y)
pp   = predD(X)

glmFit(pp, rr)
println("Converged to coefficient vector: $(pp.beta0 + pp.delb)")
println("Deviance at estimates: $(deviance(rr))")

## reinitialize objects for timing
rr   = GlmResp(BernoulliDist, y)
pp   = predD(X)
println("Elapsed time for fitting binary response with X $(size(X,1)) by $(size(X,2)): $(@elapsed glmFit(pp, rr)) seconds")


Coefficient vector for simulation: [-0.54105, 0.308874, -1.02009]
Converged to coefficient vector: [-0.551101, 0.301489, -1.02574]
Deviance at estimates: 6858.466827205941
Elapsed time for fitting binary response with X 10000 by 3: 0.09874677658081055 seconds

Thursday, March 29, 2012

PK models in R and in Julia

1.1 Pharmacokinetic models with an analytic solution

Pharmacokinetics is the study of the absorption and elimination of drugs and their metabolites in the body. As described in the wikipedia article there are several parameters, such rate constants, volumes of distribution and clearances, that we wish to estimate from pharmacokinetic data, which usually consists of measurements of blood serum concentrations following certain doses at certain times. My particular interest is in analysis of population pharmacokinetic data using nonlinear mixed-effects models and in the design of experiments for population pharmacokinetic studies.

For many of the models based on linear elimination kinetics the concentration at time t after a dose d has an analytic solution. As part of the PFIM software developed in France Mentre's lab at Inserm, Julie Bertrand created a catalogue of these expressions for 1, 2 or 3 compartment models after oral or bolus or infusion administration of a single dose or a fixed number of doses or in steady state after multiple doses. Julie and I created the PKPDmodels package for R to provide these expressions as functions that also evaluate the gradient with respect to the parameters and, optionally, the Hessian.

At the recent PODE: Population Optimum Design of Experiments conference organized by France, the wonderful Emmanuelle Comets mentioned that in her saemix package for R she only needs the predicted concentration values. She asked if it was slowing things down to evaluate the gradient at every function evaluation. I decided to benchmark.

1.2 Model function and benchmarks in R

One of the (regrettably few) published examples of population pharmacokinetic data is from a study of Theophylline kinetics. These data are available as the Theoph data set in R. The experimental subjects were given a single oral dose of the drug. Assuming a 1-compartment model with linear elimination kinetics the expression for the model is

> library(PKPDmodels)
> PKexpr("oral", "sd")
~(dose/V) * (ka/(ka - k)) * (exp(-k * t) - exp(-ka * t))

in terms of the parameters V, the volume of distribution, ka, the absorption rate constant, and k, the elimination rate constant.

It turns out that the combination of V and k is not a particularly good choice of parameters as their estimates usually end up being highly correlated. A better choice is V and Cl, the clearance, which is defined as Cl = kV. We can achieve this parameterization by providing a list of transformation expressions from the new parameters to the old. Nothing sophisticated is going on, just a substitution of expressions in another expression. While we are at it we could switch to the logarithms of pharmacokinetic parameters which provide a log-likelihood function that is closer to being quadratic.

> PKexpr("oral", "sd", list(ka ~ exp(lka), V ~ exp(lV), k ~ exp(lCl)/V))
~(dose/exp(lV)) * (exp(lka)/(exp(lka) - exp(lCl)/V)) * (exp(-(exp(lCl)/V) * 
    t) - exp(-exp(lka) * t))

A call to PKmod with the same arguments produces a byte-compiled function that evaluates both the function and the gradient.

> (mfg <- PKmod("oral", "sd", list(ka ~ exp(lka), V ~ exp(lV), k ~ exp(lCl - lV))))
function (dose, t, lV, lka, lCl) 
    .expr1 <- exp(lV)
    .expr2 <- dose/.expr1
    .expr3 <- exp(lka)
    .expr5 <- exp(lCl - lV)
    .expr6 <- .expr3 - .expr5
    .expr7 <- .expr3/.expr6
    .expr8 <- .expr2 * .expr7
    .expr11 <- exp(-.expr5 * t)
    .expr14 <- exp(-.expr3 * t)
    .expr15 <- .expr11 - .expr14
    .expr19 <- .expr8 * (.expr11 * (.expr5 * t))
    .expr21 <- .expr6^2
    .expr23 <- .expr2 * (.expr3 * .expr5/.expr21)
    .value <- .expr8 * .expr15
    .grad <- array(0, c(length(.value), 3L), list(NULL, c("lV", 
        "lka", "lCl")))
    .grad[, "lV"] <- .expr19 - (.expr23 + dose * .expr1/.expr1^2 * 
        .expr7) * .expr15
    .grad[, "lka"] <- .expr2 * (.expr7 - .expr3 * .expr3/.expr21) * 
        .expr15 + .expr8 * (.expr14 * (.expr3 * t))
    .grad[, "lCl"] <- .expr23 * .expr15 - .expr19
    attr(.value, "gradient") <- .grad
<bytecode: 0x2bd7c88>

Because the symbolic differentiation in R to produce the gradient also performs common subexpression elimination this function produces a cleaner evaluation of the model function and a comparatively compact expression of the gradient.

For comparison I use two types of evaluations of the model function itself, one with a nested function call and one with the transformations performed internally.

> mf0 <- cmpfun(function(dose, t, V, ka, k) (dose/V) * (ka/(ka - k)) * (exp(-k * t) - exp(-ka * t)))
> mf1 <- cmpfun(function(dose, t, lV, lka, lCl) mf0(dose, t, exp(lV), exp(lka), exp(lCl - lV)))
> mf2 <- cmpfun(function(dose, t, lV, lka, lCl) {V <- exp(lV); ka <- exp(lka); k <- exp(lCl - lV); (dose/V) * (ka/(ka - k)) * (exp(-k * t) - exp(-ka * t))})

Now would be a good time to check that I haven't made transcription mistakes

> Dose <- Theoph$Dose
> str(Dose)
 num [1:132] 4.02 4.02 4.02 4.02 4.02 4.02 4.02 4.02 4.02 4.02 ...
> Time <- Theoph$Time
> lV <- rep.int(-1, length(Dose))
> lka <- rep.int(0.6, length(Dose))
> lCl <- rep.int(-4, length(Dose))
> all.equal(mf1(Dose, Time, lV, lka, lCl), mf2(Dose, Time, lV, lka, lCl))
[1] TRUE
> all.equal(mf1(Dose, Time, lV, lka, lCl), as.vector(mfg(Dose, Time, lV, lka, lCl)))
[1] TRUE

Finally we benchmark

> library(rbenchmark)
> cols <- c("test", "elapsed", "relative", "user.self", "sys.self")
> benchmark(mf1(Dose, Time, lV, lka, lCl), mf2(Dose, Time, lV, lka, lCl), mfg(Dose, Time, lV, lka, lCl), replications=1000L, columns=cols, order="elapsed")
                           test elapsed relative user.self sys.self
2 mf2(Dose, Time, lV, lka, lCl)   0.081 1.000000     0.080        0
1 mf1(Dose, Time, lV, lka, lCl)   0.084 1.037037     0.084        0
3 mfg(Dose, Time, lV, lka, lCl)   0.219 2.703704     0.220        0

So the conclusion is that there is very little difference between the nested function calls and the explicit assignment of transformed values and that it takes 2.7 times as long to evaluate both the fitted values and the gradient than does evaluation of only the fitted values. However, the actual execution times for 1000 function evaluations are sufficiently small that this operation should not be a bottleneck.

1.3 Translation into Julia

I decided to translate this code into Julia, partially to gain experience with the language. I have said that the most valuable character trait for a programmer is unbounded pessimism because you spend your working time thinking, "now what can go wrong here". Because of this tendency I was careful to distinguish scalar and vector arguments and to test for correct sizes of vector arguments, etc. A bit of testing showed me that this was unnecessary. Write it the simple way and let the functions that you call check for numeric arguments and vector-scalar operations and correct vector-vector operations, which they will do.

The only changes I needed to make were to replace '*' by '.*' and '/' by './'. The "dot" forms of these operators perform componentwise vector-vector operations. Because they are also defined for scalar-scalar operations and scalar-vector operations there is no harm in using them throughout.

The result is anticlimactic. The function definitions are

mf0(dose, t, V, ka, k) = (dose ./ V) .* (ka ./ (ka - k)) .* (exp(-k .* t) - exp(-ka .* t))
mf1(dose, t, lV, lka, lCl) = mf0(dose, t, exp(lV), exp(lka), exp(lCl - lV))

function mfg(dose, t, lV, lka, lCl)
    V      = exp(lV)
    expr2  = dose ./ V
    ka     = exp(lka)
    Cl     = exp(lCl)
    k      = Cl ./ V
    expr6  = ka - k
    expr7  = ka ./ expr6
    expr8  = expr2 .* expr7
    expr11 = exp(-k .* t)
    expr14 = exp(-ka .* t)
    expr15 = expr11 - expr14
    expr18 = V .* V
    expr19 = Cl .* V ./ expr18
    expr24 = expr6 .* expr6
    expr8 .* expr15, hcat(expr8 .* (expr11 .* (expr19 .* t)) - (expr2 .* (ka .* expr19 ./ expr24) + dose .* V ./ expr18 .* expr7) .* expr15,
                          expr2 .* (expr7 - ka .* ka ./ expr24) .* expr15 + expr8 .* (expr14 .* (ka .* t)),
                          expr2 .* (ka .* k ./ expr24) .* expr15 - expr8 .* (expr11 .* (k .* t)))

and timing the same calculations on the same machine produced

elapsed time: 0.08809614181518555 seconds
elapsed time: 0.17201519012451172 seconds

which are very close to the timings for the byte-compiled functions in R.

Monday, March 26, 2012

Julia functions for the Rmath library

1.1 Signatures of the d-p-q-r functions in libRmath

Users of R are familiar with the functions for evaluating properties of and for sampling from probability distributions, the so-called d-p-q-r functions. The designation d-p-q-r reflects the fact that, for each distribution, there are up to 4 such functions, each beginning with one of these letters and followed by an abbreviated name of the distribution. The prefixes indicate:

the density of a continuous random variable or the probability mass function of a discrete random variable
the cumulative probability function, also called the cumulative distribution function or c.d.f.
the quantile function, which is the inverse of the c.d.f. defined on the interval (0, 1)
random sampling from the distribution

Members of R-core, notably Martin Maechler, have devoted considerable effort to ensuring that these functions are both reliable and general. Because they are part of an Open Source system, they can be used in other Open Source projects. In fact, there is the capability to collect these functions in a stand-alone library. One of the R packages for Debian Linux and distributions that are derived from it, such as Ubuntu, is r-mathlib which contains this stand-alone library. The header file for this library is $RHOME/include/Rmath.h on most systems (/usr/share/R/include/Rmath.h on Debian and derived distributions because of the rules for Debian packaging).

The names and arguments of these functions follow well-defined patterns. For example, those for the χ2 distribution have signatures

double  dchisq(double, double, int);
double  pchisq(double, double, int, int);
double  qchisq(double, double, int, int);
double  rchisq(double);

The first argument of dchisq is x, the abscissa, of pchisq is q, the quantile, and of qchisq is p, the probability. The next argument of the d-p-q functions, and the only argument of rchisq, is the distribution parameter, df, which is the degrees of freedom of the χ2. The last argument in the d-p-q functions controls whether the probability is on the logarithm scale. It is named give_log for the density function and log_p for the probability and quantile functions. (At first glance this argument may seem unnecessary as the probability density, for example, could be calculated on the probability scale then transformed to the logarithm scale. However, operations that are mathematically equivalent do not always produce the same result in floating point arithmetic. There are good reasons for these cases.)

The second last argument in the p and q functions determines whether probability is accumulated from the left, the lower tail, or from the right, the upper tail. The default is the lower tail. Again, there is an apparent redundancy because one can always subtract the probability in the lower tail from 1 to get the probability in the upper tail. However when x is extremely small but positive, 1 - x rounds to 1 and small upper-tail probabilities truncate too quickly. (Remember Kernighan and Plauger's aphorism that "10 times 0.1 is hardly ever 1" - that's what you live with in floating point computations.)

1.2 Calling these functions from Julia

It is surprisingly easy from within Julia to call C functions like these having simple signatures. First you call the Julia function dlopen to access the shared object file and save the handle

julia> _jl_libRmath = dlopen("libRmath")
Ptr{Void} @0x0000000003461050

I use an awkward, patterned name for this variable to avoid potential name conflicts. At present the Julia namespace is flat (but that is likely to change as namespaces are already being discussed).

Then a call to, say, pchisq needs only

julia> ccall(dlsym(_jl_libRmath,:pchisq),Float64,(Float64,Float64,Int32,Int32), 1.96^2, 1, true, false)

The first argument to the ccall function is the symbol extracted with dlsym, followed by the specific type of the function value, the argument signatures and the actual arguments. The actual arguments are converted to the signature shown before being passed to the C function. Thus the true is converted to 1 and the false to 0.

The value of this function call, 0.95000, is what we would expect, because a χ2 on 1 degree of freedom is the square of a standard normal distribution and the interval [-1.960, 1.960] contains approximately 95% of the probability of the standard normal.

1.3 Creating a Julia function for scalar arguments

Obviously, calling such a C function from the Rmath library is simpler if we wrap the call in a Julia function. We can define such a function as a one-liner (well, two lines but only because it would be a very long single line)

pchisq(q::Number, df::Number, lower_tail::Bool, log_p::Bool) =
   ccall(dlsym(_jl_libRmath,:pchisq),Float64,(Float64,Float64,Int32,Int32), q, df, lower_tail, log_p)

This illustrates one of the methods of defining a function in Julia,

fname(arg1, arg2, ...) = expression

This form, which reads much like the mathematical description f(x) = value, is useful when the function body is a single expression.

This code defines a particular method, with a reasonably general signature, for pchisq. The C function pchisq requires double precision values (called Float64 in Julia) for the first two arguments and integers for the last two. However, these integers are treated as Boolean values, so we require the arguments passed to the Julia function to be Boolean. The Number type in Julia is an abstract type incorporating all the integer and real number types (including complex and rational numbers, as it turns out). The Julia function, pchisq, can have methods of many different signatures. In a sense a function in Julia is just a name used to index into a method table.

We could have allowed unrestricted arguments, as in

pchisq(q, df, lower_tail, log_p) = <something>

but then we would need to define checks on the argument types, to ensure that they are appropriate and to check for vector versus scalar arguments, in the function body. It is better to use the method signature to check for general forms of arguments and to distinguish the scalar and vector cases. (Vector methods are defined in the next section).

This is one of the organizing principles of Julia; it is built around functions but actually all functions are methods chosen via multiple dispatch. So thinking in terms of signatures right off the bat is a good idea.

1.4 Default argument values

In R we can match arguments by names in a function call and we can define default values for arguments that are not given a value. At present neither of these capabilities is available in Julia. However, we can provide defaults for trailing arguments by creating methods with reduced signatures

pchisq(q::Number, df::Number, lower_tail::Bool) = pchisq(q, df, lower_tail, false)
pchisq(q::Number, df::Number) = pchisq(q, df, true, false)

The fact that the signature is recognized by position and not by using named actual arguments means that whenever log_p is true we must specify lower_tail, even if its value is the default value, true.

In the case of pchisq the distribution parameter, df, does not have a default value. For many other distributions there are default values of distribution parameters. For example, the distribution parameters for the logistic distribution are location and scale with defaults 0 and 1. Having defined the 5-argument scalar version of plogis

plogis(q::Number, l::Number, s::Number, lo::Bool, lg::Bool) = 
    ccall(dlsym(_jl_libRmath,:plogis),Float64,(Float64,Float64,Float64,Int32,Int32), q, l, s, lo, lg)

(for brevity we have used the names l, s, lo and lg for the last four arguments) we can define the methods with default values as

plogis(q::Number, l::Number, lo::Bool, lg::Bool) = plogis(q, l, 1, lo, lg)
plogis(q::Number, lo::Bool, lg::Bool) = plogis(q, 0, 1, true, false)
plogis(q::Number, l::Number, s::Number, lo::Bool) = plogis(q, l, s, lo, false)
plogis(q::Number, l::Number, lo::Bool) = plogis(q, l, 1, lo, false)
plogis(q::Number, lo::Bool) = plogis(q, 0, 1, lo, false)
plogis(q::Number, l::Number, s::Number) = plogis(q, l, s, true, false)
plogis(q::Number, l::Number) = plogis(q, l, 1, true, false)
plogis(q::Number) = plogis(q, 0, 1, true, false)

Because a Bool is not a Number, a signature such as (Number, Bool) can be distinguished from (Number, Number).

These method definitions allow for some combinations of default values but not all combinations. Having named arguments, possibly with default values, is still a more flexible system but these method signatures do handle the most important cases.

Defining all those methods can get tedious (not to mention providing the possibility of many transcription errors) so it is worthwhile scripting the process using the macro language for Julia. See the file extras/Rmath.jl in the Julia distribution for the macros that generate both these methods and the vectorized methods described in the next section.

1.5 Vectorizing the function

In R there are no scalars, only vectors of length 1, so, in that sense, all functions are defined for vector arguments. In addition, many functions, including the d-p-q functions, are vectorized in the sense that they create a result of the appropriate form from vector arguments whose length is greater than 1. Thus providing a vector of quantiles to pchisq will return a vector of probabilities. We would want the same to be true for the Julia functions.

In Julia scalars are distinct from vectors and we use the method dispatch to distinguish scalar and vector cases. I think a general programming paradigm for Julia is first to define a function for scalar arguments and then build the vectorized version from that. However, I don't yet have enough experience to say if this really is a general paradigm.

We can write methods for a vector q as

pchisq(q::Vector{Number}, df::Number, lower_tail::Bool, log_p::Bool) = 
    [ pchisq(q[i], df, lower_tail, log_p) | i=1:length(q) ]
pchisq(q::Vector{Number}, df::Number, lower_tail::Bool) = 
    [ pchisq(q[i], df, lower_tail, false) | i=1:length(q) ]
pchisq(q::Vector{Number}, df::Number) = 
    [ pchisq(q[i], df, true, false) | i=1:length(q) ]

These methods use a "comprehension" to specify the loop over the individual elements of the array producing another array. They could, of course, be written in a more conventional looping notation but the comprehension notation is powerful and compact, similar to the "apply" family of functions in R, so we use it here. Like the one-line function definition, the comprehension reads much like a mathematical expression to create an array of values of a certain form for i = 1,…,n.

By the way, experienced R programmers, who know not to write 1:length(q) because of unexpected results when length(q) is zero, need not worry. The sequence notation, a:b doesn't count down in Julia when a is greater than b so 1:0 has length zero. That is, 1:length(q) in Julia always produces the same answer as seq_along(q) does in R.

We could also vectorize the df argument which leads us to consider the case of vector df and scalar q and the case of vector df and vector q, etc. At this point we realize that we are seeing many variations on a theme and decide that it is best to script this vectorization. Fortunately the file base/operations.jl has macros _jl_vectorize_1arg and _jl_vectorize_2arg which we can adapt for this purpose.

1.6 The result

As of this writing, loading the file extras/Rmath.jl

julia> load("../extras/Rmath.jl")


julia> whos()
R_pow                         Function
_jl_libRmath                  Ptr{None}
dbeta                         Function
dbinom                        Function
dcauchy                       Function
dchisq                        Function
dexp                          Function
df                            Function
dgamma                        Function
dgeom                         Function
dlnorm                        Function
dlogis                        Function
dnbinom                       Function
dnchisq                       Function
dnorm                         Function
dpois                         Function
dsignrank                     Function
dt                            Function
dunif                         Function
dweibull                      Function
dwilcox                       Function
pbeta                         Function
pbinom                        Function
pcauchy                       Function
pchisq                        Function
pexp                          Function
pf                            Function
pgamma                        Function
pgeom                         Function
plnorm                        Function
plogis                        Function
pnbinom                       Function
pnchisq                       Function
pnorm                         Function
ppois                         Function
psignrank                     Function
pt                            Function
punif                         Function
pweibull                      Function
pwilcox                       Function
qbeta                         Function
qbinom                        Function
qcauchy                       Function
qchisq                        Function
qexp                          Function
qf                            Function
qgamma                        Function
qgeom                         Function
qlnorm                        Function
qlogis                        Function
qnbinom                       Function
qnchisq                       Function
qnorm                         Function
qpois                         Function
qsignrank                     Function
qt                            Function
qunif                         Function
qweibull                      Function
qwilcox                       Function
rchisq                        Function
rgeom                         Function
rpois                         Function
rsignrank                     Function
rt                            Function
set_seed                      Function

and a function like plogis provides many method signatures

julia> plogis
Methods for generic function plogis
plogis(Number,Number,Number,Bool,Bool) at /home/bates/build/julia/extras/../extras/Rmath.jl:275
plogis(Number,Number,Bool,Bool) at /home/bates/build/julia/extras/../extras/Rmath.jl:281
plogis(Number,Number,Number,Bool) at /home/bates/build/julia/extras/../extras/Rmath.jl:277
plogis(Number,Bool,Bool) at /home/bates/build/julia/extras/../extras/Rmath.jl:287
plogis(Number,Number,Bool) at /home/bates/build/julia/extras/../extras/Rmath.jl:283
plogis(Number,Number,Number) at /home/bates/build/julia/extras/../extras/Rmath.jl:279
plogis(Number,Bool) at /home/bates/build/julia/extras/../extras/Rmath.jl:289
plogis(Number,Number) at /home/bates/build/julia/extras/../extras/Rmath.jl:285
plogis(Number,) at /home/bates/build/julia/extras/../extras/Rmath.jl:291
plogis{T1<:Number,T2<:Number}(T1<:Number,AbstractArray{T2<:Number,N}) at operators.jl:162
plogis{T1<:Number,T2<:Number}(AbstractArray{T1<:Number,N},T2<:Number) at operators.jl:165
plogis{T1<:Number,T2<:Number}(AbstractArray{T1<:Number,N},AbstractArray{T2<:Number,N}) at operators.jl:169
plogis{T1<:Number,T2<:Number,T3<:Number}(AbstractArray{T1<:Number,N},T2<:Number,T3<:Number) at /home/bates/build/julia/extras/../extras/Rmath.jl:7
plogis{T1<:Number,T2<:Number,T3<:Number}(T1<:Number,AbstractArray{T2<:Number,N},T3<:Number) at /home/bates/build/julia/extras/../extras/Rmath.jl:10
plogis{T1<:Number,T2<:Number,T3<:Number}(T1<:Number,T2<:Number,AbstractArray{T3<:Number,N}) at /home/bates/build/julia/extras/../extras/Rmath.jl:13
plogis{T1<:Number,T2<:Number,T3<:Number}(AbstractArray{T1<:Number,N},AbstractArray{T2<:Number,N},T3<:Number) at /home/bates/build/julia/extras/../extras/Rmath.jl:16
plogis{T1<:Number,T2<:Number,T3<:Number}(T1<:Number,AbstractArray{T2<:Number,N},AbstractArray{T3<:Number,N}) at /home/bates/build/julia/extras/../extras/Rmath.jl:20
plogis{T1<:Number,T2<:Number,T3<:Number}(AbstractArray{T1<:Number,N},T2<:Number,AbstractArray{T3<:Number,N}) at /home/bates/build/julia/extras/../extras/Rmath.jl:24

You will see that the vectorized methods have slightly more general signatures involving AbstractArrays of Numbers, which also handles cases such as matrix arguments.

The methods allow for calling plogis in a way that is natural to R programmers

julia> plogis(-1:0.2:1, 0)
[0.268941, 0.310026, 0.354344, 0.401312, 0.450166, 0.5, 0.549834, 0.598688, 0.645656, 0.689974, 0.731059]

julia> plogis(-1:0.2:1, 0, 2)
[0.377541, 0.401312, 0.425557, 0.450166, 0.475021, 0.5, 0.524979, 0.549834, 0.574443, 0.598688, 0.622459]

(In running that example I realized that I had omitted some cases from my macros. That will be repaired "in the fullness of time" as Bill Venables is wont to say.)

Monday, March 12, 2012

A Julia version of the multinomial sampler

In the previous post on RcppEigen I described an example of sampling from collection of multinomial distributions represented by a matrix of probabilities.  In the timing example the matrix was 100000 by 5 with each of the 100000 rows summing to 1.  The objective is to create a vector of 100000 1-based indices representing a sample from the probabilities in each row.

For each row we take the cumulative sums and, for safety, normalize by dividing by the last element then compare these values to a random draw from a standard uniform distribution.  The number of elements in the cumulative sums that are less than the uniform draw is the 0-based index of the result.  We add 1 to convert to a 1-based index.

I have been experimenting a bit with a very interesting new language called Julia and decided to write a similar function in it. The version shown here has been updated according to suggestions from Jeff Bezanson, Stefan Karpinski and Viral Shah on the julia-dev  list at Google Groups

function samp1(x::Array{Float64, 2},)
    cs = cumsum(reshape(x, length(x)))
    sum(cs/cs[end] < rand()) + 1

function samp(X::Array{Float64, 2},)
    if any(X < 0)
        error("Negative probabilities not allowed")
    [samp1(X[i,:]) | i = 1:size(X,1)]
This version is about 10 times as fast as the pure R version but about 9 times slower than the RcppEigen version.

In the thread on the julia-dev list about this example Stefan Karpinski showed that in Julia you can enhance performance by de-vectorizing your code and came up with the following version which is much faster than the RcppEigen version
function SKsamp(X::Matrix{Float64})
  if any(X < 0)
    error("Negative probabilities not allowed")
  s = Array(Int, size(X,1))
  for i = 1:size(X,1)
    r = rand()
    for j = 1:size(X,2)
      r -= X[i,j]
      if r <= 0.0
        s[i] = j
  return s
To a longtime R/S programmer the concept of de-vectorizing your code seems heretical but I can understand that code created by a JIT will be happier with the looping and break style.

In any case, I think this example shows that R programmers should take a look at Julia. Two immediate applications I can imagine are McMC methods and large scale iterative fits such as Generalized linear models

Sunday, March 11, 2012

An RcppEigen example

R is an Open Source project providing an interactive language and environment for statistical computing.  It has become the lingua franca for research in statistical methods.  Because R is an interpreted language it is comparatively easy to develop applications (called packages) for it.  However, the resulting code can sometimes be slow, which is a problem in compute-bound methods like Markov chain Monte Carlo

From its inception R, and its predecessor S, have allowed calls to compiled code written in C or Fortran but such programming is not for the faint-hearted.  There are many ways in which you can trip yourself up.  Over the past several years Dirk Eddelbuettel and Romain Francois (there should be a cedilla under the c but I don't know how to create one) developed a package called Rcpp that provides C++ classes and methods for encapsulating R objects.  I recently started using these in the lme4 package for mixed-effects models that I develop with Martin Maechler and Ben Bolker.

High-performance numerical linear algebra is particularly important in the mixed-effects models calculations which use both dense and sparse matrices.  I ran across a wonderful linear algebra system called Eigen that is a templated library of C++ classes and methods and, with Dirk and Romain, wrote the RcppEigen package to interface with R.

This posting is to show an example of code that can be made to run much faster using RcppEigen than in the original R code.

The example, from Dongjun Chung, requires sampling from a collection of multinomial random variables as part of an iterative estimation method for parameter estimation in his R package for the analysis of ChIP-sequencing data.  There are generally a small number of categories - 5 is typical - and a relatively large number (say 10,000) instances that are available as a 10000 by 5 matrix of non-negative elements whose rows sum to 1.  At each iteration a sample of size 10000 consisting of indices in the range 1 to 5 is to be generated from the current matrix of probabilities.  Dongjun wrote an R function for this

Rsamp <- function(X) {
  stopifnot(is.numeric(X <- as.matrix(X)),
            (nc <- ncol(X)) > 1L,
            all(X >= 0))
  apply(X, 1L, function(x) sample(nc, size=1L, replace=FALSE, prob=x+1e-10))
which is careful R code (e.g. using apply instead of running a loop) but, even so, this function is the bottleneck in the method.

A method using RcppEigen requires a similar R function

RcppSamp <- function(X) {
  stopifnot(is.numeric(X <- as.matrix(X)),
            (nc <- ncol(X)) > 1L,
            all(X >= 0))
  .Call(CppSamp, X)

and a C++ function

SEXP CppSamp(SEXP X_) {
  typedef Eigen::ArrayXd   Ar1;
  typedef Eigen::ArrayXXd  Ar2;
  typedef Eigen::Map<Ar2> MAr2;

  const MAr2 X(as<MAr2>(X_));
  int m(X.rows()), n(X.cols()), nm1(n - 1);
  Ar1     samp(m);
  RNGScope sc; // Handle GetRNGstate()/PutRNGstate()
  for (int i=0; i < m; ++i) {
    Ar1 ri(X.row(i));
    std::partial_sum(ri.data(), ri.data() + n, ri.data())
    ri /= ri[nm1];    // normalize to sum to 1
    samp[i] = (ri < ::unif_rand()).count() + 1;
  return wrap(samp);

The general idea is that the Eigen::ArrayXd class is a one-dimensional array of doubles and the Eigen::ArrayXXd class is a two-dimensional array of doubles.  Operations on array classes are component-wise operations or reductions.  There are corresponding classes Eigen::VectorXd and Eigen::MatrixXd that provide linear algebra operations.  A Eigen::Map of another structure has the corresponding structure but takes a pointer to the storage instead of allocating its own storage.  One of the idioms of writing with RcppEigen is to create const Eigen::Map<klass> objects from the input arguments, thus avoiding unnecessary copying.  The as templated function and the wrap function are part of RcppEigen that generalizes the methods in Rcpp for converting back and forth between the R objects and the C++ classes.

Here the approach is to find the cumulative sums in each row, normalize these sums by dividing by the last element and comparing them to a draw from the standard uniform distribution.  The number of elements less than the uniform variate is the 0-based index for the result and we add 1 to get the 1-based index.

Creating the RcppEigen code is more work than the pure R code but not orders of magnitude more work.  You can operate on entire arrays as shown here and, best of all, you don't need to worry about protecting storage from the garbage collector.  And the RcppEigen code is much faster

> set.seed(1234321)
> X <- matrix(runif(100000 * 5), ncol=5)
> benchmark(Rsamp(X), RcppSamp(X), replications=5,
+           columns=c("test", "elapsed", "relative", "user.self"))
         test elapsed relative user.self
2 RcppSamp(X)   0.058        1      0.06
1    Rsamp(X)   5.162       89      5.12

Update: I have corrected the spelling of Dongjun Chung's name.  My apologies for mis-spelling it.

Update: The next posting discusses a Julia implementation of this function.  An initial version was somewhat slower than the RcppEigen version but then Stefan Karpinsky got to work and created an implementation that was over twice as fast as the RcppEigen version.  In what may seem like a bizarre approach to an R programmer, the trick is to de-vectorize the code.