{rspm}: easy access to RSPM binary packages with automatic management of system requirements

There are many community projects out there that provide binary R packages for various distributions. You may know Michael Rutter’s legendary c2d4u.team/c2d4u4.0+ PPA, but this situation has been greatly improved more recently with Detlef Steuer’s autoCRAN OBS repo for OpenSUSE, my iucar/cran Copr repo for Fedora, and Dirk Eddelbuettel’s r2u repo, again, for Ubuntu. These have obvious advantages that come with the system package management layer, such as lightning-fast installations and updates, with automatic dependency management, reversibility, and multitenancy (several users sharing the same set of packages), among others (see this paper for further details). Moreover, the {bspm} package adds the integration layer that we were lacking for all these years, enabling a bridge to the system package manager that doesn’t require admin rights or for you to leave your beloved R console (i.e. the Windows experience on Linux).

However, it may be noticed that CentOS/RHEL was the great forgotten here, but there are quite a lot of users out there tied to this distro for different reasons. Moreover, such reasons usually imply that they don’t have access to admin rights at all, not even for the setup.

So here I announce that I created the {rspm} package, which, as its name indicates, enables easy access to RStudio Public Package Manager (i.e., it does the repo setup for you), but also monitors and scans every installation to automatically detect, download, install and configure any missing system requirements. Most importantly, this is done in full user-mode (i.e., system requirements are installed into the user home) in a dynamic way (no need to restart, no need to manage environment variables). It is definitely not as fast as the other projects, but it is complementary in the sense that this may be compatible with {renv}/{pak}-based workflows (I didn’t try yet, but it would require minor adjustments if not none at all).

I made this primarily targeted at CentOS Stream 8, but for nearly the same price I added support for Ubuntu (bionic, focal, jammy) too (although note that this requires the installation of the apt-file utility). Please give it a spin if you feel like it, and let me know how it goes. Here’s a demo of the installation of {sf}, which, as you may know, has quite a number of system requirements:

Finally, I would like to thank RStudio for their investment in providing this extremely useful resource for the Linux R community.

Least squares as springs, the Shiny app

Three weeks ago, I saw a nice mechanical recreation of a PCA (or total least squares) on Twitter: just by pulling some strings attached to a straw.

Some days ago, Joshua Loftus published Least squares as springs, where the author presents some nice visualisations, and explains that the cost function is the same (except for a constant) as the potential energy of springs attaching the data points to the regression line. As a result, if we take any line attached in this way to a point cloud (with the required constraints in place for the strings: vertical movement for regular regression; no constraints for PCA, the springs slide freely), then the system will oscillate until it reaches the state of minimum energy (i.e., meets the regression line).

Inspired by this and based on the Matlab code for the animation in this excellent StackExchange answer, I created a Shiny app that allows us to play with different parameteres:

  • linear regression vs. PCA;
  • covariance matrix for data generation;
  • number of samples;
  • initial angle and shift of the center of mass;
  • velocity loss and inertia (which determines the damping ratio).

The resulting (and oddly satisfying) movement is simulated as a composition of a translation and a rotation. You can play with a plotly-powered JavaScript animation or download it as a gganimate-powered GIF:

constants: Update to 2018 CODATA values

The constants package contains CODATA internationally recommended values of the fundamental physical constants, provided as symbols for direct use within the R language. Optionally, the values with uncertainties and/or units are also provided if the errorsunits and/or quantities packages are installed. The Committee on Data for Science and Technology (CODATA) is an interdisciplinary committee of the International Council for Science which periodically provides the internationally accepted set of values of the fundamental physical constants. This release contains the “2018 CODATA” version, published on May 2019 [E. Tiesinga, P. J. Mohr, D. B. Newell, and B. N. Taylor (2020) http://physics.nist.gov/constants].

This version contains some breaking changes that are necessary to streamline future updates and provide a stable symbol table:

  • The codata table includes the absolute uncertainty instead of the relative one. Thus, the rel_uncertainty column has been dropped in favour of the new uncertainty. Also, columns have been slightly reordered.
  • Symbol names for constants have changed. The old ones were hand-crafted and thus unmanageable. This release adopts the ASCII symbols defined by NIST in their webpage, except for those that collide with some base R function. In particular, there are two cases: c, the speed of light, has been renamed as c0sigma, the Stefan-Boltzmann constant, has been renamed as sigma0.
  • Constant types, or categories, (column codata$type) adopts the names defined by NIST in the webpage too. Some constants belong to more than one category (separated by comma); some others belong to no category (missing type).

There are some new features too:

  • In addition to the codata data frame, this release includes codata.cor, a correlation matrix for all the constants.
  • In addition to syms_with_errors and syms_with_units, there is a new list of symbols called syms_with_quantities (available if the optional quantities package is installed), which provides constant values with uncertainty and units.
  • Experimental support for correlated values in syms_with_errors and syms_with_quantities is provided (disabled by default; see details in help(syms) for activation instructions).

See the README for some usage examples. For questions, suggestions or issues, please use the issue tracker.

Installing and switching to MKL on Fedora

UPDATE: MKL is a bit trickier than other backends. See this and this comment on how to use mklbuilder to generate a specific .so file to use with FlexiBLAS as described below.

In our last post, we presented the FlexiBLAS library, coming to Fedora 33, and the accompanying flexiblas R package, which enables live switching of the BLAS backend among the various open source options readily available in the Fedora repositories.

In this post, we demonstrate how to install, register with FlexiBLAS, and finally switch to Intel’s Math Kernel Library (MKL) in a few steps. First, we prepare a proper environment using docker:

$ docker run --rm -it fedora:33
$ dnf install 'dnf-command(config-manager)' # install config manager
$ dnf install R-flexiblas # install R and the FlexiBLAS API interface for R

Then we add Intel’s YUM repository, import the public key and install MKL:

$ dnf config-manager --add-repo https://yum.repos.intel.com/mkl/setup/intel-mkl.repo
$ rpm --import https://yum.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
$ dnf install intel-mkl # or a specific version, e.g. intel-mkl-2020.0-088

Then, in an R session:

library(flexiblas)
flexiblas_load_backend("/opt/intel/mkl/lib/intel64/libmkl_rt.so")
#> flexiblas BLAS /opt/intel/mkl/lib/intel64/libmkl_rt.so not found in config.
#> <flexiblas> BLAS /opt/intel/mkl/lib/intel64/libmkl_rt.so does not provide an integer size hint. Assuming 4 Byte.
#> [1] 2
backends <- flexiblas_list_loaded()
backends
#> [1] "OPENBLAS-OPENMP"                        
#> [2] "/opt/intel/mkl/lib/intel64/libmkl_rt.so"

And that’s it: now, we are able to switch between the default one and MKL. As in our previous post, let’s compare them with a simple GEMM benchmark:

n <- 2000
runs <- 10
A <- matrix(runif(n*n), nrow=n)
B <- matrix(runif(n*n), nrow=n)
# benchmark
timings <- sapply(seq_along(backends), function(i) {
  flexiblas_switch(i)
  # warm-up
  C <- A[1:100, 1:100] %*% B[1:100, 1:100]
  unname(system.time({
    for (j in seq_len(runs))
      C <- A %*% B
  })[3])
})
results <- data.frame(
  backend = backends,
  `timing [s]` = timings,
  `performance [GFlops]` = (2 * (n / 1000)^3) / timings,
  check.names = FALSE)
results[order(results$performance),]
#>                                   backend timing [s] performance [GFlops]
#> 2 /opt/intel/mkl/lib/intel64/libmkl_rt.so      3.487             4.588471
#> 1                         OPENBLAS-OPENMP      0.754            21.220159

And still OpenBLAS rocks!

For questions, suggestions or issues related to this R interface, please use its issue tracker or the R-SIG-Fedora mailing list. For more general issues, please use Red Hat Bugzilla or the upstream issue tracker.

Switch BLAS/LAPACK without leaving your R session

BLAS and LAPACK comprise all the low-level linear algebra subroutines that handle your matrix operations in R and other software. Fedora ships the reference implementation from Netlib, which is accurate and stable, but slow, as well as several optimized backends, such as ATLASBLIS (serial, OpenMP and threaded versions) and OpenBLAS (serial, OpenMP and threaded flavours as well). However, up to version 32, Fedora lacked a proper mechanism to switch between them.

We are excited to announce that this situation changes with the upcoming release, which is already in beta status. Starting with Fedora 33, R (as well as Numpy, Octave and all the other BLAS/LAPACK consumers) is linked against the outstanding FlexiBLAS library, a BLAS/LAPACK wrapper that enables runtime switching of the optimized backend, and the OpenMP version of OpenBLAS is set as the default system-wide backend.

Moreover, the accompanying flexiblas R package enables changing the BLAS/LAPACK provider, as well as setting the number of threads for parallel backends, without leaving the R session. Let’s give this a quick test using docker:

$ docker run --rm -it fedora:33
$ dnf install R-flexiblas # install R and the FlexiBLAS API interface for R
$ dnf install flexiblas-* # install all available optimized backends

Then, in an R session we see:

library(flexiblas)

# check whether FlexiBLAS is available
flexiblas_avail()
#> [1] TRUE

# get the current backend
flexiblas_current_backend()
#> [1] "OPENBLAS-OPENMP"

# list all available backends
flexiblas_list()
#> [1] "NETLIB"           "__FALLBACK__"     "BLIS-THREADS"     "OPENBLAS-OPENMP"
#> [5] "BLIS-SERIAL"      "ATLAS"            "OPENBLAS-SERIAL"  "OPENBLAS-THREADS"
#> [9] "BLIS-OPENMP"

# get/set the number of threads
flexiblas_set_num_threads(12)
flexiblas_get_num_threads()
#> [1] 12

This is an example of GEMM benchmark for all the backends available:

library(flexiblas)

n <- 2000
runs <- 10
ignore <- "__FALLBACK__"

A <- matrix(runif(n*n), nrow=n)
B <- matrix(runif(n*n), nrow=n)

# load backends
backends <- setdiff(flexiblas_list(), ignore)
idx <- flexiblas_load_backend(backends)

# benchmark
timings <- sapply(idx, function(i) {
  flexiblas_switch(i)

  # warm-up
  C <- A[1:100, 1:100] %*% B[1:100, 1:100]

  unname(system.time({
    for (j in seq_len(runs))
      C <- A %*% B
  })[3])
})

results <- data.frame(
  backend = backends,
  `timing [s]` = timings,
  `performance [GFlops]` = (2 * (n / 1000)^3) / timings,
  check.names = FALSE)

results[order(results$performance),]
#>            backend timing [s] performance [GFlops]
#> 1           NETLIB     56.776            0.2818092
#> 5            ATLAS      5.988            2.6720107
#> 2     BLIS-THREADS      3.442            4.6484602
#> 8      BLIS-OPENMP      3.408            4.6948357
#> 4      BLIS-SERIAL      3.395            4.7128130
#> 6  OPENBLAS-SERIAL      3.206            4.9906425
#> 7 OPENBLAS-THREADS      0.773           20.6985770
#> 3  OPENBLAS-OPENMP      0.761           21.0249671

For questions, suggestions or issues related to this R interface, please use its issue tracker or the R-SIG-Fedora mailing list. For more general issues, please use Red Hat Bugzilla or the upstream issue tracker. There are a couple of posters by the authors of FlexiBLAS (1, 2) with a similar demo for Octave.