{rspm}: easy access to RSPM binary packages with automatic management of system requirements

There are many community projects out there that provide binary R packages for various distributions. You may know Michael Rutter’s legendary c2d4u.team/c2d4u4.0+ PPA, but this situation has been greatly improved more recently with Detlef Steuer’s autoCRAN OBS repo for OpenSUSE, my iucar/cran Copr repo for Fedora, and Dirk Eddelbuettel’s r2u repo, again, for Ubuntu. These have obvious advantages that come with the system package management layer, such as lightning-fast installations and updates, with automatic dependency management, reversibility, and multitenancy (several users sharing the same set of packages), among others (see this paper for further details). Moreover, the {bspm} package adds the integration layer that we were lacking for all these years, enabling a bridge to the system package manager that doesn’t require admin rights or for you to leave your beloved R console (i.e. the Windows experience on Linux).

However, it may be noticed that CentOS/RHEL was the great forgotten here, but there are quite a lot of users out there tied to this distro for different reasons. Moreover, such reasons usually imply that they don’t have access to admin rights at all, not even for the setup.

So here I announce that I created the {rspm} package, which, as its name indicates, enables easy access to RStudio Public Package Manager (i.e., it does the repo setup for you), but also monitors and scans every installation to automatically detect, download, install and configure any missing system requirements. Most importantly, this is done in full user-mode (i.e., system requirements are installed into the user home) in a dynamic way (no need to restart, no need to manage environment variables). It is definitely not as fast as the other projects, but it is complementary in the sense that this may be compatible with {renv}/{pak}-based workflows (I didn’t try yet, but it would require minor adjustments if not none at all).

I made this primarily targeted at CentOS Stream 8, but for nearly the same price I added support for Ubuntu (bionic, focal, jammy) too (although note that this requires the installation of the apt-file utility). Please give it a spin if you feel like it, and let me know how it goes. Here’s a demo of the installation of {sf}, which, as you may know, has quite a number of system requirements:

Finally, I would like to thank RStudio for their investment in providing this extremely useful resource for the Linux R community.

#Naukas21: El metrónomo de Beethoven

Tras un año de parón por pandemia, vuelve Naukas Bilbao al Palacio Eskalduna, con restricciones, pero con más fuerza y más ganas que nunca. Y esta vez, hablamos de nuestra propia investigación. Es una historia que ya contaron los medios, que contaron nuestros amigos, y que nos apetecía contar también a nosotros. Como cada año, EiTB brinda su apoyo y pone todas las charlas a disposición en su plataforma Kosmos. No nos queda más que agradecer una vez más, y especialmente este año, a la organización por su trabajo, y al público, presencial y remoto, por su apoyo. Bienvenidos al misterio del metrónomo de Beethoven.

Least squares as springs, the Shiny app

Three weeks ago, I saw a nice mechanical recreation of a PCA (or total least squares) on Twitter: just by pulling some strings attached to a straw.

Some days ago, Joshua Loftus published Least squares as springs, where the author presents some nice visualisations, and explains that the cost function is the same (except for a constant) as the potential energy of springs attaching the data points to the regression line. As a result, if we take any line attached in this way to a point cloud (with the required constraints in place for the strings: vertical movement for regular regression; no constraints for PCA, the springs slide freely), then the system will oscillate until it reaches the state of minimum energy (i.e., meets the regression line).

Inspired by this and based on the Matlab code for the animation in this excellent StackExchange answer, I created a Shiny app that allows us to play with different parameteres:

  • linear regression vs. PCA;
  • covariance matrix for data generation;
  • number of samples;
  • initial angle and shift of the center of mass;
  • velocity loss and inertia (which determines the damping ratio).

The resulting (and oddly satisfying) movement is simulated as a composition of a translation and a rotation. You can play with a plotly-powered JavaScript animation or download it as a gganimate-powered GIF:

constants: Update to 2018 CODATA values

The constants package contains CODATA internationally recommended values of the fundamental physical constants, provided as symbols for direct use within the R language. Optionally, the values with uncertainties and/or units are also provided if the errorsunits and/or quantities packages are installed. The Committee on Data for Science and Technology (CODATA) is an interdisciplinary committee of the International Council for Science which periodically provides the internationally accepted set of values of the fundamental physical constants. This release contains the “2018 CODATA” version, published on May 2019 [E. Tiesinga, P. J. Mohr, D. B. Newell, and B. N. Taylor (2020) http://physics.nist.gov/constants].

This version contains some breaking changes that are necessary to streamline future updates and provide a stable symbol table:

  • The codata table includes the absolute uncertainty instead of the relative one. Thus, the rel_uncertainty column has been dropped in favour of the new uncertainty. Also, columns have been slightly reordered.
  • Symbol names for constants have changed. The old ones were hand-crafted and thus unmanageable. This release adopts the ASCII symbols defined by NIST in their webpage, except for those that collide with some base R function. In particular, there are two cases: c, the speed of light, has been renamed as c0sigma, the Stefan-Boltzmann constant, has been renamed as sigma0.
  • Constant types, or categories, (column codata$type) adopts the names defined by NIST in the webpage too. Some constants belong to more than one category (separated by comma); some others belong to no category (missing type).

There are some new features too:

  • In addition to the codata data frame, this release includes codata.cor, a correlation matrix for all the constants.
  • In addition to syms_with_errors and syms_with_units, there is a new list of symbols called syms_with_quantities (available if the optional quantities package is installed), which provides constant values with uncertainty and units.
  • Experimental support for correlated values in syms_with_errors and syms_with_quantities is provided (disabled by default; see details in help(syms) for activation instructions).

See the README for some usage examples. For questions, suggestions or issues, please use the issue tracker.

Installing and switching to MKL on Fedora

In our last post, we presented the FlexiBLAS library, coming to Fedora 33, and the accompanying flexiblas R package, which enables live switching of the BLAS backend among the various open source options readily available in the Fedora repositories.

In this post, we demonstrate how to install, register with FlexiBLAS, and finally switch to Intel’s Math Kernel Library (MKL) in a few steps. First, we prepare a proper environment using docker:

$ docker run --rm -it fedora:33
$ dnf install 'dnf-command(config-manager)' # install config manager
$ dnf install R-flexiblas # install R and the FlexiBLAS API interface for R

Then we add Intel’s YUM repository, import the public key and install MKL:

$ dnf config-manager --add-repo https://yum.repos.intel.com/mkl/setup/intel-mkl.repo
$ rpm --import https://yum.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
$ dnf install intel-mkl # or a specific version, e.g. intel-mkl-2020.0-088

Then, in an R session:


#> flexiblas BLAS /opt/intel/mkl/lib/intel64/libmkl_rt.so not found in config.
#> <flexiblas> BLAS /opt/intel/mkl/lib/intel64/libmkl_rt.so does not provide an integer size hint. Assuming 4 Byte.
#> [1] 2

backends <- flexiblas_list_loaded()
#> [1] "OPENBLAS-OPENMP"                        
#> [2] "/opt/intel/mkl/lib/intel64/libmkl_rt.so"

And that’s it: now, we are able to switch between the default one and MKL. As in our previous post, let’s compare them with a simple GEMM benchmark:

n <- 2000
runs <- 10

A <- matrix(runif(n*n), nrow=n)
B <- matrix(runif(n*n), nrow=n)

# benchmark
timings <- sapply(seq_along(backends), function(i) {

  # warm-up
  C <- A[1:100, 1:100] %*% B[1:100, 1:100]

    for (j in seq_len(runs))
      C <- A %*% B

results <- data.frame(
  backend = backends,
  `timing [s]` = timings,
  `performance [GFlops]` = (2 * (n / 1000)^3) / timings,
  check.names = FALSE)

#>                                   backend timing [s] performance [GFlops]
#> 2 /opt/intel/mkl/lib/intel64/libmkl_rt.so      3.487             4.588471
#> 1                         OPENBLAS-OPENMP      0.754            21.220159

And still OpenBLAS rocks!

For questions, suggestions or issues related to this R interface, please use its issue tracker or the R-SIG-Fedora mailing list. For more general issues, please use Red Hat Bugzilla or the upstream issue tracker.