Suggests and Vignettes

Dirk Eddelbuettel quite rightly reminded us the other day that Suggests is not Depends. I am sorry to say that I am one of those who are using Suggests… “casually”. Mea culpa. I must say that this is restricted to vignettes: there are no tests nor examples using suggested packages. But I am not checking if my suggested packages are available at all, which is definitely wrong. And I understand that it must be frustrating to run reverse dependencies on a package as popular as Rcpp when the rest of us are using Suggests so… casually. So I was definitely determined to solve this, and I finally managed to find a very simple solution that may be helpful to other maintainers.

Our simmer package has seven vignettes. Two of them are very introductory and do not use any external package. But as you try to demonstrate more advanced features and use cases, you start needing some other tools; and their use could be intensive, so that checking suggested packages for every call or every code chunk might not scale. However, I realised that the important thing for those advanced vignettes is just to make the story they tell available to your users, and anyway they are always built and available online on CRAN. Therefore, I decided to add the following at the beginning of each vignette:

required <- c("pkg1", "pkg2", "pkgn")

if (!all(unlist(lapply(required, function(pkg) requireNamespace(pkg, quietly = TRUE)))))
  knitr::opts_chunk$set(eval = FALSE)

Problem solved. Yes, I know, I am still taking knitr for granted. But given that it has its own entry (VignetteBuilder) in the DESCRIPTION, I think this is fair enough. I only hope that Dirk will unblacklist simmer after our next release. ;-)

Extensions for ‘simmer’

A new version of the Discrete-Event Simulator for R was released a few days ago on CRAN. The most interesting new feature is the implementation of the subsetting operators [ and [[ for trajectory objects. Basically, think about trajectories as lists of activities and these operators will do (almost) everything you expect.

library(simmer)

t0 <- trajectory() %>%
  seize("resource", 1) %>%
  timeout(function() rexp(1, 2)) %>%
  release("resource", 2)

t0
## trajectory: anonymous, 3 activities
## { Activity: Seize        | resource: resource | amount: 1 }
## { Activity: Timeout      | delay: 0x55ffd06c5858 }
## { Activity: Release      | resource: resource | amount: 2 }
t0[c(3, 1)]
## trajectory: anonymous, 2 activities
## { Activity: Release      | resource: resource | amount: 2 }
## { Activity: Seize        | resource: resource | amount: 1 }

After the last maintenance update (v3.5.1), which fixed several bugs and included a new interesting vignette with SimPy examples translated to ‘simmer’, this v3.6.0 comes hand in hand with the first ‘simmer’ extension released on CRAN: simmer.plot.

The primary purpose of ‘simmer.plot’ is to detach plotting capabilities from the core package, to systematise and enhance them. If you were using any of the old plot_*() functions, you will get a deprecation warning pointing to the S3 method simmer.plot::plot.simmer. This vignette will help you make the transition.

‘simmer.plot’ also implements a new plot S3 method for trajectories. It produces a diagram of a given trajectory object, which is very helpful for debugging and checking that everything conforms your simulation model. Let us consider, for instance, the following pretty complex trajectory:

t0 <- trajectory() %>%
  seize("res0", 1) %>%
  branch(function() 1, c(TRUE, FALSE),
         trajectory() %>%
           clone(2,
                 trajectory() %>%
                   seize("res1", 1) %>%
                   timeout(1) %>%
                   release("res1", 1),
                 trajectory() %>%
                   trap("signal",
                        handler=trajectory() %>%
                          timeout(1)) %>%
                   timeout(1)),
         trajectory() %>%
           set_attribute("dummy", 1) %>%
           seize("res2", function() 1) %>%
           timeout(function() rnorm(1, 20)) %>%
           release("res2", function() 1) %>%
           release("res0", 1) %>%
           rollback(11)) %>%
  synchronize() %>%
  rollback(2) %>%
  release("res0", 1)

We must ensure that:

  • Resources are seized and released as we expect.
  • Branches end (or continue) where we expect.
  • Rollbacks point back to the activity we expect.

Things are indeed much easier if you can just inspect it visually:

library(simmer.plot)

plot(t0)

Note that different resources are mapped to a qualitative color scale, so that you can quickly glance whether you placed the appropriate seizes/releases for each resource.

Other interesting ‘simmer’ extensions are already on our roadmap. Particularly, Bart has been simmering a new package (still under development) called simmer.optim, which brings parameter optimisation to ‘simmer’. While ‘simmer’, as is, can help you answer a question like the following:

If we have x amount of resources of type A, what will the average waiting time in the process be?

‘simmer.optim’ is targeted to a reformulation like this:

What amount x of resources of type A minimises the waiting time, while still maintaining a utilisation level of \rho_A?

We would be very grateful if someone with experience on DES optimisation could try it out and give us some feedback. Simply install it from GitHub using ‘devtools’

devtools::install_github("r-simmer/simmer.optim")

and start from the README, which demonstrates the current functionalities.

simmer v3.5.0 released on CRAN

We are proud to present simmer 3.5.0, a new release on CRAN of the Discrete-Event Simulator for R with a bunch of exciting features:

  • Simpler logging. So far, if we wanted arrivals to print something into the console, we had, for instance, to make a custom function returning zero, with a print() inside, and to put it into a timeout() activity. Now, with log_() it’s easier and, in addition, you get the simulation time and the arrival name:
library(simmer)

sayer <- create_trajectory() %>%
  log_("hello world!")

simmer() %>%
  add_generator("dummy", sayer, at(3, 6)) %>%
  run() %>% invisible
## 3: dummy0: hello world!
## 6: dummy1: hello world!
  • New resource modifiers. set_capacity() and set_queue_size() existed before, but they were not very useful. Now, these methods become activities and you can use them in your trajectories. They have their associatedset_capacity_selected() and set_queue_size_selected(), just like seize() and release(), for a joint use with the resource selector select().
  • New generator modifiers. Activities activate(), deactivate() allow us to start/stop generators from within trajectories. Activities set_trajectory(), set_distribution() allow us to change a generator’s assigned trajectory or distribution respectively.
  • New interarrival communication activities, allowing asynchronous programming. send(), trap(), untrap()and wait() can be used to send signals, wait for signals, trap them and launch asynchronous handlers. Nothing better than an example to understand the possibilities of this mechanism:
library(simmer)

t_blocked <- create_trajectory() %>%
  trap("you shall pass", 
       handler = create_trajectory() %>%
         log_("ok, I'm packing...") %>%
         timeout(1)) %>%
  log_("waiting...") %>%
  wait() %>%
  log_("and I'm leaving!")

t_signaler <- create_trajectory() %>%
  log_("you shall pass") %>%
  send("you shall pass")

simmer() %>%
  add_generator("blocked", t_blocked, at(0)) %>%
  add_generator("signaler", t_signaler, at(5)) %>%
  run() %>% invisible
## 0: blocked0: waiting...
## 5: signaler0: you shall pass
## 5: blocked0: ok, I'm packing...
## 6: blocked0: and I'm leaving!

Other interesting new features from past releases since our last post include post-seize and reject optional trajectories when seizing resources, arrival cloning and synchronizing, arrival batching and separating or reneging.

Please, refer to the updated Advanced Trajectory Usage vignette for more information and examples. Also remember that you can participate in our simmer-devel mailing list to get help, discuss methodologies… any feedback is always welcome.

Simulating Continuous-Time Markov Chains with ‘simmer’ (2)

In part one, we simulated a simple CTMC. Now, let us complicate things a bit. Remember the example problem there:

A gas station has a single pump and no space for vehicles to wait (if a vehicle arrives and the pump is not available, it leaves). Vehicles arrive to the gas station following a Poisson process with a rate of \lambda=3/20 vehicles per minute, of which 75% are cars and 25% are motorcycles. The refuelling time can be modelled with an exponential random variable with mean 8 minutes for cars and 3 minutes for motorcycles, that is, the services rates are \mu_\mathrm{c}=1/8 cars and \mu_\mathrm{m}=1/3 motorcycles per minute respectively (note that, in this context, \mu is a rate, not a mean).

Consider the previous example, but, this time, there is space for one motorcycle to wait while the pump is being used by another vehicle. In other words, cars see a queue size of 0 and motorcycles see a queue size of 1.

The new Markov chain is the following:

Q = \begin{pmatrix}  -\mu_\mathrm{c} & 0 & 0 & \mu_\mathrm{c} & 0 \\  (1-p)\lambda & -(1-p)\lambda-\mu_\mathrm{c} & \mu_\mathrm{c} & 0 & 0 \\  0 & p\lambda & -\lambda & (1-p)\lambda & 0 \\  0 & 0 & \mu_\mathrm{m} & -(1-p)\lambda-\mu_\mathrm{m} & (1-p)\lambda \\  0 & 0 & 0 & \mu_\mathrm{m} & -\mu_\mathrm{m} \\  \end{pmatrix}

where the states car+ and m/c+ represent car + waiting motorcycle and motorcycle + waiting motorcycle respectively.

With p the steady state distribution, the average number of vehicles in the system is given by

N = 2(p_1 + p_5) + p_2 + p_4

# Arrival rate
lambda <- 3/20
# Service rate (cars, motorcycles)
mu <- c(1/8, 1/3)
# Probability of car
p <- 0.75

# Theoretical resolution
A <- matrix(c(1,                   0,       0,               mu[1],            0,
              1, -(1-p)*lambda-mu[1],   mu[1],                   0,            0,
              1,            p*lambda, -lambda,        (1-p)*lambda,            0,
              1,                   0,   mu[2], -(1-p)*lambda-mu[2], (1-p)*lambda,
              1,                   0,       0,               mu[2],       -mu[2]), 
            byrow=T, ncol=5)
B <- c(1, 0, 0, 0, 0)
P <- solve(t(A), B)
N_average_theor <- sum(P * c(2, 1, 0, 1, 2)) ; N_average_theor
## [1] 0.6349615

As in the previous post, we can simulate this chain by breaking down the problem into two trajectories (one for each type of vehicle and service rate) and two generators. But in order to disallow cars to stay in the pump’s queue, we need to introduce a little trick in the cars’ seize: the argument amount is a function that returns 1 if the pump is vacant and 2 otherwise. This implies that the car gets rejected, because there is only one position in queue and that seize is requesting two positions. Note also that the environment env must be defined before running the simulation, as it is needed inside the trajectory.

library(simmer)
set.seed(1234)

option.1 <- function(t) {
  car <- create_trajectory() %>%
    seize("pump", amount=function() {
      if (env %>% get_server_count("pump")) 2  # rejection
      else 1                                   # serve
    }) %>%
    timeout(function() rexp(1, mu[1])) %>%
    release("pump", amount=1)
  
  mcycle <- create_trajectory() %>%
    seize("pump", amount=1) %>%
    timeout(function() rexp(1, mu[2])) %>%
    release("pump", amount=1)
  
  env <- simmer() %>%
    add_resource("pump", capacity=1, queue_size=1) %>%
    add_generator("car", car, function() rexp(1, p*lambda)) %>%
    add_generator("mcycle", mcycle, function() rexp(1, (1-p)*lambda))
  env %>% run(until=t)
}

The same idea using a branch, with a single generator and a single trajectory.

option.2 <- function(t) {
  vehicle <- create_trajectory() %>%
    branch(function() sample(c(1, 2), 1, prob=c(p, 1-p)), c(F, F), 
           create_trajectory("car") %>%
             seize("pump", amount=function() {
               if (env %>% get_server_count("pump")) 2  # rejection
               else 1                                   # serve
             }) %>%
             timeout(function() rexp(1, mu[1])) %>%
             release("pump", amount=1),                 # always 1
           create_trajectory("mcycle") %>%
             seize("pump", amount=1) %>%
             timeout(function() rexp(1, mu[2])) %>%
             release("pump", amount=1))
  
  env <- simmer() %>%
    add_resource("pump", capacity=1, queue_size=1) %>%
    add_generator("vehicle", vehicle, function() rexp(1, lambda))
  env %>% run(until=t)
}

We may also avoid messing up things with branches and subtrajectories. We can decide the type of vehicle and set it as an attribute of the arrival with set_attribute. Then, every activity’s function is able to retrieve those attributes as a named list. Although the branch option is a little bit faster, this one is nicer, because there are no subtrajectories involved.

option.3 <- function(t) {
  vehicle <- create_trajectory("car") %>%
    set_attribute("vehicle", function() sample(c(1, 2), 1, prob=c(p, 1-p))) %>%
    seize("pump", amount=function(attrs) {
      if (attrs["vehicle"] == 1 &&
          env %>% get_server_count("pump")) 2    # car rejection
      else 1                                     # serve
    }) %>%
    timeout(function(attrs) rexp(1, mu[attrs["vehicle"]])) %>%
    release("pump", amount=1)                    # always 1
  
  env <- simmer() %>%
    add_resource("pump", capacity=1, queue_size=1) %>%
    add_generator("vehicle", vehicle, function() rexp(1, lambda))
  env %>% run(until=t)
}

But if performance is a requirement, we can play cleverly with the resource’s capacity and queue size, and with the amounts requested in each seize, in order to model the problem without checking the status of the resource. Think about this:

  • A resource with capacity=3 and queue_size=2.
  • A car always tries to seize amount=3.
  • A motorcycle always tries to seize amount=2.

In these conditions, we have the following possibilities:

  • Pump empty.
  • One car (3 units) in the server [and optionally one motorcycle (2 units) in the queue].
  • One motorcycle (2 units) in the server [and optionally one motorcycle (2 units) in the queue].

Just as expected! So, let’s try:

option.4 <- function(t) {
  vehicle <- create_trajectory() %>%
    branch(function() sample(c(1, 2), 1, prob=c(p, 1-p)), c(F, F), 
           create_trajectory("car") %>%
             seize("pump", amount=3) %>%
             timeout(function() rexp(1, mu[1])) %>%
             release("pump", amount=3),
           create_trajectory("mcycle") %>%
             seize("pump", amount=2) %>%
             timeout(function() rexp(1, mu[2])) %>%
             release("pump", amount=2))
  
  simmer() %>%
    add_resource("pump", capacity=3, queue_size=2) %>%
    add_generator("vehicle", vehicle, function() rexp(1, lambda)) %>%
    run(until=t)
}

We are still wasting time in the branch decision. We can mix this solution above with the option.1 to gain extra performance:

option.5 <- function(t) {
  car <- create_trajectory() %>%
    seize("pump", amount=3) %>%
    timeout(function() rexp(1, mu[1])) %>%
    release("pump", amount=3)
  
  mcycle <- create_trajectory() %>%
    seize("pump", amount=2) %>%
    timeout(function() rexp(1, mu[2])) %>%
    release("pump", amount=2)
  
  simmer() %>%
    add_resource("pump", capacity=3, queue_size=2) %>%
    add_generator("car", car, function() rexp(1, p*lambda)) %>%
    add_generator("mcycle", mcycle, function() rexp(1, (1-p)*lambda)) %>%
    run(until=t)
}

Options 1, 2 and 3 are slower, but they give us the correct numbers, because the parameters (capacity, queue size, amounts) in the model remain unchanged compared to the problem. For instance,

gas.station <- option.1(5000)

library(ggplot2)

# Evolution + theoretical value
graph <- plot_resource_usage(gas.station, "pump", items="system")
graph + geom_hline(yintercept=N_average_theor)

However, it is not the case in options 4 and 5. The parameters of these models have been adulterated to fit our performance purposes. Therefore, we need to extract the RAW data, rescale the numbers and plot them. And, of course, we get the same figure:

gas.station <- option.5(5000)

limits <- data.frame(item = c("queue", "server", "system"), value = c(1, 1, 2))

library(dplyr); library(tidyr)

graph <- gas.station %>% get_mon_resources() %>% 
  gather(item, value, server, queue, system) %>%
  mutate(value = round(value * 2/5),                  # rescaling here <------
         item = factor(item)) %>%
  filter(item %in% "system") %>%
  group_by(resource, replication, item) %>%
  mutate(mean = c(0, cumsum(head(value, -1) * diff(time))) / time) %>% 
  ungroup() %>%
  ggplot() + aes(x=time, color=item) +
  geom_line(aes(y=mean, group=interaction(replication, item))) +
  ggtitle("Resource usage: pump") +
  ylab("in use") + xlab("time") + expand_limits(y=0) +
  geom_hline(aes(yintercept=value, color=item), limits, lty=2)
graph + geom_hline(yintercept=N_average_theor)

Finally, these are some performance results:

library(microbenchmark)

t <- 1000/lambda
tm <- microbenchmark(option.1(t), 
                     option.2(t), 
                     option.3(t),
                     option.4(t),
                     option.5(t))
graph <- autoplot(tm)
graph + scale_y_log10(breaks=function(limits) pretty(limits, 5)) +
  ylab("Time [milliseconds]")
## Scale for 'y' is already present. Adding another scale for 'y', which
## will replace the existing scale.

Simulating Continuous-Time Markov Chains with ‘simmer’ (1)

In the previous post, we gave you some insights about the simulation of simple birth-death processes with simmer. The extension of such a methodology for more complex queueing networks is immediate and was left as an exercise for the reader.

Similarly, today we are going to explore more features of simmer with a simple Continuous-Time Markov Chain (CTMC) problem as an excuse. CTMCs are more general than birth-death processes (those are special cases of CTMCs) and may push the limits of our simulator. So let’s start.

A gas station has a single pump and no space for vehicles to wait (if a vehicle arrives and the pump is not available, it leaves). Vehicles arrive to the gas station following a Poisson process with a rate of \lambda=3/20 vehicles per minute, of which 75% are cars and 25% are motorcycles. The refuelling time can be modelled with an exponential random variable with mean 8 minutes for cars and 3 minutes for motorcycles, that is, the services rates are \mu_\mathrm{c}=1/8 cars and \mu_\mathrm{m}=1/3 motorcycles per minute respectively (note that, in this context, \mu is a rate, not a mean).

This problem is described by the following CTMC:

Q = \begin{pmatrix}  -\mu_\mathrm{c} & \mu_\mathrm{c} & 0 \\  p\lambda & -\lambda & (1-p) \lambda \\  0 & \mu_\mathrm{m} & -\mu_\mathrm{m}  \end{pmatrix}

with p=0.75. The chain is irreducible and recurrent. To theoretically find the steady state distribution, we have to solve the balance equations

pQ = 0

with the constraint

\sum_i p_i = 1

There are \mathrm{dim}(Q)-1 independent columns, so the latter constraint is equivalent to substitute any column by ones and match it to one at the other side of the equation, that is:

p\cdot\begin{pmatrix}  1 & 1/8 & 0 \\  1 & -3/20 & 0.25\cdot 3/20 \\  1 & 1/3 & -1/3  \end{pmatrix} = (1, 0, 0)

The solution p represents the probability of being at each state in the long-term. Therefore, we can calculate the average number of vehicles in the system by summing these probabilities multiplied by the number of vehicles at each state. In our case,

N = 1\cdot p_1 + 0\cdot p_2 + 1\cdot p_3

# Arrival rate
lambda <- 3/20
# Service rate (cars, motorcycles)
mu <- c(1/8, 1/3)
# Probability of car
p <- 0.75

# Theoretical resolution
A <- matrix(c(1,   mu[1],            0,
              1, -lambda, (1-p)*lambda,
              1,   mu[2],       -mu[2]), byrow=T, ncol=3)
B <- c(1, 0, 0)
P <- solve(t(A), B)
N_average_theor <- sum(P * c(1, 0, 1)) ; N_average_theor
## [1] 0.5031056

Now, we are going to simulate the system with simmer and verify that it converges to the theoretical solution. There are various options for selecting the model. As a first approach, due to the properties of Poisson processes, we can break down the problem into two trajectories (one for each type of vehicle), which differ in their service time only, and therefore two generators with rates \lambda_\mathrm{c} = p\lambda and \lambda_\mathrm{m} = (1-p)\lambda.

library(simmer)
set.seed(1234)

option.1 <- function(t) {
  car <- create_trajectory() %>%
    seize("pump", amount=1) %>%
    timeout(function() rexp(1, mu[1])) %>%
    release("pump", amount=1)
  
  mcycle <- create_trajectory() %>%
    seize("pump", amount=1) %>%
    timeout(function() rexp(1, mu[2])) %>%
    release("pump", amount=1)
  
  simmer() %>%
    add_resource("pump", capacity=1, queue_size=0) %>%
    add_generator("car", car, function() rexp(1, p*lambda)) %>%
    add_generator("mcycle", mcycle, function() rexp(1, (1-p)*lambda)) %>%
    run(until=t)
}

Other arrival processes may not have this property, so we would define a single generator for all kind of vehicles and a single trajectory as follows. In order to distinguish between cars and motorcycles, we could define a branch after seizing the resource to select the proper service time.

option.2 <- function(t) {
  vehicle <- create_trajectory() %>%
    seize("pump", amount=1) %>%
    branch(function() sample(c(1, 2), 1, prob=c(p, 1-p)), continue=c(T, T), 
           create_trajectory("car") %>%
             timeout(function() rexp(1, mu[1])),
           create_trajectory("mcycle") %>%
             timeout(function() rexp(1, mu[2]))) %>%
    release("pump", amount=1)
  
  simmer() %>%
    add_resource("pump", capacity=1, queue_size=0) %>%
    add_generator("vehicle", vehicle, function() rexp(1, lambda)) %>%
    run(until=t)
}

But this option adds an unnecessary overhead since there is an additional call to an R function to select the branch, and therefore performance decreases. A much better option is to select the service time directly inside the timeout function.

option.3 <- function(t) {
  vehicle <- create_trajectory() %>%
    seize("pump", amount=1) %>%
    timeout(function() {
      if (runif(1) < p) rexp(1, mu[1])  # car
      else rexp(1, mu[2])               # mcycle
    }) %>%
    release("pump", amount=1)
  
  simmer() %>%
    add_resource("pump", capacity=1, queue_size=0) %>%
    add_generator("vehicle", vehicle, function() rexp(1, lambda)) %>%
    run(until=t)
}

This option.3 is equivalent to option.1 in terms of performance. But, of course, the three of them lead us to the same result. For instance,

gas.station <- option.3(5000)

library(ggplot2)

# Evolution + theoretical value
graph <- plot_resource_usage(gas.station, "pump", items="system")
graph + geom_hline(yintercept=N_average_theor)

And these are some performance results:

library(microbenchmark)

t <- 1000/lambda
tm <- microbenchmark(option.1(t), 
                     option.2(t), 
                     option.3(t))
graph <- autoplot(tm)
graph + scale_y_log10(breaks=function(limits) pretty(limits, 5)) +
  ylab("Time [milliseconds]")