simmer 4.3.0 + JSS publication

The 4.3.0 release of simmer, the Discrete-Event Simulator for R, is on CRAN. Along with this update, we are very glad to announce that our homonymous paper finally appeared in the Journal of Statistical Software. Please, use the following reference for citations (see citation("simmer")):

  • Ucar I, Smeets B, Azcorra A (2019). “simmer: Discrete-Event Simulation for R.” Journal of Statistical Software90(2), 1-30. doi: 10.18637/jss.v090.i02 (URL: https://doi.org/10.18637/jss.v090.i02).

It took quite a lot of work and time, but we are very proud of the final result. We would like to thank the editorial team for their hard work, with special thanks to the anonymous referee for their thorough reviews and valuable comments, and Norman Matloff for his advice and support. Last but not least, we are very grateful for all the discussion and fruitful ideas that our growing community provides via the simmer-devel mailing list and GitHub.

The new release bring us the ability to keep seized resources after reneging, as well as to define a range of arrival priorities that are allowed to access a resource’s queue if there is no room in the server. We moved a lot of activity usage examples that were scattered in a far too long vignette to the appropriate help pages, and of course there is the usual share of bug fixes. See below for a complete list of changes.

Special thanks to Tom Lawton for his contributions to this release, and to Benjamin Sawicki for his generous donation.

New features

  • Add ability to keep_seized resources after reneging (#204 addressing #200).
  • Add ability to define a range of arrival priorities that are allowed to access a resource’s queue if there is no room in the server (#205 addressing #202).

Minor changes and fixes:

  • Drop R6 as a dependency (#193 addressing #190).
  • Small fix in from and from_to + documentation update (75a9569).
  • Move activity usage examples to help pages (#194).
  • Fix shortest-queue selection policies (#196).
  • Fix batch triggering (#203).
  • Update JSS paper, CITATION, references and DOI.

simmer 4.2.1

The 4.2.1 release of simmer, the Discrete-Event Simulator for R, is on CRAN with quite interesting new features and fixes. As discussed in the mailing list, there is a way to handle the specific case in which an arrival is rejected because a queue is full:

library(simmer)

reject <- trajectory() %>%
  log_("kicked off...")

patient <- trajectory() %>%
  seize("nurse", continue=FALSE, reject=reject) %>%
  log_("nurse seized") %>%
  timeout(5) %>%
  release("nurse") %>%
  log_("nurse released")

env <- simmer() %>%
  add_resource("nurse", 1, 0) %>%
  add_generator("patient", patient, at(0, 1)) %>%
  run()
## 0: patient0: nurse seized
## 1: patient1: kicked off...
## 5: patient0: nurse released

But as Tom Lawton pointed out, until now, there was no way of handling any alternative path for an arrival that was preempted and “kicked off” from a resource. This mechanism has been implemented into the new handle_unfinished() activity:

patient <- trajectory() %>%
  handle_unfinished(reject) %>%
  seize("nurse") %>%
  log_("nurse seized") %>%
  timeout(5) %>%
  release("nurse") %>%
  log_("nurse released")

env <- simmer() %>%
  add_resource("nurse", 1, 0, preemptive=TRUE, queue_size_strict=TRUE) %>%
  add_generator("LowPrio_patient", patient, at(0), priority=0) %>%
  add_generator("HighPrio_patient", patient, at(1), priority=10) %>%
  run()
## 0: LowPrio_patient0: nurse seized
## 1: HighPrio_patient0: nurse seized
## 1: LowPrio_patient0: kicked off...
## 6: HighPrio_patient0: nurse released

Note that such a mechanism is more general, because it also covers the first scenario:

env <- simmer() %>%
  add_resource("nurse", 1, 0) %>%
  add_generator("patient", patient, at(0, 1)) %>%
  run()
## 0: patient0: nurse seized
## 1: patient1: kicked off...
## 5: patient0: nurse released

Whenever rejection (or preemption) happens and it is catched by the appropriate handler, the new getter get_seized() may be useful to know which resource was abandoned.

Finally, the readership may find interesting the new section about the implementation of state-dependent service rates in the Queueing Systems vignette. See below for a complete list of changes.

New features:

  • New handle_unfinished() activity sets a drop-out trajectory for unfinished arrivals, i.e., those dropped from a resource (due to preemption, resource shrinkage or a rejected seize) or those that leave a trajectory (#178 addressing #177).
  • New release_all() and release_selected_all() activities automatically retrieve the amount of resources seized and release it (#180 addressing #25).
  • New get_seized() and get_seized_selected() getters allow an arrival to retrieve the amount of resources seized (#180 addressing #179).
  • New stop_if() activity sets a conditional breakpoint (#181 addressing #100).

Minor changes and fixes:

  • Fix performance issues in data sources (#176).
  • Update CITATION.
  • Fix monitored activity for preempted arrivals (as part of #178).
  • Fix seizes/releases with a null amount (as part of #180).
  • Rename internal status codes (as part of #181).
  • Provide more context on error or warning (as part of #181).
  • Extend the Queueing Systems vignette with a section about state-dependent service rates.
  • Fix performance issues in getters (#183).

simmer 4.1.0

The 4.1.0 release of simmer, the Discrete-Event Simulator for R, is on CRAN. As per request in the mailing list, now get_global() is able to work inside a generator function. Moreover, the new add_global() method attaches a global attribute to a simulator.

library(simmer)

env <- simmer()

hello_sayer <- trajectory() %>%
  log_("hello world!") %>%
  set_global("interarrival", 1, mod="+")

generator <- function() get_global(env, "interarrival")

env %>%
  add_global("interarrival", 1) %>%
  add_generator("dummy", hello_sayer, generator) %>%
  run(7) %>%
  get_global("interarrival")
## 1: dummy0: hello world!
## 3: dummy1: hello world!
## 6: dummy2: hello world!
## [1] 4

Compared to plain global variables, these ones are automatically managed and thus reinitialised if the environment is reset.

env %>%
  reset() %>%
  get_global("interarrival")
## [1] 1
env %>%
  run(7) %>%
  get_global("interarrival")
## 1: dummy0: hello world!
## 3: dummy1: hello world!
## 6: dummy2: hello world!
## [1] 4

There has been a small refactoring of some parts of the C++ core, which motivates the minor version bump, but this shouldn’t be noticeable to the users. Finally, several bug fixes and improvements complete this release. See below for a complete list.

New features:

  • New getter get_selected() retrieves names of selected resources via the select() activity (#172 addressing #171).
  • Source and resource getters have been vectorised to retrieve parameters from multiple entities (as part of #172).
  • Simplify C++ Simulator interface for adding processes and resources (#162). The responsibility of building the objects has been moved to the caller.
  • New add_global() method to attach global attributes to a simulation environment (#174 addressing #158).

Minor changes and fixes:

  • Remove 3.8.0 and 4.0.1 deprecations (#170 addressing #165).
  • Fix get_global() to work outside trajectories (#170 addressing #165).
  • Fix rollback() with an infinite amount (#173).
  • Fix and improve schedules and managers (as part of #174).
  • Fix reset() to avoid overwriting the simulation environment (#175).

simmer 4.0.1

The 4.0.1 release of simmer, the Discrete-Event Simulator for R, is on CRAN since a couple of weeks ago. There are few changes, notably new getters (get_sources()get_resources()get_trajectory()) for simmer environments and some improvements in resource selection policies (see details in help(select)).

A new convenience function, when_activated, makes it easier to generate arrivals on demand, triggered from trajectories. Let us consider, for instance, a simple restocking pattern:

library(simmer)

restock <- trajectory() %>%
  log_("restock")

serve <- trajectory() %>%
  log_("serve") %>%
  activate("Restock")

env <- simmer() %>%
  add_generator("Customer", serve, at(1, 2, 3)) %>%
  add_generator("Restock", restock, when_activated()) %>%
  run()
## 1: Customer0: serve
## 1: Restock0: restock
## 2: Customer1: serve
## 2: Restock1: restock
## 3: Customer2: serve
## 3: Restock2: restock

Finally, this release leverages the new fast evaluation framework offered by Rcpp (>= 0.12.18) by default, and includes some minor improvements and bug fixes.

New features:

  • New getters (#159):
    • get_sources() and get_resources() retrieve a character vector of source/resource names defined in a simulation environment.
    • get_trajectory() retrieves a trajectory to which a given source is attached.
  • New resource selection policies: shortest-queue-availableround-robin-availablerandom-available (#156). These are the same as the existing non-available ones, but they exclude unavailable resources (capacity set to zero). Thus, if all resources are unavailable, an error is raised.

Minor changes and fixes:

  • Rename -DRCPP_PROTECTED_EVAL (Rcpp >= 0.12.17.4) as -DRCPP_USE_UNWIND_PROTECT (6d27671).
  • Keep compilation quieter with -DBOOST_NO_AUTO_PTR (70328b6).
  • Improve log_ print (7c2e3b1).
  • Add when_activated() convenience function to easily generate arrivals on demand from trajectories (#161 closing #160).
  • Enhance schedule printing (9c66285).
  • Fix generator-manager name clashing (#163).
  • Deprecate set_attribute(global=TRUE)get_attribute(global=TRUE) and timeout_from_attribute(global=TRUE) (#164), the *_global versions should be used instead.

Read the docs before questioning R’s defaults

The latest R tip in Win-Vector Blog encourages you to Use Radix Sort based on a simple benchmark showing a x35 speedup compared to the default method, but with no further explanation. In my opinion, though, the complete tip would be, instead, use radix sort… if you know what you are doing, because a quick benchmark shouldn’t spare you the effort of actually reading the docs. And here is a spoiler: you are already using it.

One may wonder why R’s default sorting algorithm is so bad, and why was even chosen. The thing is that there is a trick here, and to understand it, first we must understand the benchmark’s data and then read the docs. This is the function from the original code (slightly modified for subsequent reuse) that generates the data:

mk_data <- function(nrow, stringsAsFactors = FALSE) {
  alphabet <- paste("sym", seq_len(max(2, floor(nrow^(1/3)))), sep = "_")
  data.frame(col_a = sample(alphabet, nrow, replace=TRUE),
             col_b = sample(alphabet, nrow, replace=TRUE),
             col_c = sample(alphabet, nrow, replace=TRUE),
             col_x = runif(nrow),
             stringsAsFactors = stringsAsFactors)
}

set.seed(32523)
d <- mk_data(1e+6)

summary(d)
##     col_a              col_b              col_c          
##  Length:1000000     Length:1000000     Length:1000000    
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##      col_x          
##  Min.   :0.0000002  
##  1st Qu.:0.2496717  
##  Median :0.4991010  
##  Mean   :0.4996031  
##  3rd Qu.:0.7494089  
##  Max.   :0.9999999
length(table(d$col_a))
## [1] 99

There are three character columns sampled from 99 symbols (sym_1sym_2, …, sym_99) and a numeric column sampled from a uniform. The first three columns are thus clearly factors, but they are not treated as such. Let’s see now what help(sort) has to tell us about the sorting method, which by default is method="auto":

The “auto” method selects “radix” for short (less than 2^31 elements) numeric vectors, integer vectors, logical vectors and factors; otherwise, “shell”.

So, as I said in the opening paragraph, you are already using radix sort, except for characters. Let’s see then what happens if we treat such columns as proper factors:

library(microbenchmark)

set.seed(32523)
d <- mk_data(1e+6, stringsAsFactors = TRUE)

timings <- microbenchmark(
  order_default = d[order(d$col_a, d$col_b, d$col_c, d$col_x), , 
                    drop = FALSE],
  order_radix = d[order(d$col_a, d$col_b, d$col_c, d$col_x,
                        method = "radix"), ,
                  drop = FALSE],
  times = 10L)

print(timings)
## Unit: milliseconds
##           expr      min       lq     mean   median       uq      max neval
##  order_default 289.4685 312.0257 388.5259 387.8308 418.2673 584.4771    10
##    order_radix 265.6491 321.8337 421.2072 376.1166 512.0047 667.0545    10
##  cld
##    a
##    a

Unsurprisingly, timings are the same, because R automatically selects "radix" for you when appropriate. But when is it considered appropriate and why isn’t it appropriate in general for character vectors? We should go back to the docs:

The implementation is orders of magnitude faster than shell sort for character vectors, in part thanks to clever use of the internal CHARSXP table.

However, there are some caveats with the radix sort:

  • If x is a character vector, all elements must share the same encoding. Only UTF-8 (including ASCII) and Latin-1 encodings are supported. Collation always follows the “C” locale.
  • Long vectors (with more than 2^32 elements) and complex vectors are not supported yet.

An there it is: R is doing the right thing for you for the general case. So let us round up the tip: enforce method="radix" for character vectors if you know what you are doing. And, please, do read the docs.