Here’s a new package that brings to R new API to handle child processes – similar to how Python handles them.

Unlike the already available system() and system2() calls from the base package or the mclapply() function from the parallel package, this new API is aimed at handling long-lived child processes that can be controlled by the parent R process in a programmatic way. Here’s an example:

handle <- spawn_process("/usr/bin/sshpass",
                        c("ssh", "-T", "user@domain.com"))
process_write(handle, "password")
process_write(handle, "ls\n")
process_read(handle, "stdout")
#> "bin"   "public_html"   "www-backup"

This of course can be done with system("ssh", c("user@domain.com", "ls")) as well (at least as long as password-less ssh connectivity is enabled). However, if there is a need to make a number of subsequent calls in response to user’s input, keeping a single connection open can save some time. Otherwise you need to wait for ssh to establish a new connection each time a new command is to be executed.

Perhaps a bit more silly example is working with a local (or remote, for that matter) Spark session. Imagine there is no package dedicated to Spark (which might well be the case with the next new thing that you find under your Christmas tree this year). The simplest approach could be to open Spark console and keep it alive while sending commands on its standard input and parsing the text output. However naive, this approach can save some prototyping time.

handle <- spawn_process("/usr/bin/spark-shell")
process_write(handle, 'val textFile = sc.textFile("README.md")\n')
process_write(handle, 'textFile.count()\n')
process_read(handle)
[1] "textFile: org.apache.spark.rdd.RDD[String] = README.md MapPart
itionsRDD[1] at textFile at <console>:25"
[2] "res0: Long = 126"

The new subprocess package is available from my GitHub account and CRAN. All functions can be run in both Linux and Windows and the few OS-specific details (like signals) are described in respective manual pages. There is also an introductory vignette.

I should also say that this package has been designed with Python’s subprocess module in mind, which (both package and language) I greatly admire. Its R equivalent is now in version 0.7.4 which is there to indicate that it’s a perfect equivalent. More (simultaneous wait on stdout and stderr) are still to come.

Leave a comment

Your email address will not be published. Required fields are marked *