Automating R scripts on Linux with cron
By Rich
January 21, 2020
Hieronymus Bosch, “The Visions of Tondal” 1479.
Introduction
cron
is a task scheduler that comes baked into Linux.
The heart of cron
is the crontab file that you can add tasks to.
To edit the crontab file type:
crontab -e
This will open the VI editor.
To exit, press esc
, type in :wq
, then press Enter
. Intuitive, right? I know.
Comments in the crontab file start with #
, and tasks take the form:
# Check out this cool task below!
MIN HOUR DOM MON DOW CMD
Comments and tasks can’t live on the same line.
Allowable values for each parameter are detailed in this table that I copied from Geeks for Geeks:
Field | Description | Allowed Value |
---|---|---|
MIN | Minute field | 0 to 59 |
HOUR | Hour field | 0 to 23 |
DOM | Day of Month | 1-31 |
MON | Month field | 1-12 |
DOW | Day Of Week | 0-6 |
CMD | Command | Any command to be executed. |
You can use a *
in any of the date-time fields to indicate all values. Therefore, 1 * * * * CMD
executes CMD
every minute of every hour of every day of the month of every month and so on.
But how do we use this to automate R scripts?
First, the CMD
is RScript
. Next, we pass RScript
the .R script we want to run (
see the docs).
Let’s pretend we have a script (my_script.R
) that we want to run once per minute. This script generates 100 random samples from a normal distribution with mean=0
and sd=1
and writes them to a csv called my_file.csv
:
library(readr)
d <- rnorm(100)
write_csv(data.frame(num = d), "my_file.csv")
Now we locate RScript
. In your favorite R
development environment, run R.home()
.
On my Mac it’s:
> R.home()
[1] "/Library/Frameworks/R.framework/Resources"
Whereas on the EC2 I’m running on AWS it’s:
> R.home()
[1] "/usr/lib/R"
You can navigate to this directory to verify that RScript
lives there, or believe me.
Putting it all together
Let’s create a crontab
that runs my_script.R
once every minute. We use RScript
to run my_script.R
. We add the following line to the crontab file we opened with crontab -e
:
# once every minute, run `my_script.R`
1 * * * * RScript "my_script.R"
Note that the first line is just a comment, whereas the second line is the command. Moreover, in the example above, you need to:
- specify the full path of
RScript
- specify the full path of
my_script.R
I’ve found that on the AWS EC2 I’m using, ~/my_script
doesn’t work, whereas /home/richpauloo/my_script.R
does.
Here are some resources I found helpful in writing this short summary:
- Posted on:
- January 21, 2020
- Length:
- 3 minute read, 429 words
- See Also: