sabato 19 maggio 2007

Running R Programs on clusters

1. Syntax for running R programs in BATCH mode from the command-line

$ R CMD BATCH [options] my_script.R [outfile]
$ nohup nice -n 14 R CMD BATCH myRfile.R &


The output file lists the commands from the script file and their outputs. If no outfile is specified, the name used is that of 'infile' and '.Rout' is appended to outfile. To stop all the usual R command line information from being written to the outfile, add this as first line to my_script.R file: 'options(echo=FALSE)'. If the command is run like this 'R CMD BATCH --no-save my_script.R', then nothing will be saved in the .Rdata file which can get often very large. More on this can be found on the help pages: '$ R CMD BATCH --help' or '> ?BATCH'.

2. Submitting R script to Linux cluster via Torque Create the following shell script 'my_script.sh'

##################################
#!/bin/sh
cd $PBS_O_WORKDIR
R CMD BATCH --no-save my_script.R
##################################


This script doesn't need to have executable permissons. Use the following 'qsub' command to send this shell script to the Linux cluster from the directory where the R script 'my_script.R' is located. To utilize several CPUs on the Linux cluster, one can divide the input data into several smaller subsets and execute for each subset a separate process from a dedicated directory.

$ qsub my_script.sh

3 commenti:

  1. Questo commento è stato eliminato dall'autore.

    RispondiElimina
  2. I realize this post has been up for many years now but I came across it today. I felt compelled to thank you. I am a new Linux user and this was exactly what I needed in order for me to run my analysis. There is nothing else as direct & clear as this out on the web - even today! Thank you!

    RispondiElimina
  3. You're more than welcome! Fortunately tips on programming, particularly the ones related to Unix/Linux environment, age better than other resources!

    RispondiElimina