Using the new RSPH cluster (September 2020)
All following tips are based on using terminal on Mac OS. I believe it will work for any linux system. For Windows, one needs to install some type of Unix-like systems, such as Cygwin.
Here are some information provided by the RSPH IT:
Read it carefully because the system uses a new job scheduler.
If you want to access the cluster from outside the School of Public Health (this includes using laptop through Emory wifi), you will need connect through the Emory VPN.
Login to the RSPH cluster
Address for the RSPH cluster is
Login commmand is
ssh -X userid@email@example.com
userid needs to be replaced by your login (your EmoryID).
I usually create an alias by adding the following line to my
alias cluster="ssh -X firstname.lastname@example.org"
So I can login to the cluster by typing
cluster in the terminal.
Password-less logins using SSH
It’s annoying to have to type in password every time login or scp to/from the cluster. Fortunately there is a solution. Follow the steps to setup a password-less login.
Create Public/Private Keys. First check whether you have
.sshfolder in your home directory. Note it’s a hidden directory, and can be seen by typing
ls -a. If there exist, skip this step. Otherwise, type
ssh-keygen -t rsain the terminal and those files will be generated.
Set up logins. First copy your public key (
id_rsa.pub) to remote host by doing:
scp .ssh/id_rsa.pub email@example.com:~. Now login to the cluster and cd to the
.sshdirectory. Add the public key from your computer to the end of your ``authorized_keys file and set the correct permissions by typing the following commands at the terminal:
cat ../id_rsa.pub >> authorized_keys chmod 600 authorized_keys
Transfer data from the old cluster to the new one
scp commands to copy files over. For example, I can use the following commands to copy a whole directory over. If the file sizes are large, you can also submit a job for file transferring.
#!/bin/bash #SBATCH --partition=day-long-cpu scp -r firstname.lastname@example.org:SourceDir TargetDir
In order for this to run succesfully, you also need to setup the password-less login between the new and old cluster (adding the
id_rsa.pub line in the new cluster to the
authorized_keys file in the old cluster). Otherwise, the system will prompt for password, and the submitted job cannot run.
This is also a good opportunity to reorganize your files.
The job scheduler on the new cluster
The new clusters uses SLURM as job scheduler, instead of Sun Grid Engine (SGE) on the old clusters. A few basic SGE commands and and their corresponding SLURM commands are
srun --pty bash
For a more comprehensive list, please see this SGE to SLURM conversion page.
Environment for bioinformatics group members
We have a group
compbio created for all members in the bioinformatics group. If you belong to this group you’ll have access to some data and software.
groups userid to check your group membership. For example, I can see my group memberships:
[hwu30@hpc4 ~]$ groups hwu30 hwu30 : hpcusers compbio
So I belong to following groups:
By default, all users in the
compbio group should be able to see each other’s file (have read permission, but not write permission).
We currently have around 150T storage, under the
/projectsmount. Useful commands for checking disk usage are
df -h: report disk space usage
du -h --max-depth=1: report file space usage
The disk space is not as limited as before, but all members still need to be careful in managing the disk usage.
Setup your working directories
- All group users should setup their own working directory under
/projects/compbio/users. For example, my directory is
- You can create a symbolic link in your home directory by running the following command (in your home directory):
ln -s /projects/compbio/users/hwu30 projects
It creates a directory
projectin my home, which is in fact
- Try to be organized in managing your projects. Create directories and sub-directories.
We have shared software/libraries/data mostly located at
/projects/compbio. In particular:
/projects/compbio/binhas a number of often-used binary software tools for genetic/genomic data analysis. You can add following line to your
.bash_profile(in the home directory), so that the software installed here can be accessed from anywhere.
/projects/compbio/datahas some useful shared data, including index files for alignment, reference genomes, etc.
As of September 2020, The latest R (version 4.0) and Bioconductor (version 3.11) are installed.
Note that R cannot run on the head node directly. You must first run
module load Rto load the R module, and then run R. Read the HPC Getting Started Guide for details.
The R library directory for the group is at
projects/compbio/Rlib. You need to setup R libraray directory by adding the following line in your
.Rprofilefile. Note: it’s a hidden text file in your home directory. If you don’t have it, just create one:
.libPaths( c("projects/compbio/Rlib", .libPaths()) )
After this, run
.libPaths()in R to make sure you have the correct path.
To submit an R job to the scheduler, you need to create a
.shand put in some commands. The description in the guide is not accurate. The shell script should look like (assuming you want to submit
#!/bin/bash #SBATCH --job-name=run.R #SBATCH --partition=day-long-cpu module purge module load R srun R CMD BATCH --no-save run.R
Assume the script is called
sbatch runR.shto submit the job. You can use
squeueto view the job status.