Thomas J Hoffmann, Department of Biostatatistics, Harvard School of Public Health

statgen* Usage

Nothing is ever backed up on these machines. If a hard drive were to crash, your data is permanently gone. These are processing machines. Update: statgen1 is now configured with RAID, so one hard drive is redundant, but only on statgen1.

Some of the same principles here apply to cluster usage, but for that you should check out the cluster site.

Hardware Summary

Logging on to the statgen Machines

Use some ssh client. In *nix, this is just

ssh user@statgen2.sph.harvard.edu
or to do it with X
ssh user@statgen2.sph.harvard.edu -Y
. Getting files back and forth is just typing
sftp://user@statgen2.sph.harvard.edu
into your file browser if you happen to use linux; for windows, see my preferred client and some suggestions if you want X.

Disk space

Quotas aren't strictly enforced; just be reasonable and realize other people use the machines. To see your usage, type

    du -h
    
, or suggested for a tree-like interface
    du --max-depth=1 -h
    
, where you vary the number for tree depth. To see how much space is actually on the machine to put things in perspective type
    df -h
    

Making Personal Backups

Let's go through a command line way of doing this. Suppose my file directory structure is something like (note the '~' references your home directory, i.e. where you are when you log in, and the base directory of where all your files go).
    ~/notes.txt
    ~/thesis/thesis.tex
    ~/thesis/thesis.bib
    ...
    
Suppose we wanted to create a zip file (windows user-friendly), then just enter the command from the directory `~/`, although this can be done anywhere, the base is the current path.
    zip -r myZipfile.zip notes.txt thesis
    
to compress that file, and the entire directory. You can keep adding more files to this. Then you can copy this to the machine your machine, and go ahead and burn a CD for a semi-permanent backup (they last a while, but not forever).

Processor Usage / users on a machine

i.e. which statgen should I use? Note that you can run (number of cores) x (number of processors) jobs simultaneously without each affecting each others performance, provided that they aren't doing lots of disk access / lots of memory. If they need lots of disk space or memory, you should only run one job at a time, or they will fight each other, and it will take even longer than if you had run them sequentially. If they fight too much, generally one process will be killed.

    top
    
Press 'q' to 'q'uit, space to refresh.

Batching Jobs

To make things run when you are not logged into the system, try

    batch [return]
    [What you want to run, perhaps 'R CMD BATCH statgen.R', without the quotes.]
    [CTRL+D]
    
This will make things run when a processor isn't being used. So for instance, if you 'batched' 3 jobs, 2 would run, and after one of those completed, your third one would run.

Or, if you want to be more agressive

    at now [return]
    [What you want to run, perhaps 'R CMD BATCH statgen.R', without the quotes.]
    [CTRL+D]
    

R - Installing libraries

To install libraries for personal use, use the example for pbatR, substituting in your package of choice.

More unix information

Navigation
Projects
Resources
Contact Information
Thomas Hoffmann
655 Huntington Ave.
Department of Biostatistics
Building 2, 4th Floor
Boston, MA 02115
Last Updated: