statgen* Usage
Nothing is ever backed up on these machines. If a hard drive were to crash, your data is permanently gone. These are processing machines. Update: statgen1 is now configured with RAID, so one hard drive is redundant, but only on statgen1.
Some of the same principles here apply to cluster usage, but for that you should check out the cluster site.
Hardware Summary
- statgen1.sph.harvard.edu - 8G memory, 500Gig HD (RAID), 2 x 2 core AMD opteron 2.8 Ghz, 64-bit Fedora Core 7.
- statgen2.sph.harvard.edu - 4G memory, 250G HD, 2 x AMD opteron 2.2 Ghz, 32-bit Ubuntu Feisty.
- statgen3.sph.harvard.edu - 2G memory, 120G HD, 2 x AMD opteron 2.2 Ghz, 64-bit Fedora Core.
Logging on to the statgen Machines
Use some ssh client. In *nix, this is just
ssh user@statgen2.sph.harvard.eduor to do it with X
ssh user@statgen2.sph.harvard.edu -Y. Getting files back and forth is just typing
sftp://user@statgen2.sph.harvard.eduinto your file browser if you happen to use linux; for windows, see my preferred client and some suggestions if you want X.
Disk space
Quotas aren't strictly enforced; just be reasonable and realize other people use the machines. To see your usage, type
du -h
, or suggested for a tree-like interface
du --max-depth=1 -h
, where you vary the number for tree depth. To see how much space is actually on the machine to put things in perspective type
df -h
Making Personal Backups
Let's go through a command line way of doing this. Suppose my file directory structure is something like (note the '~' references your home directory, i.e. where you are when you log in, and the base directory of where all your files go).
~/notes.txt
~/thesis/thesis.tex
~/thesis/thesis.bib
...
Suppose we wanted to create a zip file (windows user-friendly), then just enter the command from the directory `~/`, although this can be done anywhere, the base is the current path.
zip -r myZipfile.zip notes.txt thesis
to compress that file, and the entire directory. You can keep adding more files to this. Then you can copy this to the machine your machine, and go ahead and burn a CD for a semi-permanent backup (they last a while, but not forever).
Processor Usage / users on a machine
i.e. which statgen should I use? Note that you can run (number of cores) x (number of processors) jobs simultaneously without each affecting each others performance, provided that they aren't doing lots of disk access / lots of memory. If they need lots of disk space or memory, you should only run one job at a time, or they will fight each other, and it will take even longer than if you had run them sequentially. If they fight too much, generally one process will be killed.
top
Press 'q' to 'q'uit, space to refresh.
Batching Jobs
To make things run when you are not logged into the system, try
batch [return]
[What you want to run, perhaps 'R CMD BATCH statgen.R', without the quotes.]
[CTRL+D]
This will make things run when a processor isn't being used. So for instance, if you 'batched' 3 jobs, 2 would run, and after one of those completed, your third one would run.
Or, if you want to be more agressive
at now [return]
[What you want to run, perhaps 'R CMD BATCH statgen.R', without the quotes.]
[CTRL+D]
R - Installing libraries
To install libraries for personal use, use the example for pbatR, substituting in your package of choice.