P2BAT 2.1.1
Screenshots Installation News / Changes Help FAQ Cluster UsageP2BAT provides a massively parallel implementation of Christoph Lange's pbat software with a user friendly interface in R. P2BAT is composed of the R package pbatR (where the version number comes from) and pbat 3.6. P2BAT provides both a Graphical User Interface (see screenshots), and a standard R command line interface. Alternatively to using this program, you can just use PBAT from the shell or in batch mode, but this program automates the parallel mode and provides a more user-friendly interface.
Just to get an idea of what it does and if you might like it, take a look at the documentation (pdf) that comes with the R package, specifically the functions pbat if you are interested in the GUI interface and pbat.m if you are more interested in the command line interface. If you are using the command line interface, before you run the software, you may want to see pbat.set and pbat.setmode (it's the first and last options in the GUI interface). In the package 'pbatR', the GUI enforces consistency, and the function `pbat.m' runs some consistency checks before communicating with PBAT.
Note: If you were at the talk that I gave on 01/25/2006, you can get a copy of the slides here, though most of this talk was a demonstration, so these may not be very useful, and it's quite dated now. The installation instructions given on this page are also more comprehensive. Lastly, the package has been modified to several of your suggestions including the options to not have to load the pedigree files and phenotype files in R, and on option to toggle off reading in the output. This was for those who well pointed out how large these files can be. I appreciate your continued suggestions.
P2BAT Installation instructions
P2BAT will run on windows, linux, and now Mac OS X (via darwine). The prerequisite other software will be described through this installation guide:
FBAT short course note: Steps 2-4 are the slightly time-consuming downloads/installations that would be good to try beforehand. If you do not have an ethernet card on your computer, you should try to complete step 5 if possible.
- PBAT: First you need to download and uncompress the pbat archive to your hard drive to a location that does not have a space in the filename or path of the filename, this may require renaming 'Software and Datasets' to 'pbat', e.g., and not putting it in the 'My Documents' folder. Make note of this location. For windows and 64-bit linux, download respective versions. For Mac OS X and linux 32-bit you should go ahead and download the windows version.
- (Mac only): X11. Hopefully this link works for you, or you may need to install this off of a cd that came with your computer. There is also an update available that you might be able to use instead. This was already installed on the computer I am using, hopefully it is straightforward.
- (Mac/linux 32-bit only) (Dar)wine:
- For Mac OS X you need to install darwine (use next link to download). Following the installation instructions, drag and drop into the application folder after mounting and opening. It can be placed elsewhere for those who lack sudo. NOTE: Darwine 0.9.27 has been reported to have a problem with pbat on an intel machine, although it is no longer available on sourceforge; we suspect this was just a bad build. Darwine 0.9.21 and 0.9.12 have worked on an intel machine that I have tested, and are recommended. You can view all sourceforge Darwine releases here. Update: More unofficial?/newer builds of wine were found to all work on an intel machine so far, except from this site 0.9.46 is currently a bad build of wine (does not function), and 0.9.44 gives a timezone warning; we suspect 0.9.27 was just a bad build.
- For Linux, you probably have a version of wine that came with your distro that you can use.
- R: Next you need to download and install R.
- Installing pbatR R package: Start R. For both windows and mac, you can navigate the menu to the package installer. In windows this is 'Packages > Install from CRAN > pbatR'; you will be prompted to choose a mirror inbetween. On a Mac this is 'Packages & Data > Package Installer', press 'Get List', choose a mirror, choose 'pbatR', and press 'Install Selected'. For linux, type the command install.packages(), choose a repository, and choose pbatR, or just type install.packages("pbatR"). In recent versions of R on linux, this will just work; see below if this has issues.
- Configuring and running pbatR:
- Load the library by typing library(pbatR) (this must be done every time you start R).
- Via the graphical interface:
- Type pbat() and a graphical interface should show up.
- Press 'Pbat exe...' and navigate to where you downloaded pbat earlier. This should be 'pbat35.exe' on windows, Mac, and linux 32-bit; and should be 'pbat35' on linux 64-bit.
- (Mac/linux 32-bit) Press the 'Wine exe...' button (won't show up in windows), and if you installed it to the default place on a Mac, this will be /Applications/Darwine/Wine.bundle/Contents/bin/wine, and not WineHelper. For linux, this is typically /usr/bin/wine, or you can find it from the command line with which wine if it is in your path.
Note: your first time running wine will take a little longer to start as it is setting itself up, and you will see lots of messages about font metrics.
Aside: you can test if this should work from the command prompt by entering the name of this wine executable by the name of pbat, e.g., depending on your locations, /Applications/Darwine/Wine.bundle/Contents/bin/wine /Applications/pbat/pbat35.exe. - All set! This information will persist for a given user. Now you can set the options for analysis.
- Semi-graphical interface:
- Type library(tcltk), and then enter the information described above when prompted by the commands pbat.set() and additionally pbat.setwine() for Mac/linux 32-bit.
- Command line:
- Use the commands described in semi-graphical interface, passing into the first argument the full path to the file, for example, pbat.set("/home/tom/pbat/pbat35")
- Mac note: you may need to have started X11 before loading the pbatR library, if you get a tcltk error.
If you had issues with installing the pbatR package because you have an older version of R and lack superuser privilege, you have a couple of choices to try. Sometimes they don't seem to work, so I provide them all.
- Method 1 (linux only)
- Create the directory '~/R_LIBS' with the command mkdir ~/R_LIBS (or name the directory whatever suits you; `~' references to your home directory, e.g. /home/tom for me).
- In the shell bash (default on many systems) edit the file "~/.bashrc", and enter the line export R_LIBS=~/R_LIBS . You will need to close the terminal window if you are logged in locally or log out and back in if remote. You can also just enter the above line at the command line every time / this time for ease of use.
- Method 2 (linux only?)
- Do the first step of method 1, and enter the second step in the file ~/.Renviron.
- Method 3 (always works, cumbersome)
- Do the first step of method 1.
- When you install the R package, type install.packages("pbatR",lib="~/R_LIBS")
- Load the package instead with library( pbatR, lib.loc="~/R_LIBS")
Aside/Warning: A strange thing happens with R - sometimes if you don't have the most recent version of R, then the most recent version of pbatR doesn't show up (in case you are getting a message that it isn't up to date). Also, it can take a few days after I send a package for it to get propagated through CRAN.
If you use the command line interface heavily, you might want to take a look at R syntax highlighting options, but you might also just want to search for how to get ESS working in X-emacs if you go that route..
News / Changes (See documentation for more details)
Help
For help/information in setting it up, enter the command pbat.help() in R after loading the library [e.g. after library(pbatR)].
Known "Features"
I regret to inform you of the known (but only very minor) bugs; realize that sometimes they are unfixable on my end.
- "You cannot have a space in the filename or path to the filename of the pedigree or phenotype object (i.e. even `C:\Documnents and Settings\My Documents\ped.ped' would be a bad choice).");
FAQ
Cluster Help / Advice
The following is some advice on how I would set this up on a cluster; some of this is limited to the cluster platform that I have available to me here (LSF). There are a couple ways of running it with certain requirements. Suppose first that you have installed pbat and pbatR on the cluster (generally just run something on the cluster). For a user, if pbatR isn't installed in R, you could run a script like (assuming you have done the setup described above for linux)
## installPackageUser.R ## BTW: I am a comment (preceeded by '#') install.packages( pbatR )
Cluster help / advice - an example
Now, let's go through a sample analysis.
- When you want to use the GUI: The master node must support X connections (or perhaps you can log in to a node with X). If you are looking for a free X-server, perhaps Xming will do the job. A command line only ssh client will not suffice for the GUI, but will for the other options. You set up the cluster options here:

If you set it up so that clusterRefresh=0, then you can go ahead and close the GUI, and look at some of the command line stuff later to paste the results back together (this would be the equivalent of 'process.R' below). Otherwise, it will poll every 'clusterRefresh' seconds to see if everything is done, and it should behave just like as if you were running it locally.
- When you want to use the command line:I'll lead you through an example of how I would do this. First I would created an R command file to get the data processed through PBAT - for example
## process.R ## get the data processed ## This script is long because it is full of comments. ## You could get this done in a few lines. ## load the pbatR library ## Consistent with the above installation. library( pbatR, lib.loc="~/Rlib" ) ## If your administrator installed this package, then you ## could just do ## library( pbatR ) ## Set up the executable (preserved for a user) pbat.set( "~/bin/pbat" ) ## where is pbat installed? ## Set up the mode. You could alternatively do this in a seperate file ## as your mode selected is preserved after closing pbatR, but ## for example purposes: pbat.setmode( mode="cluster", ## set it for cluster mode jobs=32, ## split it into 32 jobs (run on 32 nodes) clusterCommand="bsub", ## see after this clusterRefresh=0 ) ## see after this ) ## this will be saved for your next analysis as well ## 'clusterCommand' is cluster platform dependent! This is for LSF. ## - LSF: you'll know it is done when you get 'jobs' e-mails ## To prevent this (if you are splitting it into lots of jobs ## try 'bsub -o junk.txt' which will funnel all the ## cluster output into junk.txt ## - 'qsub': Someone from hopkins suggested ## qsub -cwd -b y sh ## 'clusterRefresh' - see pbat.setmode(), this is how I would set ## it up (except when I'm testing/debugging this package). ## load in the data ped <- read.ped( "data" ) ## data.ped is in the current working directory phe <- read.phe( "data" ) ## data.phe is in the current working directory ## run the model ## This command will split itself into 'jobs' jobs, submit them ## and then complete. ## Thus this script itself should run almost instantaneously (reasons to ## be discussed), but it will submit 'jobs' jobs that will ## take a while to run. ## We will put the output together from these jobs later after they ## are all done. res <- pbat.m( time & censor ~ NONE, ped=ped, phe=phe ) ## see pbat.mMore on how to submit this later. Now suppose I had gone for a break, and wanted to check on the status of everything. I could create an R file with the following
## areWeThereYetTom.R ## The workspace of the above is loaded in, so those objects ## still exist!!! ## but we need to reload in the package library( pbatR, lib.loc="~/Rlib" ) ## is it finished? is.finished( res );
Assuming now everything is done, we can go ahead and look at these results. Now since we are on a cluster, presumably you are doing really large analysis, so I'll present a few things you can do.
## whatAreMyResults.R ## Again, we still have the same workspace as above, but we ## need to reload in the package. library( pbatR, lib.loc="~/Rlib" ) ## option 1: Read the output into R ## This might be a bad idea if it's too big, and you don't have lots ## of memory. It might still be worth trying, as it will convert ## everything to a '.csv' format (check the files in that directory), ## which all spreadsheet programs can read in. Even if it cannot be ## loaded into R, the 'csv' file should still be there. ## The result is an object of class 'pbat', ## results$res - this is a data.frame object, see ## the help files on pbat.m for more details. results <- pbat.load( res ) ## Now do whatever you want with it (i.e. see 'order') ## option 2: Just put the output together. ## If you've used just pbat you'll recognize this strange format. The ## output will be one file that is a spreadsheet, only delimited with ## '&' symbols, so you have to specify this when you load ## it into a spreadsheet program. pbat.concatenate( res, "results.txt" )
To submit these on the cluster here, I would do
bsub R CMD BATCH whatAreMyResults.R
for example.However there is a caveat with running the initial command on clusters where the nodes cannot submit jobs. What does this mean? Well, a cluster usually has a head node that you log into, and submit jobs from. This head node controls all the other nodes, telling them what to do. On LSF the way they have it set up here, the non-head nodes can also submit jobs (I believe they go back to the head node, which farms them off again). Now, why does this matter? The original script creates a bunch of jobs to be run, and then completes, as that is it's sole purpose. Thus this original script should complete almost instantaneously, sprouting off multiple jobs. So this original process you run needs to be able to submit jobs. On LSF, since a job can submit jobs, this is no problem. I anticipate this is not the case on all clusters? In that case, you can run the script instead with
R CMD BATCH process.R
. You might refer your cluster administrator to this page if your having troubles. This should work in all other cases since it's on the head node, so it should be able to send off all of the jobs.
Cluter help / advice - others advice
Feel free to contribute to this. A grad student from Johns Hopkins reports thatqsub -cwd -b y R CMD BATCH mycommands.Rhelps with the qsub batch command. She received the following error messages:
> > *Warning: no access to tty (Bad file descriptor).This has something to do with the clustering platform. I would guess then that you might set the cluster option to be something like
> > Thus no job control in this shell.
qsub -cwd -b y sh
The cluster here uses LSF, and I generally set the cluster option to be
bsub -o junk.txt shso that it doesn't send me an e-mail on every finished part (the -o junk.txt), and I vary the queue depending on the job.