We use Linux, Apple Macintosh and Microsoft Windows. The Kirschner lab mostly uses Linux, currently Ubuntu 10.04. The Linderman lab mostly uses Mac (like the College of Engineering in general). These are some specific systems used for development, model runs and storing model run results, in addition to individual user desktops and laptops.
When installing software on a system it is generally best to download the install package for that software and follow the official installation instructions, rather than copying directories from a system that already has that package installed. THis will install the latest version of the application and it will be the proper application version for that system. There are circumstances where copying from an exisiting installation will work but this can make it difficult to manage the installation from then on, ex. when upgrading or uninstalling the package.
The following specific systems are used. See the Kisrchner lab or Linderman lab system administrator for IDs and passwords to access systems.
Axiom is a compute cluster maintained by the Medical School and CCMB (Center for Computational and Molecular Biology). The main contact person is Jonathon Poisson, jdpoisso@umich.edu. Another contact person is Jim Cavacoli, cavalcol@umich.edu. We use Axiom for LHS runs. Typically we get between 50 and 120 CPU cores when we run an LHS. Axiom has Intel CPUs running Red Hat Linux. It uses PBS for its job schduling. See the CCMB Cluster Usage Guide for more details about using Axiom. This site has a nice overview of using PBS.
You will need an account set up to access Axiom. Ask one of the Axiom contact person to have one setup for you. Your user ID and password will be your umich unique name and kerberos password (the same ID and password you use for your umich e-mail). To access Axiom you log on via ssh to the Axiom head node, which runs Red Hat Linux. You cannot log on to Axiom from outside the university network unless you first logon to a computer on the university network. You also cannot log on to Axiom from North Campus. We are not sure about central campus, but you can definitely log on from the medical school network, such as in the Kirschner Lab. For example ssh to helico and from there ssh to axiom. An alternative is to use a vpn client, in which case you would start the vpn client on your local computer and then ssh to axiom directly. See this ITS web page for instructions on how to use a vpn client with the university network. We have not had any experience using the vpn client approach.
Once you have an account on Axiom, to run an LHS do the following:
make
, and copy the executable to the LHS main directory. The run script will expect it there. Also,
you will be able to make changes to the model for different LHS runs, such as doing a knockout LHS and a
depletion LHS, since those sometimes require small model changes.
make lhs
qstat -u ID
where ID is your
axiom logon ID, ex. qstat -u pwolberg
.
There are some problems that happen on occasion when running jobs on Axiom. If any of these happen, contact the main Axiom system administrator mentioned above.
Right now actual priority is undefined in the scheduling system, it is instead operating on a soft cap, which is set to distribute 8 cores to each user (not counting when the resources are requested by specific users who have purchased resources on the system) before assigning the remaining cores in excess of the soft cap to waiting jobs.
By default the scheduler as configured now is assigning the jobs first come first served, which would naturally put your jobs to a disadvantage, but the soft cap mechanism interferes with the normal mechanism on a user's jobs in excess of the cap. This means when it does reevaluations it may allow another user to grab up all the cores.
I have increased your soft cap to 64 cores for the time being, but as you may of observed the system is experiencing some abnormally high load so it may not be able take advantage of that increased cap immediately.
Some Axiom users have priority on some of the compute nodes, since those users paid for those nodes. Because of this, jobs running on those nodes by other users (our jobs, for example) may get preempted by a job for a user with higher priority. In this case the lower priority job is halted and placed back on the input queue, to be re-run from the start (there is no automatic checkpointing). Also, the system does not automatically clean up any files created by the preempted job. This means that our PBS scripts must be written to take preemption into account. The directory to receive job result files on the head node file system must be emptied before copying files from a compute node's scratch disk space to that destination directory.
Note also that because of preemption there may be more job result log files than expected, which may also make job counts in job status scripts not add up correctly.
Unlike the Flux cluster, Axiom does not use modules to manage dependencies on specific versions of libraries and tools. This can cause some problems with using the Boost library and the g++ compiler. The default g++ compiler is 4.1.2. On Axiom, g++ version 4.4 is available as g++44 and g++ version 4.6 is available as g++46. For example, it is not possible to build a version of a model on one of our Linux systems and then run that on Axiom, since the executable will expect a particular version of the Boost library and typically those will be different on our systems than the one used on Axiom. The executable will quit immediately with an error message stating the version of the Boost library found and the versione xpected. Obviously this can happen with any library used by a model, not just Boost. In any case, when running on Axiom, or any cluster, it is better to build on the cluster head node.
The g++ "-march=native" command line option should not be used on Axiom. When performing a build on a head node that option will create object code for the specific processor type on the head node, which is not the same as the processors on the compute node. For example, the head node processors support ssse3, ssse4.1, and ssse4.2 vector instructions, whereas the compute nodes only run sse and sse2.
Flux is the UM campus wide cluster. It is managed by CAC. CAC also manages the Nyx cluster. The difference is that Flux usage is available on a fee basis, whereas Nyx is free. Nyx typically has long wait times in its batch queue. We do not yet use Flux because we have access to the med school Axiom cluster, which is free for us since we are part of the med school. Flux uses PBS. It also uses modules for managing dependcies on specific versions of libraries and tools. See the Flux web site for more information on using Flux and the fee structure.
4 cpu, 8 core desktop system. Used for small compute runs, such as small LHS runs. Also used as a development system. Runs Ubuntu 10.04.
innoculant is a computer in the Kirshner lab that is used to run some server applications. It has a file sharing web server at http://innoculant.micro.med.umich.edu/ftp2/. Files can be uploaded to this server for viewing (such as this documentation) or for download. This avoids various issues with transferring files via e-mail - for example file size and file format restrictions. The file sharing server requires authentication - it will ask for an ID and password when accessing a page for the first time in a session. This authentication will be valid for some extended period of time. When it times out it will ask for authentication again on the next page access.
innoculant also has the subversion source archive server at svn://innoculant.micro.med.umich.edu/dev.
4 cpu, 8 core desktop system. Used for small compute runs, such as small LHS runs. Also used as a development system. Runs Ubuntu 10.04.
The model run server. It runs a web server that allows access to the results of model runs, at http://necrosis.micro.med.umich.edu/ftp2/. It is also used for post processing of model runs to produce run reports. This typically requires access to a graphical interface, either by being physically at the machine or using a remote desktop application like vnc (which has not been setup on necrosis).
snow0 is a 32 bit desktop system in the Kirschner lab running Ubuntu Linux 10.04. It can be used for post-processing model run results in case necrosis is not available. It can also be used as the primary desktop for grad students or post docs in the lab.
Teragrid is a US national resource for high end scientific computing. They provide access to high performance computing - clusters, vector computers, etc., large data storage facilities, various software and database resources. Here is the Teragrid official web site.
Teragrid resources are free but user's must apply for a Teragrid allocation. The UM CAC (Center for Advanced Computing) is the campus resource for Teragrid. They can help with applying for a Teragrid allocation and the CAC Teragrid web page has more information about applying for an allocation and using it once granted. We are not yet making use of Teragrid resources.
Teragrid is being phased out and is to be replaced by a new initiative called XSEDE (eXtreme Science and Engineering Discovery Environment).
XSEDE is the successor to Teragrid. It is a US national resource for high end scientific computing. They provide access to high performance computing - clusters, vector computers, etc., large data storage facilities, various software and database resources. Here is the XSEDE official web site. We are not yet making use of XSEDE resources.