Building exabayes(1.2.1) for Rocks 6.1

To build exabayes(note, this is for version 1.2.1. 1.3 just came out and doesn’t build for me just yet) on Rocks 6.1, which is based on CentOS 6.3, LLVM’s clang and libc++ need to be installed. I have a previous blog post about this.

The available prebuilt binaries do not work on CentOS 6.3, but once clang and libc++ is installed, rebuilding it is fairly straight forward. Download and extract exabayes and go into its directory. Use the following commands to configure and build both the serial and parallel versions of exabayes:

CC=clang CXX=clang++ CXXFLAGS=”-std=c++11 -stdlib=libc++” ./configure
make
OMPI_CC=clang OMPI_CXX=clang++ OMPI_CXXFLAGS=”-std=c++11 -stdlib=libc++” CC=mpicc CXX=mpic++ ./configure –enable-mpi
make clean
OMPI_CC=clang OMPI_CXX=clang++ OMPI_CXXFLAGS=”-std=c++11 -stdlib=libc++” CC=mpicc CXX=mpic++ make

mpicc and mpic++ are just wrappers for gcc, but by using those environment variables, they can be pointed to another compiler without having to build a separate version of openmpi. Now that is done, within the top level directory are all the exabayes binaries. Ignore the ones in bin/bin, those are the prebuilt ones that don’t work.

Rocks Cluster:Changing the external IP address

Seems simple enough, yes? Just recently had to move our entire infrastructure out of the main university server room and into the new research datacenter built just for groups like the one I work for. Though this also meant a change in the network as well. 

For those who use Rocks already, you’ll know right away that Rocks doesn’t use the nifty gui interface to manage the network devices, but goes straight to the network startup scripts in /etc/sysconfig/network-scripts. Don’t worry, this is the easy part that any sysadmin should know of, just edit the corresponding ifcfg-ethX file for your external network interface and change the information to what it needs to be(and don’t forget /etc/hosts).

Then second, update the Rocks database entry for the external ip of the head node like so:

rocks set host interface ip xxxxxxx ethX x.x.x.x

Where of course you fill in the blanks with your relevant information. 

This next part wasn’t so obvious and I didn’t know anything was wrong until later.

With the head node back online with its new ip, I started booting up the nodes, only to find they were not finding their way back into the grid engine. When I ssh’d to the nodes, found out they were still referencing the old external ip address when trying to communicate back to the master grid engine process.  Where was it even getting this information? Turns out, from the Rocks database, but didn’t I just fix that?

Not really, there is more. The database stores ip information for all the nodes, and as well as for Kickstart, which is why the nodes were using the old external ip address. Use rocks list attr to list all attributes and you’ll see the Kickstart entries and the old ip information. I used the following to fix that:

rocks set attr Kickstart_PublicAddress x.x.x.x 
rocks set attr Kickstart_PublicNetwork x.x.x.x 
rocks set attr Kickstart_PublicBroadcast x.x.x.x 
rocks set attr Kickstart_PublicGateway x.x.x.x 
rocks set attr Kickstart_PublicNetmask x.x.x.x
 

Ta Da! All done.

Getting MEME on the cluster

This was a program recently requested by one of our users. Since its publicly available open-source software, its something we can readily do. Though that was a bit more easily said then done. I won’t be covering how I set up the web interface as well I forgot how I did most of it. I can say it was alot of work and I really hate setting up multiple perl dependencies. I really wish more perl modules were available as rpms. It just makes it easier, but of course not 😛

But back on topic, while the web interface was built on another server that gets a lot of usage. So much so we try to encourage uses to do more work on the cluster then on this machine, which is more useful for large memory jobs as it as 128GB of ram. So on the cluster MEME goes. The software is readily available on their website, no hoops to got through and there is some nice, but not the best, documentation on how to setup the software.

So following their instructions, download, untar, apply any patches, in this case there were two, but only one of them worked. I didn’t bother with the second one as it only patched web related files. Run the standard configure script, set the prefix, but also included options for MPICH2 and to build the included libxml. During compiling and linking, it was having problems with the system version of libxml and not wanting to do anything to the cluster, I opted for the included version.

So it builds and I run the test. Second issue, perl dependencies. Since I wasn’t dealing with the web interface, the list of dependencies isn’t very long and the tests tell you which one is needed. Luckily I was able to find an rpm for it which lists it own dependencies, which again I was able to find rpms for. With the tests run successful, MEME gets installed.

Next was testing the parallel execution. Our Rocks based cluster uses OpenMPI by default and its well integrated into the cluster, but MEME doesn’t support it. It is either LAM or MPICH2. Since MPICH2 is already installed and working, I went with that(On the big memory machine, LAM was used). At the configure stage, you specify the MPICH2 directory and binaries and it takes care of the rest. It should be noted, one of the setup requirements for using MPICH is to create .mpd.conf in your home directory and specify some password within. This is for securing communication between mpd daemons, so no cross-talk between different user jobs.

So to run MEME, the following is an example job script for submition to the scheduler(grid engine):

#!/bin/bash
#$ -pe mpich2 5
#$ -N meme_test
#$ -j y
#$ -cwd
#$ -S /bin/bash
export MPICH2_ROOT=”/opt/mpich2/gnu”
export PATH=”$MPICH2_ROOT/bin:$PATH”
export MPD_CON_EXT=”sge_$JOB_ID.$SGE_TASK_ID”
time /share/apps/meme/bin/meme -p $NSLOTS INO_up800.s -dna

-mod anr -revcomp -bfile yeast.nc.6.freq
exit 0

*Note the “" is just to show that this really should be a single line.

Thats about it, luckily it wasn’t as painful as the web interface, and even nicer is the configure option to specify the use of a particular web server. So output generated will have links back to the database information hosted on the big memory machine.

Rocks and 10gig hardware don’t mix

Well at least that was my current experience. I should mention I’m running Rocks 5.3, so just a point version behind. But here is the story:

A faculty member just recently purchased some dell blade servers for some research work. These blades and thus the chassis came with 10gig ethernet hardware. Cool. Setup the hardware, check. Plug in all the cables(but no 10gig since we don’t have 10gig hardware), check. Setup software, uh oh.

The problem happened when I booted the nodes to have the Rocks installer image them. They pxebooted just fine off of the first ethernet device(just a plain ol 1gig connection). Linux loaded, the installer runs, it tries to find an ip and fail. The installer was scanning eth0, eth2, then eth1. Turns out the kernel was numbering eth0-3 the 10gig nics and eth4 was the 1gig nic it should be using.

A few days on the mailing list to no avail. They just gave up on me, but I never give up. I narrowed it down a problem with the kickstart script overriding options I set in the pxeboot config. I added IPAPPEND 2 and ksdevice=bootif. This tells the system to use the same device it booted from. Well that wasn’t working. Not until I tell it to not run the kickstart script, by removing the ‘ks’ option, was it able to use eth4 as it should have been. But the mailing list failed me and offered no solution. Drivers! Bios! Update!! No no no! But whatev, just had to do it the hard way.

Back to the server room, removed the 10gig cards from the nodes, eth0 was now the 1gig nic, install os, reinstall hardware, and done. Luckily it was only two nodes, but still there should have been a software solution to fix this, but life goes on.

The Cloud: that internet thingy

the cloud

And just yesterday that buzz word came into academia with a presentation by one of the computing groups here at the university. So of course I had to go and see what it was about. Everyone one else were CS grad students and then little ol’ me. Short and mostly on point and then I got to leave before I fell asleep.

I don’t have an interest in the cloud. I manage a computing cluster and while in some regards thats similer to the concept of the cloud, its not something I see our current users thinking of as useful. But then maybe I’m just not being creative enough, but I don’t see someone booting up a virtual cluster in the cloud when all they want is some big machine on which to run their program. I think they rather have a web interface to run their program then have to go build a cluster(virtual or not).

And thats my job, to build and take care of that cluster. I see the cloud as something for IT and web stuff. Why host your own website when you could just put it up in the cloud? Saves money too since you don’t have to buy hardware. Me? I’d rather have my machine in front of me, so I guess I’m old school like that.

So as it stands, the cloud is cool and all that(when it works), but I don’t see much use of it in academia. Research into engineering networks and dealing with failures should interest those CS majors, but for those who just want to get things done, there is the cluster.