Building exabayes(1.2.1) for Rocks 6.1

To build exabayes(note, this is for version 1.2.1. 1.3 just came out and doesn’t build for me just yet) on Rocks 6.1, which is based on CentOS 6.3, LLVM’s clang and libc++ need to be installed. I have a previous blog post about this.

The available prebuilt binaries do not work on CentOS 6.3, but once clang and libc++ is installed, rebuilding it is fairly straight forward. Download and extract exabayes and go into its directory. Use the following commands to configure and build both the serial and parallel versions of exabayes:

CC=clang CXX=clang++ CXXFLAGS=”-std=c++11 -stdlib=libc++” ./configure
make
OMPI_CC=clang OMPI_CXX=clang++ OMPI_CXXFLAGS=”-std=c++11 -stdlib=libc++” CC=mpicc CXX=mpic++ ./configure –enable-mpi
make clean
OMPI_CC=clang OMPI_CXX=clang++ OMPI_CXXFLAGS=”-std=c++11 -stdlib=libc++” CC=mpicc CXX=mpic++ make

mpicc and mpic++ are just wrappers for gcc, but by using those environment variables, they can be pointed to another compiler without having to build a separate version of openmpi. Now that is done, within the top level directory are all the exabayes binaries. Ignore the ones in bin/bin, those are the prebuilt ones that don’t work.

Building libc++ on CentOS 6

For the cluster I manage, a user needed exabayes(there will be another post on building that later) but their prebuild binaries didn’t work on Rocks 6.1, which is based on CentOS 6.3. GCC is too old to build it as they use C++11, but luckily clang 3.4 is available from EPEL. Only thing, it still wouldn’t compile. I got the following two errors:

/usr/bin/../lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/exception_ptr.h:143:13: error:
unknown type name ‘type_info’
const type_info*
./src/Density.cpp:79:34: error: use of undeclared identifier ‘begin’
double sum = std::accumulate(begin(values), end(values), 0. );

While there was a potential workaround for the first error, nothing viable was to be found for dealing with the second error. But this research pointed in the next direction, building LLVM’s libc++ as these errors have to do with GCC’s old version of the standard c++ library. Its a bit complicated and rather hackish but looks like it works, so here we go.

Download libc++ via svn, but instead of following their directions for building, do this:

cd libcxx/lib
./buildit

Thanks to this blog post, which is in Chinese, but the commands are easy to understand. After building the library, copy it to /usr/lib(or because this is 64bit, I put it in /usr/lib64) and create the needed symlinks. Then copy libcxx/include to /usr/include/c++/v1. Remember this as we’ll be replacing libc++ later with a rebuild version.

Next is building libc++abi. Again download from svn and build it like above and copy the library to /usr/lib64 and make the symlinks. The include directory doesn’t need to be copied. Now time to rebuild libc++ with libc++abi. This requires CMake, and I opted for the newer version available from EPEL. The command is then cmake28. I also started with a fresh download of libc++

cd libcxx
mkdir build
cd build
CC=clang CXX=clang++ cmake28 -G “Unix Makefiles” -DLIBCXX_CXX_ABI=libcxxabi -DLIBCXX_LIBCXXABI_INCLUDE_PATHS=”<libc++abi-source-dir>/include” -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr ../
make

Since I don’t like to mess with the system install, I used DESTDIR during the make install step. This then allows me to build an rpm package using rocks create package. I also create a package for libc++abi.

With this now its possible to compile with clang and c++11. Test it out like so: clang++ -stdlib=libc++ -std=c++11 input.cpp

Building GCC 4.7.2 for Rocks Cluster

Its was not easy, but here were the basic steps I used. First and foremost read the prerequisites, both online and within the documentation that comes with the tarball. They really should update the website. Download the following, first from gcc’s infrastructure ftp site, cloog-0.17, isl-.10, and ppl-0.11. Then from their respective sites, download, gmp, mpfr, and mpc. Then finally the latest gcc source tarball.

I use the following blog post as my starting point, also checking the spec files from RedHat rpms to get an idea of ‘standard’ config options. I setup the following environment:

  • I create a folder where I’ll be doing all this work in, I like to call it workshop
  • Download all my source packages into workshop and then make another directory, builds, to hold the final results of each built package. So builds/gmp, builds/mpfr, etc
  • Also to keep from contaminating the source and for simple cleanup when I had to redo a build, always create a separate folder within a source package’s directory to build in. Its just good practice to do so.

First off, gmp:
tar -jxf gmp-5.0.5.tar.bz2
cd gmp-5.0.5
mkdir build
../configure --prefix=/opt/hpc/gcc --build=x86_64-linux-gnu --enable-mpbsd --enable-cxx

As I plan on building the Graphite extensions, the --enable-cxx is important

For mpfr:
../configure --prefix=/opt/hpc/gcc --build=x86_64-linux-gnu --disable-assert --with-gmp=/opt/hpc/gcc

for mpc:
../configure --prefix=/opt/hpc/gcc --build=x86_64-linux-gnu --with-gmp=/opt/hpc/gcc --with-mpfr=/opt/hpc/gcc

Now while they have moved on from using PPL within Cloog, its still needed to compile gcc

for ppl:
../configure --prefix=/opt/hpc/gcc --build=x86_64-linux-gnu --enable-shared --disable-rpath --with-gmp-prefix=/opt/hpc/gcc

for isl
../configure --prefix=/opt/hpc/gcc --build=x86_64-linux-gnu --with-gmp-prefix=/opt/hpc/gcc

for cloog:
../configure --prefix=/opt/hpc/gcc --build=x86_64-linux-gnu --with-gmp=system --with-gmp-prefix=/opt/hpc/gcc --with-isl=system --with-isl-prefix=/opt/hpc/gcc --with-isl-exec-prefix=/opt/hpc/gcc --with-bits=gmp

Here the --with-bits=gmp is specified in the GCC Prereq’s page

Finally GCC, but there is a small ‘bug’ that keeps it from using Cloog 0.17. Simply edit the configure script and replace the version number it is looking for(0.16.1) with 0.17.0. There is a patch. Also read the included docs to know to include this option: --enable-cloog-backend=isl

I then also used this post, along with checking the spec file for various config options. One warning, do not use --enable-gnu-unique-object unless you are running the latest glibc and gnu assembler, or else make will error out. Also I included LD_LIBRARY_PATH or else compilation of libgcc fails as it will complain it cannot find the shared libraries.

for gcc:
LD_LIBRARY_PATH=/opt/hpc/gcc/lib ../configure --prefix=/opt/hpc/gcc --build=x86_64-inux-gnu --with-gmp=/opt/hpc/gcc --with-mpfr=/opt/hpc/gcc --with-mpc=/opt/hpc/gcc --with-ppl=/opt/hpc/gcc --with-cloog=/opt/hpc/gcc --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-tls --enable-libgomp --disable-nls --with-fpmath=sse --enable-cloog-backend=isl --enable-languages=c,c++,objc,obj-c++,fortran

While I did install where I wanted them to go, for packaging reasons I used make install DESTDIR=/path/to/builds/packageName to separate each package. I then use rocks create package to build individual rpms for each.

Packaging GCC with Rocks

Our cluster is still working off of CentOS 5.4, so GCC is a bit out of date for some scientific simulation programs out there, so begins the task of building GCC, but that is not the point of this posting. Rocks has the nifty command, rocks create package, to create an rpm package out of a given directory. This works without any issues until I tried it when I was working on rebuilding our our software stack to be based off of the latest GCC.

I had no issues with gcc or with making the package, it was later when building openmpi I discovered what was the issue. When openmpi had to link against a GCC library, it was asking the binary where it was, which in turn was using a hard coded location that no longer existed as this was not the orginal build machine. The fix is the use of DESTDIR. In this case I wanted gcc to be in /opt/gcc-4.7.1, so during the configure step, I set the prefix as such, yet I used the following during the install step: make DESTDIR=/tmp/gcc install, which will install gcc in $DESTDIR/$prefix. Now GCC knows to find its items in /opt/gcc-4.7.1 and not where it was previously on the build server.

Grid Engine and Limits

Our cluster has slowly grown over time with new additions, but the oldest group of nodes are the only ones with infiniband, something we’ve never got around to configuring after rebuilding the cluster the first time. Well now the time came to give it another shot. Using the OpenFabric’s OFED distribution, I installed just the kernel drivers and needed libraries, I planned on building a different version of OpenMPI later. Whats nice about this distribution is it will build rpms for you, so after testing on one node, I copied the rpms to the head node and added them to the list of rpms to install.

Then picking a few more nodes to test the installation on, this is where my troulbles began. I could manually ssh to a node and run the OSU benchmarks without an issue, but whats the point of that if you can run it distributed? So I make a job script and submit, only to find it crashing with the following:

libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
This will severely limit memory registrations.

Strangely limits were being set, even though /etc/security/limits.confwas empty. Thanks to the folks in the Rocks mailing list, I found I needed to add: H_MEMORYLOCKED=infinity to the cluster via qconf -mconf and add it to execd_params.

Rocks Cluster:Changing the external IP address

Seems simple enough, yes? Just recently had to move our entire infrastructure out of the main university server room and into the new research datacenter built just for groups like the one I work for. Though this also meant a change in the network as well. 

For those who use Rocks already, you’ll know right away that Rocks doesn’t use the nifty gui interface to manage the network devices, but goes straight to the network startup scripts in /etc/sysconfig/network-scripts. Don’t worry, this is the easy part that any sysadmin should know of, just edit the corresponding ifcfg-ethX file for your external network interface and change the information to what it needs to be(and don’t forget /etc/hosts).

Then second, update the Rocks database entry for the external ip of the head node like so:

rocks set host interface ip xxxxxxx ethX x.x.x.x

Where of course you fill in the blanks with your relevant information. 

This next part wasn’t so obvious and I didn’t know anything was wrong until later.

With the head node back online with its new ip, I started booting up the nodes, only to find they were not finding their way back into the grid engine. When I ssh’d to the nodes, found out they were still referencing the old external ip address when trying to communicate back to the master grid engine process.  Where was it even getting this information? Turns out, from the Rocks database, but didn’t I just fix that?

Not really, there is more. The database stores ip information for all the nodes, and as well as for Kickstart, which is why the nodes were using the old external ip address. Use rocks list attr to list all attributes and you’ll see the Kickstart entries and the old ip information. I used the following to fix that:

rocks set attr Kickstart_PublicAddress x.x.x.x 
rocks set attr Kickstart_PublicNetwork x.x.x.x 
rocks set attr Kickstart_PublicBroadcast x.x.x.x 
rocks set attr Kickstart_PublicGateway x.x.x.x 
rocks set attr Kickstart_PublicNetmask x.x.x.x
 

Ta Da! All done.