Where I work, many of our users are involved in bioinformatics and recently one user was concerned with the time it took to convert an aligned FASTA file into an interleaved PHYLIP file for phylogenetic analysis. Using BioPython took a very long time and not to mention its in memory representation was many times larger then the actual file itself and this added to the difficulties the user was facing.
So I thought I could help out. Luckily an existing project existed, pyfasta. This great tool uses Numpy’s mmap to access a fasta file without having to read it completely into memory and then with some loops, I was able to convert to the phylip format. I’m also happy to report that the user is very satisfied with this program.
fast2phy can be found on github
To build exabayes(note, this is for version 1.2.1. 1.3 just came out and doesn’t build for me just yet) on Rocks 6.1, which is based on CentOS 6.3, LLVM’s clang and libc++ need to be installed. I have a previous blog post about this.
The available prebuilt binaries do not work on CentOS 6.3, but once clang and libc++ is installed, rebuilding it is fairly straight forward. Download and extract exabayes and go into its directory. Use the following commands to configure and build both the serial and parallel versions of exabayes:
CC=clang CXX=clang++ CXXFLAGS=”-std=c++11 -stdlib=libc++” ./configure
OMPI_CC=clang OMPI_CXX=clang++ OMPI_CXXFLAGS=”-std=c++11 -stdlib=libc++” CC=mpicc CXX=mpic++ ./configure –enable-mpi
OMPI_CC=clang OMPI_CXX=clang++ OMPI_CXXFLAGS=”-std=c++11 -stdlib=libc++” CC=mpicc CXX=mpic++ make
mpicc and mpic++ are just wrappers for gcc, but by using those environment variables, they can be pointed to another compiler without having to build a separate version of openmpi. Now that is done, within the top level directory are all the exabayes binaries. Ignore the ones in bin/bin, those are the prebuilt ones that don’t work.