Assemble bacterial genomes in minutes on your Mac laptop

Apr 19, 2018

—

MacVector with Assembler contains some remarkably powerful algorithms for assembling Next Generation Sequencing (NGS) data. Not so long ago, you needed a powerful Linux server with lots of memory for de novo assembly of whole genomes. But with advances in the efficiency of algorithms and improvements in hardware, it is now possible to assemble quite large genomes on a Mac laptop.

MacVector 16 incorporates two separate NGS de novo assemblers, Velvet and SPAdes. Both are very capable assemblers with a small memory footprint. Velvet is significantly the faster of the two, but SPAdes often generates longer contigs as it does a slightly better job at resolving repeats, plus it can handle many more data types for mixed read assemblies and has a smaller memory footprint, allowing it to be used for larger data sets. With Velvet you often need to tweak the parameters for optimal performance, whereas SPAdes usually “just works”. SPAdes can often generate meaningful assemblies from relatively poor data where Velvet will fail without considerable tweaking of the parameters.

Both are invoked the same way: use File | New | Assembly Project to create a new project, then click on the Add Reads button and select the read files you want to import. Typically these are paired-end reads (either interleaved or as separate files), but they can be unpaired reads, consensus sequences exported from a different assembly, Ion Torrent, PacBio or Oxford Nanopore reads. You can also import compressed (gzip) files directly, with no need to uncompress them, saving a lot of disk space. Finally, click on the Velvet or SPAdes toolbar button to run the algorithms. The end result will be a number of contigs.

Here are some examples of performance, with all tests run on a 2013 2.7 GHz MacBook Pro with 16 GB RAM

NewImage

In the case of the small Mycobacterium genome, Velvet completed the assembly in a little over a minute. Even a moderately large ~7 Mbp Streptomyces sp assembly of 5 million HiSeq reads took just 16 minutes with Velvet and less than an hour with the more memory efficient SPAdes algorithm.

For a more in depth discussion of these results, please see our recent blog post.