Accessing BAM files from an Assembly Project file

All assemblies are stored using the BAM file format. This is a binary file that stores each read and where and which consensus/contig/reference it is mapped against. It is a compressed version of the pure text SAM format. For some post assembly tasks it is necessary to do further processing on the BAM file.

To make your filesystem tidier Assembler stores all assemblies and associated files as a single Assembly Project. This hides away all the multiple files that an assembly creates. However, the individual files are still easily accessible.

Chr19TEST contigassembly Project

The Assembly Project file is actually a OS X Package file. This is a folder, containing multiple files, that is treated as a single file by OS X Finder. They are a great way of organising multiple files. OS X uses this to store many different application files.

To view the contents of a Package

  • Right click on the assembly project
  • You will find the BAM file (and its index) in there.

    If you are working from the command line then a simple cd will do the same.

    cd projectname.contigassembly

    Do note that the original read files (apart from Sanger trace files) are not stored here. To save diskspace a link to the original file is used. This is not a filesystem link but rather inside the file. The OS X filesystem will still keep track of this file though. So the link will be updated if the reads are moved. However, if they are on a remote filesystem or some other separate storage the link may be broken. If it is broken then you can restore it by double clicking on the link inside the assembly project and clicking RELOCATE.

    Also remember that you can import a BAM (or SAM) file directly into an Assembly Project and associate it with a reference sequence.

    This entry was posted in Tips and tagged , , , . Bookmark the permalink. Both comments and trackbacks are currently closed.