Accessing BAM files from an Assembly Project file

All assemblies are stored using the BAM file format. This is a binary file that stores each read and where and which consensus/contig/reference it is mapped against. It is a compressed version of the pure text SAM format. For some post assembly tasks it is necessary to do further processing on the BAM file.

To make your filesystem tidier Assembler stores all assemblies and associated files as a single Assembly Project. This hides away all the multiple files that an assembly creates. However, the individual files are still easily accessible.

Chr19TEST contigassembly Project

The Assembly Project file is actually a OS X Package file. This is a folder, containing multiple files, that is treated as a single file by OS X Finder. They are a great way of organising multiple files. OS X uses this to store many different application files.

To view the contents of a Package

  • Right click on the assembly project
  • select SHOW PACKAGE CONTENTS.
  • You will find the BAM file (and its index) in there.

    If you are working from the command line then a simple cd will do the same.

    cd projectname.contigassembly

    Do note that the original read files (apart from Sanger trace files) are not stored here. To save diskspace a link to the original file is used. This is not a filesystem link but rather inside the file. The OS X filesystem will still keep track of this file though. So the link will be updated if the reads are moved. However, if they are on a remote filesystem or some other separate storage the link may be broken. If it is broken then you can restore it by double clicking on the link inside the assembly project and clicking RELOCATE.

    Also remember that you can import a BAM (or SAM) file directly into an Assembly Project and associate it with a reference sequence.

    Posted in Tips | Tagged , , , | Comments closed

    de novo assembly with Velvet

    Velvet is a short read aligner that works very well on a wide variety of reads. Velvet excels at de novo assembly of sequencing reads from second and newer generation sequencers.

    In our latest release, MacVector 13, we’ve added Velvet to Assembler. This joins the existing tools, Phrap for Sanger sequencing reads and Bowtie for reference assembly if short reads. Assembler’s toolkit gives the user great flexibility for assembling sequencing data in a single application.

    Velvet was developed by Daniel Zerbino and Ewan Birney at the EBI in Hinxton (Just outside Cambridge, UK). Amongst a large range of short read aligners available it’s widely recognised as being one of the best.

    Velvet is ideal for assembling Illumina sequencing reads of bacterial genomes on a mid range Mac. With paired read data it produces very good contigs. For large datasets, especially ones with longer reads, it does use a lot of RAM. However, for bacterial genome size projects of most types of sequencing data it works very well on even quite modestly powered desktop Macs.

    The nice aspect of the interface in Assembler is that all the difficult work is done for you. You do not have to be familiar with the command line and the myriad options of the application. Normally running Velvet includes preparing the project and running at least two command line tools (velvetg and velveth) to prepare your reads and then assemble them. All you need to do is import your reads in fastq format, and click run. Assembler will build the indexes needed then run the assembly.

    Analyses and SRR155nnn contigassembly Project

    If your reads are paired end then just check the appropriate box and Assembler will do the rest for you.

    Plus other than installing MacVector itself, you do not need to download and install lots of software plugins (or update them with new releases). MacVector and Assembler contains all you need readily installed to use.

    A simplified explanation of the Velvet works is that it first breaks up all the reads into kmers (words) of a specified length. The kmer value must be shorter than your reads and an odd number (to minimise palindromic matches). Shorter kmers will produce longer contigs at the risk of less accuracy due to a higher probability of spurious matches, whereas longer kmers will produce more accurate contigs as the longer overlaps between kmers will be more specific. It then constructs a De Bruijn graph from the reads and uses matches between nodes on the graph to produce links between kmers and so construct the contigs. This process is repeated using reverse complemented kmers.

    Hybrid assembly is particularly straightforward with Velvet. Coverage depth is not always consistent over a data set. Using longer reads sequenced really does help produce longer contigs. Long reads are treated differently to short reads and there is a different set of parameters for the longer reads. Just set the upper limit of the read length of your short read dataset and MacVector will determine which files contain your long reads and treat them accordingly.

    Velvet will join two contigs as long as you have a certain number of paired reads spanning the distance between the contigs. Even though there are no reads between these two contigs. This technique is called scaffolding and really helps extend contig length. With single reads the two contigs would be separate. However, with paired reads if each read of a pair is correctly mapped against a different contig then those two contigs must be close. The insert length (the length of the sequenced fragment) determines how distant the contigs are and Velvet fills in that distance of unknown bases as a string of N’s.

    The resulting contigs are shown at both sequence level and an overview showing the coverage map. However, even with the graphical map you can zoom down to residue level to see the consensus sequence. To improve performance individual reads are only shown in the editor. The coverage map allows you to see areas that need more work. For example a region that would benefit from some Sanger sequencing for hybrid sequencing.

    The summary shows statistics about your assembly including the all important N50 statistic. N50 is a reliable quality statistic of your assembly. It is the length of the shortest contig in a ranked list of all contigs, where the sum of all contigs longer than this length is equal to 50% of the sum of the lengths of all contigs in the list.

    Now with Assembler you have many options for mapping and processing all your sequencing reads!

    References:

    Velvet: Algorithms for de novo short read assembly using de Bruijn graphs
    Daniel R. Zerbino, Ewan Birney
    Genome Res. 2008 May; 18(5): 821–829. doi: 10.1101/gr.074492.107
    PMCID: PMC2336801

    Velvet on Wikipedia

    Posted in Releases, Tips | Tagged , , , | Comments closed

    Calculating the melting temperature of PCR primers

    QuickTestAnimation2

    Calculating an accurate melting temperature of your oligos, your template and of the predicted product is important to set the cycling parameters of your PCR machine.

    The Tm calculations in MacVector were updated in MacVector 12.6 to use a more modern algorithm. MacVector has always used thermodynamic “nearest neighbor” calculations, but there were two changes in MacVector 12.6:

    1) MacVector now uses the combined thermodynamic parameters of Santa Lucia (PNAS 95:1460-1465, 1998) which have become widely accepted as the most accurate values.

    2) We also now make adjustments for Mg++ and dNTP concentrations in the reaction mixture as described by (Ahsen et al, Clinical Chemistry 47:1956-1961, 2001). These values are now shared by all primer tools in MacVector. However, if you have been using MacVector for many versions it is possible that you will need to “default” the primer settings to clear any older values from the settings.

    Additionally the Test PCR Primer Pairs tool was deprecated in MacVector 12.7.5 and it has been removed from MacVector 13. The Design Primers interface in MacVector 13 has new functionality for easily testing a pair of primers. There is nothing that the old tool provides that you cannot do with the MacVector 13.

    Quicktest Primer does display the Tm results for both the new (“Santa Lucia”) and older (“Breslauer”) calculations, although to get the same results as MacVector 12.5 you will need to clear the Mg++ and dNTP concentration to 0 (as mentioned above).

    Posted in Releases, Tips | Tagged , | Comments closed

    Troubleshooting: Resetting MacVector’s preferences

    • EDIT: December 7, 2018 Updated for macOS Mojave
    • EDIT: August 26, 2014 The script now works for all versions of OS X. For versions before Mavericks the old preferences files are moved to the Desktop rather than being deleted.


    MacVector generally just works. However, even the most well behaved of applications sometime have problems. If you have restarted the application and restarted your Mac then a good troubleshooting step to perform next is to reset your preferences.

    For Mountain Lion and earlier deleting the preferences meant opening up Finder, navigating to your ~/Library folder and deleting a few files. However, with Mavericks it’s easier and with Applescript it’s even easier than that!

    Download Reset Prefs script

    It goes without saying to only run this if you are having problems or on the direction of Support. There is no UNDO!

    You may need to activate your license again or open up the license preferences and choose your current license. So have your details at hand.

    Another reason to do this is if you have not reset your preferences for many years or clicked defaults in all the tools, it’s possible that you are not using the optimum settings for a tool. Where possible MacVector does try to preserve preferences even when upgrading from very old version to the latest. With every release we do tweak the settings to achieve the optimum results.

    Running from the command line

    If you are comfortable running commands from the command line then you can run the following command on all macOS versions after Mavericks. Note that this immediately and irreversibly deletes your MacVector Preferences.

    defaults delete com.macvector.MacVector

    Posted in Tips | Tagged , | Comments closed

    Importing sequences from ENSEMBL

    There’s a few different ways to import annotation from the ENSEMBL database browser, as well as other databases.

    Using Genbank

    The easiest way to export from ENSEMBL and keep all annotation is to use the Genbank format. The default format will be FASTA which has no annotation. With Genbank all the annotation is stored in the file and will be imported by MacVector.

    Ensembl genome browser 75 Homo sapiens Summary Gene PARL ENSG00000175193

    In ENSEMBL

  • Choose the region of interest
  • Click EXPORT DATA
  • in OUTPUT choose GENBANK and click NEXT
  • Choose the TEXT format.
  • Depending on which browser you are using you can just SELECT ALL, COPY then use FILE > NEW FROM CLIPBOARD in MacVector to import the file. MacVector 12.6 and above will recognise the fact the copied data is in Genbank format and create the properly annotated sequence.

    If you are using an older version you may first need to create a plain text file (using TEXTEDIT) and save that to your Desktop.

    Import Features

    However, using a flat file format is generally a one way operation. If you need to add newer annotation (e.g. from publications from after your initial import) then you unless you like performing lots of manual annotation then you want to stay away from using Genbank a second time.

    Instead the most flexible way would be to use the IMPORT FEATURES tool. This allows you to export annotation, using a format such as GFT, GFF or BED, from a genome browser (such as ENSEMBL) and import this to an existing sequence.

    This allows you to work on your own sequence, but keep it up to date with published annotation about that sequence.

    The Import Features tool was introduced into MacVector 12.5.

  • choose the region of interest
  • Click EXPORT DATA
  • in OUTPUT choose GFF and click NEXT
  • Ensembl genome browser 75 Homo sapiens Summary Gene PARL ENSG00000175193

  • Choose the TEXT format.
  • Save this to your Desktop
  • Open up MacVector and open your original sequence
  • Choose FILE > IMPORT FEATURES
  • Choose your GFF file and click OK
  • Depending on the sequence start and stop points you may need to adjust the sequence numbering. For example if you are trying to annotate a single gene but the GFF is still numbered as if it was on a chromosome.

    The easiest way is to adjust the numbering of your original sequence.

    To do this:

  • Double click on the red cross at the start of your sequence.
  • Enter the new start number.
  • 1 Editor

    If you have copy and pasted this sequence from an existing chromosome sequence then the original numbering will be preserved.

    Posted in Tips, Tutorials | Tagged , , | Comments closed

    Implementing a new activation code for network license users

    When you’ve got a lot of licenses and a lot of computers to manage, a network license is the most effective way of letting your users access their favourite sequence analysis app for the Mac!

    MacVector network license use the KeyServer network license software from Sassafras. However, in addition to the KeyServer setup MacVector also needs to be activated on each client.

    The actual activation procedure is very simple. To activate a license on a client go to the OPTIONS menu in MacVector and click on ACTIVATE LICENSE… Then you simply need to enter the Licenseholder, the serial number and the activation code, exactly as they are written.

    The actual KeyServer side rarely needs updating. However, whenever you renew your maintenance you will be supplied with a new activation code. When there is a new release you will need to have previously activated each client with the new code. So it is good practice to implement the new activation code whenever you receive it.

    For network licenses this activation file is not hardware dependent, and so the file can be simply copied in place on each client machine to activate that copy. It does not matter if the hardware is changed, as long at that file is present then MacVector will be able to run and checkout a license file from the Sassafras KeyServer.

    This also means that if you ‘push’ out the software to clients machines, you can also activate a single client and send the ‘activation file’ out to each client.

    The license details are stored in the following file.

    /Library/Application Support/MacVector/MacVector License Information

    This file needs to be simply copied in place on each client machine to activate that copy. Permissions must be preserved for the file. This file will also get migrated to another machine if you use the Apple Migration Utility.

    As usual contact MacVector Support if you have any questions about this.

    Posted in Tips | Tagged | Comments closed

    Testing primers with MacVector 13

    New to MacVector 13 is the ability to quickly test a pair of primers.

    Previously to test a pair of primers with the Primer3 tool you needed to modify the expected product and reduce the stringency of all parameters so that your primers would be accepted.

    Primer Design Primer3 and pBR322 nucl Map

    Now when you enter a pair of primers into Design Primers (Primer3) the interface switches to a new testing mode, TEST PRIMER PAIRS. At this point all parameters are relaxed so that your primers are always accepted.

    A new enhanced summary page then shows statistics about your primers.

    PBR322 nucl Map

    All reaction conditions are now shared. so if you change [dNTP] in one tool they are changed and used for all other tools to achieve consistency of results.

    .. and don’t forget that since MacVector 12.6 all the tools in MacVector use the Santa Lucia algorithm for calculating melting temperature of the primers. This algorithm by John SantaLucia (PNAS 95:1460-1465, 1998) has become widely accepted as producing the most accurate values for the melting temperature of DNA.

    Posted in Releases, Tips | Tagged , , | Comments closed

    MacVector 13’s redesigned results windows

    Fullscreen 13 03 2014 13 15
    Before MacVector 13 all the results windows could sometimes get in the way. Some tools, such as the restriction enzyme tool, can generate up to six different results windows. Such as a list of all enzymes that cut a sequence, a list of all non-cutters, a restriction map, etc.. This is all useful information and you do not have to view every one but even in this day of large 27″ Retina monitors your display real estate is important and you do not want clutter!

    So now all of the old style results windows have been redesigned for easier visualization of multiple analysis.

    PBR322 nucl Results and Cloning Clipboard and pBR322 nucl Map and Microsoft Word

    The results from every analysis of a sequence are collected into a single tabbed window to reduce screen clutter. But you can tear off any tab into its own window when you want to compare results side-by-side. You can easily keep results windows open for many different sequences at the same time. The output from each analysis is only dismissed when you want.

    So you have the same amount of useful information that’s just a mouse click away.

    PBR322 Results

    Posted in Releases | Tagged | Comments closed

    Public beta of MacVector 13

    QuickTestAnimation2

    There’s a public beta of our imminent release MacVector 13 ready for downloading and testing on our website. You just need an active license for MacVector 12.7 to run this. Watch out for the new easier way of installation (no installer, just drag to your Applications folder!).

    As well as a redesign of the interface MacVector has lots of new features, especially for designing silent mutations in primers. Check out the new version of the QuickTest Primer tool.

    Check out the release notes for more details.

    All feedback is welcome. Please contact Support directly.

    Posted in Releases | Tagged , | Comments closed

    MacVector 13 is almost out…..!

    The next version of MacVector is almost ready to be released. MacVector 13 is more powerful and intuitive than ever. There’s not only some very useful new tools and features but MacVector 13 has had a redesign.

    A fresh new interface

    MacVector 13 looks great. The interface is now muted to suit the look and feel of Mavericks and many windows and dialogs have been rewritten and redesigned to be easier to use.

    PBR322 Map and pBR322 Editor 2

    Introducing restriction sites into primers with Quicktest Primer v2!

    Adding restriction sites to your PCR products is now easy in the usual MacVector way. The Quicktest Primer tool first introduced with MacVector 12.6 has been enhanced to display restriction enzyme sites around the primary binding site of the primer.

    Sites that are created or destroyed by mismatches in the primer or due to the addition of a tail are shown above the sequence. ‘One out’ sites can also be displayed below the sequence, color coded to indicate their potential effect on overlapping coding regions. The display is interactive so that when you “mouse over” a site, additional information is displayed. Clicking and holding on a “one out” site temporarily changes the primer and updates the entire dialog to reflect that change. Finally, double-clicking on a “one out” site makes a permanent change to the primer sequence.

    QuickTestAnimation2

    • Existing sites in the template sequence are shown below the sequence in black.
    • New sites that will be introduced by mismatches in the primer are shown above the sequence in black.
    • Existing sites that will be destroyed by mismatches in the primer are shown in grey.
    • Putative “one out” sites are labelled with an asterisk as already used by the main restriction enzyme analysis tools.
    • Putative “one out” sites that are translationally silent are shown in green.
    • Putative “one out” sites that will change the amino acid sequence of the coding region are shown in red

    NewImage

    As well as displaying these sites the new tool makes it very easy to introduce the nucleotide change needed for the new restriction site into your primer.

    • If you mouse over one of the putative sites then the mismatched residue is shown in lower case and the recognition sequence of the site is displayed as a tooltip, aligned to the primer (as in the animation above).
    • If you click on a one-out site where the mismatched residue lies within the primer and hold the button down, the primer sequence temporarily changes, replacing the mismatched residue and showing any amino acid and restriction site changes above the primer.
    • Finally if you double-click the site it will make this change permanent in your primer.

    Here’s some of the other new features in MacVector 13:

    De novo NGS assembly

    Velvet has been added to Assembler for enhanced de novo assembly of very short reads. Velvet is ideal for assembling Illumina sequencing reads of bacterial genomes on a mid range Mac. With paired read data it produces very good contigs. Now with Assembler you have many options for mapping and processing all your sequencing reads!

    Redesigned results windows

    All of the old style results windows have been rewritten and redesigned for easier visualization of an analysis and with a modern OS X appearance. All results for each sequence window are tabbed to stop window clutter. You can easily keep many results windows open for many different sequences at the same time.

    PBR322 Results

    Improved Applescript support

    Applescript functions now allow control of file opening and saving. For example you will be able to script MacVector to batch open and save many files in a new format.

    Testing Primers

    A new mode has been added to the Design Primer tool for quick testing of pairs of primers.

    Export trace file quality values:

    A new tab in the Chromatogram sequence window displays the quality values and areas under each trace curve for each sequence residue in a tab delimited format that can be copied into Excel for additional analysis.

    MacVector and Mavericks

    MacVector 13 is fully compatible with OS X Mavericks. As well as Mountain Lion, Lion and Snow Leopard.

    MacVector 13 is undergoing final testing before being released very soon.

    Posted in Uncategorized | Comments closed