Make more of your alignments with MacVector 17.5

Our latest release MacVector 17.5 gives you new tools to make the most of your alignments.

It displays shared domains in protein alignments to visualize the relationships between aligned proteins. It introduces Flye for de novo assembly of PacBio and Oxford Nanopore long reads and a slew of enhancements to the Contig and Align to Reference Editors.

As ever there are a slew of minor enhancements, bug fixes and changes to better support the latest releases of macOS.

Outlining Shared Domains in Aligned Sequences

Outline shared aligned domains Multiple sequence alignments now retain feature information and can use this to outline shared domains in the Picture output tab. You can set the colors of features in the individual sequence documents in the usual way and these are used for the outlines.
banner

There is a feature display mode in the Editor tab where you can see the extent and color of the features. When you switch to the Picture tab, you will see colored outlines around the shared domains;

Prions text

de novo Assembly of PacBio and Oxford Nanopore reads with Flye

Flye is an assembler algorithm tuned to assemble poor quality long reads such as those produced by PacBio and Oxford Nanopore sequencers. Because these reads tend to be very error prone, MacVector 17.5 also includes an optional polishing step using Racon. With typical bacterial genome assemblies it is fairly common to be able to assemble reads into a single full-length genome contig.

Contig and Align to Reference Editor Enhancements

There have been a number of enhancements to these editors, primarily to aid in visualizing edits and quality values and to “clean up” the visual appearance of alignments.

Residue Background Colored by Quality

There have been several changes to provide improved support for quality values of de novo contigs and reference assemblies.
A Shading toolbar button lets you turn on coloring based on quality and edited residues are visualized with a blue background. Edited residues are always given a phred quality value of 99 – these residues are given a blue background.

Base Calling with Phred

You can now directly run phred on Sanger sequencing trace files in the Align to Reference Editor by clicking on the Basecall toolbar item with the appropriate sequences selected;

Assembly qualityscorecolouring 2 2x 400

Editing Enhancements

There are some new context-sensitive menu items in the Align to Reference Editor tab

Delete Clipped Residues – deletes any greyed-out (“clipped” or “trimmed”) residues. While these are ignored by the consensus calculation, some users prefer to delete them for a cleaner looking alignment.

Close Gaps by Deleting Residues – you’ll often see gaps in the consensus where one or more reads has an additional erroneous inserted residue. This menu item removes the extra residues from the read, cleaning up the visual appearance of the alignment.

Nudge reads – Select the name of the sequence you want to nudge and use the left/right arrow keys to move it around. If you have problematic alignments where you need to physically insert residues or gaps, hold down the

MSADomainEditor 2x

Miscellaneous Enhancements

There have been a large number of minor enhancements. Some, such as reworking code behind the scenes to replace deprecated Apple functions and refactoring code for better stability and performance help ensure that MacVector will continue to work on upcoming releases of macOS and take advantage of improved hardware. There have also been improvements to Dark Mode support in many area and much better handling of the labels in crowded Map views.

How to upgrade to MacVector 17.5

If you have active maintenance and are running MacVector 15.5.4 or later then you will be notified about the new release. To install this version, you must have a maintenance contract that was active on 1st February, 2020. You must also be running MacVector 15.5.4 and OS X 10.9 Mavericks or later.

If you have an older version of MacVector then download the trial and request an upgrade quote.

Even if you have downloaded the trial in the past then downloading a new trial will give you a fresh 21 days to evaluate MacVector.

When a trial license expires it becomes MacVector Free. So if you decide against upgrading then you can just delete the trial license and easily go back to your current version. It’s risk free as MacVector files are backwards compatible.

Posted in Releases | Tagged , , , | Leave a comment

Importing BAM files into an Assembly Project

You can import BAM files, containing reads mapped against a reference sequence, into a MacVector Assembly Project. As well as the BAM file(s) you will also need the original reference sequence the reads were mapped against. FASTA is fine, but an annotated reference is better for visualisation.

The tool needed is called ADD CONTIG. This is one of the toolbar buttons in an Assembly Project:

First create a new assembly project.

  • FILE > NEW > ASSEMBLY PROJECT

  • click ADD REF to add the reference sequence.

  • Use ADD CONTIG to import your BAM/SAM file.

Then you need to associate the BAM file(s) with the reference:

– select the reference and an imported contig(BAM file).

  • Right click on and select UNITE REFERENCE WITH CONSENSUS SEQUENCE

You can optionally also generate a report on any variants (either at the previous step or a later stage).

  • Right Click and choose GENERATE VCF

If you import multiple BAM files against the same reference sequence you can also graphically compare these datasets with the Coverage Tab (third tab along in the Assembly Project window).

CoverageTabx2

Incidentally if you need to access the BAM files from within MacVector’s Assembly Projects then you can right click on an Assembly Project and view the contents.

Posted in Tips | Tagged , , | Leave a comment

Calculating the optimal PCR annealing temperature

MacVector has several tools to help with primer design and testing. The Analyze | Primer Design/Test (Pairs) function uses the popular Primer3 algorithm to find suitable pairs of primers to amplify specified segments of DNA. You can also enter pairs of pre-designed primers and test their suitability for use in PCR. In both cases, the Tm of each primer is reported, along with the optimal annealing temperature (Ta).

Unknown

The optimal annealing temperature (degrees C) is calculated as follows (from W. Rychlik, W.J. Spencer, and R. E. Rhoads, Nucl.Acids.Res. 18:6409–6412(1990));

(Lowest Primer Tm x 0.3) + (Product Tm x 0.7) - 14.9

This means that you can get an optimal annealing temperature for a PCR experiment that is significantly different from the optimal annealing temperature for an individual primer (e.g. in a sequencing experiment) because of the large influence of the product in the calculation.

Posted in Tips | Tagged , , | Leave a comment

How to copy a specific short amino acid translation of a sequence

There can be times when you are messing about with open reading frames, inserting residues to change frames to try to get the perfect CDS fusion. The MacVector single sequence Editor will show those (click and hold on the “Display” toolbar button) but if you select and copy, only the DNA sequence (with any overlapping features) will be copied to the clipboard. If you need to copy a specific translation of a sequence, here’s how to do it: Select the region you are interested in, then invoke Analyze | Translation… Select the “Display text view with translation” option, set the Number of Frames to 3 or 6 and click OK.

Unknown

From the resulting result window, you can select the text of the amino acid sequence you are interested in, copy, and then create a new sequence document (File | New From Clipboard) or paste into an external application.

Posted in Techniques, Tips | Tagged , | Comments closed

Optimizing Reverse Translations

The Analyze | Reverse Translation menu option lets you create a DNA sequence from a Protein sequence, reverse translated using a specific Genetic Code (by default, the Universal Genetic Code). The default option creates a DNA sequence with N’s and other ambiguities reflecting the degeneracy of the genetic code. This is great if you want to identify less ambiguous sections to design probes or primers and in fact MacVector will even display a list of probes with the least ambiguities.

However, MacVector also offers an optimization function if you are interested in designing a gene with codon usage optimized for expression in a particular organism.

Unknown

To use this function, you do need to supply a codon usage table – a number of common tables are shipped with MacVector:

/Applications/MacVector/Codon Bias Tables/

There are four different algorithms that MacVector provides for optimizing codon usage.

Most Frequently Used Codon – this simply uses the most commonly occurring codon for each amino acid. So if, e.g. the most common Leu codon is CTC, all Leu codons will be CTC. Perhaps this is only useful if you want to design a “best guess” primer and are willing to accept a certain failure rate. If you used this to optimize expression, the host would likely run out of that tRNA and you wouldn’t see optimal expression.

Frequency Distribution – this selects a random codon for each amino acid, biased towards the most commonly used codon that encodes each amino acid. Each time you run the algorithm, a different, random set of codons will be selected. If you were to generate a new DNA over and over again, eventually this would create a collection of sequences where the average codon usage would exactly match the average for the .bias organism. But any individual reverse translation may randomly be quite different.

Probability Distribution – this is probably the most powerful setting if you are interested in expression. Similar to the Frequency Distribution, this chooses a random codon, biased towards the most frequently used codons for each amino acid. However, this version tries to ensure that the final DNA sequence has a codon usage profile as closely matching as possible to the codon usage of the selected .bias file. Again, each time you invoke the algorithm, it will produce a different sequence. But as the overall codon usage in the DNA sequence is guaranteed to be as close as possible to the codon usage in the .bias organism this should, in theory, give you the best chance of high expression. Again, you will get a different sequence each time you invoke this.

Uniform Distribution – this ignores the usage of each codon and randomly assigns an appropriate codon for each amino acid. Its similar to the default algorithm that uses ambiguities to create an “absolute” coding DNA, but here it just chooses a random codon with no regard for codon usage probability. Again, you will get a different sequence each time you invoke this.

Posted in Tips | Tagged , | Comments closed

Use Database | Auto-Annotate Sequence to annotate prokaryotic genomes

The continuing advances in Next Generation Sequencing have made it relatively low cost to sequence prokaryotic genomes. Many scientists are embarking on large projects to sequence multiple related genomes. These might be clinical isolates of the same species exhibiting different pathogenetic properties, environmental isolates from different sites, or a study over time of the changes in microbial genomes from specific locations. Once you have your sequence, the definitive source of annotation is the NCBI Prokaryotic Annotation Pipeline. However, to have that run on your sequence, you must submit the sequence to the NCBI. This is not always ideal – perhaps you are still working on resolving repeat sequences for your genome, you don’t want to wait for it to be published or you don’t want to go through the hassle of a formal submission for many variant sequences. MacVector to the rescue!

First, you need to download existing similar genome sequences – open the Database | Online Search for Keywords (Entrez) browser and search for [name of your organism] “+” “complete genome”. Assuming you are working with a reasonably common organism, you might find a few (or a lot of) hits. Select those you are interested in and click on the To Disk button, selecting a suitable target folder to save the downloaded genomes into.

Now take your unannotated genome sequence and invoke Database | Auto-Annotate Sequence. Select the folder containing your genomes as the target and press OK. On a 2.7 GHz laptop, scanning a 1.8 Mbp Campylobacter jejuni genome against 25 related C. jejuni genomes (average 5,000 features per genome) takes around 10 minutes with the default parameters, resulting in a fully annotated genome.

Unknown

MacVector 17’s new “Genome Comparison” tool lets you directly compare the features of two related genomes based on DNA or (for CDS features) protein sequences and reports all of the identities, similarities, differences and missing features. The tool confirmed that no features were missing compared to the NCBI annotated genome and there were just minor differences with a few CDS features where there were mutations creating or removing stop codons.

Posted in Techniques, Tips | Tagged , , | Comments closed

Which DNA Matrix to use in Align To Folder?

The Database | Align To Folder function is a very useful tool to find and retrieve similar sequences from folders on your computer or on other local machines. Think of it as your own personal BLAST service. It can not only search individual sequences in any format MacVector can read (MacVector, Genbank, EMBL, ABI etc) but will also process collections of sequences in fasta or fastq format.

One important factor to consider in these searches is the DNA Scoring Matrix (.nmat) file to use. There are several included in the /Applications/MacVector/Scoring Matrices/ folder. The default file is DNA database matrix.nmat. This is ideal for identifying sequences that are not particularly closely related, such as the same gene from distant organisms or sequences matching a highly degenerate input sequence such as a reverse translation of a protein sequence.

However, one common use of Align To Folder is to identify and retrieve NGS reads from large collections in fasta or fastq formatted files. It is particularly useful for finding reads to help resolve repeat regions or close gaps between contigs. When running these types of alignments, it is preferable to use a different matrix that is more tuned to finding reads with a greater identity stringency. The best scoring matrix for this is DNA identity with penalties matrix.nmat. Here’s some examples, using a short query sequence, where the searches differ only in the scoring matrix.

Unknown

Low-scoring alignments using DNA database matrix.nmat

Unknown

Low-scoring alignments using DNA identity with penalties matrix.nmat

It can clearly be seen that the second example has true matching alignments that represent sections of reads that extend beyond the query fragment. All of the reads can safely be retrieved and used in additional assembly analyses to extend the query or help resolve repeats. However, the DNA database matrix example contains matches that have extensive regions with very poor similarity. These clearly do not represent reads that could be used to extend the sequence of the query sequence.

For additional information about possible uses of Align to Folder, check out this blog post.

Posted in Tips | Tagged | Comments closed

Gap closing and genome finishing tools in Align to Reference and Assembler.

Automated algorithms can only take you so far with genome assembly. The final steps involved in finishing a genome always need manual intervention. MacVector’s various assembly editors have many tools for helping finish genome sequencing projects. For example, closing gaps, extending reference sequences and even automatically circularizing contigs. If you select reads, then right click (or use CTRL-left click) you will see a context sensitive menu with the following tools:

Unknown

  • Export Consensus with/without Gaps
  • Align Selected Reads
  • Delete Selected Reads
  • Reset (unalign) Selected Reads
  • Export Selected Reads as FASTA/FASTQ
  • Select Matching Pairs – if you have aligned a set of paired-end reads, you can select individual read(s) and use this function to select the corresponding mate(s). This is particularly useful if you want to find pairs that will extend a contig and export them for further analysis/assembly.
  • Extend Reference with Selected Read – This is active if you have selected a single read that hangs over either end of a Reference sequence. This will extend the Reference in the appropriate direction using the sequence of the read.
  • Circularize Consensus – This is enabled if it detects direct repeats at the ends of a contig, and even tells you the length of the repeat it found. It will circularize the consensus and create a new circular sequence window with the repeat appropriately deleted.
  • Select Overlapping Reads Containing Selected Sequence – This is enabled if you select a short region in a read. All overlapping reads that contain that selected sequence will be selected. For paired reads you can then use Select Matching Pairs to select their mate, then Export Selected Reads as FASTQ/FASTA to export them to a file.

Not all tools are applicable or available in all editors. Plus some tools are only enabled when using paired end reads. Here’s what’s available in each editor.

Align to Reference editor

  • Export Consensus with/without Gaps
  • Align Selected Reads
  • Delete Selected Reads
  • Reset (unalign) Selected Reads
  • Export Selected Reads as FASTA/FASTQ
  • Select Matching Pairs
  • Extend Reference with Selected Read.
  • Select Overlapping Reads Containing Selected Sequence.

Reference Contig editor

  • Export Consensus with/without Gaps
  • Export Selected Reads as FASTA/FASTQ
  • Select Matching Pairs
  • Select Overlapping Reads Containing Selected Sequence.

De novo contig editor

  • Export Consensus with/without Gaps
  • Export Selected Reads as FASTA/FASTQ
  • Select Matching Pairs
  • Circularize Consensus

Read more about the various assembly tools in MacVector.

Posted in Tips | Tagged , | Comments closed

MacVector 17 Workshop at The Crick

Room: HR Training Room 01–2162. Floor: 1 
Date: 15 October 2019  From: 9:30 to 11:30

Now rescheduled – Date to be advised

Chris Lindley of MacVector, Inc. will be giving a training workshop for both novice and advanced users of MacVector at The Crick, reviewing both basic and advanced functions. In particular new tools introduced over the last few versions.

The format is very informal and participants are very much encouraged to direct the workshop towards areas of the most interest.

Laptops will be provided for users to work through examples and tutorials as they are demonstrated. Workbooks will also be provided to allow attendees to work through during the workshop and afterwards.

The intention is that all attendees will learn at least one new and useful tool or tip. The workshop is two hours, but Chris will be available in the room for further discussion until 13:00.

Please register for the workshop by emailing Chris (drop-ins on the day will be very welcome, but will not be guaranteed access to a laptop or a workbook).

See what MacVector can do for your lab.

UnknownGibsonCloning

Posted in Meetings | Tagged , | Comments closed

Migrating your Vector NTI sequence database to MacVector.

ThermoFisher (owners of Invitrogen) have announced that Vector NTI Express is nearing the end of its life and Vector NTI Advanced was terminated quite some time ago. If you are looking for an easy to use sequence analysis application, then look for a reliable and trusted application. MacVector is easy to use, has a comprehensive set of tools and is definitely not going away! MacVector has been the tried and trusted sequence analysis application on the Mac for over 20 years. There are many thousands of happy molecular biologists using MacVector in labs all over the world. Don’t take our word for it, read what our users have to say. What’s more opening Vector NTI files is straightforward.

If you are using Vector NTI Advance 11 or Vector NTI Express

  • Download the Mac or Windows Vector NTI Data Export Tool from ThermoFisher’s website.
  • Run this and migrate your entire sequence database to Genbank format.
  • To open in MacVector simply double click the Genbank file to open it directly within MacVector.
  • When you make changes then save and MacVector will automatically migrate the data into MacVector’s own NUCL format.

  • You can optionally batch process all your files into MacVector format using an Applescript that is supplied within the MacVector application folder.
  • If you are using Vector NTI Advance 10 or earlier

  • Open MacVector
  • Select the Database->Vector NTI Import… menu item
  • Click on the Choose button to locate the Vector NTI database folder on your Mac.
  • MacVector will display a list of all of the sequences available in the database. There is a popup menu to toggle between Nucleic Acid and Protein sequences. The list can be sorted to more easily identify sequence(s) of interest.

  • Select one or more sequences and then click either the To Desktop button to open those sequences in MacVector, or To Disk to save the sequences in MacVector format to a folder on your hard drive.
  • Sequence annotation

    MacVector will read all of the standard features and annotations associated with each sequence. Graphical appearance information is discarded and the highly customizable MacVector graphical features are used instead. If you prefer your sequences to look different then it is easy to curate all graphical features using MacVector’s Auto Annotation tool.

    MacVector ignores any restriction enzyme sites annotated in the database sequence and replaces them with the default dynamic set of sites used by MacVector’s RE Picker. However, all sequence features and related information are preserved. MacVector follows the Genbank format for the features table and is always kept up to date with the latest Genbank release, so where possible MacVector will also migrate any old and deprecated features information contained in the VectorNTI file into the current up to date features nomenclature.

    You will always get a good discount for upgrading to MacVector from Vector NTI.

    Your sequences remain yours

    The MacVector team strongly believe that you should never be locked out of your data. Your data is yours! Even if you have no license of MacVector then you can download MacVector Free and export your data. All versions of MacVector, including MacVector Free, have the tools to migrate sequences in Genbank format.

    Posted in General, Tips | Tagged | Comments closed