MacVector 18.7: Generating custom Codon Usage Tables (CUT) from your own sequences.

Our latest release, MacVector 18.7, has a new Codon Usage Table viewer. You can use this to generate your own codon usage table (CUT or .bias) files. You can use codon usage tables to optimize codon usage of CDS features for enhanced expression in a different organism. They can also be used in the Nucleic Acid Toolbox to predict protein coding ORFs.

You can import data directly from codon usage websites or generate your own tables by translating CDS features in one or more sequences.

The Codon Usage Table viewer displays the data in a standard text format with one row of data per codon, identical to codon usage output windows used in other MacVector translation functions.

A codon usage table generated from a single E.coli genome.

Importing codon usage tables

  • Copy data from a source that matches the format above or to the popular GCG format available on codon usage websites such as CUTG.
  • Select New From Clipboard will create a new CUT file in the Codon Usage Table viewer.
  • Save the table as a .bias file.

Generating custom Codon Usage Tables from a single sequence.

  • Open a sequence that has annotated CDS features.
  • Analyze | Translate All CDS Features…
  • Toggle Create Codon Usage Table (.bias) to on and enter a suitable name.
  • Click OK.

Generating Codon Usage Tables from multiple sequences.

The CUT viewer has a toolbar button for the Translate All CDS Features in Folder function. You can invoke this multiple times and each new set of results will be added to the existing codon usage data. You can use this to slowly build up the codon usage information from a large sequence data set in multiple folder locations on your computer.

  • Start by one of the following two methods:
    • File | New | Codon Usage Table (.bias) and click Transl.Folder
    • run Database | Translate All CDS Features in Folder
  • Choose a folder of sequences you want to generate a CUT from.
  • The CUT viewer will be blocked until the job has finished.
  • Click Transl.Folder to repeat this procedure with multiple folders. At each stage the CUT will be updated.

MacVector 18.7 was released in July 2024 and introduces a History tab to track the construction of your expression vectors and clones. It also includes direct support for Codon Usage Tables, creating custom Codon Usage Tables and batch translation of CDS features. Additionally, MacVector 18.7 enhances Assembler’s toolkit by adding a new reference assembler for mapping PacBio and ONT sequencing reads to your reference sequences.

Posted in Releases, Tips | Tagged , | Leave a comment

MacVector 18.7: Long-Read Reference Alignments using minimap2

Our latest release, MacVector 18.7, sees the addition of Minimap2 to Assembler’s sequencing toolkit. So if you have the Assembler module, you can now map noisy long-read data from Pacific Biosciences or Oxford Nanopore to one or more genomes. Minimap2 is a reference assembler similar to Bowtie2. But whereas Bowtie2 excels at mapping “short reads” (500nt or less) to a reference, Minimap2 can handle very long reads – i.e. Oxford Nanopore or Pacific BioSciences reads. Additionally Minimap2 is significantly faster than Bowtie, even with short reads.

Two alignments of the same sequencing set aligned by bowtie2 and minimap2
Minimap2 (top) versus Bowtie2 (bottom)

Assembling reads against a reference with Minimap2

  • Create a new Assembly Project – File | New | Assembly Project
  • Click on the Add Ref toolbar button to add one or more reference sequences.
  • Then click on the Add Reads toolbar button and select one or more NGS data files.
  • While not essential, it is usually also a good idea to double-click on the Status column of each read to let MacVector know exactly what type of data you are analyzing.
  • Click on the Assemble toolbar button and select minimap2 from the menu.
  • The simplest option is to choose one of the presets in the resulting dialog that will tune the assembly parameters for your specific type of data.
  • Now run the alignment by clicking OK

MacVector 18.7 was released in July 2024 and introduces a History tab to track the construction of your expression vectors and clones. It also includes direct support for Codon Usage Tables, creating custom Codon Usage Tables and batch translation of CDS features. Additionally, MacVector 18.7 enhances Assembler’s toolkit by adding a new reference assembler for mapping PacBio and ONT sequencing reads to your reference sequences.

Posted in Releases | Tagged , | Leave a comment

MacVector 18.7 has just been released.

MacVector 18.7 has just been released. If you are eligible for this release you will be prompted to upgrade, otherwise go to MACVECTOR | CHECK FOR UPDATES… and follow the prompts to be automatically upgraded. If your license is not eligible then why not upgrade?

Overview

MacVector 18.7 introduces a History tab to track the construction of your expression vectors and clones. It also includes new features for codon optimization, such as direct support for Codon Usage Tables (CUT/.bias) and the ability to create custom Codon Usage Tables files from your own sequences. Additionally, MacVector 18.7 enhances Assembler’s toolkit by adding a new reference assembler for mapping PacBio and ONT sequencing reads to your reference sequences.

MacVector 18.7 also makes it easier to store your own Restriction Enzyme files when multiple people use the same Mac (ideal for that shared lab Mac!).

..and as usual there are a lot of enhancements to existing features such as protein pI calculations.

MacVector 18.7 was developed on macOS Sonoma and is supported on macOS High Sierra to macOS Sonoma. It has also been tested on early development releases of macOS Sequoia and will be fully supported when Apple release it. MacVector 18.7 is a Universal Binary that will run on Apple Silicon Macs and Intel Macs.

Long-Read Reference Alignments using minimap2

The addition of Minimap2 allows Assembler to map noisy long-read data from Pacific Biosciences or Oxford Nanopore to a reference sequence(s). Minimap2 is similar to Bowtie2 but optimized for handling long reads instead of short reads under 500 nucleotides. Additionally, Minimap2 also excels at assembling short read data and may even out-perform Bowtie2 in certain situations.

Reference alignments using Minimap2 (upper panel) and Bowtie2 (lower panel)

Translate All CDS Features

You can now easily translate all CDS features in a sequence with the new menu option Analyze | Translate All CDS Features. This is useful for translating proteins in bacterial genomes or eukaryotic sequences. You can choose to display all translated proteins in fasta format or create a codon usage table from the results.

Translate All CDS Features in Folder

There is a new Database | Translate All CDS Features in Folder menu option that is similar to Translate All CDS Features except that it takes a source folder and then loads every sequence file in the folder and translates each CDS feature that it finds, accumulating the results and offering the same result options as Translate All CDS Features. A Codon Usage Table viewer window is always created and displayed when you select this option.

Codon Usage Table Viewer

MacVector now includes a viewer for codon usage tables (CUT/.bias) files. This displays the data in a standard text format with one row of data per codon, identical to codon usage output windows used in other MacVector translation functions.

A Codon Usage Table generated from a folder of E.coli genomes.

You can import Codon Usage Tables from CUT files available on codon usage websites such as CUTG. You can also generate custom CUT files by using the new Translate All CDS Features from a single sequence or from multiple sequences using the new Translate All CDS Features in Folder tool. You can invoke this multiple times and slowly build up the codon usage information from a large sequence data set in multiple folders.

History Tab

There is a new tab in nucleic acid single sequence editors called History. This tab lists several MacVector-specific features relating to the editing history of the sequences such as ‘frag’ and ‘edit’. It also includes Source annotations which summarize the length of the sequence, scientific name of the source organism, and Taxon ID number. Apart from Source annotations these features now contain additional information such as the date of the operation, the name of the user who performed it and additional sequence information. In the future, all MacVector sequence modifications will write out this information allowing the full history of any construct to be determined and even allowing a simple reversion of the construct to how it existed on a specific date

A ligation and a Codon Usage Optimization operation in MacVector 18.7's new History Tab
A ligation and a Codon Usage Optimization operation in MacVector 18.7’s new History Tab

Change in Default Restriction Enzyme File Location

MacVector now saves restriction enzyme files in a new location (~/Library/Application Support/MacVector/Restriction Enzymes/) within a user’s home folder. This is always writeable, even without Administrator access. If you have already saved files in a different location, they will not be affected. MacVector will automatically populate the new directory with the latest restriction enzymes and update any user-edited files from the old location.

Miscellaneous Enhancements and Bug Fixes

  • The Align to Reference SNPs tab now also displays the percentage of each residue present in each heterozygote SNP.
  • The Align to Reference consensus calling threshold default has been raised to 70% so that heterozygous SNPs are more consistently reported on the consensus line.
  • A crash when repeating heterozygote analysis has been fixed.
  • Copied fasta text data is now more reproducibly parsed as single sequence data by New From Clipboard.
  • The protein pI calculations have been modified to also report the pI ignoring Trp and Cys residues. This brings the results more in agreement with the popular ExPASY website.
  • A bug where the “blocking” for protein sequences was taking the DNA values has been fixed.
  • Exporting sequence data in the Sequin .tbl format now correctly writes out the correct sequence for the minus strand of segmented features.
Posted in Releases | Tagged , , | Leave a comment

MacVectorTip: Quality scoring of manual edits to your contigs.

Quality scoring of Assemblies and Align to Reference alignments can be visualized directly on the sequence. Residues can be shaded according to their quality scores. These can be displayed anywhere quality values are available, including de novo and reference assemblies in Assembler and Align to Reference alignments.

The intensity of the shading of residues indicates the phred-based quality value of each residue.

  • For individual reads, this ranges from 0 (deep red) through 20 (white) to 40 or above (deep green). The consensus scale is doubled and ranges from 0 (deep red) through 40 (white) to 80 or above (deep green).
  • Gaps are always shown with a white background. You can “mouse-over” a residue to view the numerical information in a tooltip.
  • Edited residues are always given a phred quality value of 99 and these residues are given a blue background.

Most assembly algorithms are quality score aware and better quality reads will take priority over lower quality sequences. However, edited residues will always override all other sequences/reads even if they are high quality. This also means that a single read with an edited sequence will take priority over many other reads with a different sequence and the edited residues in that single edited read will define the consensus. This is done as we assume that the user knows best!

Here five edited residues override all other sequences and are shown in the consensus.
Posted in Tips | Tagged , , | Leave a comment

How to design a digest to screen minipreps after a ligation.

MacVector’s Agarose Gel tool can be used to quickly design a restriction digest to screen minipreps following a ligation. 

(View full size on website…)

Replicate your ligation in MacVector.

  • Select the two sites, for subcloning your targeted gene, and click DIGEST.
  • Drag the digested fragment from the Cloning Clipboard to your vector
  • click LIGATE.

Create your agarose gel with the correct insert and a vector only lane.

  • Go to FILE > NEW > AGAROSE GEL
  • Open your sequence and switch to the MAP tab.
  • Drag a restriction site (or sites) that digest within the fragment and also in the vector to the Agarose Gel window.
  • Open up your original cloning vector.
  • Drag the same site(s) to the Agarose Gel window.

Undo the ligation, and repeat with the wrong orientation.

  • Switch back to the ligated sequence, and use UNDO to remove the ligated fragment.
  • Switch back to the Cloning Clipboard.
  • Drag the same digested fragment from the Cloning Clipboard to your vector.
  • Hold down [OPTION] and click LIGATE.
  • Switch to the MAP tab.
  • Drag the same site (or sites) that digest the fragment and the vector to the Agarose Gel window.

Now you will end up with an Agarose Gel with three lanes: A lane with empty vector, a lane with the insert in the correct orientation, and a lane with the insert in the wrong orientation. Now it’s easy to screen your minipreps, as you know the gel bands of a correct miniprep before you’ve even loaded it on the gel!

Posted in Tips | Tagged , , | Leave a comment

MacVectorTip: Grayed out graphics indicate Missing Features

If the graphics in a nucleic acid sequence Map tab appear somewhat “washed out” it is because the graphic items represent common features that MacVector has found that are not annotated on the sequence. For example, here are the Map and Feature tabs of an unannotated cloning vector;

You can see a number of features on the Map tab, but the Features tab is completely empty. The graphics indicate common features that MacVector has identified that have not been annotated on the sequence. If you select one of the features in the Map tab and right-click (or [ctrl]-click) there is an option in the resulting context sensitive menu to Add CDS Feature. When that is selected, the feature takes on a bold appearance and a new annotation appears in the Features tab.

If you wish, you can select multiple missing features and then add them all with a right-click. Or you can select the Results | Missing Features tree view item in the floating Graphics Palette to select all missing features and then add with a right-click.

Note that the automatic display of missing features is controlled by the MacVector | Settings | Scan DNA tab. From there you can control how they are identified or even point the algorithm to your own folder of annotated sequences to be a source for the missing features.

Posted in Techniques, Tips | Tagged , | Leave a comment

MacVectorTip: “Nudge” Reads for Better Reference Alignments

The MacVector alignment algorithms are usually pretty good at finding the optimum alignments of reads against a reference sequence. But, very occasionally, they may get confused by repeats or other anomalies in the sequences. Or perhaps you have made after-the-alignment edits: for example, in the Align to Reference Editor, you can insert residues by holding down the option key while typing a residue rather than the normal overwriting editor. For example holding down option and pressing delete will delete a residue rather than replace it with a gap. In such cases, you may want to “nudge” the read left or right to maintain a better all around alignment without resorting to repeating the alignment algorithm. You can do this by selecting the name of the read you want to “nudge” then pressing the left or right arrow keys to move the entire sequence relative to the reference.

Here is a misaligned read before pressing the [right] arrow.
And after pressing the [right] arrow
Posted in Algorithms, Tips | Tagged | Leave a comment

MacVectorTip: Trimming by Quality in sequence assemblies

Many of our users may be familiar with the ability of Sequencher to semi-automatically trim poor quality sequences from the ends of Sanger ABI reads. Although it is generally not necessary to do this in MacVector because most of the algorithms can automatically handle poor quality data, there are times when it can be beneficial. MacVector has a Quality Trimming function that removes residues from the ends of Sanger reads that fall below a configurable quality threshold. You can invoke this in either the Align to Reference or Assembly Project windows by clicking on a new Qual Trim toolbar button.

This opens a setup dialog letting you determine how the reads should be trimmed.

The trimmed residues are normally shown greyed out.

But can be completely hidden by clicking on the Trimmed toggle toolbar button.

Posted in Tips | Tagged , , | Comments closed

MacVectorTip: Use self comparison matrix analysis to identify repeats and inversions

The Dot-Plot analysis (Pustell DNA Matrix) function in MacVector is an extremely powerful way of quickly getting an overview of the similarities between a pair of sequences. However, it can also be used to identify repeats and inversions in a single DNA sequence simply by comparing a sequence to itself. For example here is the Alcohol Dehydrogenase gene cluster from Drosophila funebris.

There is a direct tandem duplication of one set of genes which can clearly be seen by the presence of additional lines that are not on the main “identity” diagonal.
You can use this to identify inverted repeats as well. The display is interactive so that you can zoom in to any part of the plot by a simple mouse drag to go from this;

To this:

The image above shows the inverted terminal repeat from a Bovine herpesvirus with the inverted nature of the repeat indicated by the blue colored lines that go from bottom left to upper right.
More complicated structures can often be seen.

In this example there is a tandem direct duplication where each repeat itself consist of 7 direct overlapping repeats.
You can also use Dot Plots as sanity checks when running de novo sequence assemblies. Here is an assembly of what should have been a 6.5kb circular plasmid that the assembly algorithm assembled into a 28kb linear sequence consisting of 4 direct copies of the plasmid. This is not uncommon with very noisy long read NGS data where algorithms might assume the high error rate is actually a series of SNPs;

You can also view the textual alignments in the Aligned Sequence tab. That data also updates when you zoom in to a specific region.
Hint: If you try this yourself and get a lot of background “noise”, try increasing the Min. % Score parameter from the default 60% to 80% or higher.

Posted in Tips | Tagged , | Comments closed

MacVectorTip: Restriction enzyme sites and tooltips

Quickly viewing the recognition sequence and cut site of a restriction site is very easy in the Map tab.

By default MacVector’s Scan DNA For… tool will automatically display restriction enzyme recognition sites in the Map tab. If you hover your mouse over a restriction site, a tooltip will show you the restriction enzyme recognition site, the location of the cut site, and number of times that enzyme cuts your sequence.

– Make sure Preferences | Scan DNA | Restriction Sites is turned on:

You can also see the full sequence and cut site when zoomed to sequence level in the Map tab.

– In the Graphics Palette click the Zoom to Residue button.

Unique sites are shown in red, whereas enzymes that cut at two or more sites are shown in blue.
Posted in Tips | Tagged , | Comments closed