MacVectorTip: visualizing shared domains in a protein alignment

MacVector has a domain-outlining facility for multiple sequence alignments, letting you easily visualize the relationships between features in aligned protein sequences.
MacVector’s new multiple alignment file format retains the features/annotations from the sequences that are used to create the alignment. The colors of features from the individual sequence documents are used to outline the domains in the alignment. You can also create new domains and dynamically show/hide features in alignments.

Note that alignments created using versions of MacVector before 17.5 will not have this information and will need to be recreated.

Displaying shared domains:

  • Ensure your protein sequences are annotated -Tip: you can use DATABASE | INTERPROSCAN to quickly scan and annotate domains to your proteins.
  • Ensure the domains/features you are interested in are visible and set the Fill color to the color you would like to see in the alignment.
  • You can also control the visibility of domains/features using the floating feature palette seen in the EDITOR tab.
  • Use FILE | OPEN and select multiple protein sequences.
  • Click OPTIONS (bottom left hand corner) and choose OPEN MULTIPLE SEQUENCE FILE – AS MULTIPLE ALIGNMENT
  • Click OPEN
  • Now run the alignment by clicking ALIGN
  • In the EDITOR tab using the toolbar button turn on the feature display MODE to SHOW FEATURES

In the Editor tab a new line will appear above each sequence displaying the extent and color of visible features.

Unknown

When you switch to the Picture tab, you will see colored outlines around the shared domains.

Unknown

Posted in Tips | Tagged , , | Comments closed

MacVectorTip: correctly flagging PacBio and Oxford Nanopore datasets for assembly by Flye

MacVector 17.5 introduced Flye for assembly of PacBio and Oxford Nanopore reads.

Flye joins Phrap, Velvet and SPAdes for de novo sequence assembly using along with Bowtie2 and Align To Reference for reference assembly.

Flye is an assembler algorithm tuned to assemble low quality long reads such as those produced by the new generation of single molecule sequencers. With typical bacterial genome assemblies it is fairly common to be able to assemble reads into a single full-length genome contig.

Because these longer reads to be very error prone, MacVector also includes an optional polishing step using Racon. Polishing is the technique of correcting sequencing errors by aligning reads against contigs produced by the first run of the assembler. Multiple rounds of polishing will keep increasing the accuracy of the resulting consensus.

It is important to tell MacVector what type of reads you are assembling before running Flye. Flye will be disabled unless your read files include at least one PacBio or Nanopore dataset.

Unknown

This is easily done by double-clicking on the Status item after importing the reads via the Add Reads toolbar button.

Posted in Tips | Tagged , , | Comments closed

MacVectorTip: quality score visualization in sequence assemblies.

Quality scoring of Assemblies and Align to Reference alignments can be visualized directly on the sequence. Residues can be shaded according to their quality scores. These can be displayed anywhere quality values are available, including de novo and reference assemblies in Assembler and Align to Reference alignments.

A Shading toolbar button lets you turn on coloring based on the quality value assigned to each residue.

Unknown

The intensity of the colors indicates the phred-based quality value of each residue.

  • For individual reads, this ranges from 0 (deep red) through 20 (white) to 40 or above (deep green). The consensus scale is doubled and ranges from 0 (deep red) through 40 (white) to 80 or above (deep green).
  • Gaps are always shown with a white background. As with earlier versions of MacVector, you can “mouse-over” a residue to view the numerical information in a tooltip.
  • Edited residues are always given a phred quality value of 99 and these residues are given a blue background.
  • Unknown

    (Read More..)

    Posted in Tips | Tagged , | Comments closed

    MacVectorTip: Assembling Fungal Genomes using SPAdes

    MacVector with Assembler can assemble bacterial genomes in just minutes on quite modest hardware. Currently MacVector has four de novo assembly tools (SPAdes, Velvet, Flye and Phrap).

    But what of larger genomes? It is currently impractical to run de novo assemblies of Human genomes on a low cost Mac, though RNA-Seq analyses against the human transcriptome are possible. However, here we performed some complete genome de novo assembly tests on a sample NGS experiment from Aspergillus fischeri (NCBI SRA SRR10092049). This data set consists of 2x 18.2 million paired -end 150nt Illumina HiSeq reads, representing 5.5Gbp of sequence data. The assembly was run on a 2.9 GHz 6-core i9 MacBook Pro with 32 GB RAM, using SPAdes with 11 threads and slightly modified K-MER values.

    As seen below, the complete assembly, including the optional Bowtie reference assembly step, took just over 28 hours in total. Maximum RAM usage during assembly was about 24 GB. 1,314 contigs were generated with a total combined length of 31.38 Mbp, right in line with the reported genome size with the longest contig 1,132,146 bp in length and an N50 Score of 350,491.

    The information presented here gives you some idea of what is achievable on a modest machine using MacVector with Assembler. Assembly would be expected to be faster with a more restricted set of K-MER values, or on higher end machines such as the Mac Pro, Apple Studio or with much more RAM.

    Unknown

    Posted in Tips | Tagged , , | Comments closed

    MacVectorTip: Viewing external database entries for features in a sequence.

    Sequences, or regions of sequences, can be linked to external databases. For example an entire sequence entry or for when annotation tools are used to annotate proteins with domain or motif information (for example InterProScan). Very useful for when you want to view more detailed or updated information. Within the Genbank specification, which MacVector extensively uses, an external database entry can be stored in a /DB_XREF qualifier. This allows the database entry to be easily viewed. The Genbank (and Genpept) specification allow for many different databases to be accessed using this qualifier.

    Unknown

    In MacVector the original database entry can easily be viewed in a web browser by selecting, then right clicking (or holding down CTRL and left clicking) the feature entry in the Features tab and viewing the available DB_XREF entries. Selecting one will load it in your web browser.

    Unknown

    Posted in Tips | Tagged , , | Comments closed

    MacVectorTip: How to Customize Window Button Toolbars

    Like many Mac applications, MacVector takes full advantage of the built-in ability to add, delete and rearrange the action buttons on window toolbars. To make these changes, right-click (or [ctrl]-click) in the gray space on any toolbar and a context-sensitive menu will appear. Choose Customize Toolbar and a dialog will be displayed with all of the buttons available for that tab, like this one for the Editor tab of the DNA Sequence Window.

    Unknown

    Note that modifying the toolbar is a global change that affects all windows containing that tab. It is also specific to different document types, so you can have different sets of buttons on the Editor toolbar of the DNA, Protein, Trace/Chromatogram and MSA document windows for example. Once modified, the changes remain permanently until you either customize them again, or reset your MacVector Preferences.

    Posted in Tips | Tagged , | Comments closed

    MacVectorTip: Understanding Color Groups

    You can align hundreds, or even thousands of protein sequences within MacVector using three different alignment algorithms ClustalW, MUSCLE or T-Coffee. Once aligned, you may be familiar with the colorful display in the Editor tab.

    ColorGroups1

    But there’s more to this than pretty colors. The default Color Group in MacVector is one called “Chemical Type”. In this, glycine, leucine, isoleucine, valine and alanine are all considered to be of the same type, and thus are included in the same group. You can access the Color Group selector/editor by clicking on the Groups toolbar button (note you may need to resize the window larger to see this button). The selector is also accessible from the Prefs | Consensus pane.

    ColorGroups2

    You can change the selected Color Group from about 20 built-in groups using the Color By: dropdown menu. You can edit the groups or even create your own groups. When amino acids belong to the same group, MacVector considers them to be “similar”. This affects many related functions throughout the multiple alignment interface. One example in the first image is a consensus that is a dot – none of the residues individually exceed 51% (the default identity threshold), but all belong to the same Color Group. If you select a different Color Group scheme, not only will the Editor tab update with the new colors, but the consensus will change to reflect the new groups.

    The Picture tab also handles similarities – you can shade and outline residues based on the currently selected color grouping scheme. This is controlled from the Prefs | Picture Shading tab.

    ColorGroups3

    Finally, the Text, Pairwise and Matrix tabs also respond to the currently selected Color Group to determine similarities. Prior to MacVector 18.2.5, these always used the ClustalW Default Groups similarity scheme, but now, for consistency, they honor the currently selected group.

    ColorGroups4

    Posted in Tips | Tagged , | Comments closed

    Weekly Tip: Use Hash Value = 12 for speedy genome comparisons with Create Dot Plot

    MacVector’s Analyze | Create Dot Plot function can be used to compare entire genomes very quickly to get both an overall view of similarity (large inversions and duplications) while providing the ability to “drill down” to the residue level to see individual SNPs. One of the keys to ensuring the calculations complete in a reasonable length of time is to set the Hash Value to a large number, typically 11 or 12. For example, to compare two E. coli genomes (~4.6 Mbp) these settings are a good start.

    Unknown

    On a typical laptop, with these settings the calculation takes just a few seconds to run. But the resulting plot clearly shows the well documented inversion in E. coli strain W3110 relative to MG1655 due to a recombination between the rrnB and rrnE rRNA gene clusters.

    Unknown

    Posted in Tips | Tagged , | Comments closed

    MacVector’s Primer Database – Importing primers from Excel

    Many molecular biologists keep lists of their primer sequences in Excel or some other spreadsheet tool. Previously MacVector had a separate utility that allows you to import primers kept in spreadsheet format into a Primer Database for direct use within MacVector. With the release of MacVector 18.2 we have integrated this functionality within MacVector.

    CDC primers

    Rather than direct importing the file you will need to first open the CSV file in TextEdit, then copy and paste into MacVector:

    • Prepare your data in an Excel or Numbers spreadsheet with three columns – “Name”, “Sequence”, “Comment”.
    • Export the data (or Save As…) in Tab Separated Values (TSV) format or Comma Separated Values (CSV) format.
    • Open the file with TextEdit, select all the rows of text
    • Edit | Copy
    • switch to MacVector and select File | New From Clipboard.
    • You can also open an existing Primer Database file and paste the new entries into it.

    Remember that as well as scanning sequences to look for your primers, you can also automatically display primer binding sites with any sequence that you open.

    Here’s an overview of all the primer workflows in MacVector.

    ScanDNA primers 1

    Posted in Releases, Tips | Tagged , , , | Comments closed

    MacVector 18.2 is out! …and ready for macOS Monterey

    MacVector 18.2

    Overview

    We are very pleased to announce that MacVector 18.2 is available to download. MacVector 18.2 is a Universal Binary that runs natively on both Apple Silicon and Intel Macs. It is fully supported on macOS Sierra (10.12) to macOS Monterey (12).

    MV18 2 Monterey

    New features:

    Align to Reference Enhancements

    • The Align to Reference alignment algorithm has been overhauled to do a much better job handling larger numbers of gaps in the alignment between a reference sequence and a read.
    • The alignment algorithm has been further optimized for speed. In addition, the Sensitivity setting can now be lower due to the enhanced consecutive gap detection, which also speeds up calculations.
    • When aligning ABI chromatogram data, or plain sequences, the Map tab now graphically displays the “trimmed” regions at either end of the sequences.

    MV18 2TrimmingEnds

    • There is a new Remove Gaps context-sensitive (right-click) menu option that deletes residues in reads that correspond to a gap in the consensus sequence.

    MV18 2 CloseGaps

    Context Sensitive Hamburger Menus

    Where a window has a context-sensitive menus available these views now contain a “hamburger” button (three parallel horizontal lines) that displays the same context sensitive menu when clicked on.

    Importing of Primer Databases in TSV or CSV Format

    You can import primer data into a MacVector Primer Database (.nsub) file from an Excel or CSV formatted file. This functionality replaces the old Primer Converter utility.

    CDC primers

    Miscellaneous Enhancements

    To reduce clutter in the Assembly Project window toolbar, all of the assembly algorithms have been consolidated into a single Assemble toolbar button with a dropdown menu.

    (Read more…)

    How to update to MacVector 18.2.

    If you have an active license, then you will be prompted to automatically update within the next few days.
    You can also download the installer and do it manually now.

    Posted in Uncategorized | Tagged , , | Comments closed