MacVectorTip: correctly flagging PacBio and Oxford Nanopore datasets for assembly by Flye

MacVector 17.5 introduced Flye for assembly of PacBio and Oxford Nanopore reads.

Flye joins Phrap, Velvet and SPAdes for de novo sequence assembly using along with Bowtie2 and Align To Reference for reference assembly.

Flye is an assembler algorithm tuned to assemble low quality long reads such as those produced by the new generation of single molecule sequencers. With typical bacterial genome assemblies it is fairly common to be able to assemble reads into a single full-length genome contig.

Because these longer reads to be very error prone, MacVector also includes an optional polishing step using Racon. Polishing is the technique of correcting sequencing errors by aligning reads against contigs produced by the first run of the assembler. Multiple rounds of polishing will keep increasing the accuracy of the resulting consensus.

It is important to tell MacVector what type of reads you are assembling before running Flye. Flye will be disabled unless your read files include at least one PacBio or Nanopore dataset.

Unknown

This is easily done by double-clicking on the Status item after importing the reads via the Add Reads toolbar button.

Posted in Tips | Tagged , , | Leave a comment

MacVectorTip: quality score visualization in sequence assemblies.

Quality scoring of Assemblies and Align to Reference alignments can be visualized directly on the sequence. Residues can be shaded according to their quality scores. These can be displayed anywhere quality values are available, including de novo and reference assemblies in Assembler and Align to Reference alignments.

A Shading toolbar button lets you turn on coloring based on the quality value assigned to each residue.

Unknown

The intensity of the colors indicates the phred-based quality value of each residue.

  • For individual reads, this ranges from 0 (deep red) through 20 (white) to 40 or above (deep green). The consensus scale is doubled and ranges from 0 (deep red) through 40 (white) to 80 or above (deep green).
  • Gaps are always shown with a white background. As with earlier versions of MacVector, you can “mouse-over” a residue to view the numerical information in a tooltip.
  • Edited residues are always given a phred quality value of 99 and these residues are given a blue background.
  • Unknown

    (Read More..)

    Posted in Tips | Tagged , | Leave a comment

    MacVectorTip: Assembling Fungal Genomes using SPAdes

    MacVector with Assembler can assemble bacterial genomes in just minutes on quite modest hardware. Currently MacVector has four de novo assembly tools (SPAdes, Velvet, Flye and Phrap).

    But what of larger genomes? It is currently impractical to run de novo assemblies of Human genomes on a low cost Mac, though RNA-Seq analyses against the human transcriptome are possible. However, here we performed some complete genome de novo assembly tests on a sample NGS experiment from Aspergillus fischeri (NCBI SRA SRR10092049). This data set consists of 2x 18.2 million paired -end 150nt Illumina HiSeq reads, representing 5.5Gbp of sequence data. The assembly was run on a 2.9 GHz 6-core i9 MacBook Pro with 32 GB RAM, using SPAdes with 11 threads and slightly modified K-MER values.

    As seen below, the complete assembly, including the optional Bowtie reference assembly step, took just over 28 hours in total. Maximum RAM usage during assembly was about 24 GB. 1,314 contigs were generated with a total combined length of 31.38 Mbp, right in line with the reported genome size with the longest contig 1,132,146 bp in length and an N50 Score of 350,491.

    The information presented here gives you some idea of what is achievable on a modest machine using MacVector with Assembler. Assembly would be expected to be faster with a more restricted set of K-MER values, or on higher end machines such as the Mac Pro, Apple Studio or with much more RAM.

    Unknown

    Posted in Tips | Tagged , , | Comments closed

    MacVectorTip: Viewing external database entries for features in a sequence.

    Sequences, or regions of sequences, can be linked to external databases. For example an entire sequence entry or for when annotation tools are used to annotate proteins with domain or motif information (for example InterProScan). Very useful for when you want to view more detailed or updated information. Within the Genbank specification, which MacVector extensively uses, an external database entry can be stored in a /DB_XREF qualifier. This allows the database entry to be easily viewed. The Genbank (and Genpept) specification allow for many different databases to be accessed using this qualifier.

    Unknown

    In MacVector the original database entry can easily be viewed in a web browser by selecting, then right clicking (or holding down CTRL and left clicking) the feature entry in the Features tab and viewing the available DB_XREF entries. Selecting one will load it in your web browser.

    Unknown

    Posted in Tips | Tagged , , | Comments closed

    MacVectorTip: How to Customize Window Button Toolbars

    Like many Mac applications, MacVector takes full advantage of the built-in ability to add, delete and rearrange the action buttons on window toolbars. To make these changes, right-click (or [ctrl]-click) in the gray space on any toolbar and a context-sensitive menu will appear. Choose Customize Toolbar and a dialog will be displayed with all of the buttons available for that tab, like this one for the Editor tab of the DNA Sequence Window.

    Unknown

    Note that modifying the toolbar is a global change that affects all windows containing that tab. It is also specific to different document types, so you can have different sets of buttons on the Editor toolbar of the DNA, Protein, Trace/Chromatogram and MSA document windows for example. Once modified, the changes remain permanently until you either customize them again, or reset your MacVector Preferences.

    Posted in Tips | Tagged , | Comments closed

    MacVectorTip: Understanding Color Groups

    You can align hundreds, or even thousands of protein sequences within MacVector using three different alignment algorithms ClustalW, MUSCLE or T-Coffee. Once aligned, you may be familiar with the colorful display in the Editor tab.

    ColorGroups1

    But there’s more to this than pretty colors. The default Color Group in MacVector is one called “Chemical Type”. In this, glycine, leucine, isoleucine, valine and alanine are all considered to be of the same type, and thus are included in the same group. You can access the Color Group selector/editor by clicking on the Groups toolbar button (note you may need to resize the window larger to see this button). The selector is also accessible from the Prefs | Consensus pane.

    ColorGroups2

    You can change the selected Color Group from about 20 built-in groups using the Color By: dropdown menu. You can edit the groups or even create your own groups. When amino acids belong to the same group, MacVector considers them to be “similar”. This affects many related functions throughout the multiple alignment interface. One example in the first image is a consensus that is a dot – none of the residues individually exceed 51% (the default identity threshold), but all belong to the same Color Group. If you select a different Color Group scheme, not only will the Editor tab update with the new colors, but the consensus will change to reflect the new groups.

    The Picture tab also handles similarities – you can shade and outline residues based on the currently selected color grouping scheme. This is controlled from the Prefs | Picture Shading tab.

    ColorGroups3

    Finally, the Text, Pairwise and Matrix tabs also respond to the currently selected Color Group to determine similarities. Prior to MacVector 18.2.5, these always used the ClustalW Default Groups similarity scheme, but now, for consistency, they honor the currently selected group.

    ColorGroups4

    Posted in Tips | Tagged , | Comments closed

    Weekly Tip: Use Hash Value = 12 for speedy genome comparisons with Create Dot Plot

    MacVector’s Analyze | Create Dot Plot function can be used to compare entire genomes very quickly to get both an overall view of similarity (large inversions and duplications) while providing the ability to “drill down” to the residue level to see individual SNPs. One of the keys to ensuring the calculations complete in a reasonable length of time is to set the Hash Value to a large number, typically 11 or 12. For example, to compare two E. coli genomes (~4.6 Mbp) these settings are a good start.

    Unknown

    On a typical laptop, with these settings the calculation takes just a few seconds to run. But the resulting plot clearly shows the well documented inversion in E. coli strain W3110 relative to MG1655 due to a recombination between the rrnB and rrnE rRNA gene clusters.

    Unknown

    Posted in Tips | Tagged , | Comments closed

    MacVector’s Primer Database – Importing primers from Excel

    Many molecular biologists keep lists of their primer sequences in Excel or some other spreadsheet tool. Previously MacVector had a separate utility that allows you to import primers kept in spreadsheet format into a Primer Database for direct use within MacVector. With the release of MacVector 18.2 we have integrated this functionality within MacVector.

    CDC primers

    Rather than direct importing the file you will need to first open the CSV file in TextEdit, then copy and paste into MacVector:

    • Prepare your data in an Excel or Numbers spreadsheet with three columns – “Name”, “Sequence”, “Comment”.
    • Export the data (or Save As…) in Tab Separated Values (TSV) format or Comma Separated Values (CSV) format.
    • Open the file with TextEdit, select all the rows of text
    • Edit | Copy
    • switch to MacVector and select File | New From Clipboard.
    • You can also open an existing Primer Database file and paste the new entries into it.

    Remember that as well as scanning sequences to look for your primers, you can also automatically display primer binding sites with any sequence that you open.

    Here’s an overview of all the primer workflows in MacVector.

    ScanDNA primers 1

    Posted in Releases, Tips | Tagged , , | Comments closed

    MacVector 18.2 is out! …and ready for macOS Monterey

    MacVector 18.2

    Overview

    We are very pleased to announce that MacVector 18.2 is available to download. MacVector 18.2 is a Universal Binary that runs natively on both Apple Silicon and Intel Macs. It is fully supported on macOS Sierra (10.12) to macOS Monterey (12).

    MV18 2 Monterey

    New features:

    Align to Reference Enhancements

    • The Align to Reference alignment algorithm has been overhauled to do a much better job handling larger numbers of gaps in the alignment between a reference sequence and a read.
    • The alignment algorithm has been further optimized for speed. In addition, the Sensitivity setting can now be lower due to the enhanced consecutive gap detection, which also speeds up calculations.
    • When aligning ABI chromatogram data, or plain sequences, the Map tab now graphically displays the “trimmed” regions at either end of the sequences.

    MV18 2TrimmingEnds

    • There is a new Remove Gaps context-sensitive (right-click) menu option that deletes residues in reads that correspond to a gap in the consensus sequence.

    MV18 2 CloseGaps

    Context Sensitive Hamburger Menus

    Where a window has a context-sensitive menus available these views now contain a “hamburger” button (three parallel horizontal lines) that displays the same context sensitive menu when clicked on.

    Importing of Primer Databases in TSV or CSV Format

    You can import primer data into a MacVector Primer Database (.nsub) file from an Excel or CSV formatted file. This functionality replaces the old Primer Converter utility.

    CDC primers

    Miscellaneous Enhancements

    To reduce clutter in the Assembly Project window toolbar, all of the assembly algorithms have been consolidated into a single Assemble toolbar button with a dropdown menu.

    (Read more…)

    How to update to MacVector 18.2.

    If you have an active license, then you will be prompted to automatically update within the next few days.
    You can also download the installer and do it manually now.

    Posted in Uncategorized | Tagged , , | Comments closed

    Compare a pair of genomes

    In recent years there has been an explosion of whole-genome sequencing projects. One common question coming out of this has been to ask:

    “Exactly what are the genetic differences between my sequenced organism and another related strain?”

    MacVector to the rescue! MacVector’s Compare Genomes By Feature… tool lets you see the differences between two annotated genomes in fine detail.

    CompareGenomes 1

    The algorithm takes every annotated feature from the source genome and looks for the presence of that feature in the comparison genome based on sequence similarity. CDS features are even translated so that the predicted amino acid sequences are compared. The results are then tabulated to show identical, closely related, and weakly related features in separate tabs, with additional tabs for features that are completely missing and a “details” tab that shows the low-level alignment details for any matching pair of features. Hot-links in the result tabs let you quickly scroll the parent sequences to any individual feature of interest.

    How to compare a pair of genomes

    Compares two related annotated genomes (or smaller sequences) to identify and list, in spreadsheet form, identical, similar and weakly similar features along with missing features.

  • Open the pair of sequences you want to compare
  • ANALYZE | COMPARE GENOMES BY FEATURE
  • Choose the feature types you want to compare and the target sequence
  • CompareGenomes dialog

  • Click OK
  • When the job has completed then you will be presented with the Filter dialog. Normally the defaults will be suitable. However, if the genomes are very similar you may want to increase the Similarity Threshold
  • click OK.
  • CompareGenomesFilter

    A results window will appear with the following tabs

    Identical, Similar, Weak, Missing, Details, Plot, Context

    CompareGenomesResults

    The first three tabs refer to the similarity of a feature between the two genomes. The differences are set in the previous dialog with the threshold setting.

    Identical lists all of the features that are perfectly conserved between the two genomes based on sequence identity, even if the names and qualifiers are different. CDS features are translated and the amino acid sequences compared, so there may be silent mutation differences in the encoding DNA sequences.

    Similar shows matches that are not identical but match or exceed the Similarity Threshold.

    Weak lists all the remaining matches that exceeded our initial search criteria but were not sufficiently similar to be included on the Similar tab.

    Missing refers to features that are completely absent in the second sequence

    Details tab is used to display feature alignments when you click on a hotlink in the first three tabs.

    Plot shows a dot plot of your pair of sequences so you can visualise the relationship between the pair of genomes.

    Context shows the alignment between the pair of genomes.

    CompareGenomesPlot

    The format of the results tabs

    For the first three tabs the format is similar. The first five columns are the “name”, type, start, stop and strand of the feature in the parent sequence i.e. the sequence that you had frontmost when you invoked the search. The “name” is the label that appears in the Map tab for the feature. By default, for CDS features, this would be the /gene= qualifier, but this can be configured on an individual feature basis or for all features of a type. The rightmost columns provide the same information for the feature(s) that matched on the target genome except that there are is an extra Match Score column. This displays the DNA identity score for each pair of features along with, (in brackets) the identity score for the predicted amino acid translation for CDS features given the current default genetic code.

    Note that features that are duplicated in the target genome will show additional matches.

    Note that when multiple matches are found, if one of them has a 100% match, all of the matching features are shown in the match list,
    even if they do not also have 100% identity. This approach ensures that you are always aware of duplicated/pseudogenes with significant but non-identical matches.

    The display is highly interactive and you can click on any blue hotlink to view more information about it.

    For example if you click on a hotlinked feature name in the first column then the parent sequence document is brought frontmost, switches to the Features tab and highlights and scrolls to the corresponding feature. So, you can use this shortcut to quickly jump to any feature of interest.

    CompareGenomesFeature

    Alternatively If you click on the target genome gene names then the window changes to select the Details tab and shows the sequence alignment between the two features.

    Posted in Uncategorized | Tagged , | Comments closed