Automatic Assembly of Sub-projects with Phrap (Sub-Assemblies)

New to MacVector 18.6 is the ability to sort and assemble reads from different datasets into individual sub-projects. This functionality is located in the phrap parameters dialog. When enabled and configured appropriately for your dataset it will automatically break out the input reads into sub-projects to be assembled separately.

A simple pattern-matching text box lets you define which characters in the input filenames should be treated as project names, and which should be treated as read names. After assembly, contigs can be exported (to a variety of file formats, including fasta and fastq) retaining the project name in the contig names.

This function can be a great time saver if you do a lot of related small sequencing projects as long as you use a well-defined naming convention.

Pattern Matching

SubProjects 1

The reads in your datasets must have a defined naming standard. You need to construct a pattern that will match the project name and read name. There are a set of characters that you can use to construct a pattern that defines what is the read name and what is the project name. As an aid to construction a pattern when you type these in the dialog the sub-project name will be dynamically updated to show what the sub-projects will be named. These characters are:

  • P – a single character to be included in a project name.
  • X – a single character to be excluded from the project names (typically these would be the read names).
  • “-“, “_” or “.” – separators. If present in the pattern they MUST be in the filenames (you can add more separator characters in the dialog).
  • p – one or more characters to be included in the project name. Extends to the next separator or to the end of the filename.
  • x – one or more characters to be excluded from the project name. Extends to the next separator or to the end of the filename.

SubProjectsPostAssembly

This is best demonstrated with an example. Here we have a sequencing dataset called BASENAME. Each individual sample that had been sequenced was numbered 1000 to 1100. Typical read names are:

List of read names

  • BASENAME-1001g07_0x00.s01_1.scf
  • BASENAME-1001g07_0x00.s02_1.scf
  • BASENAME-1003g07_1a03.m22_2.scf
  • BASENAME-1003g07_1b06.m23_1.scf
  • BASENAME-1005g07_2c07.s01_1.scf
  • BASENAME-1005g07_0x00.s01_1.scf

Your pattern for this could be:

PPPPPPPP-PPPPxxxx

We can break this down as follows for the first readname:

BASENAME-1001g07_0x00.s01_1.scf
  • PPPPPPPP = comprises the main name up until the separator. (BASENAME)
  • = the separator
  • PPPP – the number of the individual sample (1001)
  • xxxx – The first x excludes all characters to the next separator (g07). The second x excludes the next set of characters to the next separator (0x00), etc..

The above set of reads would produce the following three sub-assemblies:

  • BASENAME1001
  • BASENAME1003
  • BASENAME1005

How to sort reads into sub-projects

  1. File | NEW | ASSEMBLY PROJECT
  2. click >ADD SEQS to add your dataset
  3. Click ASSEMBLE | PHRAP
  4. Click the Sub-Assemblies tab in the Phrap dialog.
  5. Toggle the Enable Sub-assemblies setting to on.
  6. Ensure your separator character is listed in the Valid Separators box.
  7. Construct a suitable matching patter (see above)
  8. Click OK.

MacVector 18.6 was released in July 2023. This release adds one-click optimization of CDS coding regions, automatic phrap sub-project assembly, direct support of .csv/.tsv files for Primer Database, inclusion of graphical information in GenBank exports and numerous tweaks and improvements to many workflows.

A DNA sequence having a CDS feature optimized for expression in a different organism. Background is macOS Sonoma.
Posted in Releases, Tips | Tagged , , | Comments closed

One click Codon Optimization of CDS Features

Our latest release, MacVector 18.6 has a new tool that will directly optimize codon usage of CDS features for enhanced expression in a different organism.

The new tool pulls together multiple tools into a one step procedure which can be run by selecting a CDS feature in your nucleic acid sequence and running Analyze | Optimize Codon Usage for CDS… You will need to choose an appropriate codon usage table (.bias file) for expression.

When optimized a new Feature is annotated to the sequence showing that the CDS has been optimized, which algorithm was used and which codon usage table. It will also show the user who made the modification, date, and how the sequence was before and after the action.

A few windows showing a DNA sequence being optimized for expression in a different organism

How to optimize codon usage for a CDS feature

  • Select a CDS feature in the Map or Features tab of a nucleic acid sequence.
  • Choose Analyze | Optimize Codon Usage for CDS…
  • Choose the codon usage table (.bias file) to use, along with the genetic code and the optimization algorithm.
  • Either apply the results to the CDS feature or just view the proposed changes.

You will need a codon usage table (.bias) for the organism that the CDS will be expressed in. A number of common tables are shipped with MacVector, but we can generate new ones on request. A future release of MacVector will generate codon usage tables automatically.

Codon Usage optimization algorithms

There are four different algorithms that MacVector provides for optimizing codon usage.

  • Most Frequently Used Codon – this simply uses the most commonly occurring codon for each amino acid. So if, e.g. the most common Leu codon is CTC, all Leu codons will be CTC. Perhaps this is only useful if you want to design a “best guess” primer and are willing to accept a certain failure rate. If you used this to optimize expression, the host would likely run out of that tRNA and you wouldn’t see optimal expression.
  • Frequency Distribution – this selects a random codon for each amino acid, biased towards the most commonly used codon that encodes each amino acid. Each time you run the algorithm, a different, random set of codons will be selected. If you were to generate a new DNA over and over again, eventually this would create a collection of sequences where the average codon usage would exactly match the average for the .bias organism, but any individual reverse translation may randomly be quite different.
  • Probability Distribution – this is probably the most powerful setting if you are interested in expression. Similar to the Frequency Distribution, this chooses a random codon, biased towards the most frequently used codons for each amino acid. However, this version tries to ensure that the final DNA sequence has a codon usage profile as closely matching as possible to the codon usage of the selected .bias file. Again, each time you invoke the algorithm, it will produce a different sequence. But as the overall codon usage in the DNA sequence is guaranteed to be as close as possible to the codon usage in the .bias organism this should, in theory, give you the best chance of high expression. Again, you will get a different sequence each time you invoke this.
    Uniform Distribution – this ignores the usage of each codon and randomly assigns an appropriate codon for each amino acid. It’s similar to the default algorithm that uses ambiguities to create an “absolute” coding DNA, but here it just chooses a random codon with no regard for codon usage probability. Again, you will get a different sequence each time you invoke this.

MacVector 18.6 was released in July 2023. This release adds one-click optimization of CDS coding regions, automatic phrap sub-project assembly, direct support of .csv/.tsv files for Primer Database, inclusion of graphical information in GenBank exports and numerous tweaks and improvements to many workflows.

A DNA sequence having a CDS feature optimized for expression in a different organism. Background is macOS Sonoma.
Posted in Releases | Tagged , | Comments closed

Importing Sequencher project files into MacVector

Assembler is a plugin for MacVector that provides comprehensive sequence assembly functionality. Assembler is fully integrated into MacVector and allows you to manage sequencing data with the familiar MacVector style. You can design primers directly on a contig or BLAST that contig to identify it.

AssemblerAssrtd

MacVector’s Assembly Project manager

Like Sequencher, MacVector with Assembler has the Assembly Projects manager which provides a simple interface where you can import and drag and drop files for Assembly.

The Assembly Project manager allows you to manage reference sequences, trace file, multiple datasets, large sequencing datasets in FASTQ format and assemblies. You can also import existing assemblies in BAM/SAM format and work with unaligned reads from an assembly directly in the project. Assembly Project manager also allows you to compare multiple compare multiple assemblies with expression level comparison and map multiple datasets against the same reference.
However, unlike Sequencher, MacVector lets you have multiple projects open at one time.

Importing Sequencher project files into the Assembly Project manager

You can directly import Sequencher’s .spf files directly into an Assembly Project.

To import a Sequencher assembly project

  • Choose File | Open….
  • Select the .SPF file you wish to import and click on the Open button.
  • The assembly will be imported into a new Assembly Project.
  • Choose File | SAVE AS… to save the imported assembly into a new Assembly Project.


The following screenshots show some Sequencher projects imported directly into MacVector.

SequencherFileinMV 90 spf
SequencherfileinMV 2838 Full Donor

Because Sequencher heavily relies on manual editing to optimize assemblies (generally not needed with MacVector/Assembler), many Sequencher project files are non standard. If you ever come across a Sequencher project file that will not import correctly then please contact MacVector Support and we will resolve the problem for you.

Sequencher does not run on macOS 10.15 Catalina or later

SequencherNeedsToBeUpdated

Sequencher will not run on macOS 10.15 Catalina or any later version of macOS. The root of this problem is that Sequencher 5.4.6 (the latest version for the Mac) is a 32-bit application and is particularly reliant on an ancient “Carbon” OS 9-compatibility framework that Apple has been telling developers for many years will be phased out in the near future. That day has finally arrived and until Sequencher is completely rewritten for the latest macOS, you cannot upgrade your Mac to Catalina.

MacVector with Assembler has been fully 64-bit for many years and has no dependency on the “Carbon” framework. It is fully supported on macOS High Sierra to macOS Ventura (and shortly macOS Sonoma) and runs natively on both Apple Silicon Macs and Intel Macs.
Assembler adds the Assembly Projects manager and a powerful sequence assembly toolkit to MacVector. It directly imports Sequencher project files, has equivalent functionality and is very easy to use. It provides almost all of the functionality of Sequencher within an updated modern interface, along with a host of additional functionality and integration with all of the DNA analysis tools within the core MacVector application.

Modern DNA sequencing has changed dramatically. Upgrade to a state of the art sequence assembly tool that’s kept ahead of the game!

We offer a 50% discount to all Sequencher users who wish to upgrade to MacVector with Assembler.

Posted in Tips | Tagged , , , | Comments closed

Sequence Assembly: What can Assembler do for my lab?

Assembler is fully integrated into MacVector and allows you to manage sequencing data with the familiar MacVector ease.

de novo sequence assembly using Phrap, Velvet and SPAdes with Flye for PacBio and Oxford Nanopore.

Reference Sequence Assembly: Map millions of reads against genomes, transcriptomes or other reference sequences using Bowtie2.

MacVector

Compare Genomes: Compare two related annotated genomes to see common or missing genes.

Coverage Tab: compare multiple assemblies with expression level comparison.

Variant Calling: SNPs and INDELS are visualised on your assembly and supplied in VCF.

Bacterial genome tools Tools for finishing bacterial genomes including circularizing genomes.

Easy to use interface. Navigating around your assemblies has never been easier. Display an entire contig in the graphical Map and select a read to zoom straight to that region. Click on a base in a contig and see coverage and variants.

MacVector

Heterozygote analysis. Analyse heterozygotes in Sanger trace files and assemblies.

Assembly Project manager makes it easy to assemble multiple datasets, reference sequences and assemblies. Work with unaligned reads from an assembly directly in the project.


Compare different assemblies. Map multiple datasets against the same reference.

RNA-Seq analysis with read depth visualization and per gene coverage data (RPKM & TPM).

Posted in Tips | Tagged | Comments closed

Setting the Numbering Origin

Preserving sequence numbering is particularly useful if you want to work on a smaller more manageable region of a large chromosome but wish to retain the original numbering. When you copy a section of a larger sequence and paste the copy into a new MacVector sequence window (or use FILE | NEW FROM CLIPBOARD), the original numbering is retained.

SettingOrigin

  • Open pBR322 (in the SAMPLE FILES folder in the MacVector application folder).
  • Either:
    • in the EDITOR tab select the Features popup menu and select the tetracycline resistance CDS.
    • or

    • click on the TET feature in the MAP tab.

    This selects the region from 86 to 1276 in both the EDITOR and MAP tabs.

  • Now choose Edit | Copy, followed by File | New from Clipboard. A new window appears with the numbering origin set to 86. (You can also accomplish this by choosing File | New | Nucleic Acid and then Edit | Paste into the new window).
  • If you want to quickly reset the origin to “1”, you can right-click (or -click ) in the sequence area to bring up a context sensitive menu and choose Reset Origin to 1.
  • Changing the numbering of existing sequences

    You can also easily change the numbering of a sequence by selecting then dragging the small red cross that usually appears at the beginning of the sequence.
    Dragging the cross to another location designates that as the “plus 1” residue – all residues before that position will be given negative numbers.
    You can also set the first residue to a positive number. To set this, double-click on the red cross and enter a new start value in the sheet that appears.

    Screenshot 2023 03 01 at 14 00 53

    Setting the Circular Origin

    If you are working with a circular sequences then you can change the location where the sequence is “split” in the editor.

    you can just right click the sequence and set SET CIRCULAR ORIGIN from the context sensitive menu that will appear. This also changes the Map tab so that the new position is located at 12 o’clock.

    In the MAP tab you can also select a restriction enzyme site and right- click to choose Set Circular Origin.

    Posted in Tips | Tagged , | Comments closed

    MacVectorTip: Selecting the sequence from a single restriction enzyme site to the end of a linear sequence

    To see the distance between any two points on a sequence is easy. For example select one restriction enzyme site, hold down SHIFT and select the second. The start, stop and length will be shown in the Range Selector (top right corner of every window – see images below). But if you want to see the distance from a Restriction Enzyme site to the end (or beginning) of the sequence it is slightly more difficult.

    You have two options.

    • Select the RE site in the MAP tab.

    • Switch to the EDITOR tab.

    • Hold down SHIFT then click right at the end of the sequence. You can also hold down SHIFT, press the RIGHT cursor key, then the DOWN cursor key.

    RE Selection EDITOR

    You can also do this entirely within the MAP tab. Although this is slightly more complicated as you need to change the type of cursor.

    • In the Graphics Palette select the SELECT SEQUENCE cursor button

    • Now you can drag and select sequence in the MAP tab, exactly as you can do in the EDITOR tab with the other cursor.

    RESelection MAP
    Both of these options will select from the restriction enzyme site right to the end of the sequence and show the length in the top right corner.

    Posted in Tips | Tagged , , | Comments closed

    MacVectorTip: How to Customize the Toolbars of MacVector windows

    Like many Mac applications, MacVector takes full advantage of macOS’s ability to add, delete and rearrange the action buttons on window toolbars. To make these changes, right-click (or [ctrl]-click) in the gray space on any toolbar and a context-sensitive menu will appear. Choose Customize Toolbar and a dialog will be displayed with all of the buttons available for that tab, like this one for the Editor tab of the DNA Sequence Window.

    EditingToolbar

    Note that modifying the toolbar is a global change that affects all windows containing that tab. It is also specific to different document types, so you can have different sets of buttons on the Editor toolbar of the DNA, Protein, Trace/Chromatogram and MSA document windows for example. Once modified, the changes remain permanently until you either customize them again, or reset your MacVector Preferences.

    You can also customise the Analyses Toolbar with your most often used tools. This will be a global change and remain the same for all windows.

    CustomizeAnalysesToolbar

    Posted in Tips | Tagged | Comments closed

    How to call heterozygotes in trace files or Assembly Projects

    In our latest release, MacVector 18.5, we added a new tool to call heterozygotes in sequencing reads.

    The heterozygote analysis tool allows you to either view heterozygotes in Sanger trace files or to permanently change the basecalled sequence with an ambiguity representing the called heterozygote. The tool works on multiple trace files in the Assembly project manager or the Align to Reference editor. You can also run it on a single trace file in the Single Trace Editor

    HET analysis Trace

    How to run Heterozygote Analysis

    To view putative heterozygotes

    Select Trace files in the Align to Reference editor or Assembly Project Editor or open a trace file in the the single trace sequence editor.
    1. Run ANALYZE – HETEROZYGOTE ANALYSIS
    2. An Options dialog will appear. Change the options and click OK
    3. A summary dialog will appear showing the number of heterozygotes found across how many sequences.
    4. Click OK
    5. A new tab will appear in the Results window showing the location of the possible heterozygotes.
    You can click on the highlighted blue position to be taken to the heterozygote. Note if you are in Assembly Projects or Align to Reference editors then the heterozygote will be displayed in the Single Trace Editor.

    To permanently basecall the sequence

    Assembler

    1. Add trace files to your Assembly Project
    2. Select one more more trace files
    3. Select ANALYZE | BASECALL | USING HETEROZYGOTE ANALYSIS
    4. An Options dialog will appear. Change the options and click OK
    5. A summary dialog will appear showing the number of heterozygotes found across how many sequences
    6. Click OK

    Align to Reference

    1. go to FILE | NEW | ALIGN SEQUENCES TO A REFERENCE
    2. Choose your reference sequence
    3. Add trace files to your Assembly Project
    4. Select one more more trace files
    5. Select the BASECALL toolbar button or ANALYZE | BASECALL | USING HETEROZYGOTE ANALYSIS
    6. An Options dialog will appear. Change the options and click OK
    7. A summary dialog will appear showing the number of heterozygotes found across how many sequences
    8. Click OK

    Trace File Editor

    1. Double click to open a trace file.
    2. Select ANALYZE | BASECALL | USING HETEROZYGOTE ANALYSIS
    3. An Options dialog will appear. Change the options and click OK
    4. A summary dialog will appear showing the number of heterozygotes found.
    5. Click OK
    Once run there is a new BASECALL line showing the new sequence. In the Assembler Project editor an H will appear in the status column. Heterozygotes are indicated by an ambiguity.
    Ensure the BASECALL toolbar button is toggled to green to view basecalled lines.

    Settings

    The default settings should be enough for the majority of trace files. However, there are a number of settings that can be adjusted where needed. The default value is in brackets.
    • Normalise peak heights using a window of (25) residues
    • Percent of each peak width to use (50%)
    • Minimum number of normalised residues (3)
    • Minimum Heterozygote threshold (35%)
    • Minimum Base Call threshold (75%)
    • Ignore low quality regions (yes)
    • Where a window of (21) residues
    • Has an average quality value of less than (20)

    HET analysis Assembly

    Posted in Uncategorized | Tagged , , | Comments closed

    MacVectorTip: Changing the font used in the Editor, Results and other windows

    EditorFontsPrefs

    MacVector is very customizable in how you can produce graphical maps of sequences, assemblies, alignments and more. You can also change the default appearance of MacVector itself. The font used in the Editor and Results window can be changed and increased in size. You are limited to using fixed width fonts (such as Andale Mono anc Courier) as otherwise sequences will not align properly. But you can change the font size. Additonally You can change the size of the font used in the Graphics Palette. All very useful for very large monitors to avoid squinting at the screen!

  • Open MacVector | Settings…
  • Choose Font.
  • Change the settings and close the Settings dialog.
  • Posted in Tips | Tagged | Comments closed

    MacVectorTip: macOS Dark Mode and forcing MacVector to open in Light Mode

    Dark Mode makes it easier to stay focused on your work because your content stands out while darkened controls and windows recede into the background”

    If you use Dark Mode with the Auto setting, then with the short winter days in the northern hemisphere (for our southern hemisphere friends please save this email for six months!) you will be spending a lot more time in Dark mode than those long summer days.

    Unknown

    Using Dark Mode is a very different user experience and may not be to every user’s taste. For example as can be seen in the screenshot above plasmid maps with dark backgrounds can be a little “odd” at first. So if you like working with Dark Mode, but would prefer MacVector to use Light Mode then read on.

    You have always been able to change this with a command line to change the preferences. However, in the last minor release (MacVector 18.5.1) we exposed this setting in the MacVector Preferences so you do not need to use Terminal.

    To force MacVector to always open in Light Mode

    This new setting is in MacVector | Settings | General | Always use Aqua

    Unknown

    You will need to restart MacVector.

    If you are using an older version then you can still do this from the command line.

  • close MacVector
  • type or copy/paste the following command into the Terminal window
  • defaults write com.macvector.MacVector NSRequiresAquaSystemAppearance -bool YES
  • When you start MacVector again, it should be running in Light Mode.
  • If you want to reset that preference, open Terminal and type/paste this command:
  • defaults delete com.macvector.MacVector NSRequiresAquaSystemAppearance

    To support Dark Mode we made a considerable number of design changes to MacVector’s user interface, so that toolbar buttons and Map tab colors suit both Dark and Light modes. Nonetheless MacVector’s default colors were originally designed with Light Mode in mind so the colors may not always be ideal for your needs. Do remember that MacVector gives you a lot of control over the default appearance of the display. See the following preferences pane: MacVector | Preferences | Color

    Posted in Tips | Tagged , , | Comments closed