The 40th anniversary of the Mac!

So it is forty years since Steve Jobs walked onto the stage and announced the Mac!

MacVector did not come about until six years later as MacVector 1.0 was released in March 1990! But we are still proud that MacVector has now been running on the Mac for over thirty years. Especially that MacVector 18.6 (our latest release) will still open files created by that very first version.

Here is the very first version of MacVector running on a virtual machine of Mac OS 8.

  • “MacVector: an integrated sequence analysis program for the Macintosh”. Olson SA. Methods Mol Biol. (1994) 25,195–201.
Posted in General, Releases | Tagged , | Leave a comment

MacVectorTip: Grayed out graphics indicate Missing Features

If the graphics in a nucleic acid sequence Map tab appear somewhat “washed out” it is because the graphic items represent common features that MacVector has found that are not annotated on the sequence. For example, here are the Map and Feature tabs of an unannotated cloning vector.

You can see a number of features on the Map tab, but the Features tab is completely empty. The graphics indicate common features that MacVector has identified that have not been annotated on the sequence. If you select one of the features in the Map tab and right-click (or [ctrl]-click) there is an option in the resulting context sensitive menu to Add CDS Feature. When that is selected, the feature takes on a bold appearance and a new annotation appears in the Features tab.

If you wish, you can select multiple missing features and then add them all with a right-click. Or you can select the Results | Missing Features tree view item in the floating Graphics Palette to select all missing features and then add with a right-click.

Note that the automatic display of missing features is controlled by the MacVector Preferences | Scan DNA tab. From there you can control how they are identified or even point the algorithm to your own folder of annotated sequences to be a source for the missing features.

Posted in Tips | Tagged , , , | Comments closed

MacVector and macOS Sonoma

Apple released macOS Sonoma yesterday (Tuesday 26th Sept 2023).

As usual in the run up to a new macOS release, we have been testing MacVector on development builds of macOS Sonoma. Unfortunately, at a late stage we found an undocumented change that affects a few tools in MacVector.

MacVector uses a number of third party tools to align and assemble sequences. Disappointingly, in an undocumented change Apple removed the function that we were using to run these third party tools (“fork()”). We’d already started migrating these tools to use a different function (“posix_spawn()”), however, we had not completed these changes when we released MacVector 18.6 back in July. So for once, Apple’s removal of this function really did catch us on the hop!

Luckily, since we’d mostly completed the work we were able to finish the transition, test it and so we are very pleased to announce that MacVector 18.6.1 was released yesterday and is fully supported on macOS Sonoma.

MacOS Sonoma MV1861

As ever this minor release is free for anybody who is already running MacVector 18.6 or whose maintenance was valid on 1st July 2023. You should be automatically prompted to upgrade to MacVector 18.6.1, but if not then go to MACVECTOR | CHECK FOR UPDATES… to update.

If you think you are eligible for MacVector 18.6 but it will not run then please contact MacVector Support. It may just be that you have not entered your updated license activation details.

If your maintenance expired before 1st July 2023 and if you have updated to macOS Sonoma then MacVector will still run fine except the following tools will no longer work:

  • Multiple Sequence alignments
    • Clustalw
    • Muscle
    • T-Coffee
  • Sequence Assembly
    • Phred and Phrap
    • Bowtie2
    • Flye
    • Velvet
    • SPAdes
  • Compare Genomes
  • Exporting sequences to text based formats
    • Fasta, Genbank, RSF, MSF, etc..

When you run one of these the analysis will start but never finish.

All other tools and functionality of MacVector 18.5 and earlier versions are untouched by the issue.

If your maintenance expired before 1st July 2023 and you have not yet updated to macOS Sonoma then you have three options:

  1. Do not upgrade to macOS Sonoma. All your tools will continue to work as normal.
  2. If you want to upgrade to macOS Sonoma and do not use Assembler or multiple sequence alignments, then MacVector 18.5 will continue to work.
  3. If you want to upgrade to macOS Sonoma and use Assembler and multiple sequence alignments then you must also upgrade MacVector.

Be aware that MacVector 18.5 is only officially supported on macOS Ventura to macOS High Sierra and was only tested on those releases. So if you upgrade to macOS Sonoma we still recommend that you also upgrade MacVector.

For versions of MacVector before MacVector 18.6 you can check compatibility on a table which we update after every official release of macOS.

macOS compatibility of older versions Table

For versions of MacVector released over the past few years it is likely that they will work fine on macOS Sonoma except for the aforementioned tools. Our developers strive to future proof MacVector, and it is only when Apple make significant changes that older versions may stop working.

Posted in Releases | Tagged | Comments closed

Viewing and applying individual putative heterozygotes

The heterozygote analysis tool allows you to either view heterozygotes in Sanger trace files or to permanently change the basecalled sequence with an ambiguity representing the called heterozygote. The tool works on multiple trace files in the Assembly project manager or the Align to Reference editor. You can also run it on a single trace file in the Single Trace Editor. But you can also apply individual heterozygotes rather than multiple ones.

Most windows in MacVector are linked. For example if you open multiple windows using the REPLICA button, then click on a gene feature in the MAP tab, then the EDITOR tab will scroll to and highlight that gene. This is very useful for  navigating around larger sequences, and is especially useful for large Align to Reference alignments. All Results windows are also linked to the main sequence (including the Het Analysis Results). So if you click on a result then the main sequence (whether MAP or EDITOR) will scroll to and highlight the region that result applies to. You can use this to manually check putative heterozygotes and only apply ones you deem to be true.

Here a possible het was clicked on in the results window and the main Align to Reference Editor scrolls to and highlights that base.

To view and apply individual putative heterozygotes

  • go to FILE | NEW | ALIGN SEQUENCES TO A REFERENCE
  • Choose your reference sequence
  • Add trace files and click ALIGN
  • Select the BASECALL toolbar button or ANALYZE | BASECALL | USING HETEROZYGOTE ANALYSIS
  • An Options dialog will appear. Change the options and click OK
  • A summary dialog will appear showing the number of heterozygotes found across how many sequences
  • Do not click APPLY but instead VIEW. A new Results window will appear.
  • In the results window click on the blue highlighted HET then the main window will scroll to and show that possible het.
  • If you want to apply the change, then manually type the ambiguity (because the base is highlighted typing a new base will overwrite it).
Posted in Tutorials | Tagged , , | Comments closed

Automatic Assembly of Sub-projects with Phrap (Sub-Assemblies)

New to MacVector 18.6 is the ability to sort and assemble reads from different datasets into individual sub-projects. This functionality is located in the phrap parameters dialog. When enabled and configured appropriately for your dataset it will automatically break out the input reads into sub-projects to be assembled separately.

A simple pattern-matching text box lets you define which characters in the input filenames should be treated as project names, and which should be treated as read names. After assembly, contigs can be exported (to a variety of file formats, including fasta and fastq) retaining the project name in the contig names.

This function can be a great time saver if you do a lot of related small sequencing projects as long as you use a well-defined naming convention.

Pattern Matching

SubProjects 1

The reads in your datasets must have a defined naming standard. You need to construct a pattern that will match the project name and read name. There are a set of characters that you can use to construct a pattern that defines what is the read name and what is the project name. As an aid to construction a pattern when you type these in the dialog the sub-project name will be dynamically updated to show what the sub-projects will be named. These characters are:

  • P – a single character to be included in a project name.
  • X – a single character to be excluded from the project names (typically these would be the read names).
  • “-“, “_” or “.” – separators. If present in the pattern they MUST be in the filenames (you can add more separator characters in the dialog).
  • p – one or more characters to be included in the project name. Extends to the next separator or to the end of the filename.
  • x – one or more characters to be excluded from the project name. Extends to the next separator or to the end of the filename.

SubProjectsPostAssembly

This is best demonstrated with an example. Here we have a sequencing dataset called BASENAME. Each individual sample that had been sequenced was numbered 1000 to 1100. Typical read names are:

List of read names

  • BASENAME-1001g07_0x00.s01_1.scf
  • BASENAME-1001g07_0x00.s02_1.scf
  • BASENAME-1003g07_1a03.m22_2.scf
  • BASENAME-1003g07_1b06.m23_1.scf
  • BASENAME-1005g07_2c07.s01_1.scf
  • BASENAME-1005g07_0x00.s01_1.scf

Your pattern for this could be:

PPPPPPPP-PPPPxxxx

We can break this down as follows for the first readname:

BASENAME-1001g07_0x00.s01_1.scf
  • PPPPPPPP = comprises the main name up until the separator. (BASENAME)
  • = the separator
  • PPPP – the number of the individual sample (1001)
  • xxxx – The first x excludes all characters to the next separator (g07). The second x excludes the next set of characters to the next separator (0x00), etc..

The above set of reads would produce the following three sub-assemblies:

  • BASENAME1001
  • BASENAME1003
  • BASENAME1005

How to sort reads into sub-projects

  1. File | NEW | ASSEMBLY PROJECT
  2. click >ADD SEQS to add your dataset
  3. Click ASSEMBLE | PHRAP
  4. Click the Sub-Assemblies tab in the Phrap dialog.
  5. Toggle the Enable Sub-assemblies setting to on.
  6. Ensure your separator character is listed in the Valid Separators box.
  7. Construct a suitable matching patter (see above)
  8. Click OK.

MacVector 18.6 was released in July 2023. This release adds one-click optimization of CDS coding regions, automatic phrap sub-project assembly, direct support of .csv/.tsv files for Primer Database, inclusion of graphical information in GenBank exports and numerous tweaks and improvements to many workflows.

A DNA sequence having a CDS feature optimized for expression in a different organism. Background is macOS Sonoma.
Posted in Releases, Tips | Tagged , , | Comments closed

One click Codon Optimization of CDS Features

Our latest release, MacVector 18.6 has a new tool that will directly optimize codon usage of CDS features for enhanced expression in a different organism.

The new tool pulls together multiple tools into a one step procedure which can be run by selecting a CDS feature in your nucleic acid sequence and running Analyze | Optimize Codon Usage for CDS… You will need to choose an appropriate codon usage table (.bias file) for expression.

When optimized a new Feature is annotated to the sequence showing that the CDS has been optimized, which algorithm was used and which codon usage table. It will also show the user who made the modification, date, and how the sequence was before and after the action.

A few windows showing a DNA sequence being optimized for expression in a different organism

How to optimize codon usage for a CDS feature

  • Select a CDS feature in the Map or Features tab of a nucleic acid sequence.
  • Choose Analyze | Optimize Codon Usage for CDS…
  • Choose the codon usage table (.bias file) to use, along with the genetic code and the optimization algorithm.
  • Either apply the results to the CDS feature or just view the proposed changes.

You will need a codon usage table (.bias) for the organism that the CDS will be expressed in. A number of common tables are shipped with MacVector, but we can generate new ones on request. A future release of MacVector will generate codon usage tables automatically.

Codon Usage optimization algorithms

There are four different algorithms that MacVector provides for optimizing codon usage.

  • Most Frequently Used Codon – this simply uses the most commonly occurring codon for each amino acid. So if, e.g. the most common Leu codon is CTC, all Leu codons will be CTC. Perhaps this is only useful if you want to design a “best guess” primer and are willing to accept a certain failure rate. If you used this to optimize expression, the host would likely run out of that tRNA and you wouldn’t see optimal expression.
  • Frequency Distribution – this selects a random codon for each amino acid, biased towards the most commonly used codon that encodes each amino acid. Each time you run the algorithm, a different, random set of codons will be selected. If you were to generate a new DNA over and over again, eventually this would create a collection of sequences where the average codon usage would exactly match the average for the .bias organism, but any individual reverse translation may randomly be quite different.
  • Probability Distribution – this is probably the most powerful setting if you are interested in expression. Similar to the Frequency Distribution, this chooses a random codon, biased towards the most frequently used codons for each amino acid. However, this version tries to ensure that the final DNA sequence has a codon usage profile as closely matching as possible to the codon usage of the selected .bias file. Again, each time you invoke the algorithm, it will produce a different sequence. But as the overall codon usage in the DNA sequence is guaranteed to be as close as possible to the codon usage in the .bias organism this should, in theory, give you the best chance of high expression. Again, you will get a different sequence each time you invoke this.
    Uniform Distribution – this ignores the usage of each codon and randomly assigns an appropriate codon for each amino acid. It’s similar to the default algorithm that uses ambiguities to create an “absolute” coding DNA, but here it just chooses a random codon with no regard for codon usage probability. Again, you will get a different sequence each time you invoke this.

MacVector 18.6 was released in July 2023. This release adds one-click optimization of CDS coding regions, automatic phrap sub-project assembly, direct support of .csv/.tsv files for Primer Database, inclusion of graphical information in GenBank exports and numerous tweaks and improvements to many workflows.

A DNA sequence having a CDS feature optimized for expression in a different organism. Background is macOS Sonoma.
Posted in Releases | Tagged , | Comments closed

Importing Sequencher project files into MacVector

Assembler is a plugin for MacVector that provides comprehensive sequence assembly functionality. Assembler is fully integrated into MacVector and allows you to manage sequencing data with the familiar MacVector style. You can design primers directly on a contig or BLAST that contig to identify it.

AssemblerAssrtd

MacVector’s Assembly Project manager

Like Sequencher, MacVector with Assembler has the Assembly Projects manager which provides a simple interface where you can import and drag and drop files for Assembly.

The Assembly Project manager allows you to manage reference sequences, trace file, multiple datasets, large sequencing datasets in FASTQ format and assemblies. You can also import existing assemblies in BAM/SAM format and work with unaligned reads from an assembly directly in the project. Assembly Project manager also allows you to compare multiple compare multiple assemblies with expression level comparison and map multiple datasets against the same reference.
However, unlike Sequencher, MacVector lets you have multiple projects open at one time.

Importing Sequencher project files into the Assembly Project manager

You can directly import Sequencher’s .spf files directly into an Assembly Project.

To import a Sequencher assembly project

  • Choose File | Open….
  • Select the .SPF file you wish to import and click on the Open button.
  • The assembly will be imported into a new Assembly Project.
  • Choose File | SAVE AS… to save the imported assembly into a new Assembly Project.


The following screenshots show some Sequencher projects imported directly into MacVector.

SequencherFileinMV 90 spf
SequencherfileinMV 2838 Full Donor

Because Sequencher heavily relies on manual editing to optimize assemblies (generally not needed with MacVector/Assembler), many Sequencher project files are non standard. If you ever come across a Sequencher project file that will not import correctly then please contact MacVector Support and we will resolve the problem for you.

Sequencher does not run on macOS 10.15 Catalina or later

SequencherNeedsToBeUpdated

Sequencher will not run on macOS 10.15 Catalina or any later version of macOS. The root of this problem is that Sequencher 5.4.6 (the latest version for the Mac) is a 32-bit application and is particularly reliant on an ancient “Carbon” OS 9-compatibility framework that Apple has been telling developers for many years will be phased out in the near future. That day has finally arrived and until Sequencher is completely rewritten for the latest macOS, you cannot upgrade your Mac to Catalina.

MacVector with Assembler has been fully 64-bit for many years and has no dependency on the “Carbon” framework. It is fully supported on macOS High Sierra to macOS Ventura (and shortly macOS Sonoma) and runs natively on both Apple Silicon Macs and Intel Macs.
Assembler adds the Assembly Projects manager and a powerful sequence assembly toolkit to MacVector. It directly imports Sequencher project files, has equivalent functionality and is very easy to use. It provides almost all of the functionality of Sequencher within an updated modern interface, along with a host of additional functionality and integration with all of the DNA analysis tools within the core MacVector application.

Modern DNA sequencing has changed dramatically. Upgrade to a state of the art sequence assembly tool that’s kept ahead of the game!

We offer a 50% discount to all Sequencher users who wish to upgrade to MacVector with Assembler.

Posted in Tips | Tagged , , , | Comments closed

Sequence Assembly: What can Assembler do for my lab?

Assembler is fully integrated into MacVector and allows you to manage sequencing data with the familiar MacVector ease.

de novo sequence assembly using Phrap, Velvet and SPAdes with Flye for PacBio and Oxford Nanopore.

Reference Sequence Assembly: Map millions of reads against genomes, transcriptomes or other reference sequences using Bowtie2.

MacVector

Compare Genomes: Compare two related annotated genomes to see common or missing genes.

Coverage Tab: compare multiple assemblies with expression level comparison.

Variant Calling: SNPs and INDELS are visualised on your assembly and supplied in VCF.

Bacterial genome tools Tools for finishing bacterial genomes including circularizing genomes.

Easy to use interface. Navigating around your assemblies has never been easier. Display an entire contig in the graphical Map and select a read to zoom straight to that region. Click on a base in a contig and see coverage and variants.

MacVector

Heterozygote analysis. Analyse heterozygotes in Sanger trace files and assemblies.

Assembly Project manager makes it easy to assemble multiple datasets, reference sequences and assemblies. Work with unaligned reads from an assembly directly in the project.


Compare different assemblies. Map multiple datasets against the same reference.

RNA-Seq analysis with read depth visualization and per gene coverage data (RPKM & TPM).

Posted in Tips | Tagged | Comments closed

Setting the Numbering Origin

Preserving sequence numbering is particularly useful if you want to work on a smaller more manageable region of a large chromosome but wish to retain the original numbering. When you copy a section of a larger sequence and paste the copy into a new MacVector sequence window (or use FILE | NEW FROM CLIPBOARD), the original numbering is retained.

SettingOrigin

  • Open pBR322 (in the SAMPLE FILES folder in the MacVector application folder).
  • Either:
    • in the EDITOR tab select the Features popup menu and select the tetracycline resistance CDS.
    • or

    • click on the TET feature in the MAP tab.

    This selects the region from 86 to 1276 in both the EDITOR and MAP tabs.

  • Now choose Edit | Copy, followed by File | New from Clipboard. A new window appears with the numbering origin set to 86. (You can also accomplish this by choosing File | New | Nucleic Acid and then Edit | Paste into the new window).
  • If you want to quickly reset the origin to “1”, you can right-click (or -click ) in the sequence area to bring up a context sensitive menu and choose Reset Origin to 1.
  • Changing the numbering of existing sequences

    You can also easily change the numbering of a sequence by selecting then dragging the small red cross that usually appears at the beginning of the sequence.
    Dragging the cross to another location designates that as the “plus 1” residue – all residues before that position will be given negative numbers.
    You can also set the first residue to a positive number. To set this, double-click on the red cross and enter a new start value in the sheet that appears.

    Screenshot 2023 03 01 at 14 00 53

    Setting the Circular Origin

    If you are working with a circular sequences then you can change the location where the sequence is “split” in the editor.

    you can just right click the sequence and set SET CIRCULAR ORIGIN from the context sensitive menu that will appear. This also changes the Map tab so that the new position is located at 12 o’clock.

    In the MAP tab you can also select a restriction enzyme site and right- click to choose Set Circular Origin.

    Posted in Tips | Tagged , | Comments closed

    MacVectorTip: Selecting the sequence from a single restriction enzyme site to the end of a linear sequence

    To see the distance between any two points on a sequence is easy. For example select one restriction enzyme site, hold down SHIFT and select the second. The start, stop and length will be shown in the Range Selector (top right corner of every window – see images below). But if you want to see the distance from a Restriction Enzyme site to the end (or beginning) of the sequence it is slightly more difficult.

    You have two options.

    • Select the RE site in the MAP tab.

    • Switch to the EDITOR tab.

    • Hold down SHIFT then click right at the end of the sequence. You can also hold down SHIFT, press the RIGHT cursor key, then the DOWN cursor key.

    RE Selection EDITOR

    You can also do this entirely within the MAP tab. Although this is slightly more complicated as you need to change the type of cursor.

    • In the Graphics Palette select the SELECT SEQUENCE cursor button

    • Now you can drag and select sequence in the MAP tab, exactly as you can do in the EDITOR tab with the other cursor.

    RESelection MAP
    Both of these options will select from the restriction enzyme site right to the end of the sequence and show the length in the top right corner.

    Posted in Tips | Tagged , , | Comments closed