Working from home and Roaming Network licenses

The pandemic brought a sudden change to usual working routines and it is probable that home working will remain part of the working week for some time to come. Most scientific research needs physical lab time, but that’s just “pipetting”! The real science also happens when you think.. and that can be done easily at home (once you have know how to ignore the varied distractions of working from home!).

The MacVector Network license is a cost-effective licensing solution for large groups that uses Sassafras KeyServer to monitor and control MacVector usage across a local network. One disadvantage is that users must be able to connect to the central license server in order to use MacVector, requiring a VPN connection when working from home and precluding the use of MacVector in the absence of an internet connection.

A few years ago we introduced our Roaming Network License to overcome these limitations. When MacVector is unable to contact the KeyServer, the Roaming license allows MacVector to run with complete functionality for a period of up to three weeks. This means that users generally do not need to have a VPN connection to access MacVector from home and can even use MacVector in the complete absence of an internet connection.

Throughout the pandemic we have been trying to help users work remotely from home with free temporary licenses provided to IT departments. We have also converted many sites with larger network licenses, at zero cost, to roaming network licenses. We have now decided to extend this to all sites with a three or more user network license.

Over the next few weeks you will be contacted with a new License Activation Details for your current license.

We do hope that this helps your users have access to MacVector from home. Please contact MacVector Support for questions about the change or indeed about any other questions you may have.

Posted in General | Tagged , | Comments closed

MacVectorTip: Identifying, Selecting and Assembling NGS reads with a variant genotype

When analyzing/assembling/aligning NGS data, there are many scenarios where you might want to separate out the reads representing different genotypes or variant sequences. MacVector makes this very easy. Take a reference sequence and choose Analyze->Align to Reference. Now click the Add Seqs button and select and add your NGS data files. NOTE: if your reference represents just a subset of the data in the NGS files, you might want to first filter the data using Align to Folder

Here we see an Align to Reference where about half the reads have obvious SNPs compared to the reference. Note that the Dots toolbar button is toggled on to help emphasize the mismatches;

Unknown

To select all of the reads that contain the SNP, first select a few residues around that SNP, as shown above. This helps ignore the occasional “bad” sequence, though, for most purposes, you can just select the one residue. Then right-click ([ctrl]-click) and choose Select Overlapping Reads Containing Selected Sequence from the context sensitive menu. This selects every read that aligns at that location with the G at that position. Finally, right-click and choose Select Matching Pairs. Now you have the mate-pairs of the SNP reads selected and you can save all the selected reads using the right-click Export Selected Reads as FastA/Q option.

If your sequence has multiple SNPs/genotypes/repeats, you can always then choose the right-click Delete Selected Reads option to remove those reads and start again on another set.

Posted in Uncategorized | Tagged , | Comments closed

MacVectorTip: Identifying, Selecting and Assembling NGS reads with a variant genotype

When analyzing/assembling/aligning NGS data, there are many scenarios where you might want to separate out the reads representing different genotypes or variant sequences. MacVector makes this very easy. Take a reference sequence and choose Analyze->Align to Reference. Now click the Add Seqs button and select and add your NGS data files. NOTE: if your reference represents just a subset of the data in the NGS files, you might want to first filter the data using Align to Folder

Here we see an Align to Reference where about half the reads have obvious SNPs compared to the reference. Note that the Dots toolbar button is toggled on to help emphasize the mismatches;

VariantGenotype
To select all of the reads that contain the SNP, first select a few residues around that SNP, as shown above. This helps ignore the occasional “bad” sequence, though, for most purposes, you can just select the one residue. Then right-click (-click) and choose Select Overlapping Reads Containing Selected Sequence from the context sensitive menu. This selects every read that aligns at that location with the G at that position. Finally, right-click and choose Select Matching Pairs. Now you have the mate-pairs of the SNP reads selected and you can save all the selected reads using the right-click Export Selected Reads as FastA/Q option.

If your sequence has multiple SNPs/genotypes/repeats, you can always then choose the right-click Delete Selected Reads option to remove those reads and start again on another set.

Posted in Techniques | Tagged , | Comments closed

MacVectorTip: Filtering NGS Data to retrieve reads matching a known sequence

So you just got your NGS reads back from that sequencing experiment and, wow, what a HUGE amount of data. Wouldn’t it be easier to handle if you could pare that down to just the gene/plasmid/sequence(s) you are interested in? MacVector to the rescue as it can read and filter fast/q files, even if they are compressed! Open a “reference” sequence (or create a fake composite sequence of all the sequences you are interested in, it will work just as well) then choose Database->Align to Folder. Set up the dialog something like this;

FilterNGS Reads 1

Set Search Folder: to the location of your NGS data files. Make sure Hash Value is 10 or more (for speed) and Scores to Keep is at least 1,000 or more (to make sure you don’t miss any reads) and consider the scoring matrix: for sequencing data where you are expecting essentially perfect matches, the DNA identity with penalties matrix is by far the best choice. If you are looking for reads from related organisms, other .nmat files may be more appropriate to allow for mismatches.

While the search algorithm has been optimized for use on multi-CPU machines (and for Apple M1 processors), it can still take some time to run. A 100bp sequence scanned against 3 million 133nt paired Illumina HiSeq reads takes less than 10 minutes on a typical Mac laptop, but scanning a 5 Mbp E. coli genome against 100 million reads is likely to be an overnight proposition.

When complete, you can save the matching hits by selecting all of the lines in the Folder Description List tab (Edit->Select All or command-A) then save them to a pair of matching fasta/q file by choosing Database->Retrieve to File. You will see a PAIR of files with the hits – even if only one read in the original pair matched the query file, BOTH reads are saved. This is a very powerful approach to help resolve variants and inconsistencies in NGS data. You can see from the example below that we’ve reduced a pair of 118 MB files (compressed!) to just the 2x 374 KB that are of interest to us;

FilterNGS Reads 2

Posted in Techniques | Tagged , | Comments closed

MacVectorTip: Use Bowtie to remove contaminating reads prior to NGS Assembly

MacVector with Assembler can use Velvet and/or SPAdes for fast and memory efficient de novo NGS assembly of modest sized genomes (typically up to 40 Mbp or so) even on a laptop. One common task is to assemble NGS data from BAC clones.However, one problem that often arises is that the BAC DNA preparations may be contaminated with genomic E. coli DNA. SPAdes, in particular, is so efficient that it will easily assemble large contigs from the minority contaminating reads. In the example below (where E. coli genomic sequence represented over two thirds of the reads) the large genomic contigs overwhelm the few BAC contigs (the main BAC contigs are 192kb and 41kb) – the primary BAC contig is highlighted below, but there are clearly a lot of large genomic contigs that add to the confusion.

BowtieFilteringReads 1

The solution to this is to first run a Bowtie reference alignment of the raw reads against an E. coli genomic sequence. Ideally, you would use the genomic sequence of your host strain, but any genome will be effective. You can use multiple genomes with relatively little difference in processing time – 10 genomes takes only about twice as long as one genome. In this case, we used the Add Ref button to add two random E. coli genomes as reference sequences, then aligned them with the NGS data files using Bowtie with the default settings resulting in this alignment;

BowtieFilteringReads 2

The critical point here is not the actual alignments, but the fact that the reads that do NOT align to the reference sequence(s) are saved into a pair of compressed FASTQ (fq.gz) files. These should be massively enriched for non-E. coli DNA sequences. While you can save these files to disk (choose File->Export Selected Reads To…) you can also directly assemble the data in the files by selecting them and then clicking the SPAdes button. When that completes with the default settings, you should see something like this.

BowtieFilteringReads 3

Now the top two assemblies (NODE_x) are the major BAC contigs.You can also click on the ‘#’ column to sort the contigs by number of reads aligned and that (in this case) brings up additional minor BAC contigs.

Note that the approach to remove contaminating sequences via Bowtie alignment is not perfect. In particular, for paired end reads, both reads need to align to be considered “aligned”. So, if one of a pair does not align, or has “failed”, both will be placed in the Unaligned_Reads file.

Another limitation that you should be aware of is that the unaligned reads files may contain pairs of reads that map perfectly, except the distance between the two reads differs between the reference sequence and the organism that has been sequenced.

Paired sequencing reads are performed by sequencing both ends of a single fragment. The insert length, or distance between the two reads on the single fragment, is used by assembler algorithms both to improve accuracy and also where you are looking for structural variants. For example resolving tricky sections with long runs of repeats or where there are large INDELS

Bowtie uses the terms concordant for pairs of reads that map well to the reference and discordant matches for pairs where both reads map well, but the distance between the pair of reads does not match the insert length.

Unfortunately for the purpose of filtering out a genome from mixed dataset, the unaligned reads files will also include discordant reads, which means that you will never fully remove all traces of the genome you are trying to filter out. For example in the above example there are small contigs of the E.coli genome which are likely due to discordant reads in the unaligned reads files.

However, even with these two limitations the technique works extremely well to ensure a better assembly of your organism of interest.

Posted in Techniques | Tagged , , | Comments closed

Applescript: batch translation of CDS features

Apple’s AppleScript (along with Javascript for Automation) is an easy to write and easy to understand language that allows you to easily automate tasks in supported applications. Many Apple applications have a AppleScript Dictionary that defines what functions you can automate. MacVector has many such functions in its AppleScript Dictionary. You can auto annotate multiple sequences, search for sequences in Entrez and retrieve them, Translate sequences, Transcribe sequences and more. AppleScript is excellent for any task that requires any batch operations, whether a single operation on multiple input sequences, multiple operations on a single sequence, or taking a single sequence and producing multiple results. Even mundane tasks such as converting a folder of sequences into a different format.

Recently we had a support query about translating all the open reading frames in a single sequence to a set of protein sequences. This is a task very well suited to automation. Whereas MacVector can easily translate single CDS features or do a six frame translation of a sequence, repeating this for a large genome with multiple ORFs would be laborious to do manually. However, with AppleScript once a script has been written it is a simple task.

The simple workflow for the script is to go through a DNA sequence and look for every CDS feature. Once a CDS feature was found it is translated, then onto the next CDS feature and so on. Finally producing a FASTA sequence containing every protein sequence.

Incidentally many tools in MacVector rely on annotated CDS features. If your sequence does not have any CDS features, then you can use SCAN FOR…ORFS to easily add them.

Here is the core routine of the script:

AppleScript

The important lines are these two:

repeat with theFeature in (every feature of theSequence whose key is "CDS")
set theTranslation to theFeature's translation as text

All they do is tell MacVector to look for a CDS feature and then translate that open reading frame.

The full script is here:

-- Translate all CDS features in a MacVector Nucl sequence
-- Clindley@MacVector.com
-- v0.2
-- May 14, 2021 
-- added direct writing of output fasta file
use AppleScript version "2.7" -- macOS High Sierra or later
use scripting additions
set outputCount to 0
set FastaFile to ""
set inputFile to GetInputFile()
set outputFolder to GetOutputFolder()
set defaultAnswer to "All_CDS_translated.fa"
display dialog "Please enter the Output filename:" default answer defaultAnswer
set OutputFilename to text returned of result

tell application "MacVector"
    set docRef to open inputFile
   delay 0.3
  set theSequence to docRef's sequence
   with timeout of 10000 seconds -- add very long timeout to avoid timeouts when translating long sequences. default timeout is 120 seconds
       repeat with theFeature in (every feature of theSequence whose key is "CDS")
            set theTranslation to theFeature's translation as text
         set theName to the theFeature's key as text
            set outputCount to outputCount + 1 -- increment the number of CDS translated
           set FastaFile to FastaFile & "
>" & theName & " " & outputCount & " 
" & theTranslation -- includes two new lines as \n but ScriptEditor always expands these.
      end repeat
 end timeout
    close docRef saving no
end tell

set myFile to open for access (outputFolder & "All_CDS.fasta") with write permission
write FastaFile to myFile
close access myFile

set outputCount to outputCount as string
set theDialogueText to outputCount & " CDS features in " & inputFile & " were translated and saved as " & outputFolder & "OutputFilename"
display dialog theDialogueText buttons {"OK"} default button "OK" giving up after 120

on GetInputFile()
   tell application "Finder"
      --get the input fastq file
     set inputFile to POSIX path of (choose file with prompt "Select DNA sequence to translate:")
       if not (exists inputFile as POSIX file) then
           display dialog inputFile & " does not exist."
          return
     end if
     return inputFile
   end tell
end GetInputFile

on GetOutputFolder()
    tell application "Finder"
      -- now choose which folder to place the reads in
       set outputFolder to POSIX path of (choose folder with prompt "Select folder for output file:")
 end tell
   return outputFolder
end GetOutputFolder

Just open /Applications / Utilities / ScriptEditor.app and copy and paste the above code into it. Script Editor is Apple’s default AppleScript editor, although better AppleScript Editors do exist – such as Script Debugger. You can also download the script.

If you want to investigate automating MacVector more, then the MacVector application folder contains an AppleScript folder with many example scripts. If there is a repetitive task that you perform in MacVector then please do contact support and ask us if it could be automated. Either we’ll be able to assist developing a script, or we’ll be able to add support to a future release of MacVector.

Posted in Techniques, Tips | Tagged , , | Comments closed

MacVector 18.1 and the new InterProScan functional domain analysis tool

MacVector allows you to do functional domain analysis on your protein sequence using the InterProScan service. InterPro contains multiple databases of protein families, domains and motifs and InterProScan will submit a protein sequence to a search of these databases. It will also do extra analysis such as transmembrane region analysis using TMHMM and other tools.MacVector will submit your protein sequence to an InterProScan search and allows you to permanently annotate results directly back to your sequence.

However, the InterProScan service is undergoing changes which means that this tool now has limited functionality. You can still submit sequences for analysis, and you will be able to view the results. However, the graphical interface is now not functional and does not allow you to directly annotate the results back to your sequence.

There is a workaround as MacVector does allow you to annotate protein sequences with GFF files using IMPORT FEATURES. GFF, along with BED and GFT, is a standard format for storing protein/DNA annotation.

However, we have been hard at work at replacing MacVector’s InterProScan tool and are pleased to announce that the new tool is available in MacVector 18.1. MacVector 18.1 was released in February 2021 but up to now has not been available via the inline updater. MacVector 18.1.3 is now the current release and contains the all new InterProScan tool. MacVector 18.1.3 is now available for online updating within MacVector and you should be prompted to upgrade shortly.

We’ve not just adpated the tool for the backend changes but made it better. It’s now got a similar interface to MacVector’s Scan DNA For.. tool. Scan DNA For automatically displays restriction sites, missing common features, primer binding sites and putative open reading frames directly on your sequence and allows you to permanently annotate them.

With the new InterProScan tool you will submit your protein sequence in the same way to the InterProScan service using DATABASE | FUNCTIONAL DOMAIN ANALYSIS (InterPro). However, when the results come back they will be presented on your existing sequence’s Results Window.

InterProScanFor clarity this sequence, Sars-COV–2 Spike Protein, had all existing features removed.

If you hover your mouse cursor over each domain then you will see a detailed list of that domain’s database entry.

InterProScanHoverHover over a result to see the database entry

To permanently annotate a domain to your sequence, use the context menu by right clicking and choose CREATE DOMAIN FEATURE.

InterProScan ContextMenu

Do remember that many tools within MacVector use Context menus that are available with a “right click”.

If you have a maintenance contract that was active on 1st February, 2021, then you can install MacVector 18.1. You will be prompted to upgrade in due course. However, if you have turned off online updates, then you can go to MACVECTOR | CHECK FOR UPDATES.. to upgrade or download the full installer.

Posted in Releases | Tagged , | Comments closed

MacVectorTip: Context-sensitive Menus in MacVector

Although Apple are well known (notorious?) for always providing mice with only a single obvious button, in reality the Mac interface from early versions of MacOS all the way to macOS Big Sur, plus many Mac apps, have always used right click menus (or more accurately “context sensitive menus”) to provide extra functionality.

MacVector is no exception and there are a lot of shortcuts and tools that can be accessed using context-sensitive menus. You can access these by either right-clicking in a window, or by [ctrl] clicking if you don’t have a right-clickable mouse/trackpad. For the most part, it doesn’t matter where in each window you right-click – the same menu will appear. However, the menus may be slightly different depending on items you have selected in the current tab.

Here’s an overview of many menus that you can see throughout the various sequence editors in MacVector. At the end you will find some very useful tools for Assembly Projects and Align to Reference. But let’s first look at the different tabs of the single sequence editor.

Editor tab

Screenshot 2021 04 28 at 16 09 52

The menu shown does vary depending on whether then topology is linear or circular (Set Origin), whether the origin is non zero (Reset Origin to 1) and whether you have some sequence selected (Add to Feature).

Create Feature – creates a feature from the current selection.

Add To Feature – lets you add the current selection as an additional segment to an existing feature.

Set Origin – for linear sequences, lets you change the numbering origin to a specific positive or negative value.

Reset Origin to 1 – for linear sequences, resets the numbering origin.

Map tab

Screenshot 2021 04 28 at 16 12 36

Again this context menu is variable. Its content will change based on the object(s) you currently have selected. In this case, it was a restriction enzyme site (Add To Gel).

Hide Symbol – hides the currently selected graphical object.

Add to Gel – for Restriction Enzymes, adds the current molecule cut with the enzyme to the currently open Agarose Gel window. Will create a new gel if necessary.

Create misc_feature Feature – create a new feature based on the selection. This will vary depending upon the selected graphic object. For an RE Site it’s a misc_feature, for an open reading frame, it will be a CDS.

Set Circular Origin – for circular molecules, sets the “12 o’clock” position to the selected restriction site.

Zoom to Sequence – “zooms” the display to the current selection.

Fit to Window – equivalent to double-clicking in the window to reset the scale to match the current window size.

Unknown

Edit – edits the selected feature.

Delete – deletes the selected feature.

Create – creates a new feature with the current selection in the Editor tab.

Join – if you have two or more features selected, this will join them into a single segmented feature.

Duplicate – duplicates the selected feature.

/protein_id= – this item varies depending on the feature selected. In this case it opens a browser and attempts to find the protein with that specific ID in the NCBI database.

Results window

Screenshot 2021 04 28 at 16 01 09

In Results windows where you can annotate the results back to the original sequence, then right clicking allows you to directly annotate that results. Such analysis tools include, InterProScan, Restriction Enzyme Analysis, ORF analysis, Primer design, and more.

Hide All Results – hides all of the result graphics.

Create XXX Feature – annotates that result to the orignal sequence. The feature created depends on the analysis that has been run.

Zoom to Sequence – “zooms” the display to the current selection.

Fit to Window – equivalent to double-clicking in the window to reset the scale to match the current window size.

The Align to Reference and Contig Editors

The Align to Reference and Contig Editors are very similar, except the Align to Reference Editor has both a reference sequence AND a consensus sequence whereas the Contig Editor just has a consensus sequence. Both have an extensive context sensitive menu. Lets look at the Contig Editor first – this is the Editor you use to view contigs generated by phrap, velvet or SPAdes in the Assembly Project window.

Screenshot 2021 04 28 at 16 15 02

Export Consensus with Gaps – saves the consensus as a MacVector .nucl sequence, including any gaps.

Export Consensus without Gaps – saves the consensus as a MacVector .nucl sequence, but removes the gaps. This is the more common function you would typically use for saving the consensus. This is a shortcut to the File | Export Consensus As… function.

Export Selected Reads as FASTA – saves the currently selected reads into a single fasta formatted file.

Export Selected Reads as FASTQ – saves the currently selected reads into a single fastq formatted file. This and the FASTA version are shortcuts to the File | Export Selected Reads As... menu item.

Select Matching Pairs – if you have assembled paired reads, and if the names of the reads follow one of the typical naming conventions, e.g. appending /1 and /2 to the read names, or having “1” or “2” in the description, this will select the appropriate matching pair.

Select Overlapping Reads Containing Selected Sequence – one of the most powerful tools in the Contig Editor. Suppose you have a SNP, or a variant base in a repeat sequence. Select the residue(s) and this command will select all of the other reads that have those residues at that position. In conjunction with Select Matching Pairs, this is a great way of finding, selecting, then exporting pairs of reads representing specific SNPs or repeats in an assembly.

Circularize Consensus – None of the assembler algorithms automatically detect circular sequences. But MacVector will automatically look at the ends of contigs and if there is a significant overlap, will offer to create a new circular sequence using this command.

Here’s the menu invoked from the Align to Reference Editor.

Screenshot 2021 04 28 at 16 19 30

Most of the menu items are the same as for the Contig Editor, but there are a few additions that are greyed out in the Contig Editor

Align Selected Reads – a shortcut to align just those reads you have selected

Delete Selected Reads – does what it says

Reset (un-align) Selected Reads – the only way of actually reverting a read to its unaligned state.

Extend Reference with Selected Read – if you have a linear reference and a read extends beyond the left or right end of the sequence, this lets you extend the reference with the contents of the read. This is a great tool for building up an extended reference to create e.g. a patch sequence to use in subsequent assemblies to join together two contigs using reads you have isolated using the other Align to Reference tools.

Posted in Tips | Tagged , | Comments closed

Make the most of your Apple Silicon Mac with MacVector 18.1

We are very pleased to announce that MacVector 18.1 is now available to download. MacVector 18.1 is a Universal Binary application, which means it runs natively on both Apple Silicon M1 Macs and Intel Macs. MacVector 18.1 matches the “Big Sur” look and feel. …and for the first time in many, many years the MacVector icon has changed to match the square look of macOS Big Sur icons.

UntitledImage

MacVector 18.1 is supported on macOS Sierra to macOS Big Sur.

BigSurMV18 1

But it’s not just how it looks. We ran some benchmarks to see how much faster MacVector now runs on an Apple Silicon MacBook Pro. We compared this against MacVector 18.0, which runs using Rosetta2 emulation. In some cases you can see that the native Apple Silcon MacVector 18.1 runs 200% faster than the emulated MacVector 18.0. That’s an impressive speed increase!

M1 BenchmarksTable
MacVector 18.0 (X86_64) vs MacVector 18.1 (ARM64) (h:mm:ss).

Here are other new features in MacVector 18.1 and earlier releases.

The release of macOS Big Sur together with Apple Silicon Macs are a major Apple milestone and signify the end of OS X and the coming of macOS 11. At MacVector we are proud that MacVector 18.1 is fully compatible with macOS Big Sur and will run natively on Apple Silicon Macs!


How to upgrade to MacVector 18.1

If you have a maintenance contract that was active on 1st February, 2021, then you can install MacVector 18.1. However, because this release is so close after the release of MacVector 18.0 and the universal binary is a major change, then you must install it manually. MacVector 18.1 will not be available through the automatic in app updater for a few months. MacVector 18.1 is also no longer supported on OS X 10.10 El Capitan. You must be running macOS Sierra to macOS Big Sur.

If you have an older version of MacVector then download the trial and request an upgrade quote.

Even if you have downloaded the trial in the past then downloading a new trial will give you a fresh 21 days to evaluate MacVector.

When a trial license expires it becomes MacVector Free. So if you decide against upgrading then you can just delete the trial license and easily go back to your current version. It’s risk free as MacVector files are backwards compatible.

Posted in Uncategorized | Tagged , | Comments closed

MacVectorTip: Viewing genotype changes in Align to Reference assemblies

The latest releases of MacVector, MacVector 18.0.1 (Intel) and MacVector 18.1.1 (Intel and Apple Silicon) have some tweaks to the output of the SNPs tab in the Align to Reference assembly window. The genotypes of any SNP changes now follow a consistent standard, and short deletions are also reported. If the region containing the nucleotide change(s) is within an annotated CDS feature, then the genotype of the amino acid change is also reported. Here’s an example of some of the common SARS-CoV–2 variant “S” genes aligned against the reference Wuhan-HU–1 genome;

Align2Ref GenotypeChanges

Here you can clearly see the two characteristic deletions in the B.1.1.7 UK variant – H69_V70del and Y144del along with the two amino acid changes that are believed to be critical for the increased transmission rate – N501Y and D614G. The B.1.135 South Africa variant also contains the N501Y and D614G changes, but does not have the two deletions.

MacVector 18.1 was released at the end of February and is an Universal Binary which runs on both Intel and Apple Silicon Macs. Since it follows so closely after the release of MacVector 18.0 it is not yet available through the online updater. However, if you are running an M1 Mac then you can download the installer.

Posted in Releases, Techniques, Tips | Tagged , , , , | Comments closed