Designing primers and documenting In-Fusion Cloning with MacVector

The In-Fusion Cloning kits from Takara allow you to perform ligase free cloning of PCR products into vectors in as little as 15 minutes.

You can use MacVector’s Gibson Cloning/Ligase Independent tool to design primers for In-Fusion cloning workflows. The In-Fusion kits need a 15nt overlap between the ends of a fragment and the ends of a linearized vector. The reaction uses a 3’ exonuclease to create single-stranded overhangs (Gibson Assembly uses a 5’ exonuclease, but otherwise is very similar).

Here are the basic steps;

  1. Use File | New | Gibson/Ligase-Independent Assembly… and select the second option in the dialog. This creates your Ligase Independent Cloning (LIC) project.
  2. In fusion 1

  3. Next you want to prepare your vector. There a a few ways you might do this in the lab –
    1. You might have linearized a vector with a restriction enzyme(s). If its a single enzyme, just open the vector sequence and drag the name of the enzyme from the Map tab of the vector and drop it onto the LIC project window. If you are using two enzymes select both enzymes (use [shift] to select the second one) making sure you have the bulk of the vector selected so you include markers and the replication origin, choose Edit | Copy, switch to the LIC Project window and Edit | Paste.
      Here, we’ve taken the major EcoRI – HindIII fragment from pUC19.

      In Fusion 2For this approach click on the outlined button and select the No Primer option

    2. You may want to create a vector backbone using PCR. In this case, you want to select and copy the exact DNA sequence you want to appear in the final construct. In the example below we’ve taken the sequence from the stop codon of the lacZ alpha gene in pUC19 to the ATG start codon of lacZ alpha, as if we were going to create a fusion protein starting with that ATG. In this case, we want MacVector to calculate an Automatic Primer for each end (Click the outlined button and change to Automatic Primer). As a second fragment has not yet been entered, MacVector simply generates a primer to match the other end of the vector backbone fragment.
    3. In fusion 3

    4. The third way might be if you are using a pre-prepared linearized vector from a company.
  4. Next you need to prepare your insert. Basically, just select the exact piece of DNA you want to insert, copy, then paste into the Project. Here we chose a GalK gene to insert;
  5. In Fusion 4
    The default value for overlaps is 8. Click on the Prefs toolbar button to increase the minimum to 15 as recommended for In-Fusion cloning.

    In fusion 5

  6. Finally you need to circularize the resulting product to have your vector with insert. Click the Assemble button in the toolbar and you will get your In-Fusion product in a new window.

In Fusion 6

The new sequence will have /FRAG features in the Features tab showing how the molecule was constructed.

In Fusion 7

Posted in Techniques | Tagged | Comments closed

MacVectorTip: Using tabbed sequence windows in MacVector

One of the lesser known features of macOS is the ability to store all open documents of an application in tabs. Tabs were initially introduced for the Finder, but macOS Mavericks saw them apply to supported application document windows too. MacVector has supported tabs since their introduction, however, by default the Tab Bar is turned off. 

To view the Tab Bar in MacVector then use:

VIEW | SHOW/HIDE TAB BAR

TabBar

However, to control the behaviour when you open new documents you need to use the main system preferences dialog:

SYSTEM PREFERENCES | GENERAL | PREFER TABS [NEVER | IN FULL SCREEN | ALWAYS] WHEN OPENING DOCUMENTS.

When set to ALWAYS then every time you open a new document in all supported applications, then it will open in a new tab. If you prefer multiple windows you can drag the tab out of the window to open in a new window. However, you may prefer for some windows to be tabbed and others to always open in separate windows. So if you want a particular sequence in a separate windows, then drag the tab out of the window or use:

WINDOW | MOVE TAB TO NEW WINDOW

Please note that MacVector’s Results windows are always tabbed irrespective of the SYSTEM setting. However, you can always drag a results tab out of a window to open in a new window.

Posted in Uncategorized | Tagged , | Comments closed

MacVector Tip: a complex subsequence pattern example.

MacVector’s Subsequence tools allows you to search for motifs in both protein and DNA sequences. As well as a library of existing subsequence files, such as promotors and transcription factor binding sites, you can keep a library of your own subsequence matches. Subsequences libraries are multiple patterns kept in a single file. A search will look for matches to all subsequences in that file.

We recently had an interesting and tricky question on how to search for a protein motif where one of the amino acids was one of four different residues and the second and/or third amino acids were one of two amino acids.

Looking for ambiguous residues is relatively easy. You just surround the amino acids with parentheses and it will match one of those. For example (MV) would match either methionine or valine at that position.

However, the second part of the motif is trickier. Whereas MacVector’s Subsequence search tool can have multiple parts and you can have AND or OR, it does not accept AND/OR logic. However, you can use that OR logic to have two parts. Here’s how this was done.

Our example peptide/motif we are looking for has ten amino acids. The amino acids are as follows:

  • The first position is one of five residues: arginine, lysine, aspartic acid or glutamic acid (RKDE).
  • The second and third positions are where one or both are tryptophan, tyrosine or methionine (WYM).
  • A string of any six amino acids.
  • The tenth position can be alanine, isoleucine, leucine, methionine, phenylalanine, valine, proline or glycine (AILMFVPG).

So let’s take that motif position by position and build our subsequence.

  • Position 1 – (RKDE) – would match any amino acid of those four.

  • Positions 2 and 3 – You cannot specify that “one or both” can be a match. But you can specify that one or the other will match by using two parts with OR to match. Then Xaa = X will match any amino acid.

    • (RKDE)(WYM)X would match any amino acid at the third position.

    • (RKDE)X(WYM) would match any amino acid at the second position.

  • Positions 4 to 9 – Then you can use X for the rest. So:

    • (RKDE)(WYM)XXXXXXX
    • (RKDE)X(WYM)XXXXXX
  • position 10 will be (AILMFVPG).

So our full set of matches will be:

ComplexSubsequenceMatches 1

Here’s how this can be entered in the Subsequence Editor:

ComplexSubsequenceMatches 2

The Editing Subsequences help topic covers this.

Posted in Tips | Tagged , | Comments closed

Working from home and Roaming Network licenses

The pandemic brought a sudden change to usual working routines and it is probable that home working will remain part of the working week for some time to come. Most scientific research needs physical lab time, but that’s just “pipetting”! The real science also happens when you think.. and that can be done easily at home (once you have know how to ignore the varied distractions of working from home!).

The MacVector Network license is a cost-effective licensing solution for large groups that uses Sassafras KeyServer to monitor and control MacVector usage across a local network. One disadvantage is that users must be able to connect to the central license server in order to use MacVector, requiring a VPN connection when working from home and precluding the use of MacVector in the absence of an internet connection.

A few years ago we introduced our Roaming Network License to overcome these limitations. When MacVector is unable to contact the KeyServer, the Roaming license allows MacVector to run with complete functionality for a period of up to three weeks. This means that users generally do not need to have a VPN connection to access MacVector from home and can even use MacVector in the complete absence of an internet connection.

Throughout the pandemic we have been trying to help users work remotely from home with free temporary licenses provided to IT departments. We have also converted many sites with larger network licenses, at zero cost, to roaming network licenses. We have now decided to extend this to all sites with a three or more user network license.

Over the next few weeks you will be contacted with a new License Activation Details for your current license.

We do hope that this helps your users have access to MacVector from home. Please contact MacVector Support for questions about the change or indeed about any other questions you may have.

Posted in General | Tagged , | Comments closed

MacVectorTip: Identifying, Selecting and Assembling NGS reads with a variant genotype

When analyzing/assembling/aligning NGS data, there are many scenarios where you might want to separate out the reads representing different genotypes or variant sequences. MacVector makes this very easy. Take a reference sequence and choose Analyze->Align to Reference. Now click the Add Seqs button and select and add your NGS data files. NOTE: if your reference represents just a subset of the data in the NGS files, you might want to first filter the data using Align to Folder

Here we see an Align to Reference where about half the reads have obvious SNPs compared to the reference. Note that the Dots toolbar button is toggled on to help emphasize the mismatches;

Unknown

To select all of the reads that contain the SNP, first select a few residues around that SNP, as shown above. This helps ignore the occasional “bad” sequence, though, for most purposes, you can just select the one residue. Then right-click ([ctrl]-click) and choose Select Overlapping Reads Containing Selected Sequence from the context sensitive menu. This selects every read that aligns at that location with the G at that position. Finally, right-click and choose Select Matching Pairs. Now you have the mate-pairs of the SNP reads selected and you can save all the selected reads using the right-click Export Selected Reads as FastA/Q option.

If your sequence has multiple SNPs/genotypes/repeats, you can always then choose the right-click Delete Selected Reads option to remove those reads and start again on another set.

Posted in Uncategorized | Tagged , | Comments closed

MacVectorTip: Identifying, Selecting and Assembling NGS reads with a variant genotype

When analyzing/assembling/aligning NGS data, there are many scenarios where you might want to separate out the reads representing different genotypes or variant sequences. MacVector makes this very easy. Take a reference sequence and choose Analyze->Align to Reference. Now click the Add Seqs button and select and add your NGS data files. NOTE: if your reference represents just a subset of the data in the NGS files, you might want to first filter the data using Align to Folder

Here we see an Align to Reference where about half the reads have obvious SNPs compared to the reference. Note that the Dots toolbar button is toggled on to help emphasize the mismatches;

VariantGenotype
To select all of the reads that contain the SNP, first select a few residues around that SNP, as shown above. This helps ignore the occasional “bad” sequence, though, for most purposes, you can just select the one residue. Then right-click (-click) and choose Select Overlapping Reads Containing Selected Sequence from the context sensitive menu. This selects every read that aligns at that location with the G at that position. Finally, right-click and choose Select Matching Pairs. Now you have the mate-pairs of the SNP reads selected and you can save all the selected reads using the right-click Export Selected Reads as FastA/Q option.

If your sequence has multiple SNPs/genotypes/repeats, you can always then choose the right-click Delete Selected Reads option to remove those reads and start again on another set.

Posted in Techniques | Tagged , | Comments closed

MacVectorTip: Filtering NGS Data to retrieve reads matching a known sequence

So you just got your NGS reads back from that sequencing experiment and, wow, what a HUGE amount of data. Wouldn’t it be easier to handle if you could pare that down to just the gene/plasmid/sequence(s) you are interested in? MacVector to the rescue as it can read and filter fast/q files, even if they are compressed! Open a “reference” sequence (or create a fake composite sequence of all the sequences you are interested in, it will work just as well) then choose Database->Align to Folder. Set up the dialog something like this;

FilterNGS Reads 1

Set Search Folder: to the location of your NGS data files. Make sure Hash Value is 10 or more (for speed) and Scores to Keep is at least 1,000 or more (to make sure you don’t miss any reads) and consider the scoring matrix: for sequencing data where you are expecting essentially perfect matches, the DNA identity with penalties matrix is by far the best choice. If you are looking for reads from related organisms, other .nmat files may be more appropriate to allow for mismatches.

While the search algorithm has been optimized for use on multi-CPU machines (and for Apple M1 processors), it can still take some time to run. A 100bp sequence scanned against 3 million 133nt paired Illumina HiSeq reads takes less than 10 minutes on a typical Mac laptop, but scanning a 5 Mbp E. coli genome against 100 million reads is likely to be an overnight proposition.

When complete, you can save the matching hits by selecting all of the lines in the Folder Description List tab (Edit->Select All or command-A) then save them to a pair of matching fasta/q file by choosing Database->Retrieve to File. You will see a PAIR of files with the hits – even if only one read in the original pair matched the query file, BOTH reads are saved. This is a very powerful approach to help resolve variants and inconsistencies in NGS data. You can see from the example below that we’ve reduced a pair of 118 MB files (compressed!) to just the 2x 374 KB that are of interest to us;

FilterNGS Reads 2

Posted in Techniques | Tagged , | Comments closed

MacVectorTip: Use Bowtie to remove contaminating reads prior to NGS Assembly

MacVector with Assembler can use Velvet and/or SPAdes for fast and memory efficient de novo NGS assembly of modest sized genomes (typically up to 40 Mbp or so) even on a laptop. One common task is to assemble NGS data from BAC clones.However, one problem that often arises is that the BAC DNA preparations may be contaminated with genomic E. coli DNA. SPAdes, in particular, is so efficient that it will easily assemble large contigs from the minority contaminating reads. In the example below (where E. coli genomic sequence represented over two thirds of the reads) the large genomic contigs overwhelm the few BAC contigs (the main BAC contigs are 192kb and 41kb) – the primary BAC contig is highlighted below, but there are clearly a lot of large genomic contigs that add to the confusion.

BowtieFilteringReads 1

The solution to this is to first run a Bowtie reference alignment of the raw reads against an E. coli genomic sequence. Ideally, you would use the genomic sequence of your host strain, but any genome will be effective. You can use multiple genomes with relatively little difference in processing time – 10 genomes takes only about twice as long as one genome. In this case, we used the Add Ref button to add two random E. coli genomes as reference sequences, then aligned them with the NGS data files using Bowtie with the default settings resulting in this alignment;

BowtieFilteringReads 2

The critical point here is not the actual alignments, but the fact that the reads that do NOT align to the reference sequence(s) are saved into a pair of compressed FASTQ (fq.gz) files. These should be massively enriched for non-E. coli DNA sequences. While you can save these files to disk (choose File->Export Selected Reads To…) you can also directly assemble the data in the files by selecting them and then clicking the SPAdes button. When that completes with the default settings, you should see something like this.

BowtieFilteringReads 3

Now the top two assemblies (NODE_x) are the major BAC contigs.You can also click on the ‘#’ column to sort the contigs by number of reads aligned and that (in this case) brings up additional minor BAC contigs.

Note that the approach to remove contaminating sequences via Bowtie alignment is not perfect. In particular, for paired end reads, both reads need to align to be considered “aligned”. So, if one of a pair does not align, or has “failed”, both will be placed in the Unaligned_Reads file.

Another limitation that you should be aware of is that the unaligned reads files may contain pairs of reads that map perfectly, except the distance between the two reads differs between the reference sequence and the organism that has been sequenced.

Paired sequencing reads are performed by sequencing both ends of a single fragment. The insert length, or distance between the two reads on the single fragment, is used by assembler algorithms both to improve accuracy and also where you are looking for structural variants. For example resolving tricky sections with long runs of repeats or where there are large INDELS

Bowtie uses the terms concordant for pairs of reads that map well to the reference and discordant matches for pairs where both reads map well, but the distance between the pair of reads does not match the insert length.

Unfortunately for the purpose of filtering out a genome from mixed dataset, the unaligned reads files will also include discordant reads, which means that you will never fully remove all traces of the genome you are trying to filter out. For example in the above example there are small contigs of the E.coli genome which are likely due to discordant reads in the unaligned reads files.

However, even with these two limitations the technique works extremely well to ensure a better assembly of your organism of interest.

Posted in Techniques | Tagged , , | Comments closed

Applescript: batch translation of CDS features

Apple’s AppleScript (along with Javascript for Automation) is an easy to write and easy to understand language that allows you to easily automate tasks in supported applications. Many Apple applications have a AppleScript Dictionary that defines what functions you can automate. MacVector has many such functions in its AppleScript Dictionary. You can auto annotate multiple sequences, search for sequences in Entrez and retrieve them, Translate sequences, Transcribe sequences and more. AppleScript is excellent for any task that requires any batch operations, whether a single operation on multiple input sequences, multiple operations on a single sequence, or taking a single sequence and producing multiple results. Even mundane tasks such as converting a folder of sequences into a different format.

Recently we had a support query about translating all the open reading frames in a single sequence to a set of protein sequences. This is a task very well suited to automation. Whereas MacVector can easily translate single CDS features or do a six frame translation of a sequence, repeating this for a large genome with multiple ORFs would be laborious to do manually. However, with AppleScript once a script has been written it is a simple task.

The simple workflow for the script is to go through a DNA sequence and look for every CDS feature. Once a CDS feature was found it is translated, then onto the next CDS feature and so on. Finally producing a FASTA sequence containing every protein sequence.

Incidentally many tools in MacVector rely on annotated CDS features. If your sequence does not have any CDS features, then you can use SCAN FOR…ORFS to easily add them.

Here is the core routine of the script:

AppleScript

The important lines are these two:

repeat with theFeature in (every feature of theSequence whose key is "CDS")
set theTranslation to theFeature's translation as text

All they do is tell MacVector to look for a CDS feature and then translate that open reading frame.

The full script is here:

-- Translate all CDS features in a MacVector Nucl sequence
-- Clindley@MacVector.com
-- v0.2
-- May 14, 2021 
-- added direct writing of output fasta file
use AppleScript version "2.7" -- macOS High Sierra or later
use scripting additions
set outputCount to 0
set FastaFile to ""
set inputFile to GetInputFile()
set outputFolder to GetOutputFolder()
set defaultAnswer to "All_CDS_translated.fa"
display dialog "Please enter the Output filename:" default answer defaultAnswer
set OutputFilename to text returned of result

tell application "MacVector"
    set docRef to open inputFile
   delay 0.3
  set theSequence to docRef's sequence
   with timeout of 10000 seconds -- add very long timeout to avoid timeouts when translating long sequences. default timeout is 120 seconds
       repeat with theFeature in (every feature of theSequence whose key is "CDS")
            set theTranslation to theFeature's translation as text
         set theName to the theFeature's key as text
            set outputCount to outputCount + 1 -- increment the number of CDS translated
           set FastaFile to FastaFile & "
>" & theName & " " & outputCount & " 
" & theTranslation -- includes two new lines as \n but ScriptEditor always expands these.
      end repeat
 end timeout
    close docRef saving no
end tell

set myFile to open for access (outputFolder & "All_CDS.fasta") with write permission
write FastaFile to myFile
close access myFile

set outputCount to outputCount as string
set theDialogueText to outputCount & " CDS features in " & inputFile & " were translated and saved as " & outputFolder & "OutputFilename"
display dialog theDialogueText buttons {"OK"} default button "OK" giving up after 120

on GetInputFile()
   tell application "Finder"
      --get the input fastq file
     set inputFile to POSIX path of (choose file with prompt "Select DNA sequence to translate:")
       if not (exists inputFile as POSIX file) then
           display dialog inputFile & " does not exist."
          return
     end if
     return inputFile
   end tell
end GetInputFile

on GetOutputFolder()
    tell application "Finder"
      -- now choose which folder to place the reads in
       set outputFolder to POSIX path of (choose folder with prompt "Select folder for output file:")
 end tell
   return outputFolder
end GetOutputFolder

Just open /Applications / Utilities / ScriptEditor.app and copy and paste the above code into it. Script Editor is Apple’s default AppleScript editor, although better AppleScript Editors do exist – such as Script Debugger. You can also download the script.

If you want to investigate automating MacVector more, then the MacVector application folder contains an AppleScript folder with many example scripts. If there is a repetitive task that you perform in MacVector then please do contact support and ask us if it could be automated. Either we’ll be able to assist developing a script, or we’ll be able to add support to a future release of MacVector.

Posted in Techniques, Tips | Tagged , , | Comments closed

MacVector 18.1 and the new InterProScan functional domain analysis tool

MacVector allows you to do functional domain analysis on your protein sequence using the InterProScan service. InterPro contains multiple databases of protein families, domains and motifs and InterProScan will submit a protein sequence to a search of these databases. It will also do extra analysis such as transmembrane region analysis using TMHMM and other tools.MacVector will submit your protein sequence to an InterProScan search and allows you to permanently annotate results directly back to your sequence.

However, the InterProScan service is undergoing changes which means that this tool now has limited functionality. You can still submit sequences for analysis, and you will be able to view the results. However, the graphical interface is now not functional and does not allow you to directly annotate the results back to your sequence.

There is a workaround as MacVector does allow you to annotate protein sequences with GFF files using IMPORT FEATURES. GFF, along with BED and GFT, is a standard format for storing protein/DNA annotation.

However, we have been hard at work at replacing MacVector’s InterProScan tool and are pleased to announce that the new tool is available in MacVector 18.1. MacVector 18.1 was released in February 2021 but up to now has not been available via the inline updater. MacVector 18.1.3 is now the current release and contains the all new InterProScan tool. MacVector 18.1.3 is now available for online updating within MacVector and you should be prompted to upgrade shortly.

We’ve not just adpated the tool for the backend changes but made it better. It’s now got a similar interface to MacVector’s Scan DNA For.. tool. Scan DNA For automatically displays restriction sites, missing common features, primer binding sites and putative open reading frames directly on your sequence and allows you to permanently annotate them.

With the new InterProScan tool you will submit your protein sequence in the same way to the InterProScan service using DATABASE | FUNCTIONAL DOMAIN ANALYSIS (InterPro). However, when the results come back they will be presented on your existing sequence’s Results Window.

InterProScanFor clarity this sequence, Sars-COV–2 Spike Protein, had all existing features removed.

If you hover your mouse cursor over each domain then you will see a detailed list of that domain’s database entry.

InterProScanHoverHover over a result to see the database entry

To permanently annotate a domain to your sequence, use the context menu by right clicking and choose CREATE DOMAIN FEATURE.

InterProScan ContextMenu

Do remember that many tools within MacVector use Context menus that are available with a “right click”.

If you have a maintenance contract that was active on 1st February, 2021, then you can install MacVector 18.1. You will be prompted to upgrade in due course. However, if you have turned off online updates, then you can go to MACVECTOR | CHECK FOR UPDATES.. to upgrade or download the full installer.

Posted in Releases | Tagged , | Comments closed