MacVectorTip: Scan For… Missing Primers: Automatically display Primer Binding Site on your sequences

MacVector’s Scan DNA For.. tool allows you to automatically display restriction enzyme recognition sites, putative ORFs, CRISPR PAM sites, missing annotation and also it will display primer binding sites from your own Primer Database in each DNA sequence that you open. Here’s an example of a couple of primers displayed on the pET 47b LIC cloning vector on each side of the LIC cloning site. You can you quickly annotate interesting binding sites with a simple right mouse click.

ScanDNA primers 1

The feature is controlled from the MacVector | Preferences | Scan DNA | Primers tab. My default, it uses the Primer Database.nsub file that you can find in the /Applications/MacVector/Subsequences/ folder, but you can point it to any .nsub file of your choosing.

ScanDNA primers 2

Posted in Tips | Tagged , , , | Comments closed

MacVectorTip: Using the Align to Reference Shading and Trimming toolbar buttons

MacVector’s Align to Reference Editor and the Contig Editor in Assembly Projects have two useful functions for visualizing assemblies. The Shading button turns on background coloring of the residues in the upper pane, based on quality values (these can be from Sanger reads or from NGS reads). The scale ranges from a dark red for poor quality, through white for “OK” quality (phred score 20 for an individual read) through dark green for high quality (phred score 40 or above). Edited residues appear blue. Coloring can be toggled on and off using the Shading button. The Trimmed button controls the visibility of trimmed (or “clipped”) residues – these are residues that do not align to the reference sequence and so are greyed out to indicate that, and they are not included in consensus calculations. However, you can choose to completely hide those residues by turning off the Trimmed button.

Here is an alignment with Shading and Trimmed both turned ON. Note the greyed out residues in the outlined areas;

Unknown

Now the same alignment with Shading and Trimmed turned off;

Unknown

Don’t forget the Dots button too. That replaces all residues that match the consensus with a “.” so to make visualisation of mismatches easier..

Posted in Tips | Tagged , | Comments closed

Importing SnapGene files into MacVector

MacVector will directly import SnapGene DNA files. You just need to use FILE | OPEN or double click the file.

This is very useful when downloading plasmid sequences from the wonderful Addgene plasmid repository.

When you import a Snapgene file the appearance will be very similar. The colors of features will be the same as the original. However, there are some aspects to the display that are not the same between the two applications. For example MacVector has multiple levels (up to six) outside and inside a plasmid and will always try to place features so that no feature overlaps another. However, Snapgene will always place features on the same two levels and so features sometimes overlap.

Here’s a plasmid sequence downloaded from the Addgene website in Snapgene format. It’s been opened directly in MacVector by double clicking the file.

Addgene+macvector

This sequence was opened using the MacVector defaults. However, MacVector’s graphics are highly customizable and you can adjust the graphical settings to display the plasmid exactly as you want. For example you may prefer features to be displayed on just those two levels instead of being distributed over the multiple levels as per the default settings.

MacVector will directly import many file formats such as common sequence formats such as Genbank, FASTA, FASTQ plus from software packages such as Sequencher Projects and Serial Cloner,

Posted in Uncategorized | Tagged , , | Comments closed

Designing primers and documenting In-Fusion Cloning with MacVector

The In-Fusion Cloning kits from Takara allow you to perform ligase free cloning of PCR products into vectors in as little as 15 minutes.

You can use MacVector’s Gibson Cloning/Ligase Independent tool to design primers for In-Fusion cloning workflows. The In-Fusion kits need a 15nt overlap between the ends of a fragment and the ends of a linearized vector. The reaction uses a 3’ exonuclease to create single-stranded overhangs (Gibson Assembly uses a 5’ exonuclease, but otherwise is very similar).

Here are the basic steps;

  1. Use File | New | Gibson/Ligase-Independent Assembly… and select the second option in the dialog. This creates your Ligase Independent Cloning (LIC) project.
  2. In fusion 1

  3. Next you want to prepare your vector. There a a few ways you might do this in the lab –
    1. You might have linearized a vector with a restriction enzyme(s). If its a single enzyme, just open the vector sequence and drag the name of the enzyme from the Map tab of the vector and drop it onto the LIC project window. If you are using two enzymes select both enzymes (use [shift] to select the second one) making sure you have the bulk of the vector selected so you include markers and the replication origin, choose Edit | Copy, switch to the LIC Project window and Edit | Paste.
      Here, we’ve taken the major EcoRI – HindIII fragment from pUC19.

      In Fusion 2For this approach click on the outlined button and select the No Primer option

    2. You may want to create a vector backbone using PCR. In this case, you want to select and copy the exact DNA sequence you want to appear in the final construct. In the example below we’ve taken the sequence from the stop codon of the lacZ alpha gene in pUC19 to the ATG start codon of lacZ alpha, as if we were going to create a fusion protein starting with that ATG. In this case, we want MacVector to calculate an Automatic Primer for each end (Click the outlined button and change to Automatic Primer). As a second fragment has not yet been entered, MacVector simply generates a primer to match the other end of the vector backbone fragment.
    3. In fusion 3

    4. The third way might be if you are using a pre-prepared linearized vector from a company.
  4. Next you need to prepare your insert. Basically, just select the exact piece of DNA you want to insert, copy, then paste into the Project. Here we chose a GalK gene to insert;
  5. In Fusion 4
    The default value for overlaps is 8. Click on the Prefs toolbar button to increase the minimum to 15 as recommended for In-Fusion cloning.

    In fusion 5

  6. Finally you need to circularize the resulting product to have your vector with insert. Click the Assemble button in the toolbar and you will get your In-Fusion product in a new window.

In Fusion 6

The new sequence will have /FRAG features in the Features tab showing how the molecule was constructed.

In Fusion 7

Posted in Techniques | Tagged | Comments closed

MacVectorTip: Using tabbed sequence windows in MacVector

One of the lesser known features of macOS is the ability to store all open documents of an application in tabs. Tabs were initially introduced for the Finder, but macOS Mavericks saw them apply to supported application document windows too. MacVector has supported tabs since their introduction, however, by default the Tab Bar is turned off. 

To view the Tab Bar in MacVector then use:

VIEW | SHOW/HIDE TAB BAR

TabBar

However, to control the behaviour when you open new documents you need to use the main system preferences dialog:

SYSTEM PREFERENCES | GENERAL | PREFER TABS [NEVER | IN FULL SCREEN | ALWAYS] WHEN OPENING DOCUMENTS.

When set to ALWAYS then every time you open a new document in all supported applications, then it will open in a new tab. If you prefer multiple windows you can drag the tab out of the window to open in a new window. However, you may prefer for some windows to be tabbed and others to always open in separate windows. So if you want a particular sequence in a separate windows, then drag the tab out of the window or use:

WINDOW | MOVE TAB TO NEW WINDOW

Please note that MacVector’s Results windows are always tabbed irrespective of the SYSTEM setting. However, you can always drag a results tab out of a window to open in a new window.

Posted in Uncategorized | Tagged , | Comments closed

MacVector Tip: a complex subsequence pattern example.

MacVector’s Subsequence tools allows you to search for motifs in both protein and DNA sequences. As well as a library of existing subsequence files, such as promotors and transcription factor binding sites, you can keep a library of your own subsequence matches. Subsequences libraries are multiple patterns kept in a single file. A search will look for matches to all subsequences in that file.

We recently had an interesting and tricky question on how to search for a protein motif where one of the amino acids was one of four different residues and the second and/or third amino acids were one of two amino acids.

Looking for ambiguous residues is relatively easy. You just surround the amino acids with parentheses and it will match one of those. For example (MV) would match either methionine or valine at that position.

However, the second part of the motif is trickier. Whereas MacVector’s Subsequence search tool can have multiple parts and you can have AND or OR, it does not accept AND/OR logic. However, you can use that OR logic to have two parts. Here’s how this was done.

Our example peptide/motif we are looking for has ten amino acids. The amino acids are as follows:

  • The first position is one of five residues: arginine, lysine, aspartic acid or glutamic acid (RKDE).
  • The second and third positions are where one or both are tryptophan, tyrosine or methionine (WYM).
  • A string of any six amino acids.
  • The tenth position can be alanine, isoleucine, leucine, methionine, phenylalanine, valine, proline or glycine (AILMFVPG).

So let’s take that motif position by position and build our subsequence.

  • Position 1 – (RKDE) – would match any amino acid of those four.

  • Positions 2 and 3 – You cannot specify that “one or both” can be a match. But you can specify that one or the other will match by using two parts with OR to match. Then Xaa = X will match any amino acid.

    • (RKDE)(WYM)X would match any amino acid at the third position.

    • (RKDE)X(WYM) would match any amino acid at the second position.

  • Positions 4 to 9 – Then you can use X for the rest. So:

    • (RKDE)(WYM)XXXXXXX
    • (RKDE)X(WYM)XXXXXX
  • position 10 will be (AILMFVPG).

So our full set of matches will be:

ComplexSubsequenceMatches 1

Here’s how this can be entered in the Subsequence Editor:

ComplexSubsequenceMatches 2

The Editing Subsequences help topic covers this.

Posted in Tips | Tagged , | Comments closed

Working from home and Roaming Network licenses

The pandemic brought a sudden change to usual working routines and it is probable that home working will remain part of the working week for some time to come. Most scientific research needs physical lab time, but that’s just “pipetting”! The real science also happens when you think.. and that can be done easily at home (once you have know how to ignore the varied distractions of working from home!).

The MacVector Network license is a cost-effective licensing solution for large groups that uses Sassafras KeyServer to monitor and control MacVector usage across a local network. One disadvantage is that users must be able to connect to the central license server in order to use MacVector, requiring a VPN connection when working from home and precluding the use of MacVector in the absence of an internet connection.

A few years ago we introduced our Roaming Network License to overcome these limitations. When MacVector is unable to contact the KeyServer, the Roaming license allows MacVector to run with complete functionality for a period of up to three weeks. This means that users generally do not need to have a VPN connection to access MacVector from home and can even use MacVector in the complete absence of an internet connection.

Throughout the pandemic we have been trying to help users work remotely from home with free temporary licenses provided to IT departments. We have also converted many sites with larger network licenses, at zero cost, to roaming network licenses. We have now decided to extend this to all sites with a three or more user network license.

Over the next few weeks you will be contacted with a new License Activation Details for your current license.

We do hope that this helps your users have access to MacVector from home. Please contact MacVector Support for questions about the change or indeed about any other questions you may have.

Posted in General | Tagged , | Comments closed

MacVectorTip: Identifying, Selecting and Assembling NGS reads with a variant genotype

When analyzing/assembling/aligning NGS data, there are many scenarios where you might want to separate out the reads representing different genotypes or variant sequences. MacVector makes this very easy. Take a reference sequence and choose Analyze->Align to Reference. Now click the Add Seqs button and select and add your NGS data files. NOTE: if your reference represents just a subset of the data in the NGS files, you might want to first filter the data using Align to Folder

Here we see an Align to Reference where about half the reads have obvious SNPs compared to the reference. Note that the Dots toolbar button is toggled on to help emphasize the mismatches;

Unknown

To select all of the reads that contain the SNP, first select a few residues around that SNP, as shown above. This helps ignore the occasional “bad” sequence, though, for most purposes, you can just select the one residue. Then right-click ([ctrl]-click) and choose Select Overlapping Reads Containing Selected Sequence from the context sensitive menu. This selects every read that aligns at that location with the G at that position. Finally, right-click and choose Select Matching Pairs. Now you have the mate-pairs of the SNP reads selected and you can save all the selected reads using the right-click Export Selected Reads as FastA/Q option.

If your sequence has multiple SNPs/genotypes/repeats, you can always then choose the right-click Delete Selected Reads option to remove those reads and start again on another set.

Posted in Uncategorized | Tagged , | Comments closed

MacVectorTip: Identifying, Selecting and Assembling NGS reads with a variant genotype

When analyzing/assembling/aligning NGS data, there are many scenarios where you might want to separate out the reads representing different genotypes or variant sequences. MacVector makes this very easy. Take a reference sequence and choose Analyze->Align to Reference. Now click the Add Seqs button and select and add your NGS data files. NOTE: if your reference represents just a subset of the data in the NGS files, you might want to first filter the data using Align to Folder

Here we see an Align to Reference where about half the reads have obvious SNPs compared to the reference. Note that the Dots toolbar button is toggled on to help emphasize the mismatches;

VariantGenotype
To select all of the reads that contain the SNP, first select a few residues around that SNP, as shown above. This helps ignore the occasional “bad” sequence, though, for most purposes, you can just select the one residue. Then right-click (-click) and choose Select Overlapping Reads Containing Selected Sequence from the context sensitive menu. This selects every read that aligns at that location with the G at that position. Finally, right-click and choose Select Matching Pairs. Now you have the mate-pairs of the SNP reads selected and you can save all the selected reads using the right-click Export Selected Reads as FastA/Q option.

If your sequence has multiple SNPs/genotypes/repeats, you can always then choose the right-click Delete Selected Reads option to remove those reads and start again on another set.

Posted in Techniques | Tagged , | Comments closed

MacVectorTip: Filtering NGS Data to retrieve reads matching a known sequence

So you just got your NGS reads back from that sequencing experiment and, wow, what a HUGE amount of data. Wouldn’t it be easier to handle if you could pare that down to just the gene/plasmid/sequence(s) you are interested in? MacVector to the rescue as it can read and filter fast/q files, even if they are compressed! Open a “reference” sequence (or create a fake composite sequence of all the sequences you are interested in, it will work just as well) then choose Database->Align to Folder. Set up the dialog something like this;

FilterNGS Reads 1

Set Search Folder: to the location of your NGS data files. Make sure Hash Value is 10 or more (for speed) and Scores to Keep is at least 1,000 or more (to make sure you don’t miss any reads) and consider the scoring matrix: for sequencing data where you are expecting essentially perfect matches, the DNA identity with penalties matrix is by far the best choice. If you are looking for reads from related organisms, other .nmat files may be more appropriate to allow for mismatches.

While the search algorithm has been optimized for use on multi-CPU machines (and for Apple M1 processors), it can still take some time to run. A 100bp sequence scanned against 3 million 133nt paired Illumina HiSeq reads takes less than 10 minutes on a typical Mac laptop, but scanning a 5 Mbp E. coli genome against 100 million reads is likely to be an overnight proposition.

When complete, you can save the matching hits by selecting all of the lines in the Folder Description List tab (Edit->Select All or command-A) then save them to a pair of matching fasta/q file by choosing Database->Retrieve to File. You will see a PAIR of files with the hits – even if only one read in the original pair matched the query file, BOTH reads are saved. This is a very powerful approach to help resolve variants and inconsistencies in NGS data. You can see from the example below that we’ve reduced a pair of 118 MB files (compressed!) to just the 2x 374 KB that are of interest to us;

FilterNGS Reads 2

Posted in Techniques | Tagged , | Comments closed