Working from home – Getting your sequence into MacVector

During the Covid-19 pandemic we want to ensure that you have access to the MacVector license that you would use in the lab, if you are working from home. If you use MacVector, even an older version, and are having trouble activating it (or installing it) at home email MacVector Support and we will help. If you are in anyway connected with COVID19 research, then please have our thanks, and have an annual license free of charge.

For the next few weekly tips we want to help familiarize you with the wide range of functionality in MacVector. You may have just been using MacVector to design PCR primers, but since you do not have a PCR machine on the kitchen table, then here’s what else MacVector can do for you. This week we’ll cover the basic steps of bringing sequences into MacVector.

Starting Point Dialog

StartingPointDialogue

When you start MacVector for the first time you will be presented with the Starting Point Dialog. This allows you to create a New MacVector file, to open a recently opened file or to Open an existing file

You can also create a Gibson Assembly project, an Agarose Gel, an Assembly Project or Align sequences to a reference.

Double clicking on a file or dragging it to the Dock

Usually this will just work. However, this is always reliant upon the operating system deciding which application to use to open a file. If you have multiple sequence analysis applications installed, or your file has no file extension you may find that the file opens in the wrong application. If you right click and choose OPEN WITH you can choose which application to use.

Using the FILE > OPEN menu

This is the traditional, always works, option. You do not have to worry about file extensions as well. MacVector will look inside the file and try to “automagically” use the most appropriate file format importer.

Entrez Browser

MacVector has an Entrez browser that lets you search the online Entrez GenBank database using keywords and retrieve matching sequences either directly to disk or to open directly in MacVector. You can access this via the Database | Online Keyword Search for Sequences (Entrez) menu item.

New From Clipboard

If you have copied any sequence data to the clipboard then you can directly open this using the New From Clipboard function. This also recognises Genbank formats if you have copied a file directly. For example many websites allow you to download a sequence by displaying a Genbank sequence. You can easily copy such sequences and directly paste into MacVector.

Import Features from a Genome Browser.

If you have a blank sequence and no existing curated sequences you can use Import Features from GFF/GFT and BED files to bring features in from one of many Genome Browsers that allow you to export date in a BED/GFT or GFF file. This function allows you to annotate an unannotated or partially annotated sequence with annotations (or features) contained within a BED/GFF/GTF/GFF3 file. Don’t forget the Auto Annotation. tool scanning your sequences against your own annotated ones either.

Posted in Tips | Tagged , | Leave a comment

Working From Home – common workflows

During the Covid-19 pandemic we want to ensure that you have access to the MacVector license that you would use in the lab, especially if you are working from home. If you use MacVector, even an older version, and are having trouble activating it (or installing it) at home email MacVector Support and we will help. If you are in anyway connected with COVID19 research, then please have our thanks, and have an annual license free of charge.

If you are working from home then here’s a series of blog posts to familiarize you with tools that you may never have used before. You may have just been using MacVector to design PCR primers, but since you do not have a PCR machine on the kitchen table, then here’s what else MacVector can do for you. This week we’ll cover the many common workflows that the molecular biologist may need.

Cloning constructs

Select two restriction sites and click DIGEST. Drag the digested fragment from the Cloning Clipboard to your vector and click LIGATE.

Gibson Assembly/Ligase Independent Cloning

FILE | NEW | GIBSON/LIGASE INDEPENDENT ASSEMBLY. Choose the type of project, add a cloning vector, then drag a gene from a sequence to clone.

Designing primers

Select a short region of sequence. Click ANALYZE | QUICKTEST PRIMER. Add a restriction site or mutation, slide the primer until the oligo is optimal.

Testing primers

Open your template sequence, Go to ANALYZE | PRIMER DESIGN/TEST(PAIRS). For the left primer click USE THIS Primer. Type or paste in your forward primer, Repeat for the reverse primer. Click TEST

Agarose Gel Simulation

Select FILE | NEW | AGAROSE GEL, Drag a restriction site from the Map tab of a sequence and drop on the Agarose Gel window

Comparing sequences

Select FILE | NEW | ALIGNMENT, Click ADD SEQS and then ALIGN.

List genetic differences between one organism and another related strain

Open two genomes and run ANALYZE | COMPARE GENOMES BY FEATURE..

Searching for your sequence.

Select DATABASE | ONLINE KEYWORD SEARCH (ENTREZ) , enter the accession number of your favorite gene and hit search. Double click the hit to open it directly in MacVector.

Annotating sequences

Right click (or CTRL+left Click) on any faded ORF or Feature on your sequence. Select CREATE FEATURE.

To align a small sequencing project against a reference.

Run FILE | NEW | ALIGN SEQUENCES TO A REFERENCE... Choose a reference sequence, then add your trace files from the sequencing facility and click ALIGN.

De novo assembly of sequencing reads

Run FILE | NEW | ASSEMBLY PROJECT, Choose a reference sequence, add your sequencing reads and also any reference sequences.

Posted in Tips | Tagged , | Leave a comment

Working From Home – Comparing sequences

During the Covid-19 pandemic we want to ensure that you have access to the MacVector license that you would use in the lab, especially if you are working from home. If you use MacVector, even an older version, and are having trouble activating it (or installing it) at home email MacVector Support and we will help. If you are in anyway connected with COVID19 research, then please have our thanks, and have an annual license free of charge.

For the next few weekly tips we want to help familiarize you with the wide range of functionality in MacVector. You may have just been using MacVector to design PCR primers, but since you do not have a PCR machine on the kitchen table, then here’s what else MacVector can do for you. This week we’ll cover different types of alignments and what you would use them for:

Multiple Sequence Alignments

The tool to which most people first turn to for aligning sequences is the Multiple Sequence Alignment tool. This allows you to align multiple DNA or protein sequences using either Muscle, Clustalw or T-Coffee. This functionality is most suited for protein alignments, or for nucleic acid sequences where you are interested in examining phylogenetic relationships. Remember that you can also align DNA sequences based on their protein translations.

  • FILE > NEW > PROTEIN ALIGNMENT
  • EDIT > ADD SEQUENCES FROM FILE..
  • Click on the Align toolbar button

Confirming a small sequencing project against a reference.

At some point all molecular biologists have to verify that a sub cloning procedure has worked or a new construct is correct. The Align to Reference tool is perfect for quickly verifying a set of sequencing reads against a reference

  • Go to ANALYZE > ALIGN TO REFERENCE
  • Add your reads and click ALIGN

Aligning cDNA/mRNA sequences against a genomic template.

Align to Reference can also be used to align cDNA clones against a genome sequence. The steps are similar – use the genomic sequence as the reference, then add one or more cDNA clones to the alignment.

  • Go to ANALYZE > ALIGN TO REFERENCE
  • Add your reads
  • Click ALIGN with the algorithm set to cDNA ALIGNMENT

Using a Dot Plot to quickly compare two sequences.

Dot Plot is great for identifying weak regions of similarity between two sequences. Dot Plots are also the best way of identifying sequence rearrangements

  • Open the two sequences to compare.
  • ANALYZE > CREATE DOT PLOT

Comparing two genomes.

Compare Genomes will compare two related annotated genomes (or smaller sequences) to identify and list, identical, similar and weakly similar features along with missing features.

  • Open the two sequences to compare.
  • ANALYZE > COMPARE GENOMES

Internet BLAST

Use this to identify and align a sequence to the databases at the NCBI using the BLAST algorithm. You can download any hits directly to your Desktop including all annotations.

  • Open your sequence
  • DATABASE > INTERNET BLAST SEARCH
  • Choose the database and click OK
  • To retrieve hits select the hits and click TO DESKTOP
  • To download and save hits, select hits and click TO DISK

Aligning a sequence against a folder of local sequences

Align to Folder allows you to scan a local folder full of sequences and align them using the FastA alignment algorithm. Its kind of like a local BLAST, but more sensitive.

  • Open your Sequence.
  • Run DATABASE > ALIGN TO FOLDER
  • Choose a folder of sequences and click ALIGN

Sequence Assembly

This requires our optional Assembler add-on. Use this if you want to align ten or more DNA sequences with the idea of assembling them into a longer sequence with a consensus (de novo) or for aligning reads against a reference (resequencing).

  • Go to FILE > NEW > ASSEMBLY PROJECT
  • Add your sequencing reads and also any reference sequences.

Primer Database

Primer Database.lets you automatically map your primer collection to any sequence you open. Primers can have tails and mismatches.

  • Go to ANALYZE > PRIMER DATABASE SEARCH

or

  • Go to MACVECTOR > PREFERENCES > SCAN DNA FOR.. PRIMER DATABASE
Posted in Tips | Tagged , | Leave a comment

101 things you (maybe) didn’t know about MacVector: #52 – Data mining to identify and analyze pangolin CoV-2 analogs to the human COVID-19 virus

One of the most underrated features in MacVector is the Database | Align to Folder function. You can use this as a more sensitive version of a local BLAST search to find sequences in a “database” that match a query sequence. But in this case the “database” is simply a collection of your own sequences, stored in one or more folders on your computer, or on a locally accessible server. More importantly, in these days of huge NGS data sets, the folders can contain fasta or fastq formatted files, and the files can even be compressed using the gzip algorithm. MacVector understands paired-end reads and can retrieve both reads of a pair even if only one of therm matches the query sequence.

As an example of the power of this approach, we used MacVector to retrieve reads matching the human SARS-CoV-2 genome from a collection of RNA-Seq reads from pangolins, assembled those reads into a viral genome and compared the sequence and encoded proteins to published bat and human isolates of SARS-CoV-2. You can read more about how that was accomplished and the results of the analysis in a published Technical Note.

You can use this approach to scan RNA-Seq reads for specific genes, or to identify reads in total genome sequencing experiments that extend sequences of interest, or to retrieve plasmids or bacteriophages. We’ve even used it to retrieve RNA-Seq reads using a protein sequence from a distantly related organism as a query. Here’s how to set up a typical search;

Align2FolderSetup

First make sure you have chosen a suitable Search Folder – you can have a hierarchy of folders and ask MacVector to search recursively through all the enclosed folders. Also be sure to check the paired-end reads checkbox if any of your files represent paired end reads.

Increasing the Hash Value speeds up searches dramatically, at the expense of more memory usage. The current maximum is 14, which means that you need at least a 14 residue perfect match before a potential match will even be considered. If you expect a lot of hits, increase Scores to Keep to a large value.

Finally, the Scoring Matrix can be critical. If you are looking for matches using a query sequence from a related organism, you should likely use DNA database matrix.nmat so that you can retrieve weak matches. However, if you trying to extend a genomic sequence where you are expecting essentially perfect matches, though perhaps with just short overlaps at the ends of reads, then DNA identity with penalties matrix.nmat is tuned for those searches.



This is an article in a long running series of tips to help you get the most out of MacVector. If you want to get notified every time a new tip gets published, follow us @MacVector on twitter (or check the feed for the hashtag #101MacVectorTips) or like us on Facebook.

Posted in 101 Tips, Tips | Tagged , , , | Leave a comment

Working from home with MacVector during the COVID-19 pandemic

A lot of MacVector users are now at home getting used to a new way of working. The MacVector team are distributed throughout the US and Europe and we are used to remote working. However, for those new to working from home, it’s a LOT different to working back in the lab with your colleagues!

(See how we’ve been making the most of the lockdown to use existing sequencing archives to assemble a new Pangolin SARS-CoV-2 genome.)

We want to help make it easier to use MacVector:

  • If you currently use a network license and are struggling to access this from home, then email MacVector Support. We can help you in various ways.
  • If you normally use a Mac desktop in the lab, then we can give you a temporary license to activate on any home Mac you might have.
  • Even if you use an old license of MacVector, then we will give you a temporary license of MacVector 17.5.
  • If you use a Standard license, then you can already activate that on any home Mac. However, if you have forgotten your license activation details then email MacVector Support for a reminder.
  • Finally if you are in anyway connected with COVID19 research, then please have our thanks, and have an annual license free of charge.

    Our thoughts are with all those affected by COVID–19, be it directly or indirectly. Our particular thoughts are with those on the front line, from health care workers to those researchers working so hard on behalf of all humanity to find a vaccine and a treatment.

    Posted in General | Tagged | Leave a comment

    Primer validation with MacVector: Primer3, Covid19 and primer design

    The CDC recently published diagnostic real-time primers for identification of SARS-CoV–2 in any person suspected of having COVID–19.

    Unfortunately as pointed out on the Biome Informatics blog these primers have issues that should have easily been detected had the primers been tested using a good quality primer testing tool (the linked blog post uses Primer3). What’s more is that if a good quality primer design application, such as MacVector’s Primer3 tool, had been used from the beginning, then these issues would never have occurred.

    Some primer design software tools can be difficult to use. However, MacVector provides an easy to use interface to Primer3. Designing a pair of primers to amplify a target can be as simple as just three mouse clicks (click on the target gene, click ANALYZE | PRIMER3 and then click OK).

    MacVector also has QuickTest primer for visual design and testing of primers. QuickTest Primer simplifies primer design by showing your primer and its statistics in realtime. Does your primer have a hairpin? Nudge it along your template until the hairpin goes? Want to add a restriction site? Then add one and again nudge your primer to optimise the oligo.

    MacVector also has built in tools for helping find suitable targets to amplify using Blast or scanning local sequences too.

    COVID19 QTPrimer 2

    Using MacVector’s Quicktest Primer tool you can see the obvious hairpin in the reverse primer of the CDC first set of primers.

    Here’s a demonstration workflow on how to use MacVector’s Primer Database and Quicktest Primer tools to look at these primers:

    Download the primers

  • Download the CDC primer list (in a MacVector Primer Database file).
  • Open MACVECTOR | PREFERENCES | SCAN DNA | PRIMERS.
  • Click SET DATABASE FILE and choose the downloaded file.
  • Download one of the many sequenced SARS-CoV-2 genomes

  • Open MACVECTOR | ENTREZ and search for “organism=SARS-CoV-2” and “all fields = genome”.
  • There are a lot of submitted genomes from this organism (94 as of March 16, 2020).

  • Double click on a hit to open the genome directly in MacVector.
  • Now switch to the MAP tab.
  • You will see the primers annotated on the sequence (faded as they are dynamically shown).

  • Open ANALYZE | QUICKTEST PRIMER
  • Click the Insert from Primer Database button (as seen in the screenshot below).

    QT Primer

    You will see the primer displayed in the QT Primer interface showing any flaws. Most of the primers have hairpins.

    SARS CoV 2 QTPrimer

    Designing Primers

    Here’s an example workflow of how you would design primers to detect the N gene.

    Please note that for this demonstration workflow we have designed primers against the N gene, that had been selected by the CDC as a target. For real world purposes you would need to select an appropriate target. One way would be to align all sequenced human CoV-2 genomes (or whatever organism you are looking for), then look for highly conserved regions. Particularly in regions of the virus thought to be critical for pathogenicity. Those conserved regions can then be used as targets to design primers. It would also be useful to also select regions that would distinguish SARS-CoV-2 from the original SARS-CoV-1.

  • Select the N Gene in the SARS-CoV-2 genome (the one you downloaded above).
  • Run ANALYZE | PRIMER DESIGN/TEST (Primer3)
  • Change the setting to REGION TO SCAN and enter a product size of 400 to 500
  • Check Hybridization Probe Sequence.
  • Ensure all primers are set to Find Primer and not USE THIS PRIMER.
  • For this workflow example you do not need to change any advanced options.
  • Click OK
  • Designprimers Primer3 settings

    In the Results window you will see a MAP with the generated primer pairs and products.

    NewPrimersPrimer3

    The initial left primer still has a hairpin. You could tweak Primer3 settings. However, you can quickly tweak the primer itself using Quicktest Primer. If you slide the primer three bases to the left then the hairpin will go.

  • Select the LEFT primer in the Results Spreadsheet tab and choose ADD TO PRIMER DATABASE
  • Choose a suitable name.
  • Repeat for the RIGHT primer and the PROBE.
  • Open QuickTest Primer
  • Click INSERT FROM PRIMER DATABASE and test each primer.
  • To slide a primer left or right click on the cursor buttons either side of the primer.
  • Don’t forget to save any modified primer. You may also want to test the primers again using primer3. If you save the primers to your Primer Database then this is easy to do.

    https://tomeraltman.net/2020/03/03/technical-problems-COVID-primers.html

    https://www.cdc.gov/coronavirus/2019-ncov/lab/rt-pcr-panel-primer-probes.html

  • Posted in Techniques, Tips | Tagged , , | Leave a comment

    MacVector Training workshop at The Crick: Tuesday 17th March 2020

    Unfortunately due to the Covid19 outbreak this workshop is now cancelled. Keep safe and healthy everybody.

    The workshop, previously cancelled last year, is now rescheduled:

    Room: HR Training Room 01.2162. Floor: 1 
    Date: 17th March 2020 – from 9:30 – 11:30

    Chris Lindley of MacVector, Inc. will be giving a training workshop for both novice and advanced users of MacVector at The Crick, reviewing both basic and advanced functions. In particular new tools introduced over the last few versions.

    The format is very informal and participants are very much encouraged to direct the workshop towards areas of the most interest.

    Laptops will be provided for users to work through examples and tutorials as they are demonstrated. Workbooks will also be provided to allow attendees to work through during the workshop and afterwards.

    The intention is that all attendees will learn at least one new and useful tool or tip. The workshop is two hours, but Chris will be available in the room for further discussion until 13:00.

    Please register for the workshop by emailing Chris (drop-ins on the day will be very welcome, but will not be guaranteed access to a laptop or a workbook).

    See what MacVector can do for your lab.

    UnknownGibsonCloning

    Posted in Uncategorized | Tagged , | Leave a comment

    Make more of your alignments with MacVector 17.5

    Our latest release MacVector 17.5 gives you new tools to make the most of your alignments.

    It displays shared domains in protein alignments to visualize the relationships between aligned proteins. It introduces Flye for de novo assembly of PacBio and Oxford Nanopore long reads and a slew of enhancements to the Contig and Align to Reference Editors.

    As ever there are a slew of minor enhancements, bug fixes and changes to better support the latest releases of macOS.

    Outlining Shared Domains in Aligned Sequences

    Outline shared aligned domains Multiple sequence alignments now retain feature information and can use this to outline shared domains in the Picture output tab. You can set the colors of features in the individual sequence documents in the usual way and these are used for the outlines.
    banner

    There is a feature display mode in the Editor tab where you can see the extent and color of the features. When you switch to the Picture tab, you will see colored outlines around the shared domains;

    Prions text

    de novo Assembly of PacBio and Oxford Nanopore reads with Flye

    Flye is an assembler algorithm tuned to assemble poor quality long reads such as those produced by PacBio and Oxford Nanopore sequencers. Because these reads tend to be very error prone, MacVector 17.5 also includes an optional polishing step using Racon. With typical bacterial genome assemblies it is fairly common to be able to assemble reads into a single full-length genome contig.

    Contig and Align to Reference Editor Enhancements

    There have been a number of enhancements to these editors, primarily to aid in visualizing edits and quality values and to “clean up” the visual appearance of alignments.

    Residue Background Colored by Quality

    There have been several changes to provide improved support for quality values of de novo contigs and reference assemblies.
    A Shading toolbar button lets you turn on coloring based on quality and edited residues are visualized with a blue background. Edited residues are always given a phred quality value of 99 – these residues are given a blue background.

    Base Calling with Phred

    You can now directly run phred on Sanger sequencing trace files in the Align to Reference Editor by clicking on the Basecall toolbar item with the appropriate sequences selected;

    Assembly qualityscorecolouring 2 2x 400

    Editing Enhancements

    There are some new context-sensitive menu items in the Align to Reference Editor tab

    Delete Clipped Residues – deletes any greyed-out (“clipped” or “trimmed”) residues. While these are ignored by the consensus calculation, some users prefer to delete them for a cleaner looking alignment.

    Close Gaps by Deleting Residues – you’ll often see gaps in the consensus where one or more reads has an additional erroneous inserted residue. This menu item removes the extra residues from the read, cleaning up the visual appearance of the alignment.

    Nudge reads – Select the name of the sequence you want to nudge and use the left/right arrow keys to move it around. If you have problematic alignments where you need to physically insert residues or gaps, hold down the

    MSADomainEditor 2x

    Miscellaneous Enhancements

    There have been a large number of minor enhancements. Some, such as reworking code behind the scenes to replace deprecated Apple functions and refactoring code for better stability and performance help ensure that MacVector will continue to work on upcoming releases of macOS and take advantage of improved hardware. There have also been improvements to Dark Mode support in many area and much better handling of the labels in crowded Map views.

    How to upgrade to MacVector 17.5

    If you have active maintenance and are running MacVector 15.5.4 or later then you will be notified about the new release. To install this version, you must have a maintenance contract that was active on 1st February, 2020. You must also be running MacVector 15.5.4 and OS X 10.9 Mavericks or later.

    If you have an older version of MacVector then download the trial and request an upgrade quote.

    Even if you have downloaded the trial in the past then downloading a new trial will give you a fresh 21 days to evaluate MacVector.

    When a trial license expires it becomes MacVector Free. So if you decide against upgrading then you can just delete the trial license and easily go back to your current version. It’s risk free as MacVector files are backwards compatible.

    Posted in Releases | Tagged , , , | Leave a comment

    Importing BAM files into an Assembly Project

    You can import BAM files, containing reads mapped against a reference sequence, into a MacVector Assembly Project. As well as the BAM file(s) you will also need the original reference sequence the reads were mapped against. FASTA is fine, but an annotated reference is better for visualisation.

    The tool needed is called ADD CONTIG. This is one of the toolbar buttons in an Assembly Project:

    First create a new assembly project.

    • FILE > NEW > ASSEMBLY PROJECT

    • click ADD REF to add the reference sequence.

    • Use ADD CONTIG to import your BAM/SAM file.

    Then you need to associate the BAM file(s) with the reference:

    – select the reference and an imported contig(BAM file).

    • Right click on and select UNITE REFERENCE WITH CONSENSUS SEQUENCE

    You can optionally also generate a report on any variants (either at the previous step or a later stage).

    • Right Click and choose GENERATE VCF

    If you import multiple BAM files against the same reference sequence you can also graphically compare these datasets with the Coverage Tab (third tab along in the Assembly Project window).

    CoverageTabx2

    Incidentally if you need to access the BAM files from within MacVector’s Assembly Projects then you can right click on an Assembly Project and view the contents.

    Simple DNA sequence assembly on a Mac with MacVector with Assembler.

    MacVector has a software plugin called Assembler that integrates directly into the DNA sequence analysis toolkit and provides DNA sequence assembly functionality. Dealing with sequencing reads has never been easier.

    MacVector includes no less than five different assemblers just a few mouse clicks away from your sequencing reads. Phrap assembles Sanger sequencing reads or existing contigs, while there are three separate NGS de novo assemblers – Velvet for short read datasets, Flye for Nanopore and PacBio long reads and SPAdes for mixed assemblies. For reference assembly Bowtie2 can map millions of sequencing reads against genomic reference sequences and is ideal for RNASeq gene expression analysis data too.

    Assembler is tightly integrated into MacVector. It’s easy to bring sequencing reads into MacVector, and it’s just as easy to directly design primers for a contig, run BLAST searches on a contig, and much more, right from your desktop!

    Posted in Tips | Tagged , , | Comments closed

    Calculating the optimal PCR annealing temperature

    MacVector has several tools to help with primer design and testing. The Analyze | Primer Design/Test (Pairs) function uses the popular Primer3 algorithm to find suitable pairs of primers to amplify specified segments of DNA. You can also enter pairs of pre-designed primers and test their suitability for use in PCR. In both cases, the Tm of each primer is reported, along with the optimal annealing temperature (Ta).

    Unknown

    The optimal annealing temperature (degrees C) is calculated as follows (from W. Rychlik, W.J. Spencer, and R. E. Rhoads, Nucl.Acids.Res. 18:6409–6412(1990));

    (Lowest Primer Tm x 0.3) + (Product Tm x 0.7) - 14.9

    This means that you can get an optimal annealing temperature for a PCR experiment that is significantly different from the optimal annealing temperature for an individual primer (e.g. in a sequencing experiment) because of the large influence of the product in the calculation.

    Posted in Tips | Tagged , , | Comments closed