Virtual Gene Cloning from NGS RNA-Seq Data

The NCBI Sequence Read Archive (SRA) database is a huge resource of Next Generation Sequencing experimental data. Many groups and laboratories deposit data here that they have generated for their own specific projects that can be datamined for other unrelated projects with a minimum of effort.

MacVector contains a number of powerful tools that can be used to extract and analyze specific sequences from large quantities of NGS data. We recently used these tools to clone the sequence of 19 distinct C2H2 Zinc Finger proteins from NGS RNA-Seq data prepared from root tissue of the Aloe vera plant.

Unknown

The basic steps to do this were;

  • Use Align to Folder to find and extract all pairs of reads that could potentially encode the conserved QALGGH domain from C2H2 Zn Finger proteins
  • Assemble the reads using phrap, velvet and/or SPAdes to generate multiple contigs
  • Analyze contigs to identify and translate protein-coding ORFs
  • Extend contigs when required using additional rounds of Align to Folder, contig assembly and Align to Reference
  • Annotate proteins using the built-in InterProScan function
  • Align proteins using ClustalW and visualize the shared QALGGH domains

The full tutorial is available as a PDF and the required data files are also available to download direct from the SRA.

Posted in Tutorials | Tagged , , , | Leave a comment

Working From Home: An overview of primer design workflows in MacVector.

Working from home?. We want to help familiarize you with the wide range of functionality in MacVector that you may never have used before. Here’s an overview of workflows for designing, testing, documenting and storing primers. You may not have a PCR machine on the kitchen table, but why not take the time to store all your lab’s regularly used primers in MacVector’s Primer Database.

Amplifying a gene

You can design a set of primers to amplify a gene in as little as three mouse clicks.

  1. Open your sequence.
  2. Open the MAP view, and click on a feature.
  3. Go to ANALYZE | PRIMER DESIGN/TEST(PAIRS).
  4. Click OK.

Design a single targeted primer with a tail.

QuickTest Primer tool gives great flexibility for designing primers with tails or mismatches.

  1. Select a 20 bp region around the location you want your primer to be.
  2. Click ANALYZE > QUICKTEST PRIMER.
  3. Slide the primer along your template until the oligo is optimal.
  4. Add a restriction site or mutation in the optional tail.
  5. Hover over a green (or red) putative restriction site to view the change to your sequence.
  6. Double click on that putative site to add to your primer. Watch how the coding is changed.
  7. Click Add to Database.. to save your primer.
MacVector

Test a pair of primers.

You can map a pair of primers against your template to see how well they would amplify your target.

  1. Open your template sequence
  2. Go to ANALYZE > PRIMER DESIGN/TEST(PAIRS).
  3. For the left primer click USE THIS Primer
  4. Type or paste in your forward primer
  5. Repeat for the reverse primer
  6. Click TEST

To save a primer from QT Primer to the Primer Database.

  1. Open ANALYZE > QUICKTEST PRIMER (INDIVIDUAL)
  2. Design your primer
  3. Click ADD TO…
  4. Give the primer a name and add a comment. Click OK

To save a primer from Primer Design to the Primer Database.

  1. Open ANALYZE > PRIMER DESIGN/TEST (PAIRS)
  2. Design your primer pair
  3. In the spreadsheet right click on a primer
  4. Choose ADD PRIMER TO DATABASE
  5. Give the primer a name and add a comment. Click OK

To use the Primer Database Search:

  1. Open your sequence
  2. Select ANALYZE > PRIMER DATABASE SEARCH
  3. Choose parameters and click OK

Design a primer to match an existing primer from the primer database

The Primer database allows you to store your own collection of primers. You can design new primers to match regularly used ones.

  1. Open your template sequence
  2. Switch to the Map tab and select the region you want to amplify.
  3. Go to ANALYZE | PRIMER DESIGN/TEST(PAIRS).
  4. For the left primer click USE THIS Primer
  5. Use the drop down menu to enter the existing primer from the primer database
  6. Click OK
Posted in Techniques, Tips, Tutorials | Tagged , , , | Comments closed

Working from home: An overview of assembling sequence data with MacVector and Assembler

Working from home Here’s a series of blog posts on the wide range of functionality in MacVector that you may never have used before. This is an overview of the many different sequence assembly tools within MacVector. The MacVector team used these tools to mine existing sequencing archives to assemble a new Pangolin SARS-CoV–2 genome.

To assemble various types of sequencing reads, follow these steps.

Unknown

  • Choose File | New | Assembly Project to create a new empty project file.

Then follow one of the following:

To create a de novo assembly from Sanger reads

  • Click on the Add Reads tool bar button, then select the sequence files you wish to assemble and click on the Open button. Read(s) file(s) can also be drag and dropped on the open Assembly Project window.
  • Click Phred to basecall sequences in the project. Note that if no sequences are selected, phred will be run on ALL of the files in the project.
  • Click Phrap to assemble the reads.

To create a de novo assembly from NGS datasets

  • Click on the Add Reads tool bar button, then select the sequence files you wish to assemble and click on the Open button. File(s) can also be drag and dropped on the open Assembly Project window. Paired reads files are automatically detected.
  • Choose either SPAdes or Velvet to assemble the reads.

To create a de novo assembly from PacBio or Nanopore datasets

  • Click on the Add Reads button to add PacBio or Oxford Nanopore reads in fasta, fastq or gzipped (.gz) format. File(s) can also be drag and dropped on the open Assembly Project window.
  • Double-click on the Status column of the imported read file(s) and set the data type to “PacBio” or “Oxford Nanopore” as appropriate.
  • Choose Flye to assemble all of the sequences of the project.

To create a reference assembly

  • Click the Add Reads button, then select the sequence files you wish to assemble and click on the Open button.
  • Click Add Ref, select the sequence file(s) you wish to align the reads against and click on the Open button.
  • Click Bowtie to map all read files against all of the reference sequences in the project.

Not sure if you have Assembler? Choose MacVector | About MacVector. If the screen that appears says MacVector with Assembler, Pro Edition then you have it. If not, you can sign up for a fully functional 21 day trial version

Read More….

Posted in Uncategorized | Tagged , , | Comments closed

Working From Home: accessing video tutorials of common workflows inside MacVector

During the Covid-19 pandemic we want to ensure that you have access to the MacVector license that you would use in the lab, if you are working from home. If you use MacVector, even an older version, and are having trouble activating it (or installing it) at home email MacVector Support and we will help. If you are in anyway connected with COVID19 research, then please have our thanks, and have an annual license free of charge. Here’s an overview of the HOW DO I menu that lists common workflows directly inside MacVector.


MacVector has a wide array of different tools for working with protein and DNA sequences. Nonetheless, since MacVector has always been designed with the Mac’s simplicity in mind, getting started with simple tasks is quick. However, making the most of the many functions and getting familiar with MacVector’s full range of tools does require more help. Nobody has time to read manuals nowadays and so with that in mind we set out to try to help users get up to speed quicker and help more advanced users know tools they may not be that familiar with.

MacVector 17 introduced a new menu which lists common workflows that a molecular biologist may need. Each topic has a short video and/or a short step by step guides. What’s more is every tool’s dialog now has a link to a video tutorial.

– If you need to know how to do something then try the HOW DO I menu.

– If you need to know more about a tool then click the help button.


(See how we’ve been making the most of the lockdown to use existing sequencing archives to assemble a new Pangolin SARS-CoV-2 genome.)

Posted in Tips | Tagged , , | Comments closed

Working from home – Getting your sequence into MacVector

During the Covid-19 pandemic we want to ensure that you have access to the MacVector license that you would use in the lab, if you are working from home. If you use MacVector, even an older version, and are having trouble activating it (or installing it) at home email MacVector Support and we will help. If you are in anyway connected with COVID19 research, then please have our thanks, and have an annual license free of charge.

We want to help familiarize you with the wide range of functionality in MacVector. You may have just been using MacVector to design PCR primers, but since you do not have a PCR machine on the kitchen table, then here’s what else MacVector can do for you. This week we’ll cover the basic steps of bringing sequences into MacVector.

Starting Point Dialog

StartingPointDialogue

When you start MacVector for the first time you will be presented with the Starting Point Dialog. This allows you to create a New MacVector file, to open a recently opened file or to Open an existing file

You can also create a Gibson Assembly project, an Agarose Gel, an Assembly Project or Align sequences to a reference.

Double clicking on a file or dragging it to the Dock

Usually this will just work. However, this is always reliant upon the operating system deciding which application to use to open a file. If you have multiple sequence analysis applications installed, or your file has no file extension you may find that the file opens in the wrong application. If you right click and choose OPEN WITH you can choose which application to use.

Using the FILE > OPEN menu

This is the traditional, always works, option. You do not have to worry about file extensions as well. MacVector will look inside the file and try to “automagically” use the most appropriate file format importer.

Entrez Browser

MacVector has an Entrez browser that lets you search the online Entrez GenBank database using keywords and retrieve matching sequences either directly to disk or to open directly in MacVector. You can access this via the Database | Online Keyword Search for Sequences (Entrez) menu item.

New From Clipboard

If you have copied any sequence data to the clipboard then you can directly open this using the New From Clipboard function. This also recognises Genbank formats if you have copied a file directly. For example many websites allow you to download a sequence by displaying a Genbank sequence. You can easily copy such sequences and directly paste into MacVector.

Import Features from a Genome Browser.

If you have a blank sequence and no existing curated sequences you can use Import Features from GFF/GFT and BED files to bring features in from one of many Genome Browsers that allow you to export date in a BED/GFT or GFF file. This function allows you to annotate an unannotated or partially annotated sequence with annotations (or features) contained within a BED/GFF/GTF/GFF3 file. Don’t forget the Auto Annotation. tool scanning your sequences against your own annotated ones either.

Posted in Tips | Tagged , | Comments closed

Working From Home – common workflows

During the Covid-19 pandemic we want to ensure that you have access to the MacVector license that you would use in the lab, especially if you are working from home. If you use MacVector, even an older version, and are having trouble activating it (or installing it) at home email MacVector Support and we will help. If you are in anyway connected with COVID19 research, then please have our thanks, and have an annual license free of charge.

If you are working from home then here’s a series of blog posts to familiarize you with tools that you may never have used before. You may have just been using MacVector to design PCR primers, but since you do not have a PCR machine on the kitchen table, then here’s what else MacVector can do for you. This week we’ll cover the many common workflows that the molecular biologist may need.

Cloning constructs

Select two restriction sites and click DIGEST. Drag the digested fragment from the Cloning Clipboard to your vector and click LIGATE.

Gibson Assembly/Ligase Independent Cloning

FILE | NEW | GIBSON/LIGASE INDEPENDENT ASSEMBLY. Choose the type of project, add a cloning vector, then drag a gene from a sequence to clone.

Designing primers

Select a short region of sequence. Click ANALYZE | QUICKTEST PRIMER. Add a restriction site or mutation, slide the primer until the oligo is optimal.

Testing primers

Open your template sequence, Go to ANALYZE | PRIMER DESIGN/TEST(PAIRS). For the left primer click USE THIS Primer. Type or paste in your forward primer, Repeat for the reverse primer. Click TEST

Agarose Gel Simulation

Select FILE | NEW | AGAROSE GEL, Drag a restriction site from the Map tab of a sequence and drop on the Agarose Gel window

Comparing sequences

Select FILE | NEW | ALIGNMENT, Click ADD SEQS and then ALIGN.

List genetic differences between one organism and another related strain

Open two genomes and run ANALYZE | COMPARE GENOMES BY FEATURE..

Searching for your sequence.

Select DATABASE | ONLINE KEYWORD SEARCH (ENTREZ) , enter the accession number of your favorite gene and hit search. Double click the hit to open it directly in MacVector.

Annotating sequences

Right click (or CTRL+left Click) on any faded ORF or Feature on your sequence. Select CREATE FEATURE.

To align a small sequencing project against a reference.

Run FILE | NEW | ALIGN SEQUENCES TO A REFERENCE... Choose a reference sequence, then add your trace files from the sequencing facility and click ALIGN.

De novo assembly of sequencing reads

Run FILE | NEW | ASSEMBLY PROJECT, Choose a reference sequence, add your sequencing reads and also any reference sequences.

Posted in Tips | Tagged , | Comments closed

Working From Home – Comparing sequences

During the Covid-19 pandemic we want to ensure that you have access to the MacVector license that you would use in the lab, especially if you are working from home. If you use MacVector, even an older version, and are having trouble activating it (or installing it) at home email MacVector Support and we will help. If you are in anyway connected with COVID19 research, then please have our thanks, and have an annual license free of charge.

For the next few weekly tips we want to help familiarize you with the wide range of functionality in MacVector. You may have just been using MacVector to design PCR primers, but since you do not have a PCR machine on the kitchen table, then here’s what else MacVector can do for you. This week we’ll cover different types of alignments and what you would use them for:

Multiple Sequence Alignments

The tool to which most people first turn to for aligning sequences is the Multiple Sequence Alignment tool. This allows you to align multiple DNA or protein sequences using either Muscle, Clustalw or T-Coffee. This functionality is most suited for protein alignments, or for nucleic acid sequences where you are interested in examining phylogenetic relationships. Remember that you can also align DNA sequences based on their protein translations.

  • FILE > NEW > PROTEIN ALIGNMENT
  • EDIT > ADD SEQUENCES FROM FILE..
  • Click on the Align toolbar button

Confirming a small sequencing project against a reference.

At some point all molecular biologists have to verify that a sub cloning procedure has worked or a new construct is correct. The Align to Reference tool is perfect for quickly verifying a set of sequencing reads against a reference

  • Go to ANALYZE > ALIGN TO REFERENCE
  • Add your reads and click ALIGN

Aligning cDNA/mRNA sequences against a genomic template.

Align to Reference can also be used to align cDNA clones against a genome sequence. The steps are similar – use the genomic sequence as the reference, then add one or more cDNA clones to the alignment.

  • Go to ANALYZE > ALIGN TO REFERENCE
  • Add your reads
  • Click ALIGN with the algorithm set to cDNA ALIGNMENT

Using a Dot Plot to quickly compare two sequences.

Dot Plot is great for identifying weak regions of similarity between two sequences. Dot Plots are also the best way of identifying sequence rearrangements

  • Open the two sequences to compare.
  • ANALYZE > CREATE DOT PLOT

Comparing two genomes.

Compare Genomes will compare two related annotated genomes (or smaller sequences) to identify and list, identical, similar and weakly similar features along with missing features.

  • Open the two sequences to compare.
  • ANALYZE > COMPARE GENOMES

Internet BLAST

Use this to identify and align a sequence to the databases at the NCBI using the BLAST algorithm. You can download any hits directly to your Desktop including all annotations.

  • Open your sequence
  • DATABASE > INTERNET BLAST SEARCH
  • Choose the database and click OK
  • To retrieve hits select the hits and click TO DESKTOP
  • To download and save hits, select hits and click TO DISK

Aligning a sequence against a folder of local sequences

Align to Folder allows you to scan a local folder full of sequences and align them using the FastA alignment algorithm. Its kind of like a local BLAST, but more sensitive.

  • Open your Sequence.
  • Run DATABASE > ALIGN TO FOLDER
  • Choose a folder of sequences and click ALIGN

Sequence Assembly

This requires our optional Assembler add-on. Use this if you want to align ten or more DNA sequences with the idea of assembling them into a longer sequence with a consensus (de novo) or for aligning reads against a reference (resequencing).

  • Go to FILE > NEW > ASSEMBLY PROJECT
  • Add your sequencing reads and also any reference sequences.

Primer Database

Primer Database.lets you automatically map your primer collection to any sequence you open. Primers can have tails and mismatches.

  • Go to ANALYZE > PRIMER DATABASE SEARCH

or

  • Go to MACVECTOR > PREFERENCES > SCAN DNA FOR.. PRIMER DATABASE
Posted in Tips | Tagged , | Comments closed

101 things you (maybe) didn’t know about MacVector: #52 – Data mining to identify and analyze pangolin CoV-2 analogs to the human COVID-19 virus

One of the most underrated features in MacVector is the Database | Align to Folder function. You can use this as a more sensitive version of a local BLAST search to find sequences in a “database” that match a query sequence. But in this case the “database” is simply a collection of your own sequences, stored in one or more folders on your computer, or on a locally accessible server. More importantly, in these days of huge NGS data sets, the folders can contain fasta or fastq formatted files, and the files can even be compressed using the gzip algorithm. MacVector understands paired-end reads and can retrieve both reads of a pair even if only one of therm matches the query sequence.

As an example of the power of this approach, we used MacVector to retrieve reads matching the human SARS-CoV-2 genome from a collection of RNA-Seq reads from pangolins, assembled those reads into a viral genome and compared the sequence and encoded proteins to published bat and human isolates of SARS-CoV-2. You can read more about how that was accomplished and the results of the analysis in a published Technical Note.

You can use this approach to scan RNA-Seq reads for specific genes, or to identify reads in total genome sequencing experiments that extend sequences of interest, or to retrieve plasmids or bacteriophages. We’ve even used it to retrieve RNA-Seq reads using a protein sequence from a distantly related organism as a query. Here’s how to set up a typical search;

Align2FolderSetup

First make sure you have chosen a suitable Search Folder – you can have a hierarchy of folders and ask MacVector to search recursively through all the enclosed folders. Also be sure to check the paired-end reads checkbox if any of your files represent paired end reads.

Increasing the Hash Value speeds up searches dramatically, at the expense of more memory usage. The current maximum is 14, which means that you need at least a 14 residue perfect match before a potential match will even be considered. If you expect a lot of hits, increase Scores to Keep to a large value.

Finally, the Scoring Matrix can be critical. If you are looking for matches using a query sequence from a related organism, you should likely use DNA database matrix.nmat so that you can retrieve weak matches. However, if you trying to extend a genomic sequence where you are expecting essentially perfect matches, though perhaps with just short overlaps at the ends of reads, then DNA identity with penalties matrix.nmat is tuned for those searches.



This is an article in a long running series of tips to help you get the most out of MacVector. If you want to get notified every time a new tip gets published, follow us @MacVector on twitter (or check the feed for the hashtag #101MacVectorTips) or like us on Facebook.

Posted in 101 Tips, Tips | Tagged , , , | Comments closed

Working from home with MacVector during the COVID-19 pandemic

A lot of MacVector users are now at home getting used to a new way of working. The MacVector team are distributed throughout the US and Europe and we are used to remote working. However, for those new to working from home, it’s a LOT different to working back in the lab with your colleagues!

(See how we’ve been making the most of the lockdown to use existing sequencing archives to assemble a new Pangolin SARS-CoV-2 genome.)

We want to help make it easier to use MacVector:

  • If you currently use a network license and are struggling to access this from home, then email MacVector Support. We can help you in various ways.
  • If you normally use a Mac desktop in the lab, then we can give you a temporary license to activate on any home Mac you might have.
  • Even if you use an old license of MacVector, then we will give you a temporary license of MacVector 17.5.
  • If you use a Standard license, then you can already activate that on any home Mac. However, if you have forgotten your license activation details then email MacVector Support for a reminder.
  • Finally if you are in anyway connected with COVID19 research, then please have our thanks, and have an academic annual license free of charge.

    Our thoughts are with all those affected by COVID–19, be it directly or indirectly. Our particular thoughts are with those on the front line, from health care workers to those researchers working so hard on behalf of all humanity to find a vaccine and a treatment.

    Posted in General | Tagged | Comments closed

    Primer validation with MacVector: Primer3, Covid19 and primer design

    The CDC recently published diagnostic real-time primers for identification of SARS-CoV–2 in any person suspected of having COVID–19.

    Unfortunately as pointed out on the Biome Informatics blog these primers have issues that should have easily been detected had the primers been tested using a good quality primer testing tool (the linked blog post uses Primer3). What’s more is that if a good quality primer design application, such as MacVector’s Primer3 tool, had been used from the beginning, then these issues would never have occurred.

    Some primer design software tools can be difficult to use. However, MacVector provides an easy to use interface to Primer3. Designing a pair of primers to amplify a target can be as simple as just three mouse clicks (click on the target gene, click ANALYZE | PRIMER3 and then click OK).

    MacVector also has QuickTest primer for visual design and testing of primers. QuickTest Primer simplifies primer design by showing your primer and its statistics in realtime. Does your primer have a hairpin? Nudge it along your template until the hairpin goes? Want to add a restriction site? Then add one and again nudge your primer to optimise the oligo.

    MacVector also has built in tools for helping find suitable targets to amplify using Blast or scanning local sequences too.

    COVID19 QTPrimer 2

    Using MacVector’s Quicktest Primer tool you can see the obvious hairpin in the reverse primer of the CDC first set of primers.

    Here’s a demonstration workflow on how to use MacVector’s Primer Database and Quicktest Primer tools to look at these primers:

    Download the primers

  • Download the CDC primer list (in a MacVector Primer Database file).
  • Open MACVECTOR | PREFERENCES | SCAN DNA | PRIMERS.
  • Click SET DATABASE FILE and choose the downloaded file.
  • Download one of the many sequenced SARS-CoV-2 genomes

  • Open MACVECTOR | ENTREZ and search for “organism=SARS-CoV-2” and “all fields = genome”.
  • There are a lot of submitted genomes from this organism (94 as of March 16, 2020).

  • Double click on a hit to open the genome directly in MacVector.
  • Now switch to the MAP tab.
  • You will see the primers annotated on the sequence (faded as they are dynamically shown).

  • Open ANALYZE | QUICKTEST PRIMER
  • Click the Insert from Primer Database button (as seen in the screenshot below).

    QT Primer

    You will see the primer displayed in the QT Primer interface showing any flaws. Most of the primers have hairpins.

    SARS CoV 2 QTPrimer

    Designing Primers

    Here’s an example workflow of how you would design primers to detect the N gene.

    Please note that for this demonstration workflow we have designed primers against the N gene, that had been selected by the CDC as a target. For real world purposes you would need to select an appropriate target. One way would be to align all sequenced human CoV-2 genomes (or whatever organism you are looking for), then look for highly conserved regions. Particularly in regions of the virus thought to be critical for pathogenicity. Those conserved regions can then be used as targets to design primers. It would also be useful to also select regions that would distinguish SARS-CoV-2 from the original SARS-CoV-1.

  • Select the N Gene in the SARS-CoV-2 genome (the one you downloaded above).
  • Run ANALYZE | PRIMER DESIGN/TEST (Primer3)
  • Change the setting to REGION TO SCAN and enter a product size of 400 to 500
  • Check Hybridization Probe Sequence.
  • Ensure all primers are set to Find Primer and not USE THIS PRIMER.
  • For this workflow example you do not need to change any advanced options.
  • Click OK
  • Designprimers Primer3 settings

    In the Results window you will see a MAP with the generated primer pairs and products.

    NewPrimersPrimer3

    The initial left primer still has a hairpin. You could tweak Primer3 settings. However, you can quickly tweak the primer itself using Quicktest Primer. If you slide the primer three bases to the left then the hairpin will go.

  • Select the LEFT primer in the Results Spreadsheet tab and choose ADD TO PRIMER DATABASE
  • Choose a suitable name.
  • Repeat for the RIGHT primer and the PROBE.
  • Open QuickTest Primer
  • Click INSERT FROM PRIMER DATABASE and test each primer.
  • To slide a primer left or right click on the cursor buttons either side of the primer.
  • Don’t forget to save any modified primer. You may also want to test the primers again using primer3. If you save the primers to your Primer Database then this is easy to do.

    https://tomeraltman.net/2020/03/03/technical-problems-COVID-primers.html

    https://www.cdc.gov/coronavirus/2019-ncov/lab/rt-pcr-panel-primer-probes.html

  • Posted in Techniques, Tips | Tagged , , | Comments closed