Virtual Gene Cloning from NGS RNA-Seq Data

By Chris | Published: August 3, 2020

The NCBI Sequence Read Archive (SRA) database is a huge resource of Next Generation Sequencing experimental data. Many groups and laboratories deposit data here that they have generated for their own specific projects that can be datamined for other unrelated projects with a minimum of effort.

MacVector contains a number of powerful tools that can be used to extract and analyze specific sequences from large quantities of NGS data. We recently used these tools to clone the sequence of 19 distinct C2H2 Zinc Finger proteins from NGS RNA-Seq data prepared from root tissue of the Aloe vera plant.

Unknown

The basic steps to do this were;

Use Align to Folder to find and extract all pairs of reads that could potentially encode the conserved QALGGH domain from C2H2 Zn Finger proteins
Assemble the reads using phrap, velvet and/or SPAdes to generate multiple contigs
Analyze contigs to identify and translate protein-coding ORFs
Extend contigs when required using additional rounds of Align to Folder, contig assembly and Align to Reference
Annotate proteins using the built-in InterProScan function
Align proteins using ClustalW and visualize the shared QALGGH domains

The full tutorial is available as a PDF and the required data files are also available to download direct from the SRA.

This entry was posted in Tutorials and tagged cloning, NGS, RNASeq, tutorials. Bookmark the permalink. Both comments and trackbacks are currently closed.

Virtual Gene Cloning from NGS RNA-Seq Data

Categories

Recent Posts

Recent Comments

Pages

Search

Meta

Virtual Gene Cloning from NGS RNA-Seq Data

Tags

Categories

Recent Posts

Recent Comments

Pages

Search

Meta