MacVectorBaseLogoWhiteTransparentBackgroundlarge2x

Sequence Analysis Tools for Molecular Biologists

Home MacVector Assembler Downloads Try HowToBuy Support Contact Forums

NGS Reference Assembly Using Bowtie

You can use Assembler to align millions of short Next Generation Sequencing (NGS) reads against genomic reference sequences. This is useful for identifying SNPs and other variants in clones or mixtures of isolates compared to a known parent or reference, or for the first step in a scaffolding-based assembly of a related species. There are blog posts describing this functionality here and here.

Adding Reference and Reads to a Project

Assembly projects have two toolbar buttons for adding sequence data to the project. The Add Seqs button is used to add read data to the project - these are typically FastQ formatted files containing sequence data from Illumina Solexa, SOLiD or 454 sequencing runs. To save disk space, the files are not copied - MacVector just notes their location, so its important not to move them after you have created a project. The Add Ref button lets you add one or more reference sequences to the project - these can be in any format recognized by MacVector. The image below shows a project populated with 4 Escherichia coli reference genomes and two FastQ read files representing paired-end reads from an E. coli clone.

Project before assembly

Assembling Using Bowtie

Bowtie can assemble reads against more than one reference sequence in a single run, so we can just select all the reference sequences and the two sequence files, then click on the Bowtie button;

Bowtie Parameters

MacVector detects the fact there are two read files, so enables the Use paired-end alignment checkbox. The Report all best alignments option ensures that each read in the sequence files can align separately to each of the four reference sequences. After clicking OK, the job runs, taking about an hour on a laptop for this data (four 4.6 Mb genomes aligned against ~5 million reads).

Viewing Reference Contigs

When completed, a reference contig assembly object for each reference sequence appears in the project window. Each of those can be opened to reveal the child contigs enclosed by them. For a full length alignment, there is just one, but W3110 has two contigs with a short gap between them;

Project after assembly

Double-clicking on a contig opens up the contig editor/viewer. You can scroll through the actual aligned assembly;

Contig Editor

zoom in on regions where the coverage map indicates there are no overlapping reads, like this transposon inserted in DH10B;

Contig Map

or view a summary of the SNPs in the consensus sequence from the reads and the affect these would have on the amino acid sequence of any annotated CDS features;

Contig SNPs

FlatLogo2019

Copyright © 2024 MacVector, Inc. All rights reserved. Terms of Use.

MacVector, Inc • PO Box 1147 • Apex • North Carolina 27502 • USA

phone: +1-919-303-7450 • toll free: +1-866-338-0222 • fax: +1-919-303-7449

Overview

Creating a Sequencing Project

Base Calling Using phred

Vector Trimming with cross_match

Assembling Sequences using phrap

Editing and Analysis of Contigs

NGS Reference Assembly using Bowtie

NGS de novo Assembly using Velvet

Comparing Assembler and AssemblyLIgn

SplitFastqFile - a ultility to break up large fastq files.

Functional comparison with Sequencher