<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>MacVector talk &#187; NGS</title>
	<atom:link href="http://macvector.com/blog/tag/ngs/feed/" rel="self" type="application/rss+xml" />
	<link>http://macvector.com/blog</link>
	<description>General musings from the MacVector team about sequence analysis, molecular biology, the Mac in general and of course your favorite sequence analysis app for the Mac!</description>
	<lastBuildDate>Mon, 23 Jan 2012 14:30:43 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
		<item>
		<title>MacVector 12.5: Sequence Assembly made easy.</title>
		<link>http://macvector.com/blog/2011/10/macvector-12-5-using-assembler/</link>
		<comments>http://macvector.com/blog/2011/10/macvector-12-5-using-assembler/#comments</comments>
		<pubDate>Mon, 24 Oct 2011 20:43:29 +0000</pubDate>
		<dc:creator>Chris</dc:creator>
				<category><![CDATA[Releases]]></category>
		<category><![CDATA[Techniques]]></category>
		<category><![CDATA[assembler]]></category>
		<category><![CDATA[NGS]]></category>

		<guid isPermaLink="false">http://macvector.com/blog/?p=669</guid>
		<description><![CDATA[This is part of a series of posts about and leading up to the release of MacVector 12.5. Assembler has always made it easy to assemble your sequencing projects. It hides the complicated algorithms and provides a point and click interface to show you the results. With the release of MacVector 12.5 the range of [...]]]></description>
			<content:encoded><![CDATA[<p><em>This is part of a series of posts about and leading up to the release of MacVector 12.5.</em></p>
<p> Assembler has always made it easy to assemble your sequencing projects. It hides the complicated algorithms and provides a point and click interface to show you the results. With the release of MacVector 12.5 the range of tasks that Assembler will perform has been greatly extended. Now with the addition of high throughput reference assembly using the popular Bowtie algorithm, Assembler can now support the alignment of many millions of NGS reads against genome sized references. Assembler will also generate output in the popular BAM format. Collections of contigs can be exported in FastQ format for additional analysis. Additionally the interface has been enhanced to increase the number of reads that can be submitted for de novo assembly. ..and to analyse your assemblies SNP detection and reporting has been enhanced with VCF output from Bowtie alignments and listing of all the codon and amino acid changes between the consensus and reference sequence alignments.</p>
<p>Here&#8217;s a selection of tasks that Assembler will make easy for you</p>
<h3>De novo assembly of Sanger trace files</h3>
<p>Add in your Sanger sequencing trace files, basecall the reads to improve accuracy then assemble using quality scores.</p>
<h3><em>de novo</em> short read Assembly</h3>
<p>Add short reads from a variety of sources in FASTQ format as well as Sanger sequencing. Great for Hybrid assemblies</p>
<p><img style="display:block; margin-left:auto; margin-right:auto;" src="http://macvector.com/blog/wp-content/uploads/2011/10/MV125_BowtieOverview.png" alt="MV125 BowtieOverview" title="MV125_BowtieOverview.png" border="0" width="600" height="425" /></p>
<h3>Reference assembly to identify SNPs in a bacterial/viral isolate</h3>
<p>Add a Reference sequence and an NGS file(s) representing the sequence of an individual isolate and assemble using Bowtie. View a report listing all the potential SNPs based on the differences between the consensus and the reference. Be able to quickly identify the genes that the SNPs lie in and drill down to view the nucleotide and amino acid changes.</p>
<h3><em>De novo</em> bacterial assembly assisted by a Reference scaffold</h3>
<p>Create a reference assembly using Bowtie, then take the individual contig consensus sequences along with all the input NGS Reads that did not assemble and assemble with a <em>de novo</em> assembler (Phrap) directly from within a new Assembly Project.</p>
<h3>Assembly to multiple similar references</h3>
<p>Take reference sequences from a series of closely related strains of virus or bacteria. The reads come from a single isolate. Use Bowtie to assemble these against the collection of References to determine which is most closely related (or identical) to the isolate.</p>
<h3>Assembly to multiple dissimilar references to identify SNPs</h3>
<p>Essentially similar to the bacterial SNP assembly, but using yeast or some other organism with multiple genomes (or even some bacteria that have multiple chromosomes or large plasmids).</p>
<h3>Exome (transcriptome) sequencing)</h3>
<p>Use the genomic sequence of an organism as a reference and align reads from sequenced mRNA, cDNA or total RNA of that same organism. Uses are splice site junction identification, novel gene identification amongst many others.</p>
<p><img style="display:block; margin-left:auto; margin-right:auto;" src="http://macvector.com/blog/wp-content/uploads/2011/10/MV125_ReferenceContigCoverageMapSymbols.png" alt="MV125 ReferenceContigCoverageMapSymbols" title="MV125_ReferenceContigCoverageMapSymbols.png" border="0" width="600" height="264" /></p>
<p><strong><br />
..and remember if you purchase an upgrade or a new license before the release of MacVector 12.5 you will get a 10% discount and a free upgrade to MacVector 12.5 when it is released.  Contact <a href="mailto:sales@macvector.com?Subject=MacVector%20discount ">Sales@macvector.com</a> and quote <em>&#8220;MV12510&#8243;</em> for the discount.</strong></p>
<p><!-- Technorati Tags Start --></p>
<p>Technorati Tags: <a href="http://technorati.com/tag/bowtie" rel="tag">bowtie</a>, <a href="http://technorati.com/tag/MacVector" rel="tag">MacVector</a></p>
<p><!-- Technorati Tags End --></p>
]]></content:encoded>
			<wfw:commentRss>http://macvector.com/blog/2011/10/macvector-12-5-using-assembler/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Short read improvements to MacVector 11.1</title>
		<link>http://macvector.com/blog/2010/03/short-read-improvements-to-macvector-11-1/</link>
		<comments>http://macvector.com/blog/2010/03/short-read-improvements-to-macvector-11-1/#comments</comments>
		<pubDate>Mon, 15 Mar 2010 15:01:45 +0000</pubDate>
		<dc:creator>Chris</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Releases]]></category>
		<category><![CDATA[454]]></category>
		<category><![CDATA[illumina]]></category>
		<category><![CDATA[NGS]]></category>
		<category><![CDATA[solid]]></category>

		<guid isPermaLink="false">http://macvector.com/blog/?p=262</guid>
		<description><![CDATA[The ability to produce de novo assemblies of short read data was introduced into MacVector &#038; Assembler 11 and we&#8217;ve enhanced this in MacVector 11.1. Now Assembler stores and visualises metadata on the type of short read you have and also stores and deals with the quality data stored in a better way. Mixing your [...]]]></description>
			<content:encoded><![CDATA[<p>The ability to produce <em>de novo</em> assemblies of short read data was introduced into <a href="http://macvector.com/Assembler/assembler.html">MacVector &#038; Assembler 11</a> and we&#8217;ve enhanced this in <a href="http://macvector.com/MacVector/what%27snewinmacvector11.1.html">MacVector 11.1</a>. Now Assembler stores and visualises metadata on the type of short read you have and also stores and deals with the quality data stored in a better way.</p>
<p><strong>Mixing your reads</strong></p>
<p>Currently there are three major types of short read data about, due to the fact that there are mainly three next generation sequencers producing considerably different read lengths. </p>
<p>- Illumina reads which are generally 66bp in length (the first generation of Solexa sequencers produced reads of 33bp long).</p>
<p>- 454 reads which can be 400 to 500 bp long</p>
<p>- SOLiD reads which are around 50bp long.</p>
<p>It is important to know which sequencer your reads have come from as length is not the only specific characteristic they possess. So with MacVector 11.1 we have introduced new feature types to indicate which types of read it is. So whether you mix Sanger, with 454, and a dash of Illumina reads, your assembly project will always keep the source of the reads stored in the metadata of the project. Furthermore the default symbol for each read type is different, and can be easily visualised.</p>
<p><img src="http://macvector.com/blog/wp-content/uploads/2010/02/ShortReadFeatureType.png" alt="http://macvector.com/blog/wp-content/uploads/2010/02/ShortReadFeatureType.png" border="0" width="649" height="321" align="center" /></p>
<p><strong>Yet another sequence file format inconsistency!</strong></p>
<p>An unfortunate fact about file formats in the bioinformatics world is that there are just so many of them! Most of them giving a slightly different approach to the same question.  So it is nice that the <a href="http://en.wikipedia.org/wiki/FASTQ_format">Fastq</a> format seems to be already fairly ubiquitous amongst raw data storage for short read data. Which is the main reason that we chose it to support short read data in MacVector.    However, on the downside already there are three variations in the way that quality scores are stored in the format (actually there are variations in the sequence labels as well, but let&#8217;s just consider the quality scores). As well as the <a href="http://en.wikipedia.org/wiki/Phred_base_calling">basecalled</a> sequence, the Fastq format stores quality data encoded as a single character using the <a href="http://en.wikipedia.org/wiki/ASCII">ASCII</a> code for that character representing the value.  All three variations of the format use that strategy. However, the actual number stored and the way it is encoded into the ASCII character does vary.  The first generation of Illumina machines (when the company was still called Solexa) used a proprietary quality code.  The current generation of their sequencers use the more recognised <a href="http://en.wikipedia.org/wiki/Phred_quality_score">Phrap quality scores</a>. However, they do store this number differently to the more recognised &#8220;Sanger&#8221; format which stores a Phrap quality score with a range of 0 &#8211; 93 using ASCII codes of 33 to 126. It is very important to distinguish when data is in the old Solexa scores. It is less important to distinguish between the later Illumina format, as reads will only be relevant when scores are very high and such high Phrap scores are unlikely. </p>
<p>So as well as distinguishing which sequencer produced the data, Assembler now also support the import of Fastq reads in all three types of quality data, and will take this into account when being assembled.</p>
]]></content:encoded>
			<wfw:commentRss>http://macvector.com/blog/2010/03/short-read-improvements-to-macvector-11-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MacVector 11.1 is now released</title>
		<link>http://macvector.com/blog/2010/02/macvector-11-1-is-now-released/</link>
		<comments>http://macvector.com/blog/2010/02/macvector-11-1-is-now-released/#comments</comments>
		<pubDate>Wed, 10 Feb 2010 15:37:11 +0000</pubDate>
		<dc:creator>Chris</dc:creator>
				<category><![CDATA[Releases]]></category>
		<category><![CDATA[NGS]]></category>

		<guid isPermaLink="false">http://macvector.com/blog/?p=260</guid>
		<description><![CDATA[MacVector 11.1 has just been released. You can check out the new features of this release on this blog post or a fuller list is available here. Existing customers can download and install it now. Although as usual you will be notified by MacVector&#8217;s online notifier in due course. We are pleased with this release [...]]]></description>
			<content:encoded><![CDATA[<p>MacVector 11.1 has just been released. You can check out the new features of this release on <a href="http://macvector.com/blog/?p=254">this blog post</a> or a fuller list is available <a href="http://www.macvector.com/MacVector/what%27snewinmacvector11.1.html">here</a>.</p>
<p>Existing customers can <a href="http://www.macvector.com/downloads.html#MacVector111Updater">download</a> and install it now. Although as usual you will be notified by MacVector&#8217;s online notifier in due course. <img class="aligncenter size-medium wp-image-87" title="msa2" src="http://macvector.com/blog/wp-content/uploads/2010/02/OnlineNotifierMV_11.1.png" alt="msa2" width="600" height="402" /></p>
<p>We are pleased with this release and we hope you are too! Please feel free to let us know what you think!</p>
<p><em>Please note that this is a download only release and CDs will not be sent to existing customers.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://macvector.com/blog/2010/02/macvector-11-1-is-now-released/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Next Generation sequencing formats</title>
		<link>http://macvector.com/blog/2009/06/next-generation-sequencing-formats/</link>
		<comments>http://macvector.com/blog/2009/06/next-generation-sequencing-formats/#comments</comments>
		<pubDate>Wed, 24 Jun 2009 13:26:19 +0000</pubDate>
		<dc:creator>Chris</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[NGS]]></category>

		<guid isPermaLink="false">http://macvector.com/blog/?p=100</guid>
		<description><![CDATA[As is common with the lack of standards seen with most emerging technologies there are many different and competing types of sequencing file formats for storage of short read or next generation sequencing data. All these formats try to solve the same question of storing an almost unprecedented amount of sequence data in a useable [...]]]></description>
			<content:encoded><![CDATA[<p>As is common with the lack of standards seen with most emerging technologies there are many different and competing types of sequencing file formats for storage of short read or next generation sequencing data. All these formats try to solve the same question of storing an almost unprecedented amount of sequence data in a useable and complete format. However, one emerging format that seems very appropriate for this type of data is <a href="http://en.wikipedia.org/wiki/FASTQ_format">Fastq</a>.  </p>
<p><a href="http://en.wikipedia.org/wiki/Phrap">Phrap</a> was one of the first real high throughput assemblers that could also deal with quality scores (generated by its stablemate <a href="http://en.wikipedia.org/wiki/Phred_base_calling">Phred</a>). The general input files to Phrap are a single Fasta file containing the reads, and an associated Qual file that contains quality scores for each and every read in the Fasta file. I&#8217;m not sure what it is about bioinformaticians, but they always feel the need to add Yet Another Format rather than reuse one of the many decent formats.  However, in this case Fastq is a logical progression of the Fasta + Qual format in that the two individual files are now merged. That is each read comprises of four lines; A label header, a sequence, and second label header and a quality score line.  </p>
<p><em>Here&#8217;s an example of such a file. Note that this example has a single &#8216;+&#8217; character to indicate the quality label line, rather than a duplicate of the label.</em></p>
<pre>@ERR000955.3982 IL6_1091:3:1:210:502/1
TCCAAACACACTTTGTGTAGAATCTGCAAGTGGAGAT
+
>>>>>>>>>>>>>>>>>>>;>>>>><>>>;>>;><>></pre>
<p>Many of the raw file formats (sff/ SRF etc) are big! They contain the raw image files as well as basecalled sequence and quality data.  Fastq files are at the complete opposite end.  They are small, and only contain minimal data. They may contain millions of reads, yet are still in the less than a tenth of a Gigabyte range for a single run&#8217;s data. It is worthwhile noting that the various Short Read Archives (NCBI, EBI etc.) require the submission of original raw image files, but only allow the reads to be downloaded in fastq format.</p>
<p>The quality line must comprise of the same number of characters as bases. i.e. one quality character per base. However, most quality scores are double digits. Fastq gets around this by using an ASCII character to encode the quality score. However, here&#8217;s where consistency fails. The quaility line mat be one of three different types.  Sanger format will encode a <a href="http://en.wikipedia.org/wiki/Phred_quality_score">Phred quality score</a> of 0 &#8211; 60 using the ASCII characters 33 to 93. The latest Illumina 1.3 format will also contain a Phred Quality score from 0 to 40 however, this time encoded using ASCII 64 to 104. Finally the older Illumina (nee Solexa) 1.0 format has its own Solexa/Illumina quality score from -5 to 40 encoded using ASCII 59 to 104.  Of course this does now pose problems, as unless you know which quality score was used, there is now way of knowing without guesswork, which it is.</p>
<p>There are also other issues with this format. It could be said that the label line for the quality score line is redundant, and the filesize could be reduced by 25% if this was removed. Some applications do generate and accept fastq files that have a single &#8216;@&#8217; or &#8216;+&#8217; in place of the quality label line. </p>
<p>It would  be helpful to see a tightening of this format and indeed there is a <a href="http://illumina.ucr.edu/ht/documentation/standardized-fastq-format-aka-fastq2">fastq2</a> format that does not have these weaknesses.</p>
<p>With the next release of MacVector and Assembler during the Summer we will be adding support for the Fastq format. We will also be adding support for <i>de novo</i> assembly of short read data. This release is currently in internal beta testing, and will be out for a public beta trial soon.</p>
]]></content:encoded>
			<wfw:commentRss>http://macvector.com/blog/2009/06/next-generation-sequencing-formats/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

