<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>MacVector talk &#187; NGS</title>
	<atom:link href="http://macvector.com/blog/tag/ngs/feed/" rel="self" type="application/rss+xml" />
	<link>http://macvector.com/blog</link>
	<description>General musings from the MacVector team about sequence analysis, molecular biology, the Mac in general and of course your favorite sequence analysis app for the Mac!</description>
	<lastBuildDate>Mon, 14 May 2012 14:40:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
		<item>
		<title>MacVector 12.5: Creating reference assemblies with Bowtie</title>
		<link>http://macvector.com/blog/2011/11/macvector-12-5-creating-reference-assemblies-with-bowtie/</link>
		<comments>http://macvector.com/blog/2011/11/macvector-12-5-creating-reference-assemblies-with-bowtie/#comments</comments>
		<pubDate>Sun, 06 Nov 2011 17:43:38 +0000</pubDate>
		<dc:creator>Chris</dc:creator>
				<category><![CDATA[Releases]]></category>
		<category><![CDATA[assembler]]></category>
		<category><![CDATA[bowtie]]></category>
		<category><![CDATA[NGS]]></category>

		<guid isPermaLink="false">http://macvector.com/blog/?p=683</guid>
		<description><![CDATA[This is part of a series of posts about and leading up to the release of MacVector 12.5. With Assembler 12.5 our developers have come up with an affordable and straightforward solution for assembling and visualizing your NGS data. Generating sequencing data is cheaper than it has ever been, however, with this increase in data [...]]]></description>
			<content:encoded><![CDATA[<p><em>This is part of a series of posts about and leading up to the release of MacVector 12.5.</em></p>
<p>With Assembler 12.5 our developers have come up with an affordable and straightforward solution for assembling and visualizing your NGS data. Generating sequencing data is cheaper than it has ever been, however, with this increase in data has come a problem with analysis. Assembler will now create reference assemblies with just a few mouse clicks using <a href="http://bowtie-bio.sourceforge.net/index.shtml">Bowtie</a>. Instead of sending your millions of reads away to be assembled or delving into complicated software tools you&#8217;ll be able to align millions of NGS reads to multi megabase reference sequences in literally minutes. Bowtie is a fast algorithm, and although it&#8217;s an ungapped assembler, what it loses in accuracy it makes up for in speed. You do not need a 32GB 8core Mac Pro to assemble your data. In addition to the existing phrap/phred tools this makes Assembler a simple, cost-effective solution to analyzing your Next Generation Sequencing reads.</p>
<h2>Creating a Reference Assembly</h2>
<li>- Chose <strong>File | New | Assembly Project</strong> to create a new empty project file.</li>
<li>- Click on the <strong>Add Reads</strong> tool bar button, then select the sequence files you wish to assemble and click on the Open button. Read(s) file(s) can also be drag and dropped on the open Assembly Project window.</li>
<li>- Click on the <strong>Add Ref</strong> tool bar button, then select the sequence file you wish to align the reads against and click on the Open button.</li>
<li>- Choose <strong>Analyze | Bowtie</strong> to run the Bowtie algorithm on all of the sequences in the project. Note that if no sequences are selected, Bowtie will be run on ALL of the files in the project. However, if any sequences are selected then the reference sequence and at least one reads file must be selected.</li>
<p>Your reference sequence can be in any &#8220;openable&#8221; format. However, your reads need to be in FASTQ format.</p>
<h3>Hit Reporting</h3>
<p>In the dialogue you&#8217;ll see an important setting called Hit Reporting.  Bowtie uses a concept of strata to score alignments.  A stratum is defined by all reads that contain the same number of mismatches in the seed (the seed is the first &#8220;n&#8221; bases of a read which is given higher priority in scoring than the entire read). You can either show ALL ALIGNMENTS,  REPORT BEST ALIGNMENT ONLY (show the best alignment in the stratum with the least amount of mismatches) or REPORT ALL BEST ALIGNMENTS (which shows the best alignment in all strata). Which you choose depends on a few factors. For example how many references you have, how many repeated regions you expect, whether you are using a reference sequence from the same organism or a related one, and many others. Generally start with show all alignments, which is the quickest, and work from there.</p>
<h2>Analysis</h2>
<p>..and that&#8217;s how easy it is. Of course generating results is always easier than analysing them and to help analyse your reference contig Assembler has a few useful tools. We&#8217;ll talk about variant detection in a later blog post, but the coverage map is one of the first tools that you will see upon completing an assembly.</p>
<h3>Using the Coverage Map</h3>
<p>It is extremely useful to be know the depth of reads that are aligned on your reference. Areas of low coverage indicate that you need further sequencing and peaks of high coverage can be indicative of repeats. The Map view of a reference contig will show details of the depth of reads in a coverage map with four statistics. A single plot line shows a running average of the number of reads at that point. However, an average plot is not very sensitive when viewed at a high level and so two shaded areas indicate the maximum value and the minimum value of the averaged reads at that point. As the coverage map is viewed at a lower level these three values will become increasingly closer to the extent that when viewed at, or close to, residue level these three plots will become identical. Areas of zero coverage are shown in light grey. Note that these areas are always displayed even when they are disproportionate to the level of magnification.</p>
<p><img style="display:block; margin-left:auto; margin-right:auto;" src="http://macvector.com/blog/wp-content/uploads/2011/10/MV125_ReferenceContigCoverageMapSymbols.png" alt="MV125 ReferenceContigCoverageMapSymbols" title="MV125_ReferenceContigCoverageMapSymbols.png" border="0" width="600" height="264" /></p>
<h3>Multiple reference sequences</h3>
<p>You can add multiple reference sequences and depending on the settings reads will be aligned against the best match or against multiple ones. This is great for such tasks as identifying a sequenced isolate amongst a series of closely related strains of virus or bacteria. Having multiple reference sequences helps determine which is most closely related (or identical) to the isolate.</p>
<h3>Paired end reads</h3>
<p>Paired end reads are very useful for improving the accuracy of alignments and also for indel detection. Paired end reads are created by sequencing both ends of the same DNA molecule, with known fragment size. Since the two reads are now separated by a known distance assembly and orientation of the two reads is less complicated. For Assembler if your reads are paired end all you need to do is ensure that the same filenames but appended with version numbers and Paired End assembly is enabled.</p>
<p>e.g.</p>
<pre>READS_1.fastq
READS_2.fastq</pre>
<p>You&#8217;ll also need to input the fragment size.</p>
<p>In the next Assembler post we&#8217;ll talk about variant detection.</p>
<p><strong><br />
..and remember if you purchase an upgrade or a new license before the release of MacVector 12.5 you can <a href="http://macvector.com/blog/2011/10/macvector-12-5-get-assembler-for-half-price/">get Assembler</a> with a 50% discount and a free upgrade to MacVector 12.5 when it is released. This offer ends on 1st December. Please <a href="http://www.macvector.com/initquoterequest.php">request a quote</a> now. Don&#8217;t forget to quote the promotional code of &#8220;Assembler50%&#8221;</strong></p>
<p><!-- Technorati Tags Start --></p>
<p>Technorati Tags: <a href="http://technorati.com/tag/bowtie" rel="tag">bowtie</a>, <a href="http://technorati.com/tag/MacVector" rel="tag">MacVector</a></p>
<p><!-- Technorati Tags End --></p>
<div id="tweetbutton683" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fmacvector.com%2Fblog%2F2011%2F11%2Fmacvector-12-5-creating-reference-assemblies-with-bowtie%2F&amp;via=macvector&amp;text=MacVector%2012.5%3A%20Creating%20reference%20assemblies%20with%20Bowtie&amp;related=macvector&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fmacvector.com%2Fblog%2F2011%2F11%2Fmacvector-12-5-creating-reference-assemblies-with-bowtie%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://macvector.com/blog/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://macvector.com/blog/2011/11/macvector-12-5-creating-reference-assemblies-with-bowtie/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MacVector 12.5: Get Assembler for half price!</title>
		<link>http://macvector.com/blog/2011/10/macvector-12-5-get-assembler-for-half-price/</link>
		<comments>http://macvector.com/blog/2011/10/macvector-12-5-get-assembler-for-half-price/#comments</comments>
		<pubDate>Mon, 31 Oct 2011 16:02:20 +0000</pubDate>
		<dc:creator>Chris</dc:creator>
				<category><![CDATA[Releases]]></category>
		<category><![CDATA[assembler]]></category>
		<category><![CDATA[bowtie]]></category>
		<category><![CDATA[NGS]]></category>
		<category><![CDATA[offers]]></category>

		<guid isPermaLink="false">http://macvector.com/blog/?p=679</guid>
		<description><![CDATA[Our latest version, MacVector 12.5, will be released on the 1st of December, 2011. There&#8217;s some great new features in this release with the addition of extra alignment algorithms, Muscle and T-coffee, to the Multiple Sequence Alignment analysis interface, as well as many interface enhancements and performance improvements. However, the most important new feature in [...]]]></description>
			<content:encoded><![CDATA[<p>Our latest version, MacVector 12.5, will be released on the 1st of December, 2011.  There&#8217;s some <a href="http://macvector.com/blog/2011/10/macvector-12-5-new-features/">great new features</a> in this release with the addition of extra alignment algorithms, <a href="http://www.drive5.com/muscle/">Muscle</a> and <a href="http://tcoffee.crg.cat/">T-coffee</a>, to the Multiple Sequence Alignment analysis interface, as well as many interface enhancements and performance improvements.</p>
<p><img src="http://macvector.com/blog/wp-content/uploads/2011/10/MV125_MuscleTcoffee.png" alt="MV125 MuscleTcoffee" title="MV125_MuscleTcoffee.png" border="0" width="300" height="200" style="float:right;" /></p>
<p>However, the most important new feature in MacVector 12.5 is the addition of <a href="http://bowtie-bio.sourceforge.net/index.shtml">Bowtie</a> to MacVector&#8217;s sequence assembly module, <a href="http://macvector.com/blog/2011/10/macvector-12-5-using-assembler/">Assembler</a> to create reference assemblies with your NGS data. Additionally, the Contig Editor has been enhanced with some useful features to visualise your genome sized alignments. With these improvements, our developers have come up with an affordable and straightforward solution to assembling your NGS data. Generating sequencing data is cheaper than it has ever been, however, analyzing it is not. Instead of sending your millions of reads away to be assembled at great expense, or spending time on the computer trying to assemble your reads, Assembler will now create reference assemblies with just a few mouse clicks.</p>
<p>To mark the release of MacVector 12.5 and this great new functionality in Assembler we&#8217;re offering users an opportunity to add Assembler to your MacVector license for <strong>half price</strong>. Not only that but if you have an older license of MacVector and want to upgrade your license then you&#8217;ll still be able to add Assembler at half price. If you do not have MacVector and want this great new functionality then you&#8217;ll still be able to add Assembler to a new license for half price.</p>
<p>If you take advantage of this offer, you will not only get a great price on Assembler, but you will get MacVector 12.5 <strong>and</strong> a free upgrade to MacVector 12.6 a few months after. MacVector 12.6 will include even more NGS support as well as a Quick-test function to easily analyze your existing primer sequences, and more. <strong>Plus</strong>, your license will include a year&#8217;s maintenance which will not start until MacVector 12.5 is officially released on the 1st December, 2011.</p>
<p>Remember that this half-price offer ends on 1st December, and the price of Assembler is going up next year. Please <a href="http://www.macvector.com/initquoterequest.php">request a quote</a> now. Don&#8217;t forget to quote the promotional code of &#8220;Assembler50%&#8221;</p>
<p>If you want to see the new functionality then we have a <a href="http://www.macvector.com/GTCGAC/macvector12.5beta.html">prerelease preview</a> available for download. This is not yet a release candidate but is part of our external beta program</p>
<p>..and remember both MacVector 12.0.6 and MacVector 12.5 are fully supported on OS X Lion.</p>
<p><em>MacVector 12.5 will be released on the 1st of December, 2011</em></p>
<p><img src="http://macvector.com/blog/wp-content/uploads/2011/10/MV125_BowtieOverview1.png" alt="MV125 BowtieOverview" title="MV125_BowtieOverview.png" border="0" width="600" height="425" style="float:center;" /><br />
<!-- Technorati Tags Start --></p>
<p>Technorati Tags: <a href="http://technorati.com/tag/MacVector" rel="tag">MacVector</a></p>
<p><!-- Technorati Tags End --></p>
<div id="tweetbutton679" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fmacvector.com%2Fblog%2F2011%2F10%2Fmacvector-12-5-get-assembler-for-half-price%2F&amp;via=macvector&amp;text=MacVector%2012.5%3A%20Get%20Assembler%20for%20half%20price%21&amp;related=macvector&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fmacvector.com%2Fblog%2F2011%2F10%2Fmacvector-12-5-get-assembler-for-half-price%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://macvector.com/blog/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://macvector.com/blog/2011/10/macvector-12-5-get-assembler-for-half-price/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MacVector 12.5: Sequence Assembly made easy.</title>
		<link>http://macvector.com/blog/2011/10/macvector-12-5-using-assembler/</link>
		<comments>http://macvector.com/blog/2011/10/macvector-12-5-using-assembler/#comments</comments>
		<pubDate>Mon, 24 Oct 2011 20:43:29 +0000</pubDate>
		<dc:creator>Chris</dc:creator>
				<category><![CDATA[Releases]]></category>
		<category><![CDATA[Techniques]]></category>
		<category><![CDATA[assembler]]></category>
		<category><![CDATA[bowtie]]></category>
		<category><![CDATA[NGS]]></category>

		<guid isPermaLink="false">http://macvector.com/blog/?p=669</guid>
		<description><![CDATA[This is part of a series of posts about and leading up to the release of MacVector 12.5. Assembler has always made it easy to assemble your sequencing projects. It hides the complicated algorithms and provides a point and click interface to show you the results. With the release of MacVector 12.5 the range of [...]]]></description>
			<content:encoded><![CDATA[<p><em>This is part of a series of posts about and leading up to the release of MacVector 12.5.</em></p>
<p> Assembler has always made it easy to assemble your sequencing projects. It hides the complicated algorithms and provides a point and click interface to show you the results. With the release of MacVector 12.5 the range of tasks that Assembler will perform has been greatly extended. Now with the addition of high throughput reference assembly using the popular Bowtie algorithm, Assembler can now support the alignment of many millions of NGS reads against genome sized references. Assembler will also generate output in the popular BAM format. Collections of contigs can be exported in FastQ format for additional analysis. Additionally the interface has been enhanced to increase the number of reads that can be submitted for de novo assembly. ..and to analyse your assemblies SNP detection and reporting has been enhanced with VCF output from Bowtie alignments and listing of all the codon and amino acid changes between the consensus and reference sequence alignments.</p>
<p>Here&#8217;s a selection of tasks that Assembler will make easy for you</p>
<h3>De novo assembly of Sanger trace files</h3>
<p>Add in your Sanger sequencing trace files, basecall the reads to improve accuracy then assemble using quality scores.</p>
<h3><em>de novo</em> short read Assembly</h3>
<p>Add short reads from a variety of sources in FASTQ format as well as Sanger sequencing. Great for Hybrid assemblies</p>
<p><img style="display:block; margin-left:auto; margin-right:auto;" src="http://macvector.com/blog/wp-content/uploads/2011/10/MV125_BowtieOverview.png" alt="MV125 BowtieOverview" title="MV125_BowtieOverview.png" border="0" width="600" height="425" /></p>
<h3>Reference assembly to identify SNPs in a bacterial/viral isolate</h3>
<p>Add a Reference sequence and an NGS file(s) representing the sequence of an individual isolate and assemble using Bowtie. View a report listing all the potential SNPs based on the differences between the consensus and the reference. Be able to quickly identify the genes that the SNPs lie in and drill down to view the nucleotide and amino acid changes.</p>
<h3><em>De novo</em> bacterial assembly assisted by a Reference scaffold</h3>
<p>Create a reference assembly using Bowtie, then take the individual contig consensus sequences along with all the input NGS Reads that did not assemble and assemble with a <em>de novo</em> assembler (Phrap) directly from within a new Assembly Project.</p>
<h3>Assembly to multiple similar references</h3>
<p>Take reference sequences from a series of closely related strains of virus or bacteria. The reads come from a single isolate. Use Bowtie to assemble these against the collection of References to determine which is most closely related (or identical) to the isolate.</p>
<h3>Assembly to multiple dissimilar references to identify SNPs</h3>
<p>Essentially similar to the bacterial SNP assembly, but using yeast or some other organism with multiple genomes (or even some bacteria that have multiple chromosomes or large plasmids).</p>
<h3>Exome (transcriptome) sequencing)</h3>
<p>Use the genomic sequence of an organism as a reference and align reads from sequenced mRNA, cDNA or total RNA of that same organism. Uses are splice site junction identification, novel gene identification amongst many others.</p>
<p><img style="display:block; margin-left:auto; margin-right:auto;" src="http://macvector.com/blog/wp-content/uploads/2011/10/MV125_ReferenceContigCoverageMapSymbols.png" alt="MV125 ReferenceContigCoverageMapSymbols" title="MV125_ReferenceContigCoverageMapSymbols.png" border="0" width="600" height="264" /></p>
<p><strong><br />
..and remember if you purchase an upgrade or a new license before the release of MacVector 12.5 you will get a 10% discount and a free upgrade to MacVector 12.5 when it is released.  Contact <a href="mailto:sales@macvector.com?Subject=MacVector%20discount ">Sales@macvector.com</a> and quote <em>&#8220;MV12510&#8243;</em> for the discount.</strong></p>
<p><!-- Technorati Tags Start --></p>
<p>Technorati Tags: <a href="http://technorati.com/tag/bowtie" rel="tag">bowtie</a>, <a href="http://technorati.com/tag/MacVector" rel="tag">MacVector</a></p>
<p><!-- Technorati Tags End --></p>
<div id="tweetbutton669" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fmacvector.com%2Fblog%2F2011%2F10%2Fmacvector-12-5-using-assembler%2F&amp;via=macvector&amp;text=MacVector%2012.5%3A%20Sequence%20Assembly%20made%20easy.&amp;related=macvector&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fmacvector.com%2Fblog%2F2011%2F10%2Fmacvector-12-5-using-assembler%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://macvector.com/blog/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://macvector.com/blog/2011/10/macvector-12-5-using-assembler/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Short read improvements to MacVector 11.1</title>
		<link>http://macvector.com/blog/2010/03/short-read-improvements-to-macvector-11-1/</link>
		<comments>http://macvector.com/blog/2010/03/short-read-improvements-to-macvector-11-1/#comments</comments>
		<pubDate>Mon, 15 Mar 2010 15:01:45 +0000</pubDate>
		<dc:creator>Chris</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Releases]]></category>
		<category><![CDATA[454]]></category>
		<category><![CDATA[illumina]]></category>
		<category><![CDATA[NGS]]></category>
		<category><![CDATA[solid]]></category>

		<guid isPermaLink="false">http://macvector.com/blog/?p=262</guid>
		<description><![CDATA[The ability to produce de novo assemblies of short read data was introduced into MacVector &#038; Assembler 11 and we&#8217;ve enhanced this in MacVector 11.1. Now Assembler stores and visualises metadata on the type of short read you have and also stores and deals with the quality data stored in a better way. Mixing your [...]]]></description>
			<content:encoded><![CDATA[<p>The ability to produce <em>de novo</em> assemblies of short read data was introduced into <a href="http://macvector.com/Assembler/assembler.html">MacVector &#038; Assembler 11</a> and we&#8217;ve enhanced this in <a href="http://macvector.com/MacVector/what%27snewinmacvector11.1.html">MacVector 11.1</a>. Now Assembler stores and visualises metadata on the type of short read you have and also stores and deals with the quality data stored in a better way.</p>
<p><strong>Mixing your reads</strong></p>
<p>Currently there are three major types of short read data about, due to the fact that there are mainly three next generation sequencers producing considerably different read lengths. </p>
<p>- Illumina reads which are generally 66bp in length (the first generation of Solexa sequencers produced reads of 33bp long).</p>
<p>- 454 reads which can be 400 to 500 bp long</p>
<p>- SOLiD reads which are around 50bp long.</p>
<p>It is important to know which sequencer your reads have come from as length is not the only specific characteristic they possess. So with MacVector 11.1 we have introduced new feature types to indicate which types of read it is. So whether you mix Sanger, with 454, and a dash of Illumina reads, your assembly project will always keep the source of the reads stored in the metadata of the project. Furthermore the default symbol for each read type is different, and can be easily visualised.</p>
<p><img src="http://macvector.com/blog/wp-content/uploads/2010/02/ShortReadFeatureType.png" alt="http://macvector.com/blog/wp-content/uploads/2010/02/ShortReadFeatureType.png" border="0" width="649" height="321" align="center" /></p>
<p><strong>Yet another sequence file format inconsistency!</strong></p>
<p>An unfortunate fact about file formats in the bioinformatics world is that there are just so many of them! Most of them giving a slightly different approach to the same question.  So it is nice that the <a href="http://en.wikipedia.org/wiki/FASTQ_format">Fastq</a> format seems to be already fairly ubiquitous amongst raw data storage for short read data. Which is the main reason that we chose it to support short read data in MacVector.    However, on the downside already there are three variations in the way that quality scores are stored in the format (actually there are variations in the sequence labels as well, but let&#8217;s just consider the quality scores). As well as the <a href="http://en.wikipedia.org/wiki/Phred_base_calling">basecalled</a> sequence, the Fastq format stores quality data encoded as a single character using the <a href="http://en.wikipedia.org/wiki/ASCII">ASCII</a> code for that character representing the value.  All three variations of the format use that strategy. However, the actual number stored and the way it is encoded into the ASCII character does vary.  The first generation of Illumina machines (when the company was still called Solexa) used a proprietary quality code.  The current generation of their sequencers use the more recognised <a href="http://en.wikipedia.org/wiki/Phred_quality_score">Phrap quality scores</a>. However, they do store this number differently to the more recognised &#8220;Sanger&#8221; format which stores a Phrap quality score with a range of 0 &#8211; 93 using ASCII codes of 33 to 126. It is very important to distinguish when data is in the old Solexa scores. It is less important to distinguish between the later Illumina format, as reads will only be relevant when scores are very high and such high Phrap scores are unlikely. </p>
<p>So as well as distinguishing which sequencer produced the data, Assembler now also support the import of Fastq reads in all three types of quality data, and will take this into account when being assembled.</p>
<div id="tweetbutton262" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fmacvector.com%2Fblog%2F2010%2F03%2Fshort-read-improvements-to-macvector-11-1%2F&amp;via=macvector&amp;text=Short%20read%20improvements%20to%20MacVector%2011.1&amp;related=macvector&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fmacvector.com%2Fblog%2F2010%2F03%2Fshort-read-improvements-to-macvector-11-1%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://macvector.com/blog/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://macvector.com/blog/2010/03/short-read-improvements-to-macvector-11-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MacVector 11.1 is now released</title>
		<link>http://macvector.com/blog/2010/02/macvector-11-1-is-now-released/</link>
		<comments>http://macvector.com/blog/2010/02/macvector-11-1-is-now-released/#comments</comments>
		<pubDate>Wed, 10 Feb 2010 15:37:11 +0000</pubDate>
		<dc:creator>Chris</dc:creator>
				<category><![CDATA[Releases]]></category>
		<category><![CDATA[NGS]]></category>

		<guid isPermaLink="false">http://macvector.com/blog/?p=260</guid>
		<description><![CDATA[MacVector 11.1 has just been released. You can check out the new features of this release on this blog post or a fuller list is available here. Existing customers can download and install it now. Although as usual you will be notified by MacVector&#8217;s online notifier in due course. We are pleased with this release [...]]]></description>
			<content:encoded><![CDATA[<p>MacVector 11.1 has just been released. You can check out the new features of this release on <a href="http://macvector.com/blog/?p=254">this blog post</a> or a fuller list is available <a href="http://www.macvector.com/MacVector/what%27snewinmacvector11.1.html">here</a>.</p>
<p>Existing customers can <a href="http://www.macvector.com/downloads.html#MacVector111Updater">download</a> and install it now. Although as usual you will be notified by MacVector&#8217;s online notifier in due course. <img class="aligncenter size-medium wp-image-87" title="msa2" src="http://macvector.com/blog/wp-content/uploads/2010/02/OnlineNotifierMV_11.1.png" alt="msa2" width="600" height="402" /></p>
<p>We are pleased with this release and we hope you are too! Please feel free to let us know what you think!</p>
<p><em>Please note that this is a download only release and CDs will not be sent to existing customers.</em></p>
<div id="tweetbutton260" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fmacvector.com%2Fblog%2F2010%2F02%2Fmacvector-11-1-is-now-released%2F&amp;via=macvector&amp;text=MacVector%2011.1%20is%20now%20released&amp;related=macvector&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fmacvector.com%2Fblog%2F2010%2F02%2Fmacvector-11-1-is-now-released%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://macvector.com/blog/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://macvector.com/blog/2010/02/macvector-11-1-is-now-released/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Next Generation sequencing formats</title>
		<link>http://macvector.com/blog/2009/06/next-generation-sequencing-formats/</link>
		<comments>http://macvector.com/blog/2009/06/next-generation-sequencing-formats/#comments</comments>
		<pubDate>Wed, 24 Jun 2009 13:26:19 +0000</pubDate>
		<dc:creator>Chris</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[NGS]]></category>

		<guid isPermaLink="false">http://macvector.com/blog/?p=100</guid>
		<description><![CDATA[As is common with the lack of standards seen with most emerging technologies there are many different and competing types of sequencing file formats for storage of short read or next generation sequencing data. All these formats try to solve the same question of storing an almost unprecedented amount of sequence data in a useable [...]]]></description>
			<content:encoded><![CDATA[<p>As is common with the lack of standards seen with most emerging technologies there are many different and competing types of sequencing file formats for storage of short read or next generation sequencing data. All these formats try to solve the same question of storing an almost unprecedented amount of sequence data in a useable and complete format. However, one emerging format that seems very appropriate for this type of data is <a href="http://en.wikipedia.org/wiki/FASTQ_format">Fastq</a>.  </p>
<p><a href="http://en.wikipedia.org/wiki/Phrap">Phrap</a> was one of the first real high throughput assemblers that could also deal with quality scores (generated by its stablemate <a href="http://en.wikipedia.org/wiki/Phred_base_calling">Phred</a>). The general input files to Phrap are a single Fasta file containing the reads, and an associated Qual file that contains quality scores for each and every read in the Fasta file. I&#8217;m not sure what it is about bioinformaticians, but they always feel the need to add Yet Another Format rather than reuse one of the many decent formats.  However, in this case Fastq is a logical progression of the Fasta + Qual format in that the two individual files are now merged. That is each read comprises of four lines; A label header, a sequence, and second label header and a quality score line.  </p>
<p><em>Here&#8217;s an example of such a file. Note that this example has a single &#8216;+&#8217; character to indicate the quality label line, rather than a duplicate of the label.</em></p>
<pre>@ERR000955.3982 IL6_1091:3:1:210:502/1
TCCAAACACACTTTGTGTAGAATCTGCAAGTGGAGAT
+
>>>>>>>>>>>>>>>>>>>;>>>>><>>>;>>;><>></pre>
<p>Many of the raw file formats (sff/ SRF etc) are big! They contain the raw image files as well as basecalled sequence and quality data.  Fastq files are at the complete opposite end.  They are small, and only contain minimal data. They may contain millions of reads, yet are still in the less than a tenth of a Gigabyte range for a single run&#8217;s data. It is worthwhile noting that the various Short Read Archives (NCBI, EBI etc.) require the submission of original raw image files, but only allow the reads to be downloaded in fastq format.</p>
<p>The quality line must comprise of the same number of characters as bases. i.e. one quality character per base. However, most quality scores are double digits. Fastq gets around this by using an ASCII character to encode the quality score. However, here&#8217;s where consistency fails. The quaility line mat be one of three different types.  Sanger format will encode a <a href="http://en.wikipedia.org/wiki/Phred_quality_score">Phred quality score</a> of 0 &#8211; 60 using the ASCII characters 33 to 93. The latest Illumina 1.3 format will also contain a Phred Quality score from 0 to 40 however, this time encoded using ASCII 64 to 104. Finally the older Illumina (nee Solexa) 1.0 format has its own Solexa/Illumina quality score from -5 to 40 encoded using ASCII 59 to 104.  Of course this does now pose problems, as unless you know which quality score was used, there is now way of knowing without guesswork, which it is.</p>
<p>There are also other issues with this format. It could be said that the label line for the quality score line is redundant, and the filesize could be reduced by 25% if this was removed. Some applications do generate and accept fastq files that have a single &#8216;@&#8217; or &#8216;+&#8217; in place of the quality label line. </p>
<p>It would  be helpful to see a tightening of this format and indeed there is a <a href="http://illumina.ucr.edu/ht/documentation/standardized-fastq-format-aka-fastq2">fastq2</a> format that does not have these weaknesses.</p>
<p>With the next release of MacVector and Assembler during the Summer we will be adding support for the Fastq format. We will also be adding support for <i>de novo</i> assembly of short read data. This release is currently in internal beta testing, and will be out for a public beta trial soon.</p>
<div id="tweetbutton100" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fmacvector.com%2Fblog%2F2009%2F06%2Fnext-generation-sequencing-formats%2F&amp;via=macvector&amp;text=Next%20Generation%20sequencing%20formats&amp;related=macvector&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fmacvector.com%2Fblog%2F2009%2F06%2Fnext-generation-sequencing-formats%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://macvector.com/blog/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://macvector.com/blog/2009/06/next-generation-sequencing-formats/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

