General musings from the MacVector team about sequence analysis, molecular biology, the Mac in general and of course your favorite sequence analysis app for the Mac!

Search fastq files and retrieve matching reads into paired fastq files

The Database | Align To Folder… function is essentially your own personal BLAST search of sequences on your computer, but with the advantage that you can scan fasta/fastq containing millions of entries and retrieve matching Reads into a new file. MacVector 14.5 added an enhancement where you can search paired-end read files and retrieve both reads of a pair into a new pair of files. The great advantage of this approach is that even if only one Read of a pair matches your search sequence, both will be retrieved and placed into a pair of files. You can then use these “filtered” reads in other analyses, such as Contig Assembly or Analyze | Align To Reference.

NewImage

There is a checkbox in the Align To Folder set up sheet to alert MacVector that you are using pairs of files. This examples shows that you can start with a protein sequence and search for hits in a folder of DNA sequences. After alignment is complete, you can select hits of interest in the Folder Description List tab, then retrieve the Reads using the Database | Retrieve To File function.

NewImage

When the hits are retrieved, you will see a pair of files in the destination folder – the matching paired Reads are maintained in order in the two files ready for additional analysis.

This entry was posted in Tips and tagged , , . Bookmark the permalink. Both comments and trackbacks are currently closed.