Search fastq files and retrieve matching reads into paired fastq files

The Database | Align To Folder… function is essentially your own personal BLAST search of sequences on your computer, but with the advantage that you can scan fasta/fastq containing millions of entries and retrieve matching Reads into a new file. MacVector 14.5 added an enhancement where you can search paired-end read files and retrieve both reads of a pair into a new pair of files. The great advantage of this approach is that even if only one Read of a pair matches your search sequence, both will be retrieved and placed into a pair of files. You can then use these “filtered” reads in other analyses, such as Contig Assembly or Analyze | Align To Reference.

NewImage

There is a checkbox in the Align To Folder set up sheet to alert MacVector that you are using pairs of files. This examples shows that you can start with a protein sequence and search for hits in a folder of DNA sequences. After alignment is complete, you can select hits of interest in the Folder Description List tab, then retrieve the Reads using the Database | Retrieve To File function.

NewImage

When the hits are retrieved, you will see a pair of files in the destination folder – the matching paired Reads are maintained in order in the two files ready for additional analysis.

This entry was posted in Tips and tagged , , . Bookmark the permalink. Both comments and trackbacks are currently closed.