Search fastq files and retrieve matching reads into paired fastq files

The Database | Align To Folder… function is essentially your own personal BLAST search of sequences on your computer, but with the advantage that you can scan fasta/fastq containing millions of entries and retrieve matching Reads into a new file. MacVector 14.5 added an enhancement where you can search paired-end read files and retrieve both reads of a pair into a new pair of files. The great advantage of this approach is that even if only one Read of a pair matches your search sequence, both will be retrieved and placed into a pair of files. You can then use these “filtered” reads in other analyses, such as Contig Assembly or Analyze | Align To Reference.

NewImage

There is a checkbox in the Align To Folder set up sheet to alert MacVector that you are using pairs of files. This examples shows that you can start with a protein sequence and search for hits in a folder of DNA sequences. After alignment is complete, you can select hits of interest in the Folder Description List tab, then retrieve the Reads using the Database | Retrieve To File function.

NewImage

When the hits are retrieved, you will see a pair of files in the destination folder – the matching paired Reads are maintained in order in the two files ready for additional analysis.

Posted in Tips | Tagged , , | Comments closed

How to retrieve BLAST hits from the Aligned Sequences result tab

After a BLAST search, you can retrieve matching sequences from the Description List results tab. What you may not know is that you can do a similar thing from the Aligned Sequences result tab.

NewImage

One advantage of this approach is that (as in the example above) sometimes there are multiple accession numbers for a hit. Simply select all of the rows containing accession numbers you are interested in, then choose Database | Retrieve to Desktop to download those sequences and open them as windows in MacVector. Alternatively, Database | Retrieve to Disk can be used to download and save them to a folder on your hard drive.

Posted in Techniques, Tips | Tagged , | Comments closed

Displaying CDS features as translations in the Map tab.

MacVector uses CDS features extensively in many areas. If you know the coding region, then it’s very useful to have that annotated to your sequence. For example you can display a CDS feature as its translation directly under the sequence in the Editor tab. You can also display the translation of a feature in the Map tab, instead of a graphical symbol, when there is sufficient space (for example when zoomed to residue). By default this is enabled for certain features, e.g. CDS features, genes, but it is controlled from the Symbol Editor and can be turned on/off for most features.

  • In the Map tab, double click on a feature to edit it.
  • Change the dropdown menu to Show as Graphic to disable this.
  • Select Show Residue Letters if Room to enable it.
  • For example, if this is enabled for a CDS feature when zoomed to residue the amino acid will be shown.

    NewImage

    See this and this blog post for more details.

    Posted in Tips | Tagged | Comments closed

    Screening for CRISPR Indels using Align To Reference

    MacVector’s Analyze | Align To Reference… tool is ideal for screening reads for the short insertions, deletions or substitutions resulting from CRISPR experiments. Simply open your reference sequence, choose Analyze | Align To Reference…, click on the Add Seqs toolbar button to add reads from different clones/experiments, then click on Align to align the reads against the reference. MacVector 15.0.1 introduced a new menu option in the Align dialog that lets you quickly set up parameters optimized for CRISPR indel analysis. The new option cleanly aligns and identifies the full range of changes that you might see.

    NewImage

    In addition, the new tool has been some tweaks to the alignment algorithm so that you get cleaner displays of insertions and deletions around the target site.

    NewImage

    Posted in Tips | Tagged , | Comments closed

    How to align DNA sequences based on their amino acid translations

    A new tool in MacVector 15 allows you to align DNA sequences based on their amino acid translated sequence.

    For most alignments in MacVector you will use the Multiple Sequence Alignment tool. This allows you to align DNA or protein sequences using either Muscle, Clustalw or T-Coffee. MacVector 15 now allows you to align DNA sequences based on their amino acid translations. You can display DNA sequences and their translations at the same time, or just the translations. Then align the protein sequences using ClustalW, Muscle or T-Coffee to see the effect on the underlying DNA sequences.

  • FILE | NEW | DNA ALIGNMENT
  • EDIT | ADD SEQUENCES FROM FILE..
  • Click on the mode toolbar button.
  • Select VirtualAA to show just the translations or NA & VirtualAA to show the original DNA sequences and their translations too.
  • Click on the Align toolbar button
  • ProteinAlignment

    Posted in Tips | Tagged , | Comments closed

    Functional domain analysis of protein sequences using InterProScan

    There’s a new tool in MacVector 15 that allows you to do functional domain analysis on your protein sequence using the InterProScan service. InterPro contains multiple databases of protein families, domains and motifs and InterProScan will submit a protein sequence to a search of these databases. It will also do extra analysis such as transmembrane region analysis using TMHMM and other tools. For MacVector 15 you can submit your protein sequence to an InterProScan search and also annotate results directly back to your sequence.

  • Open your protein sequence.
  • Run DATABASE | INTERPROSCAN SEARCH
  • When the job has finished click VIEW to see the results.
  • Click any “hotlinks” to see the original database entry for a “hit”.
  • Choose the most appropriate result for the particular hit that you want to annotate back to your protein sequence.
  • Click the small cross at the right side of the hit.
  • Switch to the FEATURES tab of your protein sequence.
  • Find the PROTEIN_MATCH feature that you have just added and double click on it.
  • In the FEATURES EDITOR click on the FEATURES KEYWORD list and choose the most appropriate FEATURE KEYWORD for your new feature. For example for an Unintegrated Signature from TMHelix choose TRANSMEM.
  • NewImage

    Posted in Techniques, Tips | Tagged , | Comments closed

    Create your own Primer Database.nsub from an Excel spreadsheet

    Wouldn’t it be great if there was an easy way of converting that huge primer collection you have into a format that MacVector can use? Well, luckily there is! There is a utility called “Primer Converter” that you can download from our website.

    To use the utility you first need to get your primers into Microsoft Excel (or other spreadsheet application) organized into three columns – “Name”, “Sequence”, “Comment”, where the comment is optional. If your primers have tails, make sure they are in lower case. Finally, export the data as a comma separated value (.csv) file.

    Now run Primer Converter, select your source file and you’ll get a screen something like this

    NewImage

    The mismatches control lets you decide how many mismatches can be present and still consider the primer to “bind”. You can override this in the Analyze | Primer Database Search… dialog. Finally, click on the Save In MacVector format button – this will save the data as an .nsub file which you can use instead of the default file for Primer Database Searches.

    Posted in Techniques, Tips | Tagged , , | Comments closed

    Use Analyze->Primer Database Search to scan sequences for primer binding sites

    It’s easy to keep track of all your primers using a MacVector subsequence (.nsub) file. We even ship MacVector with a starter file called “Primer Database.nsub” that you can find in the /Applications/MacVector/Subsequences/ folder. If you choose Analyze | Primer Database Search… and use this default file to search pUC19, you’ll find many of the universal primers bind on either side of the multiple cloning sites. Here I’ve zoomed in the display to make this clearer.

    NewImage

    This is very similar to the Restriction Enzyme search results display, so there are also text tabs containing a list of all the binding sites, along with all the primers that did not bind.

    BONUS TIP: If you click on one of the plus strand primers, hold down and click on one of the minus strand primers, then choose Edit | Copy, this will copy the sequence of the predicted PCR product to the Clipboard. Then use File | New From Clipboard to create a new PCR product sequence, complete with all of the features between the two primers.

    Posted in Tips | Tagged , , , | Comments closed

    Batch auto annotation of blank sequences with MacVector and AppleScript

    Over the past few releases we’ve been making more MacVector tools to be scriptable with AppleScript. The latest is Auto Annotation.

    Auto Annotation is a great tool for curating your sequences. For example if you receive a unannotated sequence then you can scan it against other sequences to find. Blasting an unknown sequence, fetching the hits, then annotating your unknown sequence against those hits is a good way to find out information about your unknown sequence too. Plus you can also give sequences a consistent look and feel. For example making all CDS features to be green, or all AMP genes to be a blue arrow.

    However, it only works on one sequence at a time.

    The following script will take a folder of blank sequences, then scan them against a curated folder of annotated sequences. It will add any annotation it finds.

    The script is very simple and could be enhanced much more. But it serves as a basis if you need to develop your own scripts. If you need any assistance with developing scripts for a specific purpose, please do email support. We will be happy to help.

    set inputFolder to (choose folder with prompt "Select Folder of MV files to annotate:")
    set LibraryFolder to (choose folder with prompt "Select Folder containing your curated library of sequences to use")
    
    tell application "Finder"
    	set AllFiles to every file of folder inputFolder
    end tell
    
    tell application "MacVector"
    	repeat with f in AllFiles
    		activate
    		--Now open the file
    		open f
    		delay 0.3 -- wait a little bit until MV has opened the file
    		set docRef to (a reference to the first document)
    		set modifiable of docRef to 1 -- unlock
    		annotate docRef replacing existing features by searching LibraryFolder with recursion
    		-- get files considered in annotate results of docRef
    		save docRef
    		close docRef
    	end repeat
    end tell
    
    

    There’s a longer version of the script in the MacVector application folder.

    Posted in Uncategorized | Tagged , | Comments closed

    MacVector For Windows update

    It’s been some time since since the last update about MacVector for Windows, although development has certainly not been quiet.

    Here’s some screenshots.

    Human mtDNA

    Graphical Map

    PDESTR3R4

    Sequence Editor

    Strands

    Automatic Restriction Enzyme and ORF Analysis

    MapTabResidue1

    Primer Design

    QT primer

    Cloning Clipboard

    CloningClipboard2

    Posted in Releases | Tagged | Comments closed