General musings from the MacVector team about sequence analysis, molecular biology, the Mac in general and of course your favorite sequence analysis app for the Mac!

Using QuickTest Primer to check for hairpins in sequences, and not just primers.

Although QuickTest Primer is intended for designing primers, the interface is very flexible. If your sequence is not too long, you can use the Quickest Primer interface to scroll through a sequence and visually look for hairpins appearing in the hairpin pane. The easiest way to do this is to select the first ~100 nt of the sequence then choose Analyze->QuickTest Primer (Individual).

You can nudge the “primer” to the right (or left) using the arrow buttons and the “best” hairpin will show up in the outlined pane. You can scroll through quite quickly, especially if you use the Settings button to turn off One-out Restriction Enzymes. Expect to scroll through about 10 residues per second with a 10kb sequence (the larger the sequence, the slower the scrolling will be). You wouldn’t want to do this with a genome, but if you are looking for transcriptional terminators at the end of specific prokaryotic genes, this should work quite well.

NewImage

Posted in Tips | Tagged , , | Comments closed

Download the latest published version of your favorite sequence with its accession number

It’s very quick to download the latest version of a sequence if you know its accession number. When you start working with a new sequence, it’s the best place to start.

  • Go to DATABASE > ENTREZ
  • Enter the accession number of your favorite sequence
  • Click SEARCH
  • Double click on the result to open up your sequence directly in MacVector.
  • If you do not know the accession number, then it’s still easy, but you might need to perform a more complex search to only retrieve a few hits. For example “ORGANISM=Homo sapiens, GENE=“Presenilin”

    If you just want to “refresh” your own copy of a sequence with the latest published annotation, then use Import Features instead.

    Remember that due to changes at the NCBI, BLAST and Entrez will only work in MacVector 15.1 and later.

    NewImage

    Posted in Tips | Tagged , , | Comments closed

    Customizing BLAST alignment results to make mismatches more noticeable

    When you run a Blast search, as well as a list of hits, you will get a list of alignments between your query sequence and each hit. As with most other text alignments in MacVector, identical matches are by default represented by a vertical line (a score greater than 1) and mismatches (whether similar or not) are represented with a space.

    However, sometimes you are more interested in identifying gaps or mismatches in the Blast hits. For example when you are looking for mismatches in motifs or other protein domains or looking for SNPs in DNA sequences.

    Most results and displays in MacVector are customizable, and BLAST alignments are no exception. You can also change the length of each line.

    To change the match characters:

  • Open OPTIONS | ALIGNED SEQUENCE
  • In the LINES panel change the SCORELINE match characters.
  • The default is to display a vertical line for a hit, and a space for mismatches. In the screenshot below we have changed matches to a space, a “-“ for scores between 1 and -1 and “|” for mismatches with a score less than -1. This makes mismatches very noticeable whilst scrolling through the aligned sequence results.

    NewImage

    All these changes will be the new defaults until you reset them.
    To change the line length

  • Open OPTIONS | TEXT VIEW
  • In the APPEARANCE panel change the LINE LENGTH to the length you prefer.
  • Remember that due to changes at the NCBI, BLAST and Entrez will only work in MacVector 15.1 and later.

    Posted in Techniques, Tips | Tagged | Comments closed

    How to save contig consensus sequences from assembly projects

    The MacVector Assembler module lets you create projects, populate them with Sanger Sequencing or NGS data files (or any sequences in a format that MacVector can read) and then assemble them using the popular phrap and/or Velvet assemblers. Typically, the result will be a collection of contigs that you might want to use in additional analyses. Simply select the contigs you are interested in and choose File | Export…

    NewImage

    In the file dialog that appears, make sure you select either the fasta or fastq options to save all of the consensus sequences into a single file.

    NewImage

    You can use the saved file in additional assembly experiments, or as a “database” for Align To Folder searches, or import them into an Align To Reference assembly.

    Posted in Techniques, Tips | Tagged , , | Comments closed

    Search fastq files and retrieve matching reads into paired fastq files

    The Database | Align To Folder… function is essentially your own personal BLAST search of sequences on your computer, but with the advantage that you can scan fasta/fastq containing millions of entries and retrieve matching Reads into a new file. MacVector 14.5 added an enhancement where you can search paired-end read files and retrieve both reads of a pair into a new pair of files. The great advantage of this approach is that even if only one Read of a pair matches your search sequence, both will be retrieved and placed into a pair of files. You can then use these “filtered” reads in other analyses, such as Contig Assembly or Analyze | Align To Reference.

    NewImage

    There is a checkbox in the Align To Folder set up sheet to alert MacVector that you are using pairs of files. This examples shows that you can start with a protein sequence and search for hits in a folder of DNA sequences. After alignment is complete, you can select hits of interest in the Folder Description List tab, then retrieve the Reads using the Database | Retrieve To File function.

    NewImage

    When the hits are retrieved, you will see a pair of files in the destination folder – the matching paired Reads are maintained in order in the two files ready for additional analysis.

    Posted in Tips | Tagged , , | Comments closed

    How to retrieve BLAST hits from the Aligned Sequences result tab

    After a BLAST search, you can retrieve matching sequences from the Description List results tab. What you may not know is that you can do a similar thing from the Aligned Sequences result tab.

    NewImage

    One advantage of this approach is that (as in the example above) sometimes there are multiple accession numbers for a hit. Simply select all of the rows containing accession numbers you are interested in, then choose Database | Retrieve to Desktop to download those sequences and open them as windows in MacVector. Alternatively, Database | Retrieve to Disk can be used to download and save them to a folder on your hard drive.

    Posted in Techniques, Tips | Tagged , | Comments closed

    Displaying CDS features as translations in the Map tab.

    MacVector uses CDS features extensively in many areas. If you know the coding region, then it’s very useful to have that annotated to your sequence. For example you can display a CDS feature as its translation directly under the sequence in the Editor tab. You can also display the translation of a feature in the Map tab, instead of a graphical symbol, when there is sufficient space (for example when zoomed to residue). By default this is enabled for certain features, e.g. CDS features, genes, but it is controlled from the Symbol Editor and can be turned on/off for most features.

  • In the Map tab, double click on a feature to edit it.
  • Change the dropdown menu to Show as Graphic to disable this.
  • Select Show Residue Letters if Room to enable it.
  • For example, if this is enabled for a CDS feature when zoomed to residue the amino acid will be shown.

    NewImage

    See this and this blog post for more details.

    Posted in Tips | Tagged | Comments closed

    Screening for CRISPR Indels using Align To Reference

    MacVector’s Analyze | Align To Reference… tool is ideal for screening reads for the short insertions, deletions or substitutions resulting from CRISPR experiments. Simply open your reference sequence, choose Analyze | Align To Reference…, click on the Add Seqs toolbar button to add reads from different clones/experiments, then click on Align to align the reads against the reference. MacVector 15.0.1 introduced a new menu option in the Align dialog that lets you quickly set up parameters optimized for CRISPR indel analysis. The new option cleanly aligns and identifies the full range of changes that you might see.

    NewImage

    In addition, the new tool has been some tweaks to the alignment algorithm so that you get cleaner displays of insertions and deletions around the target site.

    NewImage

    Posted in Tips | Tagged , | Comments closed

    How to align DNA sequences based on their amino acid translations

    A new tool in MacVector 15 allows you to align DNA sequences based on their amino acid translated sequence.

    For most alignments in MacVector you will use the Multiple Sequence Alignment tool. This allows you to align DNA or protein sequences using either Muscle, Clustalw or T-Coffee. MacVector 15 now allows you to align DNA sequences based on their amino acid translations. You can display DNA sequences and their translations at the same time, or just the translations. Then align the protein sequences using ClustalW, Muscle or T-Coffee to see the effect on the underlying DNA sequences.

  • FILE | NEW | DNA ALIGNMENT
  • EDIT | ADD SEQUENCES FROM FILE..
  • Click on the mode toolbar button.
  • Select VirtualAA to show just the translations or NA & VirtualAA to show the original DNA sequences and their translations too.
  • Click on the Align toolbar button
  • ProteinAlignment

    Posted in Tips | Tagged , | Comments closed

    Functional domain analysis of protein sequences using InterProScan

    There’s a new tool in MacVector 15 that allows you to do functional domain analysis on your protein sequence using the InterProScan service. InterPro contains multiple databases of protein families, domains and motifs and InterProScan will submit a protein sequence to a search of these databases. It will also do extra analysis such as transmembrane region analysis using TMHMM and other tools. For MacVector 15 you can submit your protein sequence to an InterProScan search and also annotate results directly back to your sequence.

  • Open your protein sequence.
  • Run DATABASE | INTERPROSCAN SEARCH
  • When the job has finished click VIEW to see the results.
  • Click any “hotlinks” to see the original database entry for a “hit”.
  • Choose the most appropriate result for the particular hit that you want to annotate back to your protein sequence.
  • Click the small cross at the right side of the hit.
  • Switch to the FEATURES tab of your protein sequence.
  • Find the PROTEIN_MATCH feature that you have just added and double click on it.
  • In the FEATURES EDITOR click on the FEATURES KEYWORD list and choose the most appropriate FEATURE KEYWORD for your new feature. For example for an Unintegrated Signature from TMHelix choose TRANSMEM.
  • NewImage

    Posted in Techniques, Tips | Tagged , | Comments closed