Last week we covered the fact that you can use File->Export to save sequences or alignments in different formats. Delving down deeper into this, some of the views will save different types of data based on what you select in the format menu. The best example of this is the Contig Editor. If you choose File->Export… in this window you get the following File Format options;
MacVector Reference Assembly – this saves a copy of the alignment in MacVector format. It is the same as choosing File->Save As….
MacVector NA Sequence – this saves the Reference sequence only as a MacVector formatted .nucl file. If you have edited the Reference, this is the easy way to save a copy of it under a new filename.
FASTA Multiple Sequence – this saves any selected reads into a single fasta-formatted multiple sequence file. Use this if you have aligned a large number of sequences and want to save a subset of them into a single file for additional analysis (e.g. assembly).
FASTQ Multiple Sequence – similar to the fasta option, but saves in fastq format. This format includes quality information.
Over the years, Apple have changed the recommended layout and functionality of the File menu. At MacVector, we always try to follow Apple’s guidelines wherever possible so that new users will feel comfortable with familiar menus. However, long-term users may experience some confusion, particularly if you want to save sequences in a different format. Here is the definitive guide to the Save options in MacVector.
Save – this saves the current sequence (or alignment) in MacVector format.
Save As… – this saves the current sequence (or alignment) under a different name, but always in MacVector format. The open sequence/alignment window will change its name so that any subsequent Saves will act on the new file.
Export… – this is the menu option to use if you want to save a copy of the sequence/alignment in a different format e.g. GenBank, EMBL, fasta, fastq etc. There is a file format option displayed in the resulting dialog that lets you choose the format of the file. As implied in the menu name, this does not change the name or file location of the sequence/alignment.
Export Tab Contents As… – All of the above menu options act on the entire document. This menu option applies only to the data displayed in the current tab. So, in the example above, which is a graphical view of the sequence, selecting this will give you the option of saving the image in a variety of formats (PDF, tiff, png etc). Tabs that contain text will give options to save in different text formats, whereas “spreadsheet” type views give the option to save in tab-delimited or comma delimited formats.
Posted in Tips
The floating graphics palette is extremely useful for helping to configure the display of the Map tab. You can easily toggle the visibility of any individual or group of features and adjust many aspects of the display. MacVector does try to remember the last position you moved this to so its there ready for you when you restart MacVector. Sometimes, especially if you plug into external monitors then close your computer and open it up without shutting down MacVector, that location may be off-screen. While MacVector does try to compensate for this, there are occasions where the Graphics palette stubbornly remains offscreen. If this happens to you, the easiest way to restore the graphics palette is to download and run this simple utility.
When you run it, make sure you click on the “Yes” button!
The utility will attempt to close and re-open MacVector after you click “Yes”. You will be prompted if there are any unsaved documents. Generally, it is better to run this after you have quit MacVector.
We’ve talked in previous tips about annotating open reading frames as CDS features. However, what if your sequence has no annotated ORF? MacVector’s ANALYZE | OPEN READING FRAMES… tool will help you find any quickly.
However, if you are new to this tool there are a few options that may prove initially confusing. These options modify how ORFs are detected. They are intended to help you find ORFs when you may only have a partial fragment of a gene. So if you’ve just ran an ORF analysis and find too many CDS regions or if you cannot find an ORF and you are not confident you have the full gene, then remember this email!
3’ ENDS ARE STOPS means that in any sequence it looks at each reading frame and checks whether the last full CODON would give a ORF, that is longer than the minimum number of codons. Put simply it assumes that the end of the sequence is the end of a coding region, even if no STOP codons are present.
5’ ENDS ARE STARTS does the reverse. That is assumes the start of the sequence is a START codon.
CODONS AFTER STOPS ARE STARTS assumes that after every STOP codon there is a new START codon. Even if the actual codon is not present.
In the two screenshots below we have a sequence with a single long CDS region (in blue). If we have these two options turned on then you will see three ORFS on the three forward frames (top image). If you turn them off you see a single CDS (bottom image).
Don’t forget to annotate your CDS region when you’ve found it. Many tools in MacVector will show additional information about a coding region if it is annotated as a CDS region. If you simply drag the ORF from the RESULTS window it will automatically annotated that as a CDS region. It will also include the coding region’s translation directly in the annotation.
Although QuickTest Primer is intended for designing primers, the interface is very flexible. If your sequence is not too long, you can use the Quickest Primer interface to scroll through a sequence and visually look for hairpins appearing in the hairpin pane. The easiest way to do this is to select the first ~100 nt of the sequence then choose Analyze->QuickTest Primer (Individual).
You can nudge the “primer” to the right (or left) using the arrow buttons and the “best” hairpin will show up in the outlined pane. You can scroll through quite quickly, especially if you use the Settings button to turn off One-out Restriction Enzymes. Expect to scroll through about 10 residues per second with a 10kb sequence (the larger the sequence, the slower the scrolling will be). You wouldn’t want to do this with a genome, but if you are looking for transcriptional terminators at the end of specific prokaryotic genes, this should work quite well.
It’s very quick to download the latest version of a sequence if you know its accession number. When you start working with a new sequence, it’s the best place to start.
Go to DATABASE > ENTREZ
Enter the accession number of your favorite sequence
Double click on the result to open up your sequence directly in MacVector.
If you do not know the accession number, then it’s still easy, but you might need to perform a more complex search to only retrieve a few hits. For example “ORGANISM=Homo sapiens, GENE=“Presenilin”
If you just want to “refresh” your own copy of a sequence with the latest published annotation, then use Import Features instead.
Remember that due to changes at the NCBI, BLAST and Entrez will only work in MacVector 15.1 and later.
When you run a Blast search, as well as a list of hits, you will get a list of alignments between your query sequence and each hit. As with most other text alignments in MacVector, identical matches are by default represented by a vertical line (a score greater than 1) and mismatches (whether similar or not) are represented with a space.
However, sometimes you are more interested in identifying gaps or mismatches in the Blast hits. For example when you are looking for mismatches in motifs or other protein domains or looking for SNPs in DNA sequences.
Most results and displays in MacVector are customizable, and BLAST alignments are no exception. You can also change the length of each line.
To change the match characters:
Open OPTIONS | ALIGNED SEQUENCE
In the LINES panel change the SCORELINE match characters.
The default is to display a vertical line for a hit, and a space for mismatches. In the screenshot below we have changed matches to a space, a “-“ for scores between 1 and -1 and “|” for mismatches with a score less than -1. This makes mismatches very noticeable whilst scrolling through the aligned sequence results.
All these changes will be the new defaults until you reset them.
Open OPTIONS | TEXT VIEW
In the APPEARANCE panel change the LINE LENGTH to the length you prefer.
To change the line length
Remember that due to changes at the NCBI, BLAST and Entrez will only work in MacVector 15.1 and later.
The MacVector Assembler module lets you create projects, populate them with Sanger Sequencing or NGS data files (or any sequences in a format that MacVector can read) and then assemble them using the popular phrap and/or Velvet assemblers. Typically, the result will be a collection of contigs that you might want to use in additional analyses. Simply select the contigs you are interested in and choose File | Export…
In the file dialog that appears, make sure you select either the fasta or fastq options to save all of the consensus sequences into a single file.
You can use the saved file in additional assembly experiments, or as a “database” for Align To Folder searches, or import them into an Align To Reference assembly.
The Database | Align To Folder… function is essentially your own personal BLAST search of sequences on your computer, but with the advantage that you can scan fasta/fastq containing millions of entries and retrieve matching Reads into a new file. MacVector 14.5 added an enhancement where you can search paired-end read files and retrieve both reads of a pair into a new pair of files. The great advantage of this approach is that even if only one Read of a pair matches your search sequence, both will be retrieved and placed into a pair of files. You can then use these “filtered” reads in other analyses, such as Contig Assembly or Analyze | Align To Reference.
There is a checkbox in the Align To Folder set up sheet to alert MacVector that you are using pairs of files. This examples shows that you can start with a protein sequence and search for hits in a folder of DNA sequences. After alignment is complete, you can select hits of interest in the Folder Description List tab, then retrieve the Reads using the Database | Retrieve To File function.
When the hits are retrieved, you will see a pair of files in the destination folder – the matching paired Reads are maintained in order in the two files ready for additional analysis.
After a BLAST search, you can retrieve matching sequences from the Description List results tab. What you may not know is that you can do a similar thing from the Aligned Sequences result tab.
One advantage of this approach is that (as in the example above) sometimes there are multiple accession numbers for a hit. Simply select all of the rows containing accession numbers you are interested in, then choose Database | Retrieve to Desktop to download those sequences and open them as windows in MacVector. Alternatively, Database | Retrieve to Disk can be used to download and save them to a folder on your hard drive.