101 things you (maybe) didn’t know about MacVector: #32 – Understanding The Sequence Find Function

We get quite a lot of support requests from users unsure of how to use the Find functionality in MacVector. It has changed somewhat over the years to try to simplify the interface, but there are still a few things to be aware of.

You can invoke the Find function by bringing a sequence window (DNA or Protein) to the front and choosing the Edit | Find | Find… menu item;

Find.png

Perhaps the most important setting to consider is the Strand: popup menu. By default, this is set to only search the Plus strand. However, if you are looking for sequences (e.g. primers) that might be on either strand, you should set the menu to Both;

FindBothStrands.png

You do have to be aware of what will happen if you search both strands. Lets look at an example using ATG. First, note that pressing the Find button will always find the first matching sequence in the file, starting at the 5′ end. But if we do this with pBR322 we see not ATG highlighted, but CAT;

CAT found.png

Its easier to understand what is going on if we turn on the minus strand using the Strands toolbar button;

CATMinus.png

Here you can see the ATG on the minus strand, reading from right to left. MacVector doesn’t currently have a way of notifying you of the strand the match was found on, so you need to be aware that minus strand hits will contain your search sequence reversed and complemented.

The Find Next button always finds the next matching sequence (if one is present) starting at the end of the current selection (or from the insertion point if there is no selection). If you continue to press the Find Next button you will find the highlighted region in pBR322 alternating between ATG and CAT matches until the end of the sequence is reached.

There is currently no way of getting the find function to identify partial matches to the search sequence. However, you can use IUPAC characters to identify sequences that might have mismatches at one or more positions. For example you can search with the sequence RTG (where R is the IUPAC code for A or G) to find both ATG and GTG triplets in the sequence. Similarly, if the target sequence contains IUPAC ambiguities, they will be considered in the search, so that a search with ATG will pick out (e.g.) RTG or NTG in the target sequence.

However, there may be times when you are specifically looking for particular ambiguities in the target sequence. So, if you want to see if there are any N’s in the target sequence, the key is to select the Literal checkbox;

FindLiteral.png

This will then search just for the N character in the target sequence. If the Literal checkbox was not selected, this search would find every residue in the sequence.

Finally, you can scan a DNA target sequence with an amino acid search sequence. The key to this functionality is the little DNA/Protein button;

DNAProteinFindButton.png

The sequence in the main edit box is dynamically affected by this button. So, if you have ATG in the edit box and toggle the button from DNA to Protein, the sequence will change from ATG to M (methionine, the translation product for ATG using the default genetic code). Similarly, if you toggle the button back to DNA, the sequence will change back to ATG. When you have the edit box sequence as Protein, the search algorithm takes the currently selected genetic code (usually Universal) into account and will find DNA sequences that could encode those amino acids. So the amino acid sequence MY (methionine-tyrosine) would find ATGTAT or ATGTAC, the two possible DNA sequences that could encode those two amino acids. Obviously, the combinations can get quite extensive when using amino acids (e.g. Leucine, Serine or Arginine) that each have 6 possible codons encoding them.

You can do the reverse, searching a protein target sequence with a DNA search sequence.

The MacVector Find function is quite powerful, as you can hopefully see from this short post. In addition to searching sequence residues, you can also search the feature descriptions associated with a sequence, or search the text results of an analysis. But those are posts for another day.

This is an article in a long running series of tips to help you get the most out of MacVector. If you want to get notified every time a new tip gets published, follow us @MacVector on twitter (or check the feed for the hashtag #101MacVectorTips) or like us on Facebook.

This entry was posted in 101 Tips and tagged . Bookmark the permalink. Both comments and trackbacks are currently closed.