MacVector and Applescript

Up to MacVector 13 MacVector has had limited Applescript support. The current release (12.7.5) is able to open sequences, print them and that’s about it!

We’ve had frequent requests for MacVector to batch process files. So with the new release (coming very soon) you can batch process files. No analysis can be undertaken just yet but if you need to convert to a different format then this is easily scriptable.

This simple script will open and save all files in a folder. If you want to add extensions to all your files or convert to the latest file format then this will quickly churn through your files.

-- Open, save then close each MV file in folder
set inputFolder to (choose folder with prompt "Select Folder of MV files to open and save:")
tell application "Finder"
	set AllFiles to every file of folder inputFolder
end tell
tell application "MacVector"
	repeat with f in AllFiles
		open f
		delay 0.3
		set docRef to (a reference to the first document)
		save docRef
		close docRef
	end repeat
end tell

This Applescript will convert a folder of MacVector NA files into Genbank format

-- Batch convert all MacVector files in a folder into Genbank format in a second folder.
-- clindley@macvector.com
-- v0.2
-- 2 April  2014
-- 2 April 2014 added routine to ignore any file other than MacVector NA files.

set inputFolder to (choose folder with prompt "Select Folder of MV files to convert:")
set outputFolder to (choose folder with prompt "Select Folder to save Genbank files") as text -- we need this as text to manipulate later

tell application "Finder"
	set AllFiles to every file of folder inputFolder
end tell

repeat with f in AllFiles
	--create the output filepath and add ".gb".
	tell application "Finder"
		--get file  extension of the file
		set mvExtension to the name extension of f
		-- get file path as posix path
		set inputFilePath to name of f
		set AppleScript's text item delimiters to "."
		set outputFilePathBits to text items of inputFilePath
		set last text item of outputFilePathBits to "gb"
		set outputFileName to outputFilePathBits as text
		set outputFilePath to outputFolder & outputFileName as text
	end tell
	
	if mvExtension = "nucl" then
		tell application "MacVector"
			--Now open the file
			open f
			delay 0.3 -- wait a little bit until MV has opened the file
			--now save it as a genbank file
			set docRef to (a reference to the first document)
			save docRef in outputFilePath as GenBank
			close docRef
		end tell
	end if
end repeat

MacVector 13 is still undergoing testing but will be out very soon.

Posted in Development, Releases, Tips | Tagged | Comments closed

Quicktest Primer and restriction sites

With the introduction of the Quicktest Primer tool in MacVector 12.6 primer design became even easier. Being able to slide your primer along a template sequence to see undesirable secondary structure and other attributes in realtime really does make it quicker and straightforward to design good primers. All user feedback we have had say it really does make a difference to primer design!

Now with the release of MacVector 13 it’s been enhanced to display “one out” and existing restriction enzyme sites around the primary binding site of the primer.

Sites that are created or destroyed by mismatches in the primer or due to the addition of a tail are now shown above the sequence. ‘One out’ sites can also be displayed below the sequence, color coded to indicate their potential effect on overlapping coding regions. The display is interactive so that when you “mouse over” a site, additional information is displayed. Clicking and holding on a “one out” site temporarily changes the primer and updates the entire dialog to reflect that change. Finally, double-clicking on a “one out” site makes a permanent change to the primer sequence.

Existing restriction sites are shown below the template sequence in black. However, sites that are unique to just the primer (including any tail that’s been added) will also be shown.

QuickTestAnimation2

  • Existing sites in the template sequence are shown below the sequence in black.
  • New sites that will be introduced by mismatches in the primer are shown above the sequence in black.
  • Existing sites that will be destroyed by mismatches in the primer are shown in grey.
  • Putative “one out” sites are labelled with an asterisk as already used by the main restriction enzyme analysis tools.
  • Putative “one out” sites that are translationally silent are shown in green.
  • Putative “one out” sites that will change the amino acid sequence of the coding region are shown in red.

As well as displaying these sites the new tool makes it very easy to introduce the nucleotide change needed for the new restriction site into your primer.

  1. If you mouse over one of the putative sites then the mismatched residue is shown in lower case and the recognition sequence of the site is displayed as a tooltip, aligned to the primer (as in the animation above).
  2. If you click on a one-out site where the mismatched residue lies within the primer and hold the button down, the primer sequence temporarily changes, replacing the mismatched residue and showing any amino acid and restriction site changes above the primer.
  3. Finally if you double-click the site it will make this change permanent in your primer.

Quicktest Primer and Graphics Palette

Adding restriction sites to your PCR products has never been easier. With the usual MacVector way it’s just a few mouse clicks!

Posted in Releases | Tagged | Comments closed

MacVector and Mavericks

OS X Mavericks was released last week.

As usual we’d been testing MacVector on the prerelease developer previews. We did come across a few issues, however, these all seem to be working fine now it’s officially released.

Screen Shot 2013 10 30 at 09 50 37

So far we’ve done plenty of informal testing with MacVector 12.7.5 and we’ve not come across any major issues.

There are two minor ones:

  • In a few rare circumstances the Primer3 sheet may appear to be blank. Generally after unplugging/plugging external monitors.
  • There are general display glitches with some older graphics cards
  • We’ve had no reported issues from any users. But we really do encourage users to let us know if you find any. Either email support, comment below or via twitter @macvector.

    We’re currently undergoing more formal testing and we’ll update this blog when that is complete.

    The tentative summary is that MacVector 12.7.5 works fine on Mavericks.

    Incidentally MacVector 13 is being developed on Mavericks and will be fully supported.

    Posted in Releases | Tagged | Comments closed

    MacVector 12.7 Training Workshop, LMB, Cambridge

    When: Thursday 19th September, 2013, 2:00 - 4:00,
    Where: Max Perutz Lecture Theatre, LMB

    Chris Lindley of MacVector, Inc. will be giving a workshop for both novice and advanced users of MacVector, reviewing both basic and advanced functions in MacVector. In particular, he will highlight the new functionality introduced over the past two years to MacVector. The format is very informal and participants are encouraged to ask questions and help direct the workshop towards areas of the most interest.

    For further information please contact Chris Brown, HR Advisor.

    Technorati Tags:

    Posted in Meetings, Techniques, Tips | Tagged | Comments closed

    The oldest entries in Genbank? Some fun for Labor day.

    Inspired by some tweets from @ewanbirney, because I’m waiting for lunch and because it’s Labor Day I used the Entrez tool to find the oldest entries (using the publication date field) in Genbank.

    For proteins there’s a single hit in 1979

    Internet Entrez Browser

    For NA there’s 12 hits from 1982:

    Internet Entrez Browser

    Nothing useful here and it’s probably not even correct so I promise the next article will be useful!

    Enjoy Labor Day!

    Posted in Uncategorized | Tagged , | Comments closed

    101 things you (maybe) didn’t know about MacVector: #32 – Understanding The Sequence Find Function

    We get quite a lot of support requests from users unsure of how to use the Find functionality in MacVector. It has changed somewhat over the years to try to simplify the interface, but there are still a few things to be aware of.

    You can invoke the Find function by bringing a sequence window (DNA or Protein) to the front and choosing the Edit | Find | Find… menu item;

    Find.png

    Perhaps the most important setting to consider is the Strand: popup menu. By default, this is set to only search the Plus strand. However, if you are looking for sequences (e.g. primers) that might be on either strand, you should set the menu to Both;

    FindBothStrands.png

    You do have to be aware of what will happen if you search both strands. Lets look at an example using ATG. First, note that pressing the Find button will always find the first matching sequence in the file, starting at the 5′ end. But if we do this with pBR322 we see not ATG highlighted, but CAT;

    CAT found.png

    Its easier to understand what is going on if we turn on the minus strand using the Strands toolbar button;

    CATMinus.png

    Here you can see the ATG on the minus strand, reading from right to left. MacVector doesn’t currently have a way of notifying you of the strand the match was found on, so you need to be aware that minus strand hits will contain your search sequence reversed and complemented.

    The Find Next button always finds the next matching sequence (if one is present) starting at the end of the current selection (or from the insertion point if there is no selection). If you continue to press the Find Next button you will find the highlighted region in pBR322 alternating between ATG and CAT matches until the end of the sequence is reached.

    There is currently no way of getting the find function to identify partial matches to the search sequence. However, you can use IUPAC characters to identify sequences that might have mismatches at one or more positions. For example you can search with the sequence RTG (where R is the IUPAC code for A or G) to find both ATG and GTG triplets in the sequence. Similarly, if the target sequence contains IUPAC ambiguities, they will be considered in the search, so that a search with ATG will pick out (e.g.) RTG or NTG in the target sequence.

    However, there may be times when you are specifically looking for particular ambiguities in the target sequence. So, if you want to see if there are any N’s in the target sequence, the key is to select the Literal checkbox;

    FindLiteral.png

    This will then search just for the N character in the target sequence. If the Literal checkbox was not selected, this search would find every residue in the sequence.

    Finally, you can scan a DNA target sequence with an amino acid search sequence. The key to this functionality is the little DNA/Protein button;

    DNAProteinFindButton.png

    The sequence in the main edit box is dynamically affected by this button. So, if you have ATG in the edit box and toggle the button from DNA to Protein, the sequence will change from ATG to M (methionine, the translation product for ATG using the default genetic code). Similarly, if you toggle the button back to DNA, the sequence will change back to ATG. When you have the edit box sequence as Protein, the search algorithm takes the currently selected genetic code (usually Universal) into account and will find DNA sequences that could encode those amino acids. So the amino acid sequence MY (methionine-tyrosine) would find ATGTAT or ATGTAC, the two possible DNA sequences that could encode those two amino acids. Obviously, the combinations can get quite extensive when using amino acids (e.g. Leucine, Serine or Arginine) that each have 6 possible codons encoding them.

    You can do the reverse, searching a protein target sequence with a DNA search sequence.

    The MacVector Find function is quite powerful, as you can hopefully see from this short post. In addition to searching sequence residues, you can also search the feature descriptions associated with a sequence, or search the text results of an analysis. But those are posts for another day.

    This is an article in a long running series of tips to help you get the most out of MacVector. If you want to get notified every time a new tip gets published, follow us @MacVector on twitter (or check the feed for the hashtag #101MacVectorTips) or like us on Facebook.

    Posted in 101 Tips | Tagged | Comments closed

    101 things you (maybe) didn’t know about MacVector: #31 – Exporting Subsequences To Excel

    Many MacVector users like to keep track of primers used in the lab by maintaining them in a MacVector Nucleic Acid Subsequence file. I discussed this in a previous post and later described how to create a primer “database” from a Microsoft Excel file. We recently had a support request asking how to do the reverse i.e. how to create an Excel file from a MacVector subsequence file?

    As it turns out, this is very simple with MacVector. The key is that you can copy selections from the Nucleic Acid Subsequence list view as tab-delimited text which can then be directly pasted into Excel. Lets take a look at an example.

    First, open your primer nucleic acid subsequence file in MacVector, the select all of the entries you want to export into Excel. Edit | Select All (command-A) will select all of the entries;

    CommonPrimersAllSelected.png

    Then choose Edit | Copy (command-C), switch to Excel and simply Edit | Paste (command-V) into a new worksheet;

    Workbook1.png

    Once in the worksheet you can save data in any format supported by Excel, such as csv, tab delimited, or the native Excel format.

    This is an article in a long running series of tips to help you get the most out of MacVector. If you want to get notified every time a new tip gets published, follow us @MacVector on twitter (or check the feed for the hashtag #101MacVectorTips) or like us on Facebook.

    Posted in 101 Tips | Comments closed

    101 things you (maybe) didn’t know about MacVector: #30 – Submitting Sequences To GenBank using Sequin

    Note: while preparing this blog post we discovered a bug in MacVector 12.7.4 that prevents submission using the exact steps shown here. Be sure you are using MacVector 12.7.5 or later which has the bug fixed. If you are using an earlier version, send an e-mail to support@macvector.com and we’ll send you the details of the additional steps required.

    I’ve been helping a customer submit his annotated sequence to GenBank using the Sequin application that you can download from the NCBI website, so I thought it would be a good time to share what we’ve learned using MacVector to prepare sequences for submission.

    First, if you have protein-coding open reading frames in your sequence, be sure to annotate those as CDS features. You should also include a /gene= qualifier with the proposed name for the open reading frame. Plus, GenBank prefer that you also include a /translation= qualifier for each CDS feature. I’ll show you how to do that in a separate blog post.

    Second, to prevent Sequin for reporting errors in the feature list, delete any MacVector-specific features in the Feature tab list. The most likely feature you will see are the frag features that MacVector uses to indicate the history of copied and pasted fragments.

    Once you are comfortable that you have annotated all of the features you are interested in, switch to the Annotations tab and double-click on the LOCUS line;

    LocusEditor.png

    Type in a suitable name for the sequence. Keep this short and DO NOT USE SPACES OR PUNCTUATION in the name.

    Now its time to create the specially formatted input files required by Sequin. Make sure you save the source sequence in standard MacVector .nucl format first. Now choose File | Save As… and select Sequin Feature Table Format from the Format menu;

    SaveSequinTable.png

    Note that the file will be given a .tbl extension. I like to save the file with the same name as I used in the LOCUS to be sure all the names remain in sync.

    After saving the Sequin table file, repeat the Save As… but this time choose Sequin FastA Format from the Format menu;

    SaveSequinFasta.png

    Again I like to use the LOCUS name as the file name, but this time note that a .sqn extension is automatically added.

    Now we are ready to run Sequin. Click on Start New Submission to get going and follow the steps, adding your name and affiliation information etc. When you get to the Preparing the Sequences dialog, select the Use the normal submission dialog radio button;

    Preparing the Sequences.png

    Then choose FASTA as the data format;

    SequinSequenceFormat.png

    Finally, click on the Import Nucleotide FASTA button and select the .sqn file you saved from MacVector. Carry on working through the pages, including entering Organism information, until you get to the Annotation tab;

    Organism and Sequences.png

    There is no button on this page to import the .tbl file. Instead you have to choose the File | Open menu item and select the .tbl file there. If all goes well, the file imports, but there is no visible indication that anything has happened. In fact, you only get notification if something has gone wrong! To see the results of your import, click on the Next Form >> button. You may be prompted to add missing organism data at this time, so go back to the Organism tab to do that. You will then see a misleading message saying “You have not entered proteins and have not created any features. Is this correct?”. Click OK and on the next screen you should see a preliminary GenBank entry that should show that yes, indeed, you did enter some features;

    SequinFinalWindow.png

    Now you can click on the Done button. Sequin will attempt to validate the data and will often generate a list of errors and warnings;

    Sequin Validation Errors.png

    The easiest way to fix these is to open the sequence in MacVector, fix the problems there and save the .tbl file again. I’ll discuss some of the common errors in a future blog post.

    Once you are happy with the sequence and features, choose File | Prepare Submission and choose a location to save the completed sequence data file. This is also given a .sqn extension, although in this case it is in asn.1 format, not FastA.

    This is an article in a long running series of tips to help you get the most out of MacVector. If you want to get notified every time a new tip gets published, follow us @MacVector on twitter (or check the feed for the hashtag #101MacVectorTips) or like us on Facebook.

    Posted in 101 Tips | Tagged , | Comments closed

    101 things you (maybe) didn’t know about MacVector: #29 – Option-Click to Close All Windows

    If you are like me, you often find yourself with many, many windows open while using MacVector. Sometimes you just want to get rid of everything and start all over again with a different project. You can always just quit MacVector and start again, but there is an easier way:

    Hold down the option key while clicking on the red close button of any window. All of the currently open windows will close. If there are any windows with unsaved changes, these will stay open and you will be prompted to save each open file;

    OptionSave.png

    You can perhaps see better how this works by looking at the MacVector File menu. Normally, if you open that menu, it will simply have the Close option;

    FileClose.png

    But if you hold down the option key, the menu item changes to Close All…

    FileCloseAll-1.png

    This is an article in a long running series of tips to help you get the most out of MacVector. If you want to get notified every time a new tip gets published, follow us @MacVector on twitter (or check the feed for the hashtag #101MacVectorTips) or like us on Facebook.

    Posted in 101 Tips | Comments closed

    101 things you (maybe) didn’t know about MacVector: #28 – Identifying Methylation Blocked Restriction Sites

    A big thanks to Jeffrey Dvorin at Boston Children’s Hospital for this great suggestion.

    Most common laboratory strains of E. coli contain a number of methylase enzymes that modify DNA residues, preventing certain restriction enzymes from cutting DNA isolated from those strains. The two most relevant enzymes are the Dam methylase that methylates the A in the sequence GATC and the Dcm methylase that methylates the second C in the sequences CCAGG and CCTGG. For a more detailed description of the Dam and Dcm methylases, check out this page on the New England Biolabs website.

    Some enzymes are entirely blocked by this methylation. For example MboI recognizes the sequence GATC, the same as the Dam methylase recognition sequence, but is blocked from cutting if the A is methylated. Thus MboI does not cut DNA isolated from a Dam+ E. coli strain. However, Sau3A also recognizes GATC, but is unaffected by the methylation, so cuts Dam+ E. coli DNA completely.

    The situation with other enzymes can be more complicated – ClaI recognizes the sequence ATCGAT, but will not cut if either of the A residues are methylated. That means that if the site is flanked by a 5′ G or a 3′ C residue (e.g. GATCGAT or ATCGATC) then one of the A’s will be methylated and the site will not be cleaved. However, other sites (e.g. TATCGATT or AATCGAT etc) do not contain Dam methylase sites and so will be cleaved normally.

    MacVector does not have any direct support for identifying cleavage sites that are blocked by the Dam or Dcm methylases. However, thanks to a trick that Jeff Dvorin suggested, there is an easy way to display them on maps. The basic idea is that you modify the restriction enzyme files you normally use for RE searching to include customized methylation site entries. Lets look at an example using the ClaI enzyme discussed above;

    DAM-ClaI.renz.png

    In the screenshot above, I’ve modified the default Common Enzymes.renz restriction enzyme file to include a new site called DAM-ClaI that is almost identical to the normal ClaI site except that it contains an additional 3′ C residue. Because DNA is double-stranded and MacVector searches both strands, this site will find both GATCGAT and ATCGATC. So now, if I search a DNA sequence with both of these enzymes selected, I can easily see which sites will be blocked by Dam methylase activity. Here’s an example – its obviously artificial as you would never normally see so many ClaI sites in such a short piece of DNA – but you can clearly see the ClaI sites that would be blocked as they have both ClaI and DAM-ClaI sites at the same position on the DNA strand;

    ClaISampleSequence.png

    Note that the two DAM-ClaI sites are ...GATCGAT... and ...ATCGATC... demonstrating that the single DAM-ClaI site in the .renz file does indeed identify both types of site.

    So, we have a simple way of identifying sites that will be blocked by the Dam and Dcm methylases. All you have to do is individually enter the sites listed on the New England Biolabs page into each of your restriction enzyme files and and you are done! What? Think that’s too much work? Of course we have a file to help you out! If you download this file from our website, we’ve already included all the methylation-sensitive restriction enzyme sites and labelled them based on their sensitivity to either the Dam or Dcm methylase. You do need to add all of the sites to your favorite enzyme files, but that is pretty simple;

    (a) Use File | Open and locate and open the Methylation-Sensitive Restriction Enzymes.renz file.

    (b) Use File | Open and locate and open your favorite restriction enzyme file (e.g. Common Enzymes.renz or New England Biolabs.renz).

    (c) Bring Methylation-Sensitive Restriction Enzymes.renz to the front, choose Edit | Select All, followed by Edit | Copy.

    (d) Switch to your target .renz file and choose Edit | Paste. The DAM and DCM enzymes will get pasted into the file. Save the file.

    Now, whenever you use that file for searches, the methylation sensitive versions will show up in the results. However, note the following caveats;

    (i) If you run a search using “Selected Enzymes” (and that is the default for the automatic search that is run whenever you view the Map tab of an open DNA sequence), you must also make sure any corresponding DAM- or DCM- enzymes are checked to be sure they will appear in the results.

    (ii) If you limit the results of searches based on the number of cut sites, you may not see the results you expect. For example, in the above ClaI example there were 4 ClaI sites and 2 DAM-ClaI sites. If you set the filter to only show enzymes that cut twice or less you would see only the DAM-ClaI sites. Conversely, if you set the filter to show enzymes that cut between 3 and 5 times, you would see the ClaI sites, but not the DAM-ClaI sites.

    We’ll likely incorporate this functionality into a future version of MacVector with a simplified user interface. But for now, this is a great way to identify those pesky methylation blocked sites and works for any version of MacVector.

    This is an article in a long running series of tips to help you get the most out of MacVector. If you want to get notified every time a new tip gets published, follow us @MacVector on twitter (or check the feed for the hashtag #101MacVectorTips) or like us on Facebook.

    Posted in 101 Tips | Tagged | Comments closed