101 things you (maybe) didn’t know about MacVector: #30 – Submitting Sequences To GenBank using Sequin

Note: while preparing this blog post we discovered a bug in MacVector 12.7.4 that prevents submission using the exact steps shown here. Be sure you are using MacVector 12.7.5 or later which has the bug fixed. If you are using an earlier version, send an e-mail to support@macvector.com and we’ll send you the details of the additional steps required.

I’ve been helping a customer submit his annotated sequence to GenBank using the Sequin application that you can download from the NCBI website, so I thought it would be a good time to share what we’ve learned using MacVector to prepare sequences for submission.

First, if you have protein-coding open reading frames in your sequence, be sure to annotate those as CDS features. You should also include a /gene= qualifier with the proposed name for the open reading frame. Plus, GenBank prefer that you also include a /translation= qualifier for each CDS feature. I’ll show you how to do that in a separate blog post.

Second, to prevent Sequin for reporting errors in the feature list, delete any MacVector-specific features in the Feature tab list. The most likely feature you will see are the frag features that MacVector uses to indicate the history of copied and pasted fragments.

Once you are comfortable that you have annotated all of the features you are interested in, switch to the Annotations tab and double-click on the LOCUS line;

LocusEditor.png

Type in a suitable name for the sequence. Keep this short and DO NOT USE SPACES OR PUNCTUATION in the name.

Now its time to create the specially formatted input files required by Sequin. Make sure you save the source sequence in standard MacVector .nucl format first. Now choose File | Save As… and select Sequin Feature Table Format from the Format menu;

SaveSequinTable.png

Note that the file will be given a .tbl extension. I like to save the file with the same name as I used in the LOCUS to be sure all the names remain in sync.

After saving the Sequin table file, repeat the Save As… but this time choose Sequin FastA Format from the Format menu;

SaveSequinFasta.png

Again I like to use the LOCUS name as the file name, but this time note that a .sqn extension is automatically added.

Now we are ready to run Sequin. Click on Start New Submission to get going and follow the steps, adding your name and affiliation information etc. When you get to the Preparing the Sequences dialog, select the Use the normal submission dialog radio button;

Preparing the Sequences.png

Then choose FASTA as the data format;

SequinSequenceFormat.png

Finally, click on the Import Nucleotide FASTA button and select the .sqn file you saved from MacVector. Carry on working through the pages, including entering Organism information, until you get to the Annotation tab;

Organism and Sequences.png

There is no button on this page to import the .tbl file. Instead you have to choose the File | Open menu item and select the .tbl file there. If all goes well, the file imports, but there is no visible indication that anything has happened. In fact, you only get notification if something has gone wrong! To see the results of your import, click on the Next Form >> button. You may be prompted to add missing organism data at this time, so go back to the Organism tab to do that. You will then see a misleading message saying “You have not entered proteins and have not created any features. Is this correct?”. Click OK and on the next screen you should see a preliminary GenBank entry that should show that yes, indeed, you did enter some features;

SequinFinalWindow.png

Now you can click on the Done button. Sequin will attempt to validate the data and will often generate a list of errors and warnings;

Sequin Validation Errors.png

The easiest way to fix these is to open the sequence in MacVector, fix the problems there and save the .tbl file again. I’ll discuss some of the common errors in a future blog post.

Once you are happy with the sequence and features, choose File | Prepare Submission and choose a location to save the completed sequence data file. This is also given a .sqn extension, although in this case it is in asn.1 format, not FastA.

This is an article in a long running series of tips to help you get the most out of MacVector. If you want to get notified every time a new tip gets published, follow us @MacVector on twitter (or check the feed for the hashtag #101MacVectorTips) or like us on Facebook.

This entry was posted in 101 Tips and tagged , . Bookmark the permalink. Both comments and trackbacks are currently closed.