(updated March 21, 2018)
MacVector’s Import Features tool allows you to import annotation from many Genome Browsers (e.g. Ensemble, UCSC, etc). MacVector can annotate an empty or annotated sequence.
BED, GFF, GTF, and GFF3 formats
GFF, GTF, GFF3 & BED files are all file formats that are used to store annotation (features) generally without containing any sequence. Although it is common that they will be accompanied by a fasta file containing the sequence only. They emerged as a way of exporting, or exchanging, information from a specified region of an entire genome without having to take the entire genome.
Most sequence formats were developed to be for a specific gene or protein. Although this is no longer true they are still orientated to be of a region of fixed length. These annotation files are not at all length specific and could potentially store just two features that were at either end of the same chromosome. They are a much more flexible way of dealing with annotation, especially a large amount, than a fixed length sequence format such as Genbank.
They also are not limited to a single sequence and can contain information from multiple sequences in the same file (Fasta files can also contain multiple sequences). For example you could store the entire human set of chromosomes in a pair of (quite large!) files. A multiple sequence Fasta file and a single GFF file.
The format of these annotation files does vary (who ever said Bioinformaticians had to be consistent!) but basically their format consists of a set of individual lines (one line per feature) along the following lines:
SEQUENCE ID, START, STOP, FEATURE TYPE, NOTE
Sequence ID is the sequence these annotations belong to.
START and STOP are the region of sequence they are annotated against
FEATURE TYPE is obvious! Note that this does not always correspond to a correct Genbank Feature Keyword
Genome Browsers
These tools (generally online web gateways) allow you to browse the entire chromosome or genome of a particular organism. Almost like a graphical model of a sequence database. All the information known about that particular organism’s sequence that has been submitted to one of the large sequence databases (e.g. Genbank at the NCBI) should be visualised within the genome browser. You can download all the annotation contained within a particular region fairly easily using one of these annotation formats. Then you can either annotate an existing file that you are working with (so preserving your own “private” annotation with all known public annotation).
To annotate a sequence with a BED/GFF/GFF3/GFT file in MacVector
From the UCSC’s Genome browser
- Click on this link to open the UCSC’s Genome Browser.
- Select C.elegans (or click the link to “worm”) and enter sel-12 in the gene name. Click SUBMIT.
![](http://macvector.com/blog/wp-content/uploads/2012/05/UCSC_Genome_Browser_Gateway-300x51.png)
The interface will change and show all annotation associated with that region. You can modify the amount or type of annotation being showed. This particular gene, C.elegans Sel-12 is located on Chromosome X
- Click the Tools then Table Browser menu link at the top of the page
This will now allow you to export all the annotation associated with the previous displayed region (tracks).
- Change the REGION to POSITION.
If it is left at genome the entire genome will be downloaded
SaveSave
SaveSaveSaveSave
SaveSave