General musings from the MacVector team about sequence analysis, molecular biology, the Mac in general and of course your favorite sequence analysis app for the Mac!

Importing sequences from ENSEMBL

There’s a few different ways to import annotation from the ENSEMBL database browser, as well as other databases.

Using Genbank

The easiest way to export from ENSEMBL and keep all annotation is to use the Genbank format. The default format will be FASTA which has no annotation. With Genbank all the annotation is stored in the file and will be imported by MacVector.

Ensembl genome browser 75 Homo sapiens Summary Gene PARL ENSG00000175193

In ENSEMBL

  • Choose the region of interest
  • Click EXPORT DATA
  • in OUTPUT choose GENBANK and click NEXT
  • Choose the TEXT format.
  • Depending on which browser you are using you can just SELECT ALL, COPY then use FILE > NEW FROM CLIPBOARD in MacVector to import the file. MacVector 12.6 and above will recognise the fact the copied data is in Genbank format and create the properly annotated sequence.

    If you are using an older version you may first need to create a plain text file (using TEXTEDIT) and save that to your Desktop.

    Import Features

    However, using a flat file format is generally a one way operation. If you need to add newer annotation (e.g. from publications from after your initial import) then you unless you like performing lots of manual annotation then you want to stay away from using Genbank a second time.

    Instead the most flexible way would be to use the IMPORT FEATURES tool. This allows you to export annotation, using a format such as GFT, GFF or BED, from a genome browser (such as ENSEMBL) and import this to an existing sequence.

    This allows you to work on your own sequence, but keep it up to date with published annotation about that sequence.

    The Import Features tool was introduced into MacVector 12.5.

  • choose the region of interest
  • Click EXPORT DATA
  • in OUTPUT choose GFF and click NEXT
  • Ensembl genome browser 75 Homo sapiens Summary Gene PARL ENSG00000175193

  • Choose the TEXT format.
  • Save this to your Desktop
  • Open up MacVector and open your original sequence
  • Choose FILE > IMPORT FEATURES
  • Choose your GFF file and click OK
  • Depending on the sequence start and stop points you may need to adjust the sequence numbering. For example if you are trying to annotate a single gene but the GFF is still numbered as if it was on a chromosome.

    The easiest way is to adjust the numbering of your original sequence.

    To do this:

  • Double click on the red cross at the start of your sequence.
  • Enter the new start number.
  • 1 Editor

    If you have copy and pasted this sequence from an existing chromosome sequence then the original numbering will be preserved.

    This entry was posted in Tips, Tutorials and tagged , . Bookmark the permalink. Both comments and trackbacks are currently closed.