Applescript: batch translation of CDS features

Apple’s AppleScript (along with Javascript for Automation) is an easy to write and easy to understand language that allows you to easily automate tasks in supported applications. Many Apple applications have a AppleScript Dictionary that defines what functions you can automate. MacVector has many such functions in its AppleScript Dictionary. You can auto annotate multiple sequences, search for sequences in Entrez and retrieve them, Translate sequences, Transcribe sequences and more. AppleScript is excellent for any task that requires any batch operations, whether a single operation on multiple input sequences, multiple operations on a single sequence, or taking a single sequence and producing multiple results. Even mundane tasks such as converting a folder of sequences into a different format.

Recently we had a support query about translating all the open reading frames in a single sequence to a set of protein sequences. This is a task very well suited to automation. Whereas MacVector can easily translate single CDS features or do a six frame translation of a sequence, repeating this for a large genome with multiple ORFs would be laborious to do manually. However, with AppleScript once a script has been written it is a simple task.

The simple workflow for the script is to go through a DNA sequence and look for every CDS feature. Once a CDS feature was found it is translated, then onto the next CDS feature and so on. Finally producing a FASTA sequence containing every protein sequence.

Incidentally many tools in MacVector rely on annotated CDS features. If your sequence does not have any CDS features, then you can use SCAN FOR…ORFS to easily add them.

Here is the core routine of the script:


The important lines are these two:

repeat with theFeature in (every feature of theSequence whose key is "CDS")
set theTranslation to theFeature's translation as text

All they do is tell MacVector to look for a CDS feature and then translate that open reading frame.

The full script is here:

-- Translate all CDS features in a MacVector Nucl sequence
-- v0.2
-- May 14, 2021 
-- added direct writing of output fasta file
use AppleScript version "2.7" -- macOS High Sierra or later
use scripting additions
set outputCount to 0
set FastaFile to ""
set inputFile to GetInputFile()
set outputFolder to GetOutputFolder()
set defaultAnswer to "All_CDS_translated.fa"
display dialog "Please enter the Output filename:" default answer defaultAnswer
set OutputFilename to text returned of result

tell application "MacVector"
    set docRef to open inputFile
   delay 0.3
  set theSequence to docRef's sequence
   with timeout of 10000 seconds -- add very long timeout to avoid timeouts when translating long sequences. default timeout is 120 seconds
       repeat with theFeature in (every feature of theSequence whose key is "CDS")
            set theTranslation to theFeature's translation as text
         set theName to the theFeature's key as text
            set outputCount to outputCount + 1 -- increment the number of CDS translated
           set FastaFile to FastaFile & "
>" & theName & " " & outputCount & " 
" & theTranslation -- includes two new lines as \n but ScriptEditor always expands these.
      end repeat
 end timeout
    close docRef saving no
end tell

set myFile to open for access (outputFolder & "All_CDS.fasta") with write permission
write FastaFile to myFile
close access myFile

set outputCount to outputCount as string
set theDialogueText to outputCount & " CDS features in " & inputFile & " were translated and saved as " & outputFolder & "OutputFilename"
display dialog theDialogueText buttons {"OK"} default button "OK" giving up after 120

on GetInputFile()
   tell application "Finder"
      --get the input fastq file
     set inputFile to POSIX path of (choose file with prompt "Select DNA sequence to translate:")
       if not (exists inputFile as POSIX file) then
           display dialog inputFile & " does not exist."
     end if
     return inputFile
   end tell
end GetInputFile

on GetOutputFolder()
    tell application "Finder"
      -- now choose which folder to place the reads in
       set outputFolder to POSIX path of (choose folder with prompt "Select folder for output file:")
 end tell
   return outputFolder
end GetOutputFolder

Just open /Applications / Utilities / and copy and paste the above code into it. Script Editor is Apple’s default AppleScript editor, although better AppleScript Editors do exist – such as Script Debugger. You can also download the script.

If you want to investigate automating MacVector more, then the MacVector application folder contains an AppleScript folder with many example scripts. If there is a repetitive task that you perform in MacVector then please do contact support and ask us if it could be automated. Either we’ll be able to assist developing a script, or we’ll be able to add support to a future release of MacVector.

This entry was posted in Techniques, Tips and tagged , , . Bookmark the permalink. Both comments and trackbacks are currently closed.