How to copy a specific short amino acid translation of a sequence

There can be times when you are messing about with open reading frames, inserting residues to change frames to try to get the perfect CDS fusion. The MacVector single sequence Editor will show those (click and hold on the “Display” toolbar button) but if you select and copy, only the DNA sequence (with any overlapping features) will be copied to the clipboard. If you need to copy a specific translation of a sequence, here’s how to do it: Select the region you are interested in, then invoke Analyze | Translation… Select the “Display text view with translation” option, set the Number of Frames to 3 or 6 and click OK.

Unknown

From the resulting result window, you can select the text of the amino acid sequence you are interested in, copy, and then create a new sequence document (File | New From Clipboard) or paste into an external application.

Posted in Techniques, Tips | Tagged , | Leave a comment

Optimizing Reverse Translations

The Analyze | Reverse Translation menu option lets you create a DNA sequence from a Protein sequence, reverse translated using a specific Genetic Code (by default, the Universal Genetic Code). The default option creates a DNA sequence with N’s and other ambiguities reflecting the degeneracy of the genetic code. This is great if you want to identify less ambiguous sections to design probes or primers and in fact MacVector will even display a list of probes with the least ambiguities.

However, MacVector also offers an optimization function if you are interested in designing a gene with codon usage optimized for expression in a particular organism.

Unknown

To use this function, you do need to supply a codon usage table – a number of common tables are shipped with MacVector:

/Applications/MacVector/Codon Bias Tables/

There are four different algorithms that MacVector provides for optimizing codon usage.

Most Frequently Used Codon – this simply uses the most commonly occurring codon for each amino acid. So if, e.g. the most common Leu codon is CTC, all Leu codons will be CTC. Perhaps this is only useful if you want to design a “best guess” primer and are willing to accept a certain failure rate. If you used this to optimize expression, the host would likely run out of that tRNA and you wouldn’t see optimal expression.

Frequency Distribution – this selects a random codon for each amino acid, biased towards the most commonly used codon that encodes each amino acid. Each time you run the algorithm, a different, random set of codons will be selected. If you were to generate a new DNA over and over again, eventually this would create a collection of sequences where the average codon usage would exactly match the average for the .bias organism. But any individual reverse translation may randomly be quite different.

Probability Distribution – this is probably the most powerful setting if you are interested in expression. Similar to the Frequency Distribution, this chooses a random codon, biased towards the most frequently used codons for each amino acid. However, this version tries to ensure that the final DNA sequence has a codon usage profile as closely matching as possible to the codon usage of the selected .bias file. Again, each time you invoke the algorithm, it will produce a different sequence. But as the overall codon usage in the DNA sequence is guaranteed to be as close as possible to the codon usage in the .bias organism this should, in theory, give you the best chance of high expression. Again, you will get a different sequence each time you invoke this.

Uniform Distribution – this ignores the usage of each codon and randomly assigns an appropriate codon for each amino acid. Its similar to the default algorithm that uses ambiguities to create an “absolute” coding DNA, but here it just chooses a random codon with no regard for codon usage probability. Again, you will get a different sequence each time you invoke this.

Posted in Tips | Tagged , | Leave a comment

Use Database | Auto-Annotate Sequence to annotate prokaryotic genomes

The continuing advances in Next Generation Sequencing have made it relatively low cost to sequence prokaryotic genomes. Many scientists are embarking on large projects to sequence multiple related genomes. These might be clinical isolates of the same species exhibiting different pathogenetic properties, environmental isolates from different sites, or a study over time of the changes in microbial genomes from specific locations. Once you have your sequence, the definitive source of annotation is the NCBI Prokaryotic Annotation Pipeline. However, to have that run on your sequence, you must submit the sequence to the NCBI. This is not always ideal – perhaps you are still working on resolving repeat sequences for your genome, you don’t want to wait for it to be published or you don’t want to go through the hassle of a formal submission for many variant sequences. MacVector to the rescue!

First, you need to download existing similar genome sequences – open the Database | Online Search for Keywords (Entrez) browser and search for [name of your organism] “+” “complete genome”. Assuming you are working with a reasonably common organism, you might find a few (or a lot of) hits. Select those you are interested in and click on the To Disk button, selecting a suitable target folder to save the downloaded genomes into.

Now take your unannotated genome sequence and invoke Database | Auto-Annotate Sequence. Select the folder containing your genomes as the target and press OK. On a 2.7 GHz laptop, scanning a 1.8 Mbp Campylobacter jejuni genome against 25 related C. jejuni genomes (average 5,000 features per genome) takes around 10 minutes with the default parameters, resulting in a fully annotated genome.

Unknown

MacVector 17’s new “Genome Comparison” tool lets you directly compare the features of two related genomes based on DNA or (for CDS features) protein sequences and reports all of the identities, similarities, differences and missing features. The tool confirmed that no features were missing compared to the NCBI annotated genome and there were just minor differences with a few CDS features where there were mutations creating or removing stop codons.

Posted in Techniques, Tips | Tagged , , | Leave a comment

Which DNA Matrix to use in Align To Folder?

The Database | Align To Folder function is a very useful tool to find and retrieve similar sequences from folders on your computer or on other local machines. Think of it as your own personal BLAST service. It can not only search individual sequences in any format MacVector can read (MacVector, Genbank, EMBL, ABI etc) but will also process collections of sequences in fasta or fastq format.

One important factor to consider in these searches is the DNA Scoring Matrix (.nmat) file to use. There are several included in the /Applications/MacVector/Scoring Matrices/ folder. The default file is DNA database matrix.nmat. This is ideal for identifying sequences that are not particularly closely related, such as the same gene from distant organisms or sequences matching a highly degenerate input sequence such as a reverse translation of a protein sequence.

However, one common use of Align To Folder is to identify and retrieve NGS reads from large collections in fasta or fastq formatted files. It is particularly useful for finding reads to help resolve repeat regions or close gaps between contigs. When running these types of alignments, it is preferable to use a different matrix that is more tuned to finding reads with a greater identity stringency. The best scoring matrix for this is DNA identity with penalties matrix.nmat. Here’s some examples, using a short query sequence, where the searches differ only in the scoring matrix.

Unknown

Low-scoring alignments using DNA database matrix.nmat

Unknown

Low-scoring alignments using DNA identity with penalties matrix.nmat

It can clearly be seen that the second example has true matching alignments that represent sections of reads that extend beyond the query fragment. All of the reads can safely be retrieved and used in additional assembly analyses to extend the query or help resolve repeats. However, the DNA database matrix example contains matches that have extensive regions with very poor similarity. These clearly do not represent reads that could be used to extend the sequence of the query sequence.

For additional information about possible uses of Align to Folder, check out this blog post.

Posted in Tips | Tagged | Leave a comment

Gap closing and genome finishing tools in Align to Reference and Assembler.

Automated algorithms can only take you so far with genome assembly. The final steps involved in finishing a genome always need manual intervention. MacVector’s various assembly editors have many tools for helping finish genome sequencing projects. For example, closing gaps, extending reference sequences and even automatically circularizing contigs. If you select reads, then right click (or use CTRL-left click) you will see a context sensitive menu with the following tools:

Unknown

  • Export Consensus with/without Gaps
  • Align Selected Reads
  • Delete Selected Reads
  • Reset (unalign) Selected Reads
  • Export Selected Reads as FASTA/FASTQ
  • Select Matching Pairs – if you have aligned a set of paired-end reads, you can select individual read(s) and use this function to select the corresponding mate(s). This is particularly useful if you want to find pairs that will extend a contig and export them for further analysis/assembly.
  • Extend Reference with Selected Read – This is active if you have selected a single read that hangs over either end of a Reference sequence. This will extend the Reference in the appropriate direction using the sequence of the read.
  • Circularize Consensus – This is enabled if it detects direct repeats at the ends of a contig, and even tells you the length of the repeat it found. It will circularize the consensus and create a new circular sequence window with the repeat appropriately deleted.
  • Select Overlapping Reads Containing Selected Sequence – This is enabled if you select a short region in a read. All overlapping reads that contain that selected sequence will be selected. For paired reads you can then use Select Matching Pairs to select their mate, then Export Selected Reads as FASTQ/FASTA to export them to a file.

Not all tools are applicable or available in all editors. Plus some tools are only enabled when using paired end reads. Here’s what’s available in each editor.

Align to Reference editor

  • Export Consensus with/without Gaps
  • Align Selected Reads
  • Delete Selected Reads
  • Reset (unalign) Selected Reads
  • Export Selected Reads as FASTA/FASTQ
  • Select Matching Pairs
  • Extend Reference with Selected Read.
  • Select Overlapping Reads Containing Selected Sequence.

Reference Contig editor

  • Export Consensus with/without Gaps
  • Export Selected Reads as FASTA/FASTQ
  • Select Matching Pairs
  • Select Overlapping Reads Containing Selected Sequence.

De novo contig editor

  • Export Consensus with/without Gaps
  • Export Selected Reads as FASTA/FASTQ
  • Select Matching Pairs
  • Circularize Consensus

Read more about the various assembly tools in MacVector.

Posted in Tips | Tagged , | Leave a comment

MacVector 17 Workshop at The Crick

Room: HR Training Room 01–2162. Floor: 1 
Date: 15 October 2019  From: 9:30 to 11:30

Now rescheduled – Date to be advised

Chris Lindley of MacVector, Inc. will be giving a training workshop for both novice and advanced users of MacVector at The Crick, reviewing both basic and advanced functions. In particular new tools introduced over the last few versions.

The format is very informal and participants are very much encouraged to direct the workshop towards areas of the most interest.

Laptops will be provided for users to work through examples and tutorials as they are demonstrated. Workbooks will also be provided to allow attendees to work through during the workshop and afterwards.

The intention is that all attendees will learn at least one new and useful tool or tip. The workshop is two hours, but Chris will be available in the room for further discussion until 13:00.

Please register for the workshop by emailing Chris (drop-ins on the day will be very welcome, but will not be guaranteed access to a laptop or a workbook).

See what MacVector can do for your lab.

UnknownGibsonCloning

Posted in Meetings | Tagged , | Comments closed

Migrating your Vector NTI sequence database to MacVector.

ThermoFisher (owners of Invitrogen) have announced that Vector NTI Express is nearing the end of its life and Vector NTI Advanced was terminated quite some time ago. If you are looking for an easy to use sequence analysis application, then look for a reliable and trusted application. MacVector is easy to use, has a comprehensive set of tools and is definitely not going away! MacVector has been the tried and trusted sequence analysis application on the Mac for over 20 years. There are many thousands of happy molecular biologists using MacVector in labs all over the world. Don’t take our word for it, read what our users have to say. What’s more opening Vector NTI files is straightforward.

If you are using Vector NTI Advance 11 or Vector NTI Express

  • Download the Mac or Windows Vector NTI Data Export Tool from ThermoFisher’s website.
  • Run this and migrate your entire sequence database to Genbank format.
  • To open in MacVector simply double click the Genbank file to open it directly within MacVector.
  • When you make changes then save and MacVector will automatically migrate the data into MacVector’s own NUCL format.

  • You can optionally batch process all your files into MacVector format using an Applescript that is supplied within the MacVector application folder.
  • If you are using Vector NTI Advance 10 or earlier

  • Open MacVector
  • Select the Database->Vector NTI Import… menu item
  • Click on the Choose button to locate the Vector NTI database folder on your Mac.
  • MacVector will display a list of all of the sequences available in the database. There is a popup menu to toggle between Nucleic Acid and Protein sequences. The list can be sorted to more easily identify sequence(s) of interest.

  • Select one or more sequences and then click either the To Desktop button to open those sequences in MacVector, or To Disk to save the sequences in MacVector format to a folder on your hard drive.
  • Sequence annotation

    MacVector will read all of the standard features and annotations associated with each sequence. Graphical appearance information is discarded and the highly customizable MacVector graphical features are used instead. If you prefer your sequences to look different then it is easy to curate all graphical features using MacVector’s Auto Annotation tool.

    MacVector ignores any restriction enzyme sites annotated in the database sequence and replaces them with the default dynamic set of sites used by MacVector’s RE Picker. However, all sequence features and related information are preserved. MacVector follows the Genbank format for the features table and is always kept up to date with the latest Genbank release, so where possible MacVector will also migrate any old and deprecated features information contained in the VectorNTI file into the current up to date features nomenclature.

    You will always get a good discount for upgrading to MacVector from Vector NTI.

    Your sequences remain yours

    The MacVector team strongly believe that you should never be locked out of your data. Your data is yours! Even if you have no license of MacVector then you can download MacVector Free and export your data. All versions of MacVector, including MacVector Free, have the tools to migrate sequences in Genbank format.

    Posted in General, Tips | Tagged | Comments closed

    Identifying transposon insertion sites from multiplexed NGS data

    Transposon mutagenesis is a common approach for investigating gene function in bacterial genomes by selecting for clones where the transposon inserting into the genome has generated a specific phenotype. You can then simply sequence the entire genome of each clone by NGS to identify the transposon insertion site. To lower the cost of such experiments, it is common to pool several individual genomes into each NGS sample and then run appropriate sequence analysis to identify the genes disrupted by the transposition events.

    There is a new Transposon Insertion Analysis Tutorial that describes how to perform this analysis using MacVector with Assembler. To follow along, you can download sample data. The basic strategy is to use MacVector’s Align to Folder functionality to pull out all pairs of reads that contain transposon sequences then align those to the genome to identify the end points of the transposon insertion site.

    Unknown

    The tutorial goes into detail, describing several approaches you can use to identify the insertion locations, along with shortcuts and suggestions on how to rapidly annotate the insertion sites on the complete genome. While the tutorial does use Macvector with Assembler for parts of the analysis, you can actually accomplish the same end result using plain MacVector.

    Posted in Techniques, Tutorials | Tagged , , , | Comments closed

    Human Transcriptome RNA-Seq Analysis Using MacVector

    With MacVector Pro and Assembler you can use Bowtie to perform RNA-Seq analyses using NGS data. The interface even has specialized output tabs listing the coverage information and statistics for each annotated CDS and gene feature on the genome. There is an example tutorial in the /Applications/MacVector/Documentation folder called “RNASeq Expression Analysis Tutorial.pdf” that illustrate the analysis using a small (1.6 Mbp) prokaryotic genome.

    What surprises many people is that the combination of MacVector and modest Macintosh hardware can actually perform this analysis on the human genome. Now there are limitations to this – it’s not currently practical to do this with the entire genome due to memory and processing constraints, but it is possible to run an analysis against the known Human Transcriptome. The latest version of this can be downloaded from the GENCODE database. There is a new RNA-Seq Human Transcriptome Analysis Tutorial that describes the basic procedure in detail and some sample data that can be downloaded. The end result is that you get a table similar to that shown below that can be copied and pasted into Microsoft Excel for additional analysis.

    Unknown

    Posted in Techniques, Tutorials | Tagged , , | Comments closed

    Use a right-click in the Editor tab to see if your contig can be circularized

    MacVector incorporates no less than THREE different de novo assemblers, phrap, velvet and SPAdes. While all are great assemblers, with each having their own specific advantages, none of them will generate a circular sequence from input reads. However, MacVector also includes a tool to help you with this. If you are assembling reads representing plasmid sequences, or if you are closing gaps in a circular genome, you can find out if a contig can be circularized by double-clicking on it in the Assembly Project and then right-clicking* in the Contig Editor to bring up a context-sensitive menu.

    Unknown

    The algorithm looks for a perfect overlap between the ends of at least 20 bases. If no overlap exists, the menu item is greyed out and reads “Cannot Circularize Consensus”. Otherwise it indicates the length of the overlap. If you select the menu item, a new sequence window opens containing the circularized consensus of the contig, with all gaps removed.

    *To right click with a trackpad hold down [CTRL] and click once or tap with two fingers. MacVector has many “right click” menus with extra functionality.

    Not sure if you have Assembler? Choose MacVector | About MacVector. If the screen that appears says “MacVector with Assembler, Pro Edition” then you have it. If not, you can sign up for a fully functional 21 day trial version.

    Posted in Techniques, Tips | Tagged , | Comments closed