General musings from the MacVector team about sequence analysis, molecular biology, the Mac in general and of course your favorite sequence analysis app for the Mac!

Using the Primer Database to store your lab’s collection of primers

MacVector’s Primer Database allows you to save and retrieve primers from a Primer Database with the Primer3 and Quicktest Primer tools. You can also scan sequences for potential primer binding sites using Primer Database Search. The tool comes with a starter database of primers, but you can use existing subsequence files or import primers from Excel.

NewImage

To save a primer from QT Primer:

  • Open ANALYZE > QUICKTEST PRIMER (INDIVIDUAL)
  • Design your primer
  • Click ADD TO…
  • Give the primer a name and add a comment. Click OK
  • To save a primer from Primer Design:

  • Open ANALYZE > PRIMER DESIGN/TEST (PAIRS)
  • Design your primer pair
  • In the spreadsheet right click on a primer
  • Choose ADD PRIMER TO DATABASE
  • Give the primer a name and add a comment. Click OK
  • To use the Primer Database Search:

  • Open your sequence
  • Select ANALYZE > PRIMER DATABASE SEARCH
  • Choose parameters and click OK
  • Posted in Uncategorized | Tagged , , | Leave a comment

    Viewing external database entries for features in a sequence.

    Sequences, or regions of sequences, can be linked to external databases. For example an entire sequence entry or for when annotation tools are used to annotate proteins with domain or motif information (e.g. InterProScan available in MacVector 15). Very useful for when you want to view more detailed or updated information. Within the Genbank specification, which MacVector extensively uses, an external database entry can be stored in a /DB_XREF qualifier. This allows the database entry to be easily viewed. The Genbank (and Genpept) specification allow for many different databases to be accessed using this qualifier.

    NewImage

    In MacVector the original database entry can easily be viewed in a web browser by selecting, then right clicking the feature entry in the Features tab and viewing the available DB_XREF entries. Selecting one will load it in your web browser.

    NewImage

    Posted in Tips | Tagged , , | Leave a comment

    Happy Holidays from all of us at MacVector…!

    ChristmasWithSnow2016

    We wish you a happy, healthy and rewarding New Year!

    Posted in Uncategorized | Leave a comment

    Optimizing the Reverse Translation function

    The Analyze | Reverse Translation menu option lets you create a DNA sequence from a Protein sequence, reverse translated using a specific Genetic Code (by default, the Universal Genetic Code). The default option creates a DNA sequence with N’s and other ambiguities reflecting the degeneracy of the genetic code. This is great if you want to identify less ambiguous sections to design probes or primers and in fact MacVector will even display a list of probes with the least ambiguities.

    However, MacVector also offers an optimization function if you are interested in designing a gene with codon usage optimized for expression in a particular organism.

    NewImage

    To use this function, you do need to supply a codon usage table – a number of common tables are shipped with MacVector in the /Applications/MacVector/Codon Bias Tables/ directory. There are four different algorithms that MacVector provides for optimizing codon usage;

  • Most Frequently Used Codon – this simply uses the most commonly occurring codon for each amino acid. So if, e.g. the most common Leu codon is CTC, all Leu codons will be CTC. Perhaps this is only useful if you want to design a “best guess” primer and are willing to accept a certain failure rate. If you used this to optimize expression, the host would likely run out of that tRNA and you wouldn’t see optimal expression.
  • Frequency Distribution – this selects a random codon for each amino acid, biased towards the most commonly used codon that encodes each amino acid. Each time you run the algorithm, a different, random set of codons will be selected. If you were to generate a new DNA over and over again, eventually this would create a collection of sequences where the average codon usage would exactly match the average for the .bias organism. But any individual reverse translation may randomly be quite different.
  • Probability Distribution – this is probably the most powerful setting if you are interested in expression. Similar to the Frequency Distribution, this chooses a random codon, biased towards the most frequently used codons for each amino acid. However, this version tries to ensure that the final DNA sequence has a codon usage profile as closely matching as possible to the codon usage of the selected .bias file. Again, each time you invoke the algorithm, it will produce a different sequence. But as the overall codon usage in the DNA sequence is guaranteed to be as close as possible to the codon usage in the .bias organism this should, in theory, give you the best chance of high expression. Again, you will get a different sequence each time you invoke this.
  • Uniform Distribution – this ignores the usage of each codon and randomly assigns an appropriate codon for each amino acid. Its similar to the default algorithm that uses ambiguities to create an “absolute” coding DNA, but here it just chooses a random codon with no regard for codon usage probability. Again, you will get a different sequence each time you invoke this.
  • Posted in Tips | Tagged , , | Comments closed

    Tear-off Result Window Tabs: make viewing results easier.

    All analysis results for an individual sequence are collected into a single tabbed result window to reduce window clutter. However, there are times when it is very convenient to have results displayed in side-by-side windows. For example, if you run a dot plot you can zoom in to view sections of the comparison by drag-selecting over a region of interest in the Matrix Plot tab and the Aligned Sequence tab will update to only display the text alignments across the new selection. Constantly toggling between the tabs to drill down to the region you are interested in (e.g. a potential splice site on a genome versus cDNA alignment) can be very frustrating.

    All you need to do is to click on the title of a tab, hold down the mouse button, then drag the selected tab away from the parent window. When you let go of the mouse button, a new window will be created containing just that single tab.

    NewImage
    Not only that, you can organize the tabs into multiple windows if you like. If you drag a tab from one window and drop it onto the tab bar of another window (this only works on the tab bar, you can’t drop on the content region of a window), then the tab will be added to the target window.

    Give it a try and get your result windows under control!

    Posted in Tips | Tagged | Comments closed

    IMPORTANT SERVICE ANNOUNCEMENT Entrez and BLAST services will stop working in November for MacVector 15.0.3 and earlier.

    UPDATE – November 8, 2016 – Although the official switch off date is not until Wednesday the 9th, Entrez and BLAST are NOT currently working from MacVector 15.0.3 and earlier. We suspect this change is now permanent.

    UPDATE – November 7, 2016 – We have just been notified by the NCBI that they will be making the server changes that will prevent Entrez and BLAST from working on any version from MacVector 15.0 and earlier this Wednesday, 9th November, NOT the 1st of December as we were previously told.

    The NCBI hosts one of the world’s definitive sequence repositories and MacVector gives you direct access to sequences from these databases. Unfortunately, due to recently announced major infrastructure changes at the NCBI, MacVector’s ability to access these services will be severely impacted from December of this year. These changes are completely out of our control, however, our developers have been hard at work to resolve this. We are pleased to announce that we are about to release a new version that will restore this functionality: MacVector 15.1.

    ENTREZ AND BLAST

    MacVector gives you access to the NCBI’s Entrez database. This allows you to search for and retrieve sequences directly to your desktop. Both simple and complex queries are allowed. For example, you can search for all human kinases and specific accession numbers.

    Internet Entrez Browser

    MacVector also allows you to perform Blast searches directly from your desktop against the NCBI’s BLAST database. From example, you can submit a protein or gene sequences against published sequence databases, or even a reverse translated protein against DNA databases. You can very easily retrieve any hits directly to your Desktop as well as view the alignments.

    Both of these tools directly access the NCBI’s databases via services that the NCBI offer.

    NCBI INFRASTRUCTURE CHANGES

    From September to December the NCBI are making two major infrastructure changes that will impact MacVector’s use of these two services.

    HTTP to HTTPS

    In September the NCBI are moving all their web servers to HTTPS only and in December they are switching off the HTTP web servers.

    These changes have been implemented in a US government wide change to ensure safe access of websites for all. HTTP is insecure and HTTPS is much safer. The US government website explains this further.

    Accession number, GI numbers and versioning

    From September onwards, they are gradually transitioning the way sequences are referenced, and therefore retrieved. Previously upon submission every sequence was assigned a GI number (since 1994) as well as an accession number. An accession number never changes, but with a new version of a sequence submission the GI number would change. From September 2016, all new sequence submissions will be assigned a single number instead. This will be the accession number AND version combined. The redundant GI number will no longer be assigned (although it will be a long time until older sequences have this replaced with the new version). The accession number and version will always be read together so you now have a simpler way of referencing a sequence, and more importantly, a human readable way of determining the version.

    NCBI Toolkit and Entrez Programming Utilities.

    There are two toolkits that the NCBI offer for accessing these services from within a software application. NCBI Toolkit and Entrez Programming Utilities (E-Utils). The NCBI have released a new version of E-Utils that supports HTTPS connections and ACCESSION.VERSION numbering. However, NCBI Toolkit is not going to be updated. MacVector 15.0.3 uses NCBI Toolkit and this unfortunately means that from December MacVector 15.0.3 will no longer be able to access ENTREZ or Blast.

    MacVector 15.1

    This change was announced with only a short notice and since the announcement our developers have been working extremely hard to migrate MacVector to use E-Utilities.

    We’re pleased to announce that this will be released shortly. MacVector 15.1 contains a whole new implementation of the BLAST and Entrez tools. MacVector 15.1 will be released very shortly a few months before the final switching off of these services.

    To avoid any interruption in service, please ensure that you have downloaded and installed MacVector 15.1 before the start of December.

    MacVector 15.5 and beyond

    Although our hand has been forced somewhat with the release of MacVector 15.1, we are still planning to release MacVector 15.5. Since the Blast and Entrez tools have now been rewritten, expect to see even more new enhancements to these tools in future releases.

    Posted in Releases | Tagged , , | Comments closed

    Highlighting sequence using color and lower case in the Editor tab

    You can very quickly annotate a region of interest in your sequence in the Editor tab. For example, showing introns in lower case or highlighting CDS features with a colored background.

    NewImage

    Using the TRANSFORMATIONS menu

    To enter sequences as mixed case.

  • Enable Edit | Transformations | Enable Mixed Case Entry
  • Type your sequence using SHIFT or CAPS LOCK
  • To color a region.

  • Select the region you want to highlight.
  • Choose Edit | Transformations | Color.
  • To change the case of a region.

  • Select the region you want to change to lower case.
  • Choose Edit | Transformations | Make Lower Case.
  • Posted in Tips | Tagged , | Comments closed

    Estimating insert length quickly for a read pair

    Insert length is the length of the sequence in between a pair of reads. Sequencers are supplied DNA samples in fragments of a known length and each end is sequenced (generally in a 5′ to 3′ direction from both ends).

    For example if you have a fragment of 2Kbp and your reads had an average of 500bp, then the insert length would be around 1Kbp. Insert length is determined by the protocol that is used when preparing samples for sequencing. You should know the insert length (and orientation) of your sequencing data.

    However, it may be that you do not know it. … or perhaps you have only a wide range and you want to see if using more stringent values would improve your assembly. Velvet will estimate this for you, however, Bowtie does not.

    Here’s a quick way to estimate the insert length using MacVector and Assembler.

  • Choose a published sequence from your dataset. e.g. a reference sequence or a gene. If no such sequences exist, then take a contig from any assembly you may have already done.
  • Use this sequence with Align to Folder against the your dataset of paired reads. If your read files are large this may take some time (*consider extracting the first few hundred pairs as below).
  • Select a subset of hits in the results (Fewer read pairs will be less accurate, but a lot quicker to do) and save these as fastq using DATABASE | RETRIEVE TO FILE
  • Now we will repeat the alignment with these extracted hits.

  • Align these extracted reads using Align to Reference.
  • Once aligned toggle the SORT menu so that it is in alphabetical order. As long as your read pairs are named according to usual conventions, then each mate of a pair should be next to each other. For example in the example shown “SLXA-EAS1_89:1:100:858:113/1” and “SLXA-EAS1_89:1:100:858:113/2”.

    Now measure the insert length from a few aligned pairs.

  • Scroll down the list until you reach the first pair. Select both reads and check the length from the selection.
  • Repeat with a few pairs until you are happy with the estimated length.
  • Ecoli K12 1000bp Alignment Editor

    *If you are comfortable with working from the command line, then a quick step is to use “head” to extract 100 pairs.

    A read in a FASTQ file is generally composed of four lines. The header, the sequence, a blank header and the quality line. So extracting a multiple of four lines would give you that number of reads.

    If your pairs are in two files then run the following to extract 100 reads.

    head -n 400 Mate1.fastq > Mate1_100reads.fastq
    head -n 400 Mate2.fastq > Mate2_100reads.fastq
    Posted in Tips | Tagged , | Comments closed

    Restriction enzyme analysis in MacVector and REBASE

    Although there are two different ways to perform restriction enzyme analysis with MacVector, there are also additional places where restriction enzyme sites are shown. All these tools use the same set of restriction enzyme files to recognise enzymes. These files are updated regularly from the REBASE database.

    The restriction enzymes are divided into multiple files. There’s the “Common Enzymes” file with commonly used enzymes. That is the default list for all tools. There are also files that are grouped by supplier or all known enzymes. However, if one of these does not suit, it is very easy to create your own set of enzymes. For example all enzymes that are stored in your lab’s freezer drawer!

    For each tool you can use the same set of enzymes, or different files for each. So there are multiple ways to select which set of enzymes to use. For all tools you can also choose whether to SHOW ALL ENZYMES or if you have ONLY USE SELECTED ENZYMES checked, then click the OPEN button, ensure that your required sites are selected, and save the file.

  • For dynamically showing restriction enzyme cut sites directly in the Map view, you need to click on MACVECTOR | PREFERENCES | MAP VIEW. Then click the SET ENZYME FILE button and ensure that your file is selected. Please note that since this is calculated dynamically by default it is turned off for sequences larger than 50 kb, although you may increase this limit.
  • For the ANALYZE | RESTRICTION ENZYME…. tool, click the CHOOSE button in the initial dialog and ensure your file is selected.  Again if you are only showing SELECTED ENZYMES click the OPEN button and ensure that these are selected in the file.
  • The QuickTest Primer tool uses the same settings as the Automatic Restriction Enzymes setting in the Map VIEW preferences. However, you also have the option to SHOW ALL ENZYMES or ONLY USE SELECTED ENZYMES.
  • ..and do not forget that all restriction enzyme tools will also optionally show “one out” sites. These are restriction enzymes sites that can be introduced with a single substitution. QuickTest Primer will also show whether these are silent mutations as well. That is whether introducing that site will affect translation.

    Quicktest Primer

    Posted in Techniques | Tagged , , | Comments closed

    Drag and drop to quickly annotate ORFs

    You can use the Analyze | Open Reading Frames function to very quickly find ORFs on a sequence. Did you know that you can very quickly turn those results into permanent CDS features on your sequence? After running the Open Reading Frames analysis, simply drag and drop the ORF objects you are interested in from the Results window onto the original sequence window. Here we are dropping the results into the Map tab but it works with any of the tabs.

    Note that MacVector automatically fills out the /translation= qualifier for you with the predicted amino acid sequence. For optimum Genbank compliance, you should also manually add a /gene= qualifier with the preferred short name of the gene encoding your open reading frame and /product= with the full name of the product..

    Posted in Tips | Tagged , | Comments closed