General musings from the MacVector team about sequence analysis, molecular biology, the Mac in general and of course your favorite sequence analysis app for the Mac!

How to split large fastq files for more manageable assemblies

We’ve previously discussed how important it can be to make sure you are using the appropriate number of fastq reads from an NGS experiment to ensure you obtain the results you are looking for. Using too many reads can confuse algorithms with the massive coverage increasing mis-assemblies due to background errors in the reads. In addition, large numbers of reads can significantly impact CPU performance, memory usage, and even disk usage. At MacVector we have coded a simple utility that will split large fastq files into smaller chunks. It’s completely free to download and should work on all versions of macOS/ Mac OS X.

Download the SplitFastqFile utility

You run it by simply dropping fastq files onto the application and following the prompts. When complete, you’ll see the split files in a folder, with naming similar to this.

(Read more….)

NewImage

Not sure if you have Assembler? Choose MacVector | About MacVector. If the screen that appears says “MacVector with Assembler, Pro Edition” then you have it. If not, you can sign up for a fully functional 21 day trial version.

Posted in Techniques, Tips | Tagged , , , | Comments closed

Balancing Velvet KMER and coverage

The Velvet assembly algorithm in MacVector is blazingly fast and generates excellent assemblies. However, you do have to be careful when assembling NGS data to be sure that the parameters you submit are appropriate for the data you are assembling in order to get optimal results. By far the most important parameter is the KMER value. If you are not getting good assemblies, this is the parameter you should change. Below are the results of varying the KMER value for an NGS assembly of a circular 8,859 bp plasmid using data acquired from an Illumina HiSeq machine. In this case, the data consisted of a pair of fastq files with a read length of 75 nt. The original files each contained 1,370,000 paired end reads. The table below shows the longest contig that resulted from using varying numbers of input reads, versus varying the KMER parameter in Velvet.

MacVector has a feature in the contig editor that simplifies circularization of contigs with overlapping direct repeats at the ends. All the contigs in black could be circularized to generate a 8,859 bp plasmid. Those in red were not full length, or could not be circularized.

  • First, note that Velvet (like most assemblers) does not like a massive over-abundance of coverage. If you submit too many reads, it confuses the algorithm and you have to be very careful with your choice of KMER to get a good assembly.

  • Second, note that the more reads you submit, the higher the KMER needs to be to generate a complete contig.

The take home lesson from this is that in general, you should tune the amount of data in your NGS set to be between 100x and 1,000x coverage as that gives the most flexibility in your choice of KMER. You should start with a KMER that is ~70% of the average length of your reads (it has to be odd, so 51 in this case), then vary the KMER to see what impact that has. This holds true for bacterial genome assemblies as well as simple plasmids like this. Next week we will discuss a tool to help you break up large NGS data files into smaller segments to facilitate this analysis.

Not sure if you have Assembler? Choose MacVector | About MacVector. If the screen that appears says “MacVector with Assembler, Pro Edition” then you have it. If not, you can sign up for a fully functional 21 day trial version.

Posted in Tips | Tagged , , , | Comments closed

Use a right-click in the Editor tab to see if your contig can be circularized

MacVector 16 incorporates no less than THREE different de novo assemblers, phrap, velvet and SPAdes. While all are great assemblers, with each having their own specific advantages, none of them will generate a circular sequence from input reads. However, MacVector 16 also includes a new feature to help you with this. If you are assembling reads representing plasmid sequences, or if you are closing gaps in a circular genome, you can find out if a contig can be circularized by double-clicking on it in the Assembly Project and then right-clicking* in the Contig Editor to bring up a context-sensitive menu.

NewImage

The algorithm looks for a perfect overlap between the ends of at least 20 bases. If no overlap exists, the menu item is greyed out and reads “Cannot Circularize Consensus”. Otherwise it indicates the length of the overlap. If you select the menu item, a new sequence window opens containing the circularized consensus of the contig, with all gaps removed.

*To right click with a trackpad hold down [CTRL] and click once or tap with two fingers. MacVector has many “right click” menus with extra functionality.

Posted in Tips | Tagged , , | Comments closed

Getting Started with MacVector

Any molecular biologist should be able to start getting useful results within an hour of starting to use MacVector by just exploring the toolbar and menus. Everything is logically named to be obvious. However, there’s still a lot of functionality that a user will never know exists. That is unless you read the manual or the help pages and we all know nobody does that any more!
This blog post shows all the different resources we have available to help you get the most of MacVector.

But first before you read anything watch this video on how to directly download a sequence into MacVector from the Entrez database. It’s always better to learn with your own sequences!

Getting Started guide

The first document that any user should turn to is Getting Started with MacVector. This document is geared for both new users and those who already use MacVector and introduces you to the new functions and features in recent versions.

MacVector Help

For specific and up to date information turn to the MacVector Help directly within the MacVector Application itself.

Weekly tips email newsletter

We have a short weekly email with tips on using MacVector. Most of the tips are answers to questions that have recently been asked by users. We think that if someone has a question about it, then others probably do too! We will never forward your information on to a third party and you are able to easily unsubscribe from these emails.
Every email is meant to be read in a few minutes or quickly skipped if it is not relevant. They rarely have more than three short paragraphs.
Here are some past posts.

101 tips

These blog posts show short tips into MacVector’s functionality.

Screencasts

Sometimes a video is more instructional than text. There are short video tips on using MacVectoron our blog and our YouTube channel
These cover subjects as diverse as adding primers to the Primer database, or aligning large sequencing reads datasets against a genome. Each one is less than 2 minutes and generally shorter than that!
The latest screencasts include showing Missing Features on unannotated sequences, searching for sequences in the Entrez database checking the orientation of a ligated insert using MacVector’s Restriction Digest and Agarose Gel tools.

Workshops

We will come and run MacVector Support to find out about workshops in your area.

Documentation

The documentation folder in the MacVector application folder (Finder Window> Applications > MacVector > Documentation); as well as our website contains many excellent resources:

Manuals

  • The MacVector 16.0 Workshop Manual is geared for those who already use MacVector and introduces you to the new functions and features in MV 10.0 through 16.0. It is still useful if you are not very familiar with any version of MacVector.
  • The latest version of the full MacVector User Guide, MV 12.6 (July, 2012) is available in PDF format. This is the most recent edition of the User Guide. We update the User Guide every couple of years.

Tutorials

In the documentation folder right in MacVector you will also find some PDF tutorials. All these tutorials work using sequence files that you will find directly in the “Tutorial Files” section of the MacVector application folder.

What’s New in MacVector

Here’s a list of all new features over the many versions.

Posted in Tips | Tagged | Comments closed

The MacVector Team are at ASM Microbe 2018 in Atlanta this week.

We’re at ASM Microbe 2018 in Atlanta from Friday until Sunday (8–10th June).

The exhibit hall hours are again different from the last two years.

  • Friday, June 8th – 10:30 AM – 5:00 PM
  • Sat. June 9th – 10:30 AM – 5:00 PM
  • Sunday, June 10th – 10:00 AM – 4:00 PM
  • We’re on booth 1410.

    Please do drop by. We’ll be showing our latest release, MacVector 16, and previews of our upcoming release MacVector 17.

    If you’ve never used MacVector before, or you are a power user, then please drop by and say hello. We guarantee we can teach you something new. If not, then hopefully you’ll be able to teach us something new.! So it’s a win:win for all.

    See you in Atlanta!

    Kevin and Chris in Boston ASM2016

    The Twitter hashtag looks to be #ASMMICROBE2018.

    Posted in Meetings | Tagged | Comments closed

    Workflows on designing, testing and storing primers in MacVector

    MacVector has many primer tools to make designing, analyzing and cataloging your primers easy. Here are a few typical workflows.

    Designing primers

    Amplifying a gene

    You can design a set of primers to amplify a gene in as little as three mouse clicks.

    1. Open your sequence.
    2. Open the MAP view, and click on a feature.
    3. Go to ANALYZE > PRIMER DESIGN/TEST(PAIRS).
    4. Click OK.

    You will get a ranked list of the best primer pairs to amplify that feature along with a spreadsheet table, that you can copy and send to your oligo synthesis service.

    Design a single targeted primer with a tail.

    QuickTest Primer tool gives great flexibility for designing primers with tails or mismatches.You can slide your primer along your template to find the optimal sequence. You can add mismatches and view their affect on any CDS features. You can add restriction sites too and view “one out” sites.

    1. Select a 20 bp region around the location you want your primer to be.
    2. Click ANALYZE > QUICKTEST PRIMER.
    3. Slide the primer along your template until the oligo is optimal.
    4. Add a restriction site or mutation in the optional tail.
    5. Hover over a green (or red) putative restriction site to view the change to your sequence.
    6. Double click on that putative site to add to your primer. Watch how the coding is changed.
    7. Click Add to Database.. to save your primer.
    MacVector

    Testing Primers

    How do I test a pair of primers?.

    You can map pairs primers against your template to see how well they would amplify your target.

    1. Open your template sequence
    2. Go to ANALYZE | PRIMER DESIGN/TEST(PAIRS).
    3. For the left primer click USE THIS Primer
    4. Type or paste in your forward primer
    5. Repeat for the reverse primer
    6. Click TEST

    Storing primers in the Primer Database.

    MacVector’s Primer Database allows you to save and retrieve regularly used primers. You can also scan sequences for potential primer binding sites using Primer Database Search. The tool comes with a starter database of primers, but you can use existing subsequence files or import primers from Excel.

    To save a primer from QT Primer to the Primer Database.

    1. Open ANALYZE > QUICKTEST PRIMER (INDIVIDUAL)
    2. Design your primer
    3. Click ADD TO…
    4. Give the primer a name and add a comment. Click OK

    To save a primer from Primer Design to the Primer Database.

    1. Open ANALYZE > PRIMER DESIGN/TEST (PAIRS)
    2. Design your primer pair
    3. In the spreadsheet right click on a primer
    4. Choose ADD PRIMER TO DATABASE
    5. Give the primer a name and add a comment. Click OK

    To use the Primer Database Search:

    1. Open your sequence
    2. Select ANALYZE > PRIMER DATABASE SEARCH
    3. Choose parameters and click OK

    Design a primer to match an existing primer from the primer database

    The Primer database allows you to store your own collection of primers. You can design new primers to match regularly used ones.

    1. Open your template sequence
    2. Switch to the Map tab and select the region you want to amplify.
    3. Go to ANALYZE | PRIMER DESIGN/TEST(PAIRS).
    4. For the left primer click USE THIS Primer
    5. Use the drop down menu to enter the existing primer from the primer database
    6. Click OK
    Posted in Techniques, Tips | Tagged , | Comments closed

    MacVector 16 and macOS High Sierra 10.13.4.

    Apple have just released macOS® High Sierra 10.13.4. This may appear to be just an incremental update to macOS High Sierra. However, under the hood there are some major changes, not least is a new warning dialog if you are running an older application on macOS High Sierra.

    With the release of macOS High Sierra 10.13.4 Apple have added warnings to all 32 bit applications and it is possible that with the next macOS release that 32 bit applications will no longer work. Apple have stated:

    "....macOS High Sierra would be the last version of macOS to run 32-bit apps without compromise." 

    We’re pleased to inform you that MacVector 16 is fully supported and compatible with macOS High Sierra 10.13.4

    However, older versions of MacVector are not compatible with this OS release and you may see the following dialog when you start MacVector 13.5 or earlier.

    Blank Skitch Document

    Here at MacVector we always strive to stay ahead of the game. When Apple recommended that all applications should be 64 bit, we immediately started the move to making MacVector a fully 64 bit application, culminating in the release of MacVector 14 in 2015. MacVector is a modern Mac application that takes full advantage of all the new technologies and ease of use of today’s macOS operating system.

    If you are using an older version of MacVector then see what you are missing. Why not download a trial version, and request an upgrade quote.

    Posted in Tips | Tagged , , | Comments closed

    Assemble bacterial genomes in minutes on your Mac laptop

    MacVector with Assembler contains some remarkably powerful algorithms for assembling Next Generation Sequencing (NGS) data. Not so long ago, you needed a powerful Linux server with lots of memory for de novo assembly of whole genomes. But with advances in the efficiency of algorithms and improvements in hardware, it is now possible to assemble quite large genomes on a Mac laptop.

    MacVector 16 incorporates two separate NGS de novo assemblers, Velvet and SPAdes. Both are very capable assemblers with a small memory footprint. Velvet is significantly the faster of the two, but SPAdes often generates longer contigs as it does a slightly better job at resolving repeats, plus it can handle many more data types for mixed read assemblies and has a smaller memory footprint, allowing it to be used for larger data sets. With Velvet you often need to tweak the parameters for optimal performance, whereas SPAdes usually “just works”. SPAdes can often generate meaningful assemblies from relatively poor data where Velvet will fail without considerable tweaking of the parameters.

    Both are invoked the same way: use File | New | Assembly Project to create a new project, then click on the Add Reads button and select the read files you want to import. Typically these are paired-end reads (either interleaved or as separate files), but they can be unpaired reads, consensus sequences exported from a different assembly, Ion Torrent, PacBio or Oxford Nanopore reads. You can also import compressed (gzip) files directly, with no need to uncompress them, saving a lot of disk space. Finally, click on the Velvet or SPAdes toolbar button to run the algorithms. The end result will be a number of contigs.

    Here are some examples of performance, with all tests run on a 2013 2.7 GHz MacBook Pro with 16 GB RAM

    NewImage

    In the case of the small Mycobacterium genome, Velvet completed the assembly in a little over a minute. Even a moderately large ~7 Mbp Streptomyces sp assembly of 5 million HiSeq reads took just 16 minutes with Velvet and less than an hour with the more memory efficient SPAdes algorithm.

    For a more in depth discussion of these results, please see our recent blog post.

    Posted in Tips | Tagged , , , , | Comments closed

    Simple Assembly of Sanger Sequencing Files with MacVector Assembler

    With MacVector Assembler, assembling ABI Sanger Sequencing files is simple, fast and accurate. MacVector uses the popular phred/phrap/cross_match set of tools from the University of Washington. To improve accuracy, and to help resolve repeats, these tools use “quality scores” (popularly known as “phred scores”), giving them an advantage over many other methods. To assemble two or more ABI files, follow these steps.

  • Use File | New | Assembly Project to create a new project
  • Click on the Add Seqs toolbar button and select all of your ABI (or SCF) chromatogram files to import
  • Click on the phred toolbar button – this re-calls the traces and generates quality scores (no need to select any items in the project, though you can to run phred on specific files)
  • Click on the phrap toolbar button and accept the defaults
  • After phrap has run, you will be presented with one or more contigs (assuming your ABI reads actually overlap). If you double-click on one of those, a contig editor will open letting you view and edit the actual alignments.

    NewImage

    Not sure if you have Assembler? Choose MacVector | About MacVector. If the screen that appears says “MacVector with Assembler, Pro Edition” then you have it. If not, you can sign up for a fully functional 21 day trial version.

    Posted in Tips | Tagged , , , | Comments closed

    An overview of assembling sequencing data with MacVector’s Assembler plugin

    To assemble various types of sequencing reads, follow these steps.

    NewImage

    • Choose File | New | Assembly Project to create a new empty project file.

    Then follow one of the following:

    To create a de novo assembly from Sanger reads

    • Click on the Add Reads tool bar button, then select the sequence files you wish to assemble and click on the Open button. Read(s) file(s) can also be drag and dropped on the open Assembly Project window.
    • Click Phred to basecall sequences in the project. Note that if no sequences are selected, phred will be run on ALL of the files in the project.
    • Click Phrap to assemble the reads.

    To create a de novo assembly from NGS datasets

    • Click on the Add Reads tool bar button, then select the sequence files you wish to assemble and click on the Open button. File(s) can also be drag and dropped on the open Assembly Project window. Paired reads files are automatically detected.
    • Choose either SPAdes or Velvet to assemble the reads.

    To create a reference assembly

    • Click the Add Reads button, then select the sequence files you wish to assemble and click on the Open button.
    • Click Add Ref, select the sequence file(s) you wish to align the reads against and click on the Open button.
    • Click Bowtie to map all read files against all of the reference sequences in the project.

    Not sure if you have Assembler? Choose MacVector | About MacVector. If the screen that appears says MacVector with Assembler, Pro Edition then you have it. If not, you can sign up for a fully functional 21 day trial version

    Posted in Techniques, Tips | Tagged , , , , , | Comments closed