With MacVector 12.5 we’ve added additional multiple sequence alignment (MSA) algorithms. Muscle and T-coffee have been added to the Multiple Sequence Alignment editor complementing the existing ClustalW algorithm. We’ve wanted both of these for a while now and judging from the results of last year’s survey so have many users.
Both T-Coffee and Muscle are progressive alignment algorithms as is ClustalW. Progressive alignments generally build a guide tree that represents the pairwise relationships between each possible pair of sequences in the alignment. A multiple sequence alignment is then built sequentially using the tree as a construction guide. T-Coffee builds a library of all pairwise alignments but also aligns each sequence in the pair with a third sequence in the sequence set before building the MSA. Muscle does not do a pairwise alignment but instead uses an approximate method of comparing the number of short subsequences (k-mers, k-tuples or words) that each pair of sequences share. You can immediately see how this is much faster for alignments containing many sequences where the number of pairwise alignments needed to construct the tree is high.
T-Coffee is regarded as being slightly slower than ClustalW but will produce more accurate alignments for distantly related amino acid sequences. Here’s the original publication. Incidentally T-Coffee stands for Tree based Consistency Objective Function For AlignmEnt Evaluation.
Muscle is generally regarded as faster than Clustalw and T-Coffee at the penalty of being slightly less accurate.
All three algorithms are integrated into the MSA editor. This means you can try all three algorithms on the same alignment to see the results.
Here’s a few benchmark for protein sequence alignments (the files are in the Sample Files folder in the MacVector application folder:
– Clustalw: 1min26secs
– Muscle: 12 seconds
– T-Coffee: 17mins22secs
Because of the initial step of constructing the pairwise alignments tree, progressive alignment algorithms have difficulties with alignments where all sequences do not share a common region. For example take an alignment of three sequences where you have 5kb sequence and two regions of this sequence of around 1KB long. One subsequence aligns from 1,000 to 2,000 of the 10kb sequence and the other aligns at 4,000 to 5,000. Most progressive alignments will try to create initial pairwise alignments of all combinations of sequences and that skews the alignment so that it prefers to align the sequences so that there is at least one segment of overlap between all of the input sequence. Due to not creating pairwise alignments the “Muscle” algorithm is unique amongst these three algorithms as it will align this type of data as long as the “Diagonals” optimization parameter is set to “On”.
Technorati Tags: Sequence Alignments, MacVector