101 things you (maybe) didn’t know about MacVector: #41 – Extracting raw data from chromatogram files

Have you ever wanted to know exactly what the total signal value is for individual peaks in a chromatogram file? Perhaps you are looking for mixtures of residues at a particular location and want to get some idea of the relative proportions? You can open .ab1 and .scf chromatogram files directly in MacVector and view the “areas under the curve” in the Raw Data tab. You can open a chromatogram file by selecting File | Open in MacVector, or by dragging the file onto the MacVector application icon, whether it be on the dock, on your desktop, or in the MacVector application folder. Double-clicking on an .ab1 or .scf file *may* open that file in MacVector, depending on the other applications you have installed on your machine.

D01a scf Editor

When you click on the Raw Data tab you get a textual representation of the data.

D01a Raw Data

The first four columns are fairly self explanatory – the number of the residue, the residue at that position (A, G. C, T etc), the quality value if set (e.g. after a phred basecall) and the position of the peak in the chromatogram.

The “Tot. Area” value is the total value of all four traces under the curve for that peak. It should be noted that MacVector calculates the sides of each peak as the mid way point between the current peak and the peaks on either side. It does not attempt to account for the “wide base” you might see when an isolated residue’s trace bleeds into adjacent peaks.

Following on from that, the “A Area” is the value under the peak for the “A” trace, and the “A%” is the percentage of the entire signal that the “A” area represents. This repeats for the “C”, “G” and “T” traces.

Finally, the “mix” columns list the traces that exceed the % mix at that position to help you identify potential mixed alleles at each position. So, if you see “AGC” at the “15% mix” level, that means the A, G and C traces both represented more than 15% of the signal at that position. The “45%” mix is the most stringent – if you see a pair of residues in that column, that is a good sign that this is a position you might investigate in more detail if you are looking for allelic differences in a sample. Of course, because of sequencing artifacts, the beginning (as in this example) and end of sequencing runs are most likely to contain errors.

The entire plot can be selected, copied, and pasted into Microsoft Excel if you want to analyze the data in more detail. The data is presented in tab-delimited format so that when you copy and paste, each value gets pasted into a separate cell.



This is an article in a long running series of tips to help you get the most out of MacVector. If you want to get notified every time a new tip gets published, follow us @MacVector on twitter (or check the feed for the hashtag #101MacVectorTips) or like us on Facebook.

This entry was posted in 101 Tips and tagged . Bookmark the permalink. Both comments and trackbacks are currently closed.