General musings from the MacVector team about sequence analysis, molecular biology, the Mac in general and of course your favorite sequence analysis app for the Mac!

Changes to isoelectric point calculation in MacVector

We have a policy of constant improvement in all areas of MacVector, even with algorithms and analyses that have been the same for many years. Recently we changed the way that the isoelectric point of a protein was calculated. The previous version of the algorithm was accurate for short proteins, however for longer amino acid sequences the values were not as accurate as they should have been. There were three problems that caused this discrepancy:

(a) using some out of date values for pI,

The pI values for each residue were not as accurate as they should have been. The upper set is the old values, the lower set are the new ones. Additionally we also did not calculate for Cys;


Asp   Glu    Tyr     Lys    Arg    His   Cys   NH3  COOH
3.86, 4.25, 10.10,   9.80, 12.48, 6.00, n/a,  8.00, 3.00

Asp   Glu    Tyr     Lys    Arg    His   Cys   NH3  COOH
3.90, 4.07, 10.46, 10.54, 12.48, 6.04, 8.18, 8.20, 3.65

Incidentally the latest values are actually taken from the Wikipedia page. Many different sources were reviewed, however, we discovered a fair amount of discrepancies between many sources. Wikipedia seemed to have the most common values so these values were used. The algorithm itself is unchanged and should give the same results as the Expasy server in theory, as both use the same approach. However, in real usage there are slight differences.

(b) using inaccurate values for each amino acid’s molecular weight (only going out to 1 decimal place rather than 4 or 5)

Here’s the raw data for molecular weight; With the old values in column one and the new values in column two. The values are taken from www.webqc.org/aminoacids.php

0.0,

0.0,

A

71.07

71.07822

C

103.13

103.14372

D

115.08

115.08792

E

129.11

129.11462

F

147.17

147.17472

G

57.05

57.05162

H

137.14

137.13992

I

113.15

113.15832

K

128.13,

128.17292

L

113.15

113.15832

M

131.19

131.19712

N

114.10

114.10312

P

97.11

97.11572

Q

128.13

128.12982

R

156.18

156.18642

S

87.07

87.07772

T

101.10

101.10442

V

99.13

99.13162

W

186.22

186.21092

Y

163.17

163.17412

B

114.59

114.10312,

Z

128.62

128.62222

X

110.00

118.836325

0.0

0.0


The old and new values are pretty close, however, the small errors build up over a couple of thousand residues.

(c) The third issue was that it used 32 bit “short float” variables which again lost accuracy with larger proteins.

The calculations now use 64 bit floating points variables.

Technorati Tags:

This entry was posted in Algorithms, Releases. Bookmark the permalink. Both comments and trackbacks are currently closed.