14 Handbook of Plant Biotechnology
14 Handbook of Plant Biotechnology
14 Handbook of Plant Biotechnology
Table 4.3 A PAM 250 scoring matrix for amino acid substitutions. The row and column headings are the IUPAC single-letter codes
for each residue. The matrix is symmetric. Each number is the logarithm of the ratio of (i) the probability of substitution between
the row and column residues based on empirical data relative to (ii) the same probability derived from amino acid frequencies alone.
Thus, the more positive the numbers, the higher the probability of that particular substitution. Numbers along the diagonal are related
to the probability that the residue is identical after 250 PAM units (250 substitutions per 100 amino acids)—note that this does not
require that the site has not undergone any substitution events
A R N D C Q E G H I L K M F P S T W Y V B Z X ∗
A 2 −2 0 0 −2 0 0 1 −1 −1 −2 −1 −1 −3 1 1 1 −6 −3 0 0 0 0 −8
R −2 6 0 −1 −4 1 −1 −3 2 −2 −3 3 0 −4 0 0 −1 2 −4 −2 −1 0 −1 −8
N 0 0 2 2 −4 1 1 0 2 −2 −3 1 −2 −3 0 1 0 −4 −2 −2 2 1 0 −8
D 0 −1 2 4 −5 2 3 1 1 −2 −4 0 −3 −6 −1 0 0 −7 −4 −2 3 3 −1 −8
C −2 −4 −4 −5 12 −5 −5 −3 −3 −2 −6 −5 −5 −4 −3 0 −2 −8 0 −2 −4 −5 −3 −8
Q 0 1 1 2 −5 4 2 −1 3 −2 −2 1 −1 −5 0 −1 −1 −5 −4 −2 1 3 −1 −8
E 0 −1 1 3 −5 2 4 0 1 −2 −3 0 −2 −5 −1 0 0 −7 −4 −2 3 3 −1 −8
G 1 −3 0 1 −3 −1 0 5 −2 −3 −4 −2 −3 −5 0 1 0 −7 −5 −1 0 0 −1 −8
H −1 2 2 1 −3 3 1 −2 6 −2 −2 0 −2 −2 0 −1 −1 −3 0 −2 1 2 −1 −8
I −1 −2 −2 −2 −2 −2 −2 −3 −2 5 2 −2 2 1 −2 −1 0 −5 −1 4 −2 −2 −1 −8
L −2 −3 −3 −4 −6 −2 −3 −4 −2 2 6 −3 4 2 −3 −3 −2 −2 −1 2 −3 −3 −1 −8
K −1 3 1 0 −5 1 0 −2 0 −2 −3 5 0 −5 −1 0 0 −3 −4 −2 1 0 −1 −8
M −1 0 −2 −3 −5 −1 −2 −3 −2 2 4 0 6 0 −2 −2 −1 −4 −2 2 −2 −2 −1 −8
F −3 −4 −3 −6 −4 −5 −5 −5 −2 1 2 −5 0 9 −5 −3 −3 0 7 −1 −4 −5 −2 −8
P 1 0 0 −1 −3 0 −1 0 0 −2 −3 −1 −2 −5 6 1 0 −6 −5 −1 −1 0 −1 −8
S 1 0 1 0 0 −1 0 1 −1 −1 −3 0 −2 −3 1 2 1 −2 −3 −1 0 0 0 −8
T 1 −1 0 0 −2 −1 0 0 −1 0 −2 0 −1 −3 0 1 3 −5 −3 0 0 −1 0 −8
W −6 2 −4 −7 −8 −5 −7 −7 −3 −5 −2 −3 −4 0 −6 −2 −5 17 0 −6 −5 −6 −4 −8
Y −3 −4 −2 −4 0 −4 −4 −5 0 −1 −1 −4 −2 7 −5 −3 −3 0 10 −2 −3 −4 −2 −8
V 0 −2 −2 −2 −2 −2 −2 −1 −2 4 2 −2 2 −1 −1 −1 0 −6 −2 4 −2 −2 −1 −8
B 0 −1 2 3 −4 1 3 0 1 −2 −3 1 −2 −4 −1 0 0 −5 −3 −2 3 2 −1 −8
Z 0 0 1 3 −5 3 3 0 2 −2 −3 0 −2 −5 0 0 −1 −6 −4 −2 2 3 −1 −8
X 0 −1 0 −1 −3 −1 −1 −1 −1 −1 −1 −1 −1 −2 −1 0 0 −4 −2 −1 −1 −1 −1 −8
∗ −8 −8 −8 −8 −8 −8 −8 −8 −8 −8 −8 −8 −8 −8 −8 −8 −8 −8 −8 −8 −8 −8 −8 1
time consuming and, as a result, were feasible the reading frame of the encoded protein, it is
only on very powerful computers searching small necessary to translate in all six possible reading
databases. However, recently there have been frames, run MPsrch six times independently, and
advances both in the speed of the average computer manually combine the output.
and in the implementation of the algorithm. The
fastest running implementation of the true Smith–
Waterman algorithm is MPsrch. Though about BLAST
10 times slower than the BLAST algorithm (descr-
ibed below), it is acceptable for smaller databases. The most commonly used similarity search
The software is available free to academic method is the Basic Local Alignment Search
users from the Edinburgh Biocomputing Systems Tool (BLAST; Altschul et al., 1997). BLAST is
website (www.edinburgh-biocomputing.com/) for a heuristic modification of the Smith–Waterman
local use on a UNIX/Linux operating system. (1981) algorithm (i.e. it is not guaranteed to
MPsrch is also available online at the EBI website produce an optimal pairwise alignment and score);
(see Table 4.2). in practice, however, it performs very well. More
Unlike other similarity search programs (such importantly, it is orders of magnitude faster
as BLAST), MPsrch is currently only available than any true Smith–Waterman implementation.
for querying protein sequences against a protein The most popular online interface to BLAST is
database. If you have a DNA sequence, you must available at NCBI, where a standalone version is
perform the translation yourself (which may be also available for download.
done using the transeq program on the EBI There are several parameters controlling the
website). Thus, in the case where you do not know behaviour of the BLAST algorithm, and these