2003 Eur J Biochem 270 (16) Riveros-Rosas

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Eur. J. Biochem. 270, 3309–3334 (2003) Ó FEBS 2003 doi:10.1046/j.1432-1033.2003.03704.

Diversity, taxonomy and evolution of medium-chain


dehydrogenase/reductase superfamily
Héctor Riveros-Rosas1, Adriana Julián-Sánchez1, Rafael Villalobos-Molina2, Juan Pablo Pardo1
and Enrique Piña1
1
Depto. Bioquı´mica, Fac. Medicina, UNAM, Cd. Universitaria, Me´xico D.F., Me´xico; 2Depto. Farmacobiologı´a,
CINVESTAV-Sede Sur, Me´xico D.F., Me´xico

A comprehensive, structural and functional, in silico analysis with different subcellular, phylogenetic, and species distri-
of the medium-chain dehydrogenase/reductase (MDR) butions. This results from constant enzymogenesis and
superfamily, including 583 proteins, was carried out by use proteinogenesis within each kingdom, and highlights the
of extensive database mining and the BLASTP program in an huge plasticity that MDR superfamily members possess.
iterative manner to identify all known members of the Thus, through evolution a great number of taxa-specific new
superfamily. Based on phylogenetic, sequence, and func- functions were acquired by MDRs. The generation of new
tional similarities, the protein members of the MDR super- functions fulfilled by proteins, can be considered as the
family were classified into three different taxonomic essence of protein evolution. The mechanisms of protein
categories: (a) subfamilies, consisting of a closed group evolution inside MDR are not constrained to conserve
containing a set of ideally orthologous proteins that perform substrate specificity and/or chemistry of catalysis. In conse-
the same function; (b) families, each comprising a cluster of quence, MDR functional diversity is more complex than
monophyletic subfamilies that possess significant sequence sequence diversity.
identity among them and might share or not common sub- MDR is a very ancient protein superfamily that existed in
strates or mechanisms of reaction; and (c) macrofamilies, the last universal common ancestor. It had at least two (and
each comprising a cluster of monophyletic protein families probably three) different ancestral activities related to for-
with protein members from the three domains of life, which maldehyde metabolism and alcoholic fermentation. Euk-
includes at least one subfamily member that displays activity aryotic members of this superfamily are more related to
related to a very ancient metabolic pathway. In this context, bacterial than to archaeal members; horizontal gene transfer
a superfamily is a group of homologous protein families among the domains of life appears to be a rare event in
(and/or macrofamilies) with monophyletic origin that shares modern organisms.
at least a barely detectable sequence similarity, but showing
Keywords: protein taxonomy; protein evolution; medium-
the same 3D fold.
chain alcohol dehydrogenase; enoyl reductase; formalde-
The MDR superfamily encloses three macrofamilies, with
hyde dehydrogenase.
eight families and 49 subfamilies. These subfamilies exhibit
great functional diversity including noncatalytic members

Correspondence to H. Riveros-Rosas, Depto. Bioquı́mica, Fac. Medicina, UNAM, Apdo. Postal 70–159, Cd. Universitaria, México,
04510, D.F., México. Fax: + 52 55 5616 2419, Tel.: + 52 55 5622 0829, E-mail: [email protected]
Abbreviations: AADH, allyl alcohol dehydrogenase; ACR, acyl-CoA reductase; ADH, alcohol dehydrogenase; AL, alginate lyase; ARP, auxin-
regulated protein; AST, membrane traffic protein; BCHC, 2-desacetyl-2-hydroxyethyl bacteriochlorophyllide-a dehydrogenase; BDH, 2,3-
butanediol dehydrogenase; BDOR, bi-domain oxidoreductase; BRP, bacteriocin-related protein; CADH, cinnamyl alcohol dehydrogenase;
CCAR, crotonyl-CoA reductase; COG, cluster of orthologous groups of proteins; DHSO, sorbitol dehydrogenase; DINAP, dinoflagellate
nuclear-associated protein; DI-QOR, dark induced-quinone oxidoreductase; ELI3, elicitor-inducible defense-related proteins; ER, enoyl reduc-
tase; FADH, formaldehyde dehydrogenase; FAS, fatty acid synthase; FDEH, 5-exo-hydroxycamphor dehydrogenase; GATD, galactitol
1-phosphate dehydrogenase; GDH, glucose dehydrogenase; GSH, glutathione; HNL, hydroxynitrile lyase; LTD, leukotriene B4
12-dehydrogenase; MDR, medium-chain dehydrogenases/reductases; MP, maximum parsimony; MRF, mitochondrial respiratory function
protein; MSH, mycothiol; MTD, mannitol-1-phosphate dehydrogenase; NCBI, National Center for Biotechnology Information; NJ, neighbour-
joining; NRBP, nuclear receptor binding protein; PDH, polyol dehydrogenase; pER, probable enoyl reductase; PGR, 15-oxoprostaglandin
13-reductase; PIG3, animal P53-induced gen. 3; PKS, polyketide synthase; PKS-IAP, polyketide synthase-independent associated protein; QOR,
quinone oxidoreductase; QORL-1, quinone oxidoreductase-like 1; SORE, L-sorbose-1-phosphate dehydrogenase; SSP, sensing starvation protein;
TDH, threonine dehydrogenase; TED2, quinone oxidoreductase involved in tracheary element differentiation in plants; UPGMA, unweighted
pair-group method using arithmetic averages; Y-ADH, yeast alcohol dehydrogenase.
Note: a web site is available at http://lagunaÆfmedic.unam.mx/%7Eadh/
(Received 2 April 2003, revised 27 May 2003, accepted 5 June 2003)
3310 H. Riveros-Rosas et al. (Eur. J. Biochem. 270) Ó FEBS 2003

NAD(P)-dependent alcohol dehydrogenase (ADH) acti- their enzymogenesis. This analysis is valuable as a paradigm
vity is widely distributed in nature and is carried out by of protein evolution and provides information to under-
three main superfamilies of enzymes that arose independ- stand previously defined concepts such as protein family,
ently throughout evolution [1]. Their amino acid identity subfamily, and superfamily, and their relationships to
is 20% or less and they exhibit different structures and several protein classification efforts. Furthermore, recruit-
reaction mechanisms. The first superfamily corresponds to ment of selected members of this superfamily may offer
the Fe-dependent ADHs and makes up the smallest and clues about the evolution of some metabolic pathways, and
least studied family of alcohol dehydrogenases [2–4]. The show the evolutionary history of different organisms: for
second group includes the short-chain dehydrogenase/ example, ER was recruited from MDR and incorporated
reductase superfamily; this large family of enzymes do not into the multifunctional enzyme fatty acid synthase from
require a metallic ion as cofactor [5,6]. The third animals (not fungi or plants); additionally, the capacity for
superfamily is composed of zinc-dependent ADHs, and retinoic acid synthesis, a powerful regulator of genetic
is named preferentially medium-chain dehydrogenases/ expression active only in vertebrates, evolved in parallel to
reductases (MDRs) [7,8]. These enzymes usually require evolution of animal ADHs; and animal ADHs are involved
zinc atom(s) as cofactor and the family includes the in the synthetic or catabolic route of paramount modulators
classical horse liver ADH. In addition to these three such as epinephrine, serotonin, and dopamine [15].
NAD(P)-dependent ADH families, other minor families
of ADH exist, which use different cofactors such as FAD,
and pyrroquinoline quinone, among others; however, the
Materials and methods
distribution of these minor families is limited to some Extensive database searches for zinc-dependent ADH,
bacterial groups [1]. sorbitol dehydrogenase, threonine dehydrogenase, CADH,
To date, nearly 1000 protein sequences have been mannitol dehydrogenase, ER, and QOR were performed.
identified as MDR superfamily members [8–10]. Identifica- Protein sequence data were taken from SWISS-
tion of new members of the MDR superfamily is performed PROT + TrEMBL protein databases [16] and the Gen-
with high statistical significance using tools such as BLASTP Bank nonredundant protein sequence database at the
[11] or FASTA [12,13]. However, efforts to assign proteins to National Center for Biotechnology Information (NCBI)
families and/or subfamilies within the MDR superfamily [17]. Access to NCBI databases was achieved by means of the
have not been equally successful. Public proteins databases integrated database retrieval system ENTREZ [17]. Gapped
use different criteria to classify proteins, and therefore, BLASTP program with default gap penalties and BLOSUM62
several inconsistencies in the identification of protein substitution matrix was employed [11]. Thus, based on
subfamilies and families have been observed. Recently, selected protein sequences that belong to each of the
Nordling et al. [14], based on analysis of five complete subfamilies that compose the MDR superfamily, a search
eukaryotic genomes, and Escherichia coli, constructed an for homologous sequences was performed through BLASTP
evolutionary tree of the MDR in which at least eight families for each selected sequence to identify new members of MDRs
can be distinguished: dimeric ADHs in animals and plants; not yet recognized. Whenever a new sequence was identified
tetrameric ADHs in fungi (Y-ADHs), polyol dehydrogen- (P < 0.00001), the BLASTP search was repeated, seeking
ases (PDHs), quinone oxidoreductases (QORs), cinnamyl closer relative sequences. The procedure was repeated
alcohol dehydrogenases (CADHs), leukotriene B4 dehy- iteratively until no new members of MDRs were recognized.
drogenases (LTDs), enoyl reductases (ERs), and nuclear Progressive multiple protein sequence alignment was
receptor binding protein (NRBPs). ERs and NRBPs were calculated with the CLUSTAL_X package [18] using secondary
originally described [14] as acyl-CoA reductases (ACRs) and structure-based penalties and corrected according to results
mitochondrial respiratory function proteins (MRFs), of gapped BLASTP [11]. Dendrograms were calculated using
respectively; the Results section discusses why the names CLUSTAL_X [18] and displayed with TREEVIEW [19]. Phylo-
of these enzymes are described differently here. genetic analyses were performed with MEGA2 software [20],
Because the MDR protein families proposed by Nordling using both maximum parsimony (MP) and distance-based
et al. [14] were identified considering only a few genomes, it methods [UPGMA, and neighbour-joining (NJ)], with the
is possible that other protein families of the MDR may be Poisson correction distance method, and gaps treated by
identified if complete sets of their protein sequences are used. pairwise deletion. Confidence limits of branch points were
Furthermore, a larger set of MDRs will allow us to make a estimated by 1000 bootstrap replications.
more detailed taxonomic analysis. Therefore, in this report The procedure to define protein subfamilies and families
we analysed MDR taxonomy on the basis of the entire set of is explained with detail in the Results section.
currently known MDR members, and completed the work
initiated by Nordling et al. with identification of further
protein subfamilies that comprise each protein family within
Results
the MDR superfamily. To contribute to validation of the A total of 656 nonredundant sequences (allelic forms
eight protein families previously identified, we grouped excluded) were identified as members of MDR superfamily.
protein sequences employing a different method from that Of this total, 73 sequences were excluded from final analysis
used by Nordling et al. [14]. Indeed, the limited number of for one of the following reasons: (a) sequences with less than
protein sequences employed by Nordling et al. [14], 75 amino acids; (b) isozymes with 100% identity; (c) multiple
precluded them from identifying protein subfamilies. sequences corresponding to orthologous genes identified in
Finally, we analysed evolution of the MDR superfamily several species from the same genera, because they were
and identified some putative selective forces that directed considered redundant for the phylogenetic analysis; and
Ó FEBS 2003 MDR superfamily (Eur. J. Biochem. 270) 3311

be also QOR only by similarity to the second group of


QOR-related sequences.
In summary, GenBank reports might be produced before
characterization is completed and/or published; usually,
authors do not update the original GenBank report after
publication. Therefore, many proteins would already have
been characterized, but this information is not quoted in the
GenBank and other protein databases. Thus, to record
reliable functional identification for most proteins, an
extensive search for published papers by authors who made
contributions to GenBank for each of the MDRs was
carried out. This functional identification plus statistically
significant degree of similarities calculated with BLAST
(E-value), allowed us to identify many additional small
subfamilies as members of MDR superfamily. E-value
represents the number of alignments with an equivalent or
greater score, that would be expected to occur purely by
Fig. 1. Unrooted tree constructed with identified 583 nonredundant chance [23].
protein sequences that belong to the MDR superfamily. Each sequence is Table 1 lists the main protein families that are found with
coloured as follows: red, animals; green, plants; brown, fungi; light the MDR superfamily, as stated by several public protein
blue, protista; orange, bacteria; dark blue, archaea. Protein sequences databases. Several inconsistencies in the nomenclature for
were ascribed to different subfamilies, as indicated in the SWISSPROT protein subfamilies, families and superfamilies are observed:
database [16]. As a guide, the protein families considered by COG for example, Pfam [24] does not attempt to identify families
Database [30–32] are displayed (Table 1); grey pins mark the bound- or subfamilies in the MDR superfamily; PROSITE [25] uses
aries of clusters of orthologous groups of proteins (COGs).They do not motifs to identify two protein families in the MDR
correspond to the protein families and subfamilies proposed in this superfamily; PIR [26,27] uses distance-based criteria to
work. identify 119 families in MDR; CATH [28,29] uses structural
data to identify six superfamilies in MDR; COG [30–32]
uses phylogenetic criteria to identify six families; and
(d) duplicity in information, for example, two fragments of SYSTERS uses a non-distance-based method to identify
proteins in Streptomyces coelicolor (CAB53403 and 80 families. This discrepancy is due to the different criteria
CAB55521), were identified as the N- and C-terminus, used for defining each of these terms.
respectively, of the same protein (kindly confirmed by To clarify this, we have defined a protein subfamily as a
S. Bentley, Sanger Institute, Hinxton, Cambridge, UK; set of homologous (ideally orthologous) protein sequences
personal communication). Thus, 583 nonredundant protein that (a) performs the same function and (b) forms a
sequences were considered for phylogenetic analysis; of closed group in which identity, similarity, and statistical
these, 21 proteins belong to archaea, 234 to bacteria, 11 to significance between any two members of the closed group
protista, 62 to fungi, 148 to plants, and 107 to animals. are higher than to any other protein sequence outside the
The 583 sequences permitted construction of the unrooted subfamily, i.e. clusters of proteins with BLAST reciprocal
tree shown in Fig. 1. Protein sequences were ascribed to best hits. Often, members of protein subfamilies share
different subfamilies, as indicated in the SWISSPROT more than 30% sequence identity, and E-value of
database. Conserved groups with high degree of identity can approximately 10–30 or less. It should be mentioned that
be identified easily (e.g. class III ADH, plant ADHs, animal all-vs.-all BLAST-based searches have recently been used to
ADHs), as well as poorly conserved subfamilies, such as find orthologs [33–36], and that these methods bypass
sorbitol dehydrogenase, ER, or QOR. Conserved protein multiple alignments and construction of phylogenetic
subfamilies are identified because distances between their trees, which can be slow and error-prone steps in classical
members are short, and appear as a group of branches that ortholog detection [37].
join among themselves far from the centre of the tree. In The previously mentioned definition of subfamily is
comparison, poorly conserved subfamilies with low identity nearly identical to the approach employed in the SYSTERS
among themselves, resemble groups of long branches that database to define protein families or clusters of protein
depart close to the centre of the tree. However, the latter, sequences [38–40], but with the additional condition that all
more than being an inherent property of these subfamilies, sequences in a cluster must (ideally) share the same function.
might be due to problems concerning particular aspects with This functional criterion is necessary because true ortho-
regard to reliability of database information, because a logous proteins must perform the same function; if this last
significant fraction of functional annotations in databases condition is not true, then the proteins are paralogous. In
is dubious or even incorrect [21,22]. This problem arises contrast, paralogous proteins do not necessarily possess
because there are many noncharacterized sequences. different functions, in that by definition, two proteins are
An especially illustrative example is the case of the QOR/ said to be paralogous if they are derived from a duplication
f-crystallin subfamily, in which many protein sequences are event, but orthologous if they are derived from a speciation
assumed to be QOR only by sequence similarities with the event [41–44]. Therefore, initially a duplication event will
well-characterized animal QOR/f-crystallins. Thus, other produce two proteins possessing identical properties, and
noncharacterized distantly related sequences are assumed to only after evolution might they acquire different functions.
3312 H. Riveros-Rosas et al. (Eur. J. Biochem. 270) Ó FEBS 2003

Table 1. Protein families/subfamilies within medium-chain dehydrogenase/reductase superfamily (MDR) as it is indicated on several public databases.

Database Protein families/subfamilies considered within MDR

Pfam [24] PF00107 adh_zinc (consider only one superfamily)


PROSITE [25] PDOC00058 Zinc-containing alcohol dehydrogenases
Considers two patterns or signatures: PS00059 ADH-ZINC PS01162 QOR_ZETA_CRYSTAL.
SCOP [147] Family: alcohol dehydrogenase-like, N-terminal domain
Family: alcohol/glucose dehydrogenases, C-terminal domain
Considers two similar families and both contain the same five domains:
Sorbitol dehydrogenase/secondary ADH/Glucose dehydrogenase/Alcohol
dehydrogenase/Quinone oxidoreductase
InterPro [148] IPR002085 Zinc-containing alcohol dehydrogenase superfamily.
Considers two families: IPR002364 Quinone oxidoreductase/zeta-crystallin
IPR002328 Zinc-containing alcohol dehydrogenase
Considers one subfamily: IPR004627 L-threonine 3-dehydrogenase
CATH [28,29] Considers six homologous superfamilies based on structural data.
Two of them are domains contained inside the other four multidomain superfamilies
Homologous superfamily 3.40.50.720 NAD(P)-binding Rossmann-like domain
Homologous superfamily 3.90.180.10 Medium-chain alcohol dehydrogenases, catalytic domain
Homologous superfamily 5.1.120.1 Oxidoreductase (NAD(A)-CHOH(D));
include animal ADH, class III ADH
Homologous superfamily 5.1.2796.1 Oxidoreductase; include secondary ADH
Homologous superfamily 5.1.1670.1 Oxidoreductase: include quinone oxidoreductase
Homologous superfamily 7.1.147.10 Oxidoreductase; include sorbitol dehydrogenase
PIR-PSD (MIPS/IESA) [26,27] SF000091 alcohol dehydrogenase superfamily.
Considers 119 protein families, the main protein families are:
Fam000150 (94 sequences: includes animal ADH, plant ADH, class III ADH)
Fam000152 (18 sequences: includes fungi ADH)
Fam007438 (31 sequences: includes CADH)
Considers two motifs:
PCM00059 zinc-containing ADH
PCM0162 Quinone oxidoreductase/zeta crystalline
COG [30–32] Considers six families or Clusters of Orthologous Groups of proteins (COGs):
COG 1063: Threonine dehydrogenase and related Zinc-dependent dehydrogenases
COG 1062: Zinc-dependent alcohol dehydrogenases, class III (and related)
COG 1064: Zinc-dependent alcohol dehydrogenases (include CADH and fungi ADH)
COG 0604: NADPH: quinone oxidoreductase and related Zinc-dependent oxidoreductases
COG 3321: Polyketide synthase (PKS) modules and related proteins
(enoyl reductase from PKS and FAS)
COG 2130: Putative NADP-dependent oxidoreductases AADH/LHD
(and related)
SYSTERS [38–40] adh_zinc Include 80 clusters (families), organized into superfamilies;
the main superfamilies are:
Superfamily of cluster O60787: includes six aditional clusters with sequences from animal ADH,
plant ADH, class III ADH (equivalent to COG1062)
Superfamily of cluster N60795; includes 13 aditional clusters with sequences from CADH,
fungi ADH, DHSO, TDH, secondary ADH among others (equivalent to COG1063 plus COG1064)
Superfamily of cluster N60499: includes five aditional clusters with sequences
from QOR/f-crystallin and related (equivalent to COG0604)
Superfamily of cluster O59495 and O59531: includes other nonrelated clusters
(equivalent to COG3321).

This explanation is obligatory because some papers provide perfect clusters (in agreement with SYSTERS nomencla-
inexact definitions [45–47]. ture), is advantageous over distance-based clustering meth-
This non-distance-based method allows us to sort MDR ods because it is not necessary to set an arbitrary identity
sequences into nonoverlapping clusters (subfamilies), in cutoff value to define a subfamily (or families in the
which the granularity of this clustering is determined by SYSTERS database), and permits identification of both
data and not by a user-supplied data-dependent cut-off [38]. highly and poorly conserved groups of orthologous pro-
Identification of closed groups of protein sequences, or teins. Furthermore, Krause & Vignron [39] showed that this
Ó FEBS 2003 MDR superfamily (Eur. J. Biochem. 270) 3313

method is highly conservative, as the probability of


obtaining a false positive is extremely low, i.e. we almost
never observe sequences that do not belong to a cluster
being included.
On the other hand, this subfamily definition fits with the
widely used nomenclature proposed by Persson et al. [7] for
the MDR superfamily. Thus, only closed groups with at
least one characterized protein were listed as true protein
subfamilies in this work. This criterion excluded some minor
clusters without characterized proteins, or protein sequences
located in the twilight zone, which can not be assigned with
certainty to a protein subfamily. Furthermore, there is
always the possibility that best match in a database hit is
solely a well-conserved paralog [22] that in reality belongs to
a related, but different, protein subfamily.
As a consequence of application of these criteria,
subfamilies identified in this work are equivalent to a
carefully crafted, manual-curated version from clusters of
proteins proposed in the SYSTERS database. Figure 2 Fig. 3. Unrooted tree constructed with 328 protein sequences that belong
shows an unrooted tree constructed with all the MDR to MDR in eukaryota. Each sequence is coloured as follows: red,
protein sequences identified in bacteria and archaea, with animals; green, plants; brown, fungi; light blue, protista. The three
recognized protein subfamilies indicated. Figure 3 shows an main clusters of subfamilies (macrofamilies) are indicated with roman
equivalent unrooted tree constructed with protein sequences numerals and the name of each family and subfamily is abbreviated.
identified in eukaryota. In both trees, the main subfamilies Grey pins mark the boundaries of protein families; yellow-capped pins
of the MDR superfamily are easily visualized. Comparison mark the boundaries of protein macrofamilies. COGs are also indi-
of Figs 2 and 3 clearly shows that in addition to the well- cated in boxes. The complete names of the protein subfamilies are
characterized protein subfamilies that exist simultaneously indicated in Tables 3–8, according to the protein family to which they
in several phylogenetic lineages, there are additional belong. Subfamilies with restricted distribution are shown in italics,
with subfamilies with broad distribution shown in normal font.

subfamilies associated with only one phylogenetic lineage,


suggesting a more recent evolutionary origin.
It can also be observed that several protein subfamilies
are formed by clusters of related subfamilies (Figs 2 and 3).
According to the previous proposal for protein subfamilies,
we define a protein family as a set of protein subfamilies in
which identity and/or similarity of proteins in the family
is higher among them than when compared with other
proteins belonging to a different family. Therefore, a family
is composed of a closed group of subfamilies in which the
closest relative of one subfamily is always another subfamily
member from the same family. However, although protein
subfamily definition used in this work comprises (ideally) a
natural unit (orthologous proteins with the same function),
the protein family is not a straightforward concept, as it is
necessary to set author cutoff criteria to identify it. In fact,
Fig. 2. Unrooted tree constructed with identified protein sequences that with tools such as BLASTP, identification of the protein
belong to MDR in bacteria and archaea. Subfamilies were identified superfamily to which one new protein belongs is easy and
based on statistical identity and similarity calculated with BLAST. Only accurate. An additional functional analysis of the new
subfamilies with at least one functionally characterized protein protein permits recognition of the orthologous group
received a name. The three main clusters of subfamilies (macro- (subfamily) to which this protein belongs. Nonetheless, at
families) are indicated with roman numerals and the name of each present there are no universal criteria to classify proteins
family and subfamily is abbreviated. Grey pins mark the boundaries of into intermediate categories located between subfamily and
protein families; yellow-capped pins mark the boundaries of protein superfamily. Indeed, a universally accepted protein family
macrofamilies. COGs are also indicated in boxes. The complete names definition, does not exist; thus, different authors use
of the protein subfamilies are indicated in Tables 3–8, according to the different concepts with a different emphasis, e.g. homology
protein family to which they belong. Subfamilies present only in one in sequence, structure, and/or function.
kingdom are indicated in italics: bacteria or archaea; normal type Therefore, using BLAST to compare E-values and identity/
indicates subfamilies present in two or more kingdoms. All archaea similarity values among different protein subfamilies, we
sequences are coloured in blue, for clarity, bacterial sequences are can identify several clusters of protein subfamilies in the
coloured in the font colour selected to name each subfamily. MDR superfamily. In this way, at the highest level of
3314 H. Riveros-Rosas et al. (Eur. J. Biochem. 270) Ó FEBS 2003

Later, subsets of proteins that belong to each of the


proposed three macrofamilies, or eight families, were used
to validate by bootstrap analyses, the proposed 49 protein
subfamilies. Figure 5 shows a phylogenetic tree constructed
with protein sequences belonging to macrofamily II of
MDR superfamily. The additional phylogenetic trees con-
structed with protein sequences pertaining to macrofamilies
I and III, and to each of the kingdoms to which belong the
MDR proteins (archaea, bacteria, fungi, animals or plants)
are not shown.
Table 2 shows a comparison of the proposed protein
families that comprise MDR superfamily, according to
COG database, the Nordling et al. paper [14], and the three
macrofamilies or main clusters identified in this work. It is
clear that information in addition to sequence data is needed
to define the true protein families comprising the MDR
superfamily. Consensus agreements among protein taxon-
omists must be reached before setting up intermediate
categories between ideally true orthologous clusters (sub-
families in this paper) and superfamilies. Sequence data
Fig. 4. Schematic diagram showing the main relationships between dif- alone are not enough to set up true protein families with a
ferent protein subfamily members of macrofamily II (COG1064), listed real biological sense. It is important to point out that the
in Table 4. The arrows point toward subfamilies with the highest sta- intermediate categories proposed in COG database, the
tistical significance (E-value); not all possible relationships are dis- Nordling et al. paper [14], and in this work create a
played. Two clusters of closely related subfamilies (CADH family, and congruent pattern despite the different criteria used to define
Y-ADH family) are seen, but all are interrelated among themselves, them in each study.
forming a closed group. The relationships between subfamilies are not Tables 3–8 present lists of subfamilies in the eight families
necessarily symmetric; nonsymmetric relationships can be observed in of the MDRs, and their distribution into the different
amino acid sequences [39]. Inside each subfamily, taxa, where found, kingdoms, with a brief summary for each subfamily (a
are indicated. Identity (I), indicated as percentage is showed for complete list with all protein sequences and consulted
illustrative purpose only. The dotted line separates the CADH and references was included as supplementary material and can
Y-ADH families. be requested from the publisher or the authors).
Interestingly, archaea protein sequences appear to be
concentrated in only two families (macrofamily I: PDH
integration, we herein identify three great clusters or family, COG1063, and macrofamily II: Y-ADH family,
macrofamilies in the MDR superfamily (see Figs 2 and 3). COG1064), suggesting that these two families, with a
At lower levels of integration, we identify six clusters of universal distribution, are the probable ancestral protein
orthologous groups of proteins (COGs), that comprise the families in the MDR superfamily. However, in macrofami-
MDR superfamily (according to the COG database ly III, a small uncharacterized cluster related to crotonyl-
proposed by Koonin & Tatusov (see Table 1) [30–32]), or CoA reductase (CCAR) subfamily also possesses archaea
the eight protein families recently proposed by Nordling members, also suggesting an ancient group.
et al. [14]. To illustrate the criteria used to identify clusters of In bacterial phyla, the taxa with sequences most related
protein subfamilies, Fig. 4 illustrates schematically the main to eukaryota are firmicutes (Gram-positive) and proteo-
relationships among the different subfamily members that bacteria (c subdivision), see Tables 3–8. However, this
comprise macrofamily II in Figs 3 and 4 (this big cluster is proximity could simply be due to the fact that these
equivalent to COG1064, and comprises the Y-ADH and bacterial clades possess the greatest number of completely
CADH families from Nordling et al. [14]). Similar data sequenced genomes. Table 9 shows the number of iden-
were obtained with the other protein subfamilies (not tified genes that belong to the MDR in completely
shown). sequenced species. There is great variability with respect to
Additionally, the proposed taxonomic categories (sub- total number of genes identified in each organism, even
families, families, and macrofamilies) were validated by whitin the same taxonomic category, as well as variability
bootstrap analysis with conventional phylogenetic methods, with respect to the number of genes identified in MDR
using both distance-based methods (neighbour-joining and superfamily.
UPGMA), and character-based methods (maximum parsi-
mony). To perform this phylogenetic analysis, only subsets
Macrofamily I: PDH family (COG1063): DHSO, TDH,
of the MDR superfamily were utilized (the complete set
and related subfamilies
demands excessive resources of computing power). Initial
subsets employed for phylogenetic analysis included protein This family was formerly denominated by Nordling et al.
sequences that belong to only one kingdom (archaea, [14] as PDH (polyol dehydrogenase) family; however,
bacteria, animals, plants, or fungi). These kingdom-specific after including bacteria and archaea members, it is clear
subsets were used to validate by bootstrap analysis the that less than half of their subfamily members possess an
proposed taxonomic categories: macrofamilies and families. activity related to polyol metabolism. The PDH family is
Ó FEBS 2003 MDR superfamily (Eur. J. Biochem. 270) 3315

Fig. 5. Phylogenetic tree constructed with the protein sequences that belong to macrofamily II within MDR superfamily. Shown is the consensus
UPGMA tree which was constructed with the computer software MEGA v. 2.1 [20], using the 50% majority-rule. Sequence names are shaded as
follows: red, animals; green, plants; brown, fungi; light blue, protista; orange, bacteria; dark blue, archaea. The circles indicate those nodes
supperted in >70% (open), >80% (grey) or >90% (closed) of 1000 random bootstrap replicates of all NJ, UPGMA and MP. Resultant trees were
rooted with threonine dehydrogenase protein sequences (macrofamily I). Grey pins mark the boundaries of protein families (Y-ADH family and
CADH family); yellow-capped pins mark the boundaries of protein macrofamilies. Sequence names are indicated with a SwissProt-like identifier
(Gene_organism), followed by the accession number assignated by the database (GenBank, PIR, TrEMBL, etc.; only sequence names reported by
the nonredundant SWISSPROT database were used directly).

composed of 12 subfamilies (Table 3). Their characterized


Macrofamily I: ADH family (COG1062): class III ADH
members contain zinc, show dehydrogenase or reductase
and related subfamilies
activities, bind NAD(H), except secondary ADHs that use
NADP(H), and are cytosolic proteins, with the exception This family includes classical ADHs from animals and
of the bi-domain oxidoreductase subfamily (BDOR), plants. ADH family comprises seven subfamilies absent
which appears to be represented by transmembrane in archaea (Table 4). Only one subfamily has a broad
proteins. They are organized as homotetramers or distribution: class III ADH, which is present in animals,
homodimers that are involved in several metabolic roles, plants, fungi and bacteria (cyanobacteria and proteo-
but only two correspond to anabolic activities: BDOR, bacteria). Proteins belonging to these subfamilies are
involved in exopolysaccharide biosynthesis, and 2-desace- cytoplasmic, although class III ADHs in animals are also
tyl-2-hydroxyethyl bacteriochlorophyllide-a dehydrogenase nuclear [48]. They contain zinc, bind NAD(H), except
subfamily (BCHC), in bacteriochlorophyll-a biosynthesis animal ADH8 from Rana perezi that uses NADP(H)
in proteobacteria. Remaining enzymes in PDH family [49,50], and show dehydrogenase or reductase activities,
show catabolic activities related either to aryl/alkyl with the exception of hydroxynitrile lyase (HNL) in
metabolism (FDEH, secondary ADH, and BDH), for- plants. They are homodimers and only mycothiol-depend-
maldehyde metabolism (FADH, formaldehyde dismutase), ent formaldehyde dehydrogenase is atypically reported as
carbohydrate catabolism (DHSO, SORE, GATD, and a homotrimer [51–53].
archaea GDH), and threonine and derivative compound With the exception of HNL, involved in cyanogenesis
catabolism (TDH and SSP). Five subfamilies have in plants, all enzymatic activities fulfilled by the MDR
polyphyletic distribution and simultaneously exist in at subfamilies in the ADH family are catabolic activities
least two domains (eukaryota and bacteria, or archaea related either to aryl/alkyl metabolism (benzyl ADH,
and bacteria). Of these five subfamilies, four include firmicute aryl/alkyl ADH), or formaldehyde metabolism
tetrameric proteins and three are present in archaea. (class III ADH, mycothiol-dependent FADH). It is likely
3316 H. Riveros-Rosas et al. (Eur. J. Biochem. 270) Ó FEBS 2003

that the function of plant and animal ADHs, although several endogenous substrates metabolized by this com-
typically associated with ethanol metabolism, is more plex of enzymatic forms with an efficiency at least one
complex, in that these comprise an intricate system with a thousand times higher than that of ethanol [15]. A similar
broad diversity of enzymatic forms. The animal ADH history probably occurred in plants. Plant ADHs comprise
subfamily, in addition to ethanol oxidation, participates in a complex subfamily with numerous enzymatic forms
oxidation or reduction of diverse endogenous substrates expressed in a developmental and tissue-specific manner; it
involved in retinoic acid and bile acid synthesis, norepi- was suggested recently that these participate in flooding
nephrine, leukotriene, serotonin, and dopamine catabol- tolerance, anther development, fruit ripening, disease
ism, or in detoxification of cytotoxic products of resistance, and stress response (reviewed in [55]).
lipoperoxidation such as 4-hydroxynonenal (reviewed in
[15]). Thus, it is difficult to accept that this complex
Macrofamily II: CADH family (COG1064): ELI3, CADH
enzymatic system with its broad diversity of enzymatic
and related subfamilies
forms and substrates (up to eight ADH classes in
vertebrates) [49,54] was produced in the course of The CADH family comprises two subfamilies; only one
vertebrate evolution with the sole purpose of oxidizing shows a broad distribution (Table 5). Their members are
ethanol, an exogenous metabolite found in minimal oxidoreductases and use zinc. All are dimeric proteins and
quantities under regular conditions: in fact, there are bind NADP(H), except ELI3 in celery. Enzymes in the

Table 2. Comparison of the protein families included within MDR superfamily according to COG database, Nordling et al. [14], and the three
macrofamilies or main clusters of protein subfamilies identified in this work. The distribution of MDR subfamilies inside each protein family is
indicated, as well as their distribution into eukaryota, bacteria, and archaea domain.
Ó FEBS 2003 MDR superfamily (Eur. J. Biochem. 270) 3317

Table 2. (Continued).

1
This family was formerly denominated by Nordling et al. [14] as the mitochondrial respiratory function proteins (MRF) family. 2 This
subfamily is probably comprised by two or more paralogous related groups. 3 Nordling et al. [14] named inappropriately this family as
acyl-CoA reductase (ACR).

CADH subfamily perform anabolic functions and partici-


Macrofamily II: Y-ADH family (COG1064): yeast ADH,
pate in biosynthesis of cinnamyl alcohols, the monomeric
and related subfamilies
precursors of lignin in plants. In bacteria, in which lignin is
absent, CADH-related proteins participate in biosynthesis The Y-ADH family comprises four subfamilies; two
of the lipids composing the bacterial cell envelope; in fungi, show broad distribution (Table 5). Their members are
they could participate in ligninolysis and fusel alcohol oxidoreductases and use zinc. This family contains
synthesis pathways [56,57]. tetrameric proteins that use NAD(H) and have catabolic
Elicitor-inducible defense-related proteins (ELI3) are functions, involved mainly in metabolism of ethanol or
present only in eudicot plants, and show different, but short-chain alcohols (typical yeast ADH, broad ADH,
related, defense activities: CADH, benzyl alcohol dehy- and fungal-secondary ADH), or metabolism of mann-
drogenase, or mannitol dehydrogenase. ELI3 expression is itol (fungal MTD). The most ancient subfamily is
elicited by fungal pathogens [58], wounds [59], salicylic acid probably the broad ADH; it is present in archaea
[60], and leaf senescence [61]. In celery, there is down- and bacteria, and its members exhibit broad substrate
regulation by sugars or salt stress [62–64]. specificity.
3318 H. Riveros-Rosas et al. (Eur. J. Biochem. 270) Ó FEBS 2003

Table 3. Main subfamilies that comprise the PDH family of MDR (COG1063) and their occurrence in eukaryota, archaea and bacteria.
d
Subfamily/main characteristics Eukaryota Archaea/Bacteria
a
DHSO (sorbitol dehydrogenase)
Homotetramer Animals Firmicutes
NAD+/NADH Plants Proteobacteria (c subdivision)
1 Zn2+/subunit Fungi Proteobacteria (a subdivision)
Cytoplasm
BDH (2,3-butanediol dehydrogenase)
Homodimer Fungi Firmicutes
NAD+/NADH Proteobacteria (c subdivision)
2 Zn2+/subunit (putative) Proteobacteria (b subdivision)
Cytoplasm
TDH (threonine dehydrogenase)
Homotetramer – Euryarchaeota
1 Zn2+/subunit (2 Zn2+/subunit?) Firmicutes
NAD+/NADH Proteobacteria (c subdivision)
Cytoplasm Proteobacteria (a subdivision)
Thermus/Deinococcus group
BCHC (2-desacetyl-2-hydroxyethyl bacteriochlorophyllide a dehydrogenase)
Unpurified protein, characterized by genetic – Proteobacteria (a subdivision)
analysis only Proteobacteria (b subdivision)
SORE (L-sorbose-1-phosphate reductase)
Homodimer – Proteobacteria (c subdivision)
Use both NAD+/NADH and NADP+/NADPH
Requires an activating divalent metal (Zn2+)
Secondary ADH
Homotetramer Protista: Firmicutes
NADP/NADPH Entamobidae Proteobacteria (c subdivision)
1 Zn2+/subunit (only catalytic) Proteobacteria (b subdivision)
Cytoplasm
GATD (galactitol 1-phosphate dehydrogenase)
Homodimer – Proteobacteria (c subdivision)
NAD+/NADH
Require divalent cations for activity and stability
Cytoplasm
SSP and related (sensing starvation protein)
Unpurified protein Firmicutes
Catabolic enzyme that suppress induction of rpoS Proteobacteria (c subdivision)
expression at starvation or stationary phase Thermotogales
FDEH (5-exo-hydroxycamphor dehydrogenase)
Homodimer
NAD/NADH – Proteobacteria (c subdivision)
2 Zn2+ (putative) Thermotogales
BDOR (bi-domain oxidoreductase) b
Unpurified protein Firmicutes
Probable transmembrane protein Proteobacteria (b subdivision)
Proteobacteria (c subdivision)
Archaea GDH (glucose dehydrogenase)
Homotetramer (Sulfolobus: crenarchaeota) Euryarchaeota
Homodimer (Haloferax: euryarchaeota) Crenarchaeota
Both NAD+/NADH and NADP+/NADPH
2 Zn2+/subunit
Ó FEBS 2003 MDR superfamily (Eur. J. Biochem. 270) 3319

Table 3. (Continued).
d
Subfamily/main characteristics Eukaryota Archaea/Bacteria

FADH (formaldehyde dehydrogenase-independent


of cofactor-/formaldehyde dismutase)
Homotetramer – Euryarchaeota
NAD+/NADH Firmicutes
2 Zn2+/subunit Proteobacteria (c subdivision)
Proteobacteria (b subdivision)
Thermus/Deinococcus group

a
The members of this subfamily receive the official name of L-iditol 2-dehydrogenase, and possess alternative names as glucitol dehy-
drogenase, xylitol dehydrogenase or polyol dehydrogenase, in addition to sorbitol dehydrogenase. This subfamily catalyzes the reversible
oxidation of D-sorbitol and other polyalcohols, like xylitol and L-iditol, to the corresponding keto-sugars [149–152]. b N-terminus is similar
to diverse DHSO; C-terminus is probably an NAD(P)H oxidoreductase, which belongs to the GFO_IDH_MocA family. It is related to
synthesis of exopolysaccharides. c Two enzymes have been purified, and characterized: formaldehyde dehydrogenase from Pseudomonas
putida, and formaldehyde dismutase also from Pseudomonas putida. However, recently Oppenheimer et al., demonstrate that formaldehyde
dehydrogenase from P. putida is a functional alcohol dehydrogenase that conducts the efficient dismutation of wide range of aldehydes
(including formaldehyde), where NADH production represents a pH-dependent burst. Thus, both enzymes can be considerated as for-
maldehyde dismutases. d For bacteria and archaea, only sequences that can be unambiguously assigned to one subfamily are considered in
the table. References are included on Table S2 of supplementary material.

complex is translocated to the nucleus by a piggyback


Macrofamily III: QOR family (COG0604): QOR
mechanism, where they act as transcription factors.
and related subfamilies
Although fungi and bacteria lack nuclear receptors, in
Members of this family lack zinc and use mainly NADP(H) Saccharomyces cerevisiae, MRF1_YEAST (P38071), a
as cofactor. It is the most complex and divergent family, single-stranded DNA-binding protein, has acquired the
with 16 subfamilies (Table 6). Twelve subfamilies are found activity of a transcription factor [66,67]. Indeed, it is a
in only one taxon, suggesting intensive and recent enzymo- transcriptional regulatory protein of certain genes whose
genesis. In functional and structural terms, this is a highly products are necessary for the functional assembly of
divergent family and their members, in addition to oxido- mitochondrial respiratory proteins. In bacteria, uncharac-
reductase activity, act as lyases, nuclear-associated proteins, terized related proteins are reported in Corynebacterium
membrane traffic proteins (that participate in subcellular glutamicum and Xanthomonas campestris. Thus, it is likely
protein distribution), and integral membrane proteins with that in the course of evolution, NRBP1 acquired a new
ATPase activity and calcium-binding capacity. This family function to work with nuclear receptors. This family
is nearly absent in archaea; only Halobacterium sp. and appears to be evolved from members of QOR family
Sulfolobus sulfataricus have proteins related to CCARs. It is (COG 0604).
likely that CCAR and related proteins are the most ancient
subfamily of macrofamily III, because they have the widest
Macrofamily III: LTD family (COG2130): LTD/AADH
distribution (archaea, bacteria, and eukaryota) and because
and related subfamilies
it is the only subfamily with a physiologic role related to
primary metabolic pathways. This is a small family with only three subfamilies (Table 7).
Members lack zinc and have a preference for NADP(H)
over NAD(H). Two subfamilies are found in only one
Macrofamily III: NRBP family (COG0604): NRBP1
taxon: leukotriene B4 12-hydroxydehydrogenase (LTD)/
subfamily and related
15-oxoprostaglandin 13-reductase (PGR), found in animals
This small family comprises only nuclear receptor binding and allyl alcohol dehydrogenase (AADH), found in plants.
protein 1 (NRBP1) and related subfamily (Table 6). It has Both subfamilies clearly have their origin in an uncharac-
broad distribution, and is present in animals, plants, fungi terized protein subfamily (LTD/AADH related) with broad
and bacteria. Their members are homodimers, with both distribution. This protein family is closely related to QOR
nuclear and cytosolic location. This family was formerly Family COG0604 (Figs 2 and 3).
designated by Nordling et al. [14] as the mitochondrial
respiratory function proteins (MRF) family; however, this
Macrofamily III: ER family (COG3321): enoyl reductases
name is unfortunate in that members of this family probably
do not have enzymatic activity. In animals these proteins are This family contains four related subfamilies comprising
nuclear receptor co-operators; in the cytosol, in presence of multifunctional polypeptides that enclose a MDR domain
the appropriate ligand, they interact with several nuclear with ER activity (Table 8). ER domains in MDR enzymes
hormone receptors, such as peroxisome proliferator-activa- use NADP(H) and lack zinc. These subfamilies show
ted receptor a, thyroid hormone receptor, retinoic acid limited distribution and are involved in biosynthesis of fatty
receptor, retinoid-X receptor, and hepatocyte nuclear acids and polyketides. Nordling et al. [14] inappropriately
factor-4 [65]. Later, NRBP1-activated nuclear receptor named this family as acyl-CoA reductase (ACR). As they
3320 H. Riveros-Rosas et al. (Eur. J. Biochem. 270) Ó FEBS 2003

Table 4. Main subfamilies that comprise the ADH family of MDR (COG1062) and their occurrence in eukaryota, archaea and bacteria.

Subfamily/main characteristics Eukaryota Archaea/Bacteria

Aryl/Alkyl ADH: Firmicutes a


Unpurified protein; characterized by genetic analysis only – Firmicutes
Benzyl ADH b
Homodimer (Pseudomonas putida) – Proteobacteria (c subdivision)
Homotetramer (Acinetobacter calcoaceticus) Proteobacteria (a subdivision)
2 Zn2+/subunit Firmicutes
NAD+/NADH
Cytoplasm
HNL (Hydroxynitrile lyase: acetone cyanohydrin lyase)
Homodimer Plants (derived from –
(not an oxidoreductase) plant-/class III ADH)
2 Zn2+/subunit
Cytoplasm
FADH: mycothiol-dependent (formaldehyde dehydrogenase
dependent on mycothiol)
Homotrimer – Firmicutes
NAD+/NADH
2 Zn2+/subunit
Cytoplasm
Class III ADH (formaldehyde dehydrogenase
dependent on glutathione)
Homodimer (Eukaryota; Cyanobacteria Animals Cyanobacteria
and Proteobacteria) Fungi Proteobacteria (c subdivision)
Homotetramer (Paracoccus: Proteobacteria a) Plants Proteobacteria (b subdivision)
NAD+/NADH Proteobacteria (a subdivision)
2 Zn2+/subunit
Cytoplasm (all) and nucleus (animals)
Animal ADH c
Homodimer d Animals –
NAD+/NADH e (derived from animal class III)
2 Zn2+/subunit
Cytoplasm
Plant ADH
Dimer Plants –
NAD+/NADH (derived from plant class III)
2 Zn2+/subunit
Cytoplasm

a
This belongs to a highly conserved gene cluster encoding haloalkane catabolism on the plasmid Prtl1. b This shows affinity for a wide range
of (substituted) aromatic alcohols, but are not capable of oxidizing aliphatic alcohols. c This subfamily comprises eight different classes
involved besides ethanol metabolism, on the synthesis and catabolism of several endogenous metabolites that regulate growth, metabolism,
differentiation, and neuroendocrine functions [15,50,54]. d Some animal ADH are also heterodimers (e.g., isozymes from human class I
ADH). e Only class VIII ADH from Rana perezi uses NADP(H) rather than NAD(H) [49,50]. See final note (d) in Table 3.

identified correctly the enoyl-acyl carrier protein (ACP) Animal fatty acid synthases are closer to fungal iterative
reductase domain contained in multifunctional fatty acid polyketide synthases than to any other fatty acid synthases
synthase from animals, or enoyl-ACP reductase domain from fungi, plant, or bacteria. The latter kingdoms possess
from iterative polyketide synthase in fungi, the generic name one ER that does not belong to the MDRs. As can be seen
enoyl reductase is preferable. The enzyme ACR is absent in in Figs 2 and 3, this protein family is also closely related to
fatty acid synthase; this latter multidomain enzyme uses QOR Family (COG0604).
ACP as carrier for intermediates, not coenzyme A. ACR is
usually a membrane-bound enzyme involved in the biosyn-
thesis of fatty alcohols and waxes, and it is clearly a different
Discussion
enzyme that does not belong to the MDR superfamily We will focus our discussion on five topics: criteria used to
[68,69]. define a protein family; mechanisms of evolution in MDR;
Ó FEBS 2003 MDR superfamily (Eur. J. Biochem. 270) 3321

Table 5. Main subfamilies that comprise the CADH family and Y-ADH family of MDR (COG1064) and their occurrence in eukaryota, archaea, and
bacteria.

Subfamily/main characteristics Eukaryota Archaea/Bacteria

CADH FAMILY
CADH and related (cinnamyl alcohol dehydrogenase) a
Homodimer Plants (tracheophytes) Firmicutes
NADP+/NADPH Fungi Proteobacteria (c subdivision)
Protista: Euglenozoa Proteobacteria (e subdivision)
Cyanobacteria
b
ELI3 (elicitor-inducible defense-related proteins)
Homodimer Plants –
Monomer (celery) (Eudicots: derived from CADH)
NADP+/NADPH
NAD+/NADH (in celery)
Y-ADH FAMILY
Yeast ADH and related
Homotetramer Fungi Proteobacteria (c subdivision)
NAD+/NADH Animals Proteobacteria (a subdivision)
2 Zn2+/subunit Proteobacteria (b subdivision)
Cytoplasm and mitochondria Firmicutes
Fungi MTD (mannitol-1-phosphate dehydrogenase)
Homotetramer Fungi –
NAD+/NADH (derived from yeast ADH)
2 Zn2+/subunit
Cytosol
Fungi secondary ADH
Homotetramer Fungi –
NAD+/NADH (derived from yeast ADH)
2 Zn2+/subunit (putative)
Cytosol
c
Broad ADH (broad substrate specificity ADH)
Homotetramer – Crenarchaeota
NAD/NADH Firmicutes
2 Zn2+/subunit Proteobacteria (c subdivision)
Cytosol

See final note (d) in Table 3. a Induced by several elicitors, such as pathogens, ozone, and wounding. b Proteins described with different
activities: cinnamyl alcohol dehydrogenase, benzyl alcohol dehydrogenase, or mannitol dehydrogenase. Induced by fungal pathogens,
wound, salicylic acid, and leaf senescence; shows a down-regulation by sugar or salt stress. c Shows broad substrate specificity; carbon source
stimulated.

whether eukaryota inherited their enzymatic machinery existence of several structurally related proteins with high
mainly from bacteria; ancestral activities of MDR; and identity or similarity, but different functional roles [75].
taxonomy within MDR superfamily. These proteins (closely related paralogous, but with a
different mechanism of reaction and/or substrates) might
even show higher similarity than the most distant phylo-
Criteria used to define a protein family: sequence over
genetic derivatives in the same protein family (true ortho-
functional similarities
logous) with the same activity, substrates, and mechanism
Generally, the term protein family describes a group of of reaction. For example, identity and similarity between
homologous (frequently orthologous) enzymes that catalyse plant ADHs and class III ADHs from plants (paralogous
the same reaction (mechanism and substrate specificity) proteins with different substrates) are higher than iden-
[47]. However, in addition to their primary activities, tity and similarity between class III ADHs from plant
enzymes often have other secondary activities with lower and bacteria; albeit both orthologous proteins have the
efficiency and different substrates and mechanism of same activity, substrates, and mechanism of reaction
reaction [70]. For example, horse ADH also exhibits [indeed, identity between ADH1_MAIZE (P00333) and
aldehyde dismutase [71,72] and esterase activities [73]; yeast ADHX_MAIZE (P93629) (paralogous proteins) is 59%,
ADH additionally shows methylformate synthase activity but identity between ADHX_MAIZE (P93629) and
[74]. Therefore, it is clear that through evolution several FADH_PARDE (P45382) (orthologous proteins) is 55%].
proteins acquired, with only a few point mutations, activities Based on this type of data, it is clear that several proteins
that differed from the primary activity [46]. This implies the exhibit significant similarity (>30–40% identity), but have
3322 H. Riveros-Rosas et al. (Eur. J. Biochem. 270) Ó FEBS 2003

Table 6. Main subfamilies that comprise the QOR family and NRBP family of MDR (COG0604) and their occurrence in eukaryota, archaea and
bacteria.

Subfamily/main characteristics Eukaryota Archaea/Bacteria

NRBP FAMILY
a
NRBP1 (nuclear receptor binding protein/transcription factor)
Homodimer Animals Firmicutes
Transcription factor (yeast) Plants
Nuclear receptor co-operator (animals) Fungi
Nucleus (fungi and animals) and cytosol (animals)
QOR FAMILY
b
f-crystallin/QOR (quinone oxidoreductase)
Taxon-specific lens crystallin Animals Firmicutes
Homotetramer
NADP+/NADPH
Lack Zn2+
PIG3 and related (animal P53 Induced Gen 3: putative quinone oxidoreductase) c
Unpurified protein; characterized by genetic analysis only. Animals Firmicutes
Cytoplasm Plants Proteobacteria (a subdivision)
Protozoa: Euglenozoa
TED2 and related (quinone oxidoreductase involved in Tracheary Element Differentiation in plants)
Homodimer (c Proteobacteria: E. coli) Plants Proteobacteria (c subdivision)
Both NAD+/NADH and NADP+/NADPH (E. coli) Protozoa: Euglenozoa Proteobacteria (a subdivision)
Lack Zn2+ Fungi Firmicutes
Cytoplasm
d
Bifunctional QOR and related
Monomer (Euglenozoa) Plants –
NADP+/NADPH Protozoa: Euglenozoa
Lack Zn2+
Cytoplasm
VAT1 e
Localized in the synaptic membranes, as an integral membrane protein Animals –
f
pER in actinomycetes (probable enoyl reductase in actinomycetes)
Unpurified protein; characterized by genetic analysis only. – Firmicutes
g
PKS-IAP (polyketide synthase-independent asociated proteins)
Unpurified protein; characterized by genetic analysis only. Fungi –
Heterodimers?
h
QORL-1 (quinone oxidoreductase-like 1)
Unpurified proteins. Animals –
i
DINAP (dinoflagellate nuclear associated protein)
Unpurified protein Protozoa: –
Nucleus Alveolata, dinophyceae
ARP (auxin regulated protein) j
Unpurified protein; characterized by genetic analysis only Plants –
k
DI-QOR (dark induced-quinone oxidoreductase)
Unpurified protein; characterized by genetic analysis only Plants –
DI-QOR/ARP related
Unpurified protein; uncharacterized Fungi –
AL (alginate lyase)
This protein is not an oxidoreductase – Proteobacteria (c subdivision)
Does not require either NAD+/NADH or NADP+/NADPH.
Cytosol
AST (membrane traffic protein)
Unpurified protein; characterized by genetic analysis only Fungi –
Plasma membrane-associated
Ó FEBS 2003 MDR superfamily (Eur. J. Biochem. 270) 3323

Table 6. (Continued).

Subfamily/main characteristics Eukaryota Archaea/Bacteria


l
BRP (bacteriocin-related proteins)
Unpurified proteins – Firmicutes
CCAR (crotonyl-CoA reductase) and related
Homodimer Fungi Euryarchaeota
NADP+/NADPH Firmicutes
Proteobacteria (c subdivision)
Proteobacteria (a subdivision)

See final note (d) in Table 3. a In animals, NRBP1 is translocated to the nucleus by a piggyback mechanism. In rat, it interacts with
peroxisome proliferator-activated receptor a, PPARa; thyroid hormone receptor, TR; retinoic acid receptor, RAR; retinoid-X receptor,
RXR, and hepatocyte nuclear factor-4, HNF-4. Fungi lack nuclear receptors; in yeast, it is a single-stranded DNA-binding protein that
fulfills a role as transcription factor. b Several activities for f-crystallin/QOR have been reported, however, the relative importance of any
remains an enigma. Nevertheless, all f-crystallin retain NADPH binding capacity as a common character. c PIG3 in humans seems to be a
redox-related protein involved in the formation of reactive oxygen species in response to p53-induced apoptosis. d Bifunctional protein in
plants; monofunctional protein in Euglenozoa. In plants, it is a defense protein whose synthesis is activated as response to pathogen-
inoculation. In Euglenozoa, its functional role is not resolved. e VAT-1 forms a high-molecular-mass complex within the synaptic vesicle
membrane, and is composed of three or four VAT-1 subunits, displays an ATPase activity, and binds calcium with low affinity. f Probable
monofunctional enoyl reductase involved in biosynthesis of actinomycete aromatic polyketides in a multicomponent (type II) polyketide
synthase complex. g Monofunctional enoyl reductase associated to iterative multidomain type I polyketide synthase. h Expressed mainly in
heart, brain, and skeletal muscle, and moderately expressed in placenta, kidney, and pancreas. i Dinap1 protein is one of the quantitatively
major nuclear proteins in the dinoflagellate Crypthecodinium cohnii. Although Dinap1 did not bind directly to DNA, it activated basal
transcription activity. j Protein highly expressed during fruit-ripening, or induced in response to auxin treatment. k These proteins are
expressed in plant roots, where light-induced a negative regulation. They are involved in biosynthesis of antimicrobial or allelopathic
quinines. l They are included inside plasmids that contain a bacteriocin production region.

Table 7. Main subfamilies that comprise the LTD family of MDR (COG2130) and their occurrence in eukaryota, archaea and bacteria.

Subfamily/main characteristics Eukaryota Archaea/Bacteria


a
LTD (Leukotriene B4 12-hydroxydehydrogenase)/PGR (15 oxoprostaglandin 13-reductase)
Monomer Animals –
Preference for NADP+/NADPH over
NAD+/NADH
Cytoplasm
b
AADH (allyl alcohol dehydrogenase)
Homodimer Plants –
NADP+/NADPH
Cytoplasm (probably)
LTD/AADH related c
Uncharacterized proteins Fungi Euryarchaeota
Animal? Firmicutes
Proteobacteria (c subdivision)

See final note (d) in Table 3. a This subfamily in animals corresponds to proteins with two different activities, indicating that enzymes are
capable of carrying out reduction of a double bond, as well as oxidation of a hydroxy group. b Enzymes efficient for dehydrogenation of
secondary allylic alcohols and reduction of azodicarbonyl compounds and quinones. Induced by various oxidative-stress treatments.
c
Bacterial and archaea proteins show 40.2 ± 2.5% (SD, n ¼ 36) average identity with animal LHD family, and a 39.6 ± 2.4% (SD, n ¼ 36)
with plant AADH family.

different functional roles. Therefore, sequence data alone catalyses oxidation of a hydroxyl group, and 15-oxopro-
cannot be used as sole criterium to define protein families, staglandin 13-reductase, which carries out reduction of a
because without functional data, orthologous and para- double bond [76]. In contrast, there are several examples
logous groups cannot be accurately identified. where the same function can be fulfilled by several
On the other hand, the protein function cannot be the nonrelated proteins with distinct domains, conforming
main criterium used to define a protein family because one analogous enzymes [75,77,78]. The MDR and the short-
domain might have several catalytic activities. In fact, LTD chain dehydrogenase/reductase (SDR) superfamilies con-
subfamily shows two different and equally efficient catalytic tain several analogous enzymes. Thus, the SDR superfamily
activities: leukotriene B4 12-hydroxydehydrogenase, which contains an analogous alcohol dehydrogenase found in
3324 H. Riveros-Rosas et al. (Eur. J. Biochem. 270) Ó FEBS 2003

Table 8. Main subfamilies that comprise the ER family of MDR (COG3321) and their occurrence in eukaryota, archaea and bacteria.

Subfamily/main characteristics Eukaryota Archaea/Bacteria


a
Enoyl reductase (Fatty acid synthase -FAS-)
Homodimer Animal –
NADP+/NADPH
Cytoplasm
b
Enoyl reductase (modular polyketide synthase -PKS-)
Homodimer Firmicutes
NADP+/NADPH Proteobacteria (d subdivision)
Cytoplasm Proteobacteria (c subdivision)
Proteobacteria (a subdivision)
c
Enoyl reductase (iterative polyketide synthase -PKS-)
Heterodimer Fungi (PKS) –
NADP+/NADPH (by similarity to modular
PKS and FAS)
Cytoplasm
d
ER-FAS: alveolata (enoyl reductase from type I fatty acid synthase in alveolata)
Homodimer? Protozoa: alveolata –
NADP+/NADPH (by similarity to modular
PKS and FAS)
Cytoplasm

See final note (d) in Table 3. a This enoyl reductase domain belongs to a multifunctional polypeptide of approximately 2500 aa that contains
seven enzymatic domains. b This enoyl reductase domain belongs to a multifunctional polypeptide with modular organization where each
module designates a repeated unit whose functional domains resemble a single type I fatty acid synthase. c This enoyl reductase domain
belongs to a multifunctional polypeptide whose functional domains resemble a single type I fatty acid synthase. In fungi, PKS is involved in
mycotoxin biosynthesis. d This enoyl reductase domain belongs to a multifunctional polypeptide of 8243 aa that contains 21 enzymatic
domains in Cryptosporidium parvum. Three ER domains are organized inside three modules, each containing a complete set of six enzymes
for elongation of fatty acid C2-units (i.e., one ER/module).

Drosophila [79], a glucose dehydrogenase from Bacillus with other biological criteria different from function, such as
[80,81], an ER from bacteria and plants [82–84], a sorbitol phylogenetic data, since minor changes in amino acid
dehydrogenase from Klebsiella [85], and a threonine dehy- sequence may induce changes of function.
drogenase in animals [86]. These enzymes represent different
protein structure solutions to the same activities observed in
Mechanisms of evolution in MDR superfamily
MDRs.
In summary, phylogenetic data can not be overlooked as Enzymogenesis. Currently, two different evolutionary sce-
a criterium for identification of a protein family. All families narios are envisioned for enzyme evolution [88]. New
recognized inside the MDR superfamily are made up of catalytic functions of enzymes can evolve by: (a) changing
clusters of phylogenetically related paralogous proteins, the chemistry of catalysis, while retaining the binding
which may or may not conserve their original substrates or capacity for a common ligand (hypothesis initially proposed
mechanisms of reaction. All paralogous proteins are by Horowitz [89]) or (b) retaining the chemistry of catalysis
generated by duplication events, and initially possess the while changing the substrate specificity. Interestingly, we
same function; selective pressures and evolutive forces shape found several enzymes of the MDR superfamily that
the functional role that duplicated proteins will perform. A conserved their chemistry of catalysis, but changed their
change in the functional role of a protein is not necessarily substrate specificity, e.g. plant ADH and animal ADH
related to a change in substrates or mechanism of reaction. subfamilies that evolved both from class III ADH sub-
Recruitment of a duplicated protein into a different family; or secondary ADH from fungi and mannitol-1-
metabolic pathway, a different physiological role, or even phosphate dehydrogenase from fungi (Fungi MTD), that
a change in the spatiotemporal pattern of expression, evolved both from yeast ADH subfamily. In contrast, we
expressing a protein in novel tissues and/or developmental could not find two related enzymes of MDR superfamily
stages [87], could be a good evolutionary reason to conserve that maintained their binding capacity for a common
the duplicated protein, and result in a novel paralogous ligand, but with modification in their chemistry of catalysis.
protein with a different functional role. This possibility, described as retrograde evolution or
Therefore, we propose that the condition of performing substrate-driven evolution, suggests that metabolic path-
the same function (with one, two, or more catalytic ways evolved in a backward manner, i.e. divergent members
activities) must be assigned solely at a more specific (or of the same protein family catalyse successive reactions
restricted) taxonomic level, such as at the subfamily level inside a metabolic pathway. To our knowledge, only a few
(employed in this work). A protein family must be defined examples have been reliably identified to date: two pairs of
based mainly on sequence similarities, but in conjunction enzymes in tryptophan and histidine biosynthesis [47,88].
Ó FEBS 2003 MDR superfamily (Eur. J. Biochem. 270) 3325

Table 9. Number of MDR members in organisms with complete genome sequences. Number of protein coding genes in each genome were taken from
NCBI (http://www.ncbi.nlm.nih.gov), except human [153,154], and fruitfly (http://www.fruitfly.org).

Number PDH ADH CADH & QOR & LTD ER


of protein Family Family Y-ADH Family NRBP Family Family Family
Organism coding genes [COG 1063] [COG 1062] [COG 1064] [COG 0604] [COG 2130] [COG 3321]

Archaea
Euryarchaeota
Archaeoglobus fulgidus 2407 1 – – – – –
Methanobacterium 1869 – – – – – –
thermoautotrophicum
Methanococcus jannaschii 1715 – – – – – –
Pyrococcus abyssi 1765 1 – – – – –
Pyrococcus horikoshii 2064 1 – – – – –
Halobacterium sp. NRC-1 2630 3 – – 1 1 –
Crenarchaeota
Aeropyrum pernix 2694 1 – 2 – – –
Bacteria
Thermotogales
Thermotoga maritima 1846 3 – – – – –
Spirochaetales
Borrelia burgdorferi 850 – – – – – –
Treponema pallidum 1031 – – – – – –
Thermus/Deinococcus group
Deinococcus radiodurans 2937 3 – – 2 – –
Chlamydiales
Chlamydia muridarum 818 – – – – – –
Chlamydia trachomatis 894 – – – – – –
Chlamydia pneumoniae 1052–1110 – – – – – –
Proteobacteria; gamma subdivision
Buchnera sp. 564 – – – – – –
Vibrio cholerae 3828 1 – – – – –
Escherichia coli 4289 11 2 4 2 1 –
Haemophilus influenzae 1709 1 1 – – – –
Pseudomonas aeruginosa 5565 5 1 2 4 2 –
Xylella fastidiosa 2766 1 – 4 1 – –
Proteobacteria; alpha subdivision
Rickettsia prowazekii 834 – – – – – –
Proteobacteria; beta subdivision
Neisseria meningitidis 2025–2121 2 1 1 – – –
Proteobacteria; epsilon subdivision
Campylobacter jejuni 1654 – – 1 – – –
Helicobacter pylori 1491–1553 – – 1–2 – – –
Firmicutes (Gram positives)
Bacillus subtilis 4100 6 – 1 2 1 –
Bacillus halodurans 4066 4 – 1 3 – –
Mycoplasma genitalium 480 – – – – – –
Mycoplasma pneumoniae 677 1 – – – – –
Ureaplasma urealyticum 611 – – – – – –
Actinobacteria
Mycobacterium tuberculosis 3918 3 4 2 5 – 10
Cyanobacteria
Synechocystis sp 3169 – 1 1 – – –
Aquificales
Aquifex aeolicus 1522 – – – 1 – –
3326 H. Riveros-Rosas et al. (Eur. J. Biochem. 270) Ó FEBS 2003

Table 9. (Continued).

Number PDH ADH CADH & QOR & LTD ER


of protein Family Family Y-ADH Family NRBP Family Family Family
Organism coding genes [COG 1063] [COG 1062] [COG 1064] [COG 0604] [COG 2130] [COG 3321]

Eukaryota
Fungi
Saccharomyces cerevisiae 6297 5 1 6 8 1 –
Plant
Arabidopsis thaliana 27707 1 4 9 5 5 –
Animal
Drosophila melanogaster 13601 3 1 1 1 – 3
Caenorhabditis elegans 20238 2 1 4 3 1 1
Homo sapiens 42 000–48 000 1 7 – 8 1 1

The data presented in our manuscript enlarge perspec- new genes containing two or more domains with novel
tives on protein evolution, because in addition to the activities [75,96]. Examples of this modular construction
previously mentioned mechanism of enzyme evolution, we within the MDR superfamily are as follows: bi-domain
showed that preexisting enzymes can be recruited to form oxidoreductase (BDOR) involved in biosynthesis of exo-
novel pathways in which proteins acquire new activities by polysaccharides [97]; bifunctional QOR in plants, with an
changing both their binding capacity and their chemistry of N-terminal domain related to short-chain dehydrogenase/
catalysis. This last possibility is in concordance with a novel reductase superfamily [98,99]; fatty acid synthase (FAS), a
third hypothesis, recently proposed by Gerlt & Babbit [47], multifunctional polypeptide with seven enzymatic domains
which does not require conservation of either substrate from animals [100] or alveolata (protozoa) [101]; modular
specificity or chemical mechanisms; instead, they proposed polyketide synthase from bacteria [100], and the iterative
that an active site is able to support an alternate reaction polyketide synthase from fungi [102,103]. All of them
that may use some functional groups of the active site in a possess modular architecture. In this sense, it is important to
different mechanistic and metabolic context; in this propo- mention that oligomerization is not conserved among
sal, only active site architecture is conserved. We discuss members of MDR superfamily. For example, monomers,
below one interesting example to support this third hypo- homodimers, homotrimers, homotetramers and hetero-
thesis. A divergent plant ADH with an acetone cyanohydrin dimers, are present in this superfamily, and it has been
lyase activity (P93243) has been described in flax (Linum proposed that degree of oligomerization might be involved
usitatissimum) [90–93]. This protein belongs to a novel class with changes in the functional role developed by proteins
of hydroxynitrile lyases (HNLs), and its amino acid [75,96].
sequence shows no overall homology to any cloned HNLs. Taken together, we conclude that the deep-rooted
Indeed, HNLs from plants form a heterogenous group of statements one enzyme, one function and one protein
proteins differing in molecular mass, quaternary structure, family, one function are not accurate for many enzymes.
presence or absence of flavin adenine dinucleotide, as well as Several secondary activities might exist in one protein, as in
glycosylation. They have convergently evolved from FAD- the previously mentioned animal ADH or yeast ADH
dependent oxidoreductases, a/b hydrolases, and MDRs subfamilies (see the first topic in the Discussion section), and
[94]. Interestingly, HNL from flax, is a zinc-containing this can be the point of departure to gain novel and
protein and conserves all amino acid residues important for completely different functions. Indeed, we point out the fact
structural integrity or coordinating zinc [91,92]; however, that two different and equally efficient catalytic activities
flax HNL neither displays ADH activity nor is inhibited by can be a feature of a single protein, as described for LTD/
reagents interfering with zinc coordination [91]. This PGR subfamily. This catalytic promiscuity has been recog-
information, together with the fact that flax HNL is more nized as a vital springboard from which new catalytic
related to plant-, animal- and class III ADH [93], suggest activities can emerge from existing folds and active sites
that flax HNL evolved late from a plant-/class III ADH, [70,104].
which was recruited for cyanogenesis in plants, a recent Data presented in this paper reinforce the idea that a
secondary pathway used as a defence mechanism against protein can gain or lose a function through a limited
herbivorous [95]. Existence of multiple phylogenetically number of amino acid changes, and several such examples
independent HNLs in plants supports this proposal. from natural protein evolution are shown. MDR belongs to
Therefore, this novel activity within MDR superfamily the limited number of protein superfamilies that posses both
was acquired without conservation of the original binding different mechanisms of reaction and substrate specificity
capacity and the chemistry of catalysis. In conclusion, [47,75]. Indeed, several laboratories [45,88,105] have mimi-
proteins exhibit a huge unrecognized plasticity. cked the evolution of paralog proteins in vitro, showing
Another and different alternative mechanism for enzyme generation of new catalytic or binding properties by
evolution, also observed in members of MDR superfamily modifications of a preexisting protein scaffold, and forget
corresponds to modular construction or gene fusion, in that evolution has carried out many such successful
which separate gene products join together and generate experiments.
Ó FEBS 2003 MDR superfamily (Eur. J. Biochem. 270) 3327

Proteinogenesis vs. enzymogenesis. Several subfamilies transfer of a secondary ADH from anaerobic bacteria to
within the MDR superfamily evolved as nonenzyme the protist Entamoeba histolytica [115], and the second, not
homologs, i.e. novel proteins that have lost their original previously reported, corresponds to horizontal gene transfer
catalytic activity. f-Crystallin/QOR is probably the most of an LTD/AADH-related protein from firmicutes (Gram-
well-investigated example. This protein is expressed in a positive bacteria) to the archaea Halobacterium sp. NRC-1
taxon-specific fashion in the lens of the phylogenetically (NCBI accession no. AAG19273). This latter example is
distant guinea pig, camel, and Japanese tree frog (Hyla shown in Fig. 2, where the LTD/AADH subfamily contains
japonica) [106–109], and constitutes approximately 10% of some bacterial sequences that are more related to the
total water-soluble proteins of the lens. Other examples of archaea sequence (coloured in dark blue) than to other
nonenzymes within the MDR superfamily are: (a) NRBP1 bacterial sequences within the same subfamily, obtaining a
that functions as a transcription factor in yeast [66,67], or phylogenetically discordant pattern that displays a distribu-
nuclear receptor co-operator in animals [65]; (b) dinofla- tion compatible with horizontal gene transfer. Furthermore,
gellate nuclear-associated protein (DINAP) that corres- this archaea sequence is the only sequence in which its
ponds to the quantitatively major nuclear protein in branch departs far from the centre of the unrooted tree (see
Crypthecodinium cohnii, and although DINAP did not bind Fig. 2).
directly to DNA, it activated basal transcription activity
[110,111]; and (c) the membrane traffic protein (AST) in
Is there a MDR ancestral activity?
fungi [112].
On the other hand, subcellular location is not conserved A preliminary answer to this question can be approached
across members of the MDR superfamily. Although the from several directions, but it is clear that ancestral activity
great majority are soluble cytoplasmic proteins, some of (within a protein subfamily) should be related to a primary
them are located in mitochondria (yeast ADH), and nuclei (also ancient) metabolic pathway with (an ideally) broad
(DINAP; NRBP1; class III ADH in animals), and others phylogenetic distribution. Thus, protein subfamilies with
have a membrane location (VAT-1, and probably BDOR), restricted phylogenetic distribution involved in secondary
or function as a structural protein (f-crystallin/QOR). metabolic pathways cannot be considered as ancestral
All these examples serve as a cogent reminder that Nature subfamilies.
is not restricted to chemically or substrate- conserved Glutathione-dependent formaldehyde dehydrogenase
strategies for divergent evolution; instead, divergent evolu- activity of class III ADH in ADH family (COG1062). This
tion is opportunistic and one active site architecture, can be has been proposed as the ancient activity from which both
used to develop mechanistically distinct catalytic [47] or animal and plant ADHs are derived [116]. However, this
noncatalytic functions. In other words, inside one protein activity cannot be the ancestral function for the remaining
superfamily (e.g. MDR), functional diversity is more subfamilies within the MDR superfamily, as shown by
complex than sequence diversity. several pieces of evidence. First, glutathione (GSH) does not
show the universal distribution observed for MDRs,
inasmuch as GSH is restricted to proteobacteria, cyano-
Eukaryota inherited MDR from bacteria
bacteria, and eukaryotes [117,118]. Second, in organisms in
Our analysis of MDR superfamily shows that most MDR which the mycothiol (MSH) molecule fulfils the functions of
subfamilies in eukaryota are more closely related to their GSH, as in firmicutes, formaldehyde dehydrogenase activity
counterparts in bacteria than in archaea. This supports the exists in any event, but now as a mycothiol-dependent
idea that in eukaryota, although the machinery for DNA activity. A third cofactor-independent formaldehyde dehy-
duplication, transcription, and protein synthesis is more drogenase subfamily (FADH) exists, present either in
related to archaea (informational genes), the enzymatic proteobacteria (with GSH), firmicutes (with MSH), and
machinery is more related to bacteria (operational genes) archaea (without GSH or MSH). Overall, data suggest that
[113]. This agrees with the generally accepted notion that formaldehyde dehydrogenase activity in MDRs is very
eukaryotic cells are the symbiotic result of bacteria (the ancient and predates the origin of GSH or MSH. This is
symbiont) and archaea (the host). Therefore, horizontal reasonable if we consider that formaldehyde reacts sponta-
gene transfer of operational genes had a significant role neously with GSH or MSH to form S-hydroxymethyl-
in development of metabolic pathways in eukaryotes. In glutathione or S-hydroxymethyl-mycothiol, the true
bacterial taxa, phylogenetic relationships that can be substrates for glutathione-dependent formaldehyde dehy-
established within each protein subfamily suggest a signifi- drogenase (class III ADH) or mycothiol-dependent form-
cant horizontal gene transfer. In fact, it is calculated that aldehyde dehydrogenase, respectively. Furthermore, the
nearly 20% of Escherichia coli genes were acquired by FADH subfamily also shows formaldehyde dismutase
lateral transfer events in the last 100 million years [114]. This activity and the capacity to catalyse a dismutation reaction
contrasts with the nearly complete absence of recent has been conserved in animal ADH, a subfamily derived
examples of horizontal gene transfer between species that from class III ADH. Consequently, it is probable that ADH
belong to different domains of life (eukaryota, bacteria, and family (COG1062), absent in archaea, forms a paralogous
archaea) in MDRs. Thus, although horizontal gene transfer group derived from FADH subfamily, which in turn
among bacterial taxa appears to be a recurrent event, exhibits more ample distribution than ADH family
horizontal gene transfer between bacteria and eukaryota or (COG1062).
between bacteria and archaea is a rare event (at least in Another interesting option for ancestral activity within
MDRs). Only two clear-cut examples were identified: the MDR superfamily is ER; it is necessary in one of the
first corresponds to the previously reported horizontal gene primary (and ancient?) anabolic pathways, i.e. synthesis of
3328 H. Riveros-Rosas et al. (Eur. J. Biochem. 270) Ó FEBS 2003

fatty acids. However, little evidence supports this proposal. from formaldehyde under plausible prebiotic conditions
First, archaea contain membranes with isoprenoid-based [126–128]; (b) through glycolysis, it is an energy source for
ether lipids, lacking fatty acids. Furthermore, gene(s) for living purposes; (c) it is an important metabolite in
fatty acid synthase complex (FAS), as occurs in both photosynthesis; (d) it can be used in prebiotic condensation
bacteria and eukaryotes, is (are) absent in Methanococcus reactions [129,130]; and (e) it is a source of glycerol,
jannaschii [119], as well as in other completely sequenced necessary for synthesis of glycerolipids, the precursors of
archaea genomes such as Aeropyrum pernix K1, Archaeo- biomembranes.
globus fulgidus, Methanobacterium thermoautotrophicum, Furthermore, results of Fukuchi & Otsuka [131] suggest
Pyrococcus abyssi, and P. horikoshii (in agreement with that the glycolytic stage from glyceraldehydes 3-phosphate
our BLAST results). Thus, although archaea possess some to pyruvate corresponds to one of the most ancient
members of MDR superfamily, ER activity probably catabolic pathways, because genes involved in this stage of
cannot be the ancestral activity of this superfamily because glycolysis exhibit the highest similarity to nucleotide
archaea lacks known FAS, as well as medium-chain ER. sequences of ribosomal RNA and/or transfer RNA gene
Second, different types of FASs exist and each possesses clusters, clearly predating the origin of proteic enzymes in
different and unrelated ER. Thus, the ER member of the the ancient RNA world and strongly suggesting that these
MDR superfamily is one of seven activities that comprise metabolic pathways were developed by chance assembly of
type I multifunctional fatty acid synthase in animals [100]. enzyme proteins generated from pre-existing genes. If this is
ER present in type II fatty acid synthase characteristic of true, it is clear that fermentative activity should be an early
bacteria and plants belongs to short-chain dehydrogenase/ metabolic development to sustain activity of the ancient
reductase (SDR) superfamily, not to MDR as occurs in type stage of glycolytic pathway to dispose of generated
I animal fatty acid synthase and some bacterial polyketide NAD(P)H. Alcoholic fermentation has been suggested as
synthases. Additionally, ER present in fungi (type I fatty an early pathway, considering that ethanol permeates the
acid synthase a6b6 complex) does not show significant membrane and is easily eliminated by the cell. Lactic acid
homology either to medium-chain ER or to short-chain ER fermentation should be a later development, in that lactate
(calculated with BLAST), suggesting the existence of a third is a nonpenetrant product, hence retained inside the cell to
class of ER. Indeed, the finding that multifunctional FAS be utilized to regenerate carbohydrates when autotrophic
protein exists in two distinct architectural forms, the a2 pathways became available [132]. Therefore, one ancestral
animal FAS and the a6b6 yeast FAS, with protein domains activity of the MDR superfamily is probably related to an
arranged in a different order, is compatible with the idea ancient alcoholic fermentative activity, such as actually
that FAS complexes evolved independently several times observed in some subfamilies like broad ADH (from the
and that they are a late acquisition in metabolic evolution of Y-ADH family), present in eukaryota, bacteria, and arch-
organisms, subsequent to the split of major kingdoms. aea [133,134]; these enzymes catalyse oxidation of a broad
Thus, both arguments strengthen the idea that ER is not an variety of substrates, which includes primary and secondary,
ancestral activity of the MDR superfamily. Furthermore, linear- and branched-chain, aliphatic and aromatic alcohols,
extensive similarity between each domain in FAS and in addition to several of their corresponding aldehydes and
polyketide synthase (PKS), the presence of medium-chain ketones. Moreover, theoretical studies predict that primor-
ER, and the order in which the domains are arranged in dial enzymes were nonspecific, with broad substrate speci-
these multifunctional complexes [100] suggest that animal ficity, and showing different activities characterized by slow
FAS is more closely related to PKS than to any other FAS reaction rates [120,135]. Indeed, some MDRs fulfil all these
from fungi, plants, or bacteria. In conclusion, there is no one requirements (e.g. broad ADH subfamily [133,136,137], or
member in ER family (COG3321) that can be considered as animal ADH subfamily [15,138]).
an ancestral group. Finally, we cannot disregard other activities, such as
According to heterotrophic theory, the only theory with threonine dehydrogenase (TDH) or crotonyl CoA-reduc-
experimental support to substantiate the origin of the first tase (CCAR), present both in archaea and bacteria. These
metabolic pathways [120], the most ancient catabolic activities are also probably ancient. TDH is involved in
activities should be semienzymatic fermentative routes fed amino acid metabolism, and CCAR in benzoate catabolism,
by stable and available prebiotic compounds. Thus, glyco- acetate assimilation, and interestingly, in the supply of
lysis, proposed as the first catabolic route [121], should precursors for polyketides biosynthesis [139]. In animals,
have been preceded by simpler versions. The upper part of TDH initiates a minor degradative pathway [140], and the
glycolysis, from hexoses to trioses, appeared as a late enzyme does not belong to the MDR superfamily. It is a
adaptation because glucose 6-phosphate and aldopentoses small subfamily whose distribution is restricted to animals,
are unlikely prebiotic compounds due to rapid decompo- and was recruited from short-chain dehydrogenase/reduc-
sition on a geological timescale [122]. Additionally, the step tase superfamily (bacterial UDP-glucose 4-epimerase,
from glucose to glyceraldehyde 3-phosphate is not a according to our BLAST analysis). On the other hand, the
universal pathway; it is absent in archaea, while there are supply of precursors for fatty acid synthesis in bacteria and
other alternatives to transform glucose into triose deriva- eukaryota is provided by acetyl-CoA carboxylase, an
tives [123–125]. On the other hand, the lower part of ancient enzyme also present in archaea. This suggests that
glycolysis, from glyceraldehyde 3-phosphate to pyruvate is the origin of acetyl-CoA carboxylase predates that of fatty
universally conserved, and glyceraldehyde is one of the acid synthesis, because fatty acids are absent in archaea.
most attractive intermediates as an energy source for Apparently, the role of acetyl-CoA carboxylase in the
primitive organisms provided with nascent glycolysis. Some supply of precursors for fatty acid synthesis is a later
advantages of glyceraldehyde are: (a) it can be produced recruitment in the evolution of this enzyme. Thus, TDH and
Ó FEBS 2003 MDR superfamily (Eur. J. Biochem. 270) 3329

CCAR probably belong to ancient metabolic pathways number of macrofamilies within the MDR superfamily,
subsequently substituted by other metabolic pathways. reflects the original number of MDR proteins that existed
in the last universal common ancestor. It is important to
mention that Castresana [143], after analysing the phylo-
Taxonomy within the MDR superfamily
genetic distribution and evolution of bioenergetic path-
Use of the complete set of known MDR proteins, together ways, concluded that the last universal common ancestor
with criteria and procedures described under the Results contained several members of each gene family. This
section, has allowed us to identify within the MDR agrees with the idea that the last universal common
superfamily, 49 subfamilies, and two additional taxonomic ancestor was a metabolically sophisticated organism.
levels containing eight families and three macrofamilies. Finally, it is interesting to point out that in comparison
From these three taxonomic levels, only the subfamily level, with the other taxonomic categories, the superfamily
as defined by us, comprises a natural unit that can be used to concept is not the focus of extensive discussion and there
sort protein members of a protein superfamily with clear- is a near consensus agreement that in addition to sequence
cut rules. Thus, each subfamily encloses a set of ideally similarities, and a common evolutionary origin, 3D struc-
orthologous proteins that perform the same function, and ture data should be taken into consideration. Thus, a
delineate a closed group (see Results). superfamily can be considered as groups of homologous
Two specific examples of subfamilies containing highly protein families (and/or macrofamilies) with a monophyletic
related paralogous rather than orthologous proteins, are the origin, that share, at least, a barely detectable sequence
animal ADH and plant ADH subfamilies. Both subfamilies similarity, but showing similar 3D structure [144,145].
originated by successive gene duplications from an ancient Inclusion of phylogenetic criteria to define subfamilies,
class III ADH. Animal ADH evolved only in vertebrates families, macrofamilies, and superfamilies can be subscribed
and plant ADH, only in tracheophytas. Within the former to the present tendency to construct a natural taxonomy
subfamily, fishes possess one animal ADH, while amphibia, of proteins and protein families. Figure 6 illustrates the
reptiles, and birds, appear to have at least two enzymes and relationships among the different taxonomic categories
mammals, up to six. It seems that animal ADH enzymo- defined in this work.
genesis developed in parallel to vertebrate evolution. Animal
ADHs conserved the same mechanism of reaction, and
share the same substrates; their main differences occur in
their pattern of expression. Today, the functional roles
developed by the different animal ADHs overlap, and this
functional redundancy allows the individual to tolerate
mutational or environmental perturbations [141]. Absence
of one ADH can be overcome by the existence of other
members of the animal ADH subfamily [142]. This partial
functional redundancy contributes to a more general
phenomenon designated canalization, which is the genetic
capacity to buffer developmental pathways against delete-
reous perturbations [141]; similar advantages can be
described in plant ADHs. Therefore, these singular sub-
families comprise clusters of highly related paralogous
proteins that share functional roles.
A protein family, as discussed previously, must comprise
a cluster of monophyletic subfamilies, i.e. highly related
paralogous proteins, that all derive from a common
ancestor. They possess significative sequence identity and/
or similarity, and may or may not share common substrates Fig. 6. Schematic display showing the main relationships among the
or mechanisms of reaction. different taxonomic categories inside a protein superfamily. Although
In contrast, a protein macrofamily within MDR the definition of homology has remained elusive and is the subject of
comprises a cluster of related protein families with broad intense debates [146], in this work, the concept of homologous proteins
phylogenetic distribution, i.e. with protein members from essentially refers to proteins derived from a common ancestor
the three domains of life, and that originate from a (phylogenetic homology). Therefore, all the taxonomic ranks comprise
common ancestor (monophyletics). Furthermore, within monophyletic groups. Identification of protein subfamilies as non-
each macrofamily at least one subfamily possesses a overlapping clusters (closed groups) is advantageous over distance-
physiological role related to primary metabolic pathways based clustering methods because it is not necessary to set an arbitrary
(with a probable ancient origin). Thus, the advantage of identity cutoff value, and permits the identification of both highly and
clustering protein families into macrofamilies lies in the poorly conserved groups of orthologous proteins. Because of the huge
fact that not all families are equally related, and this is protein plasticity, families cannot be defined by taking the function as a
probably due to the fact that some protein families are criterion, as only inside subfamilies (orthologous groups) is the func-
more ancient than others. Indeed, within each MDR tion conserved. Macrofamilies represent probable ancestral groups
macrofamily, there is a probable ancestral group (see the that might be tracked to the last universal common ancestor; in
previous section), that might be tracked to the last addition, they show a wide phylogenetic range, with protein members
universal common ancestor. If the latter is true, the in archaea, bacteria and eukarya.
3330 H. Riveros-Rosas et al. (Eur. J. Biochem. 270) Ó FEBS 2003

Final consideration 7. Persson, B., Zigler, J.S.J. & Jornvall, H. (1994) A super-family
of medium-chain dehydrogenases/reductases (MDR). Sub-lines
After development of MDR molecular taxonomy, we including zeta-crystallin, alcohol and polyol dehydrogenases,
propose application of the methodology employed in this quinone oxidoreductase enoyl reductases, VAT-1 and other
paper to other protein superfamilies for several reasons. proteins. Eur. J. Biochem. 226, 15–22.
First, use of the BLASTP program in an iterative manner 8. Jornvall, H., Hoog, J.O. & Persson, B. (1999) SDR and MDR:
allows for identification of all members of any protein completed genome sequences show these protein families to be
superfamily. Second, use of all-vs.-all BLAST-based searches large, of old origin, and of complex nature. FEBS Lett. 445,
within one protein superfamily together with extensive 261–264.
database mining, allow to sort members of any protein 9. Jornvall, H. (1999) Multiplicity and complexity of SDR and
superfamily in subfamilies, i.e. closed groups of orthologous MDR enzymes. Adv. Exp. Med. Biol. 463, 359–364.
proteins with BLASTP reciprocal best hits. This procedure 10. Jornvall, H., Shafqat, J. & Persson, B. (2001) Variations and
constant patterns in eukaryotic MDR enzymes. Conclusions
provides an advantage over classical methods for ortholog
from novel structures and characterized genomes. Chem. Biol.
detection because it permits use of all available protein Interact. 130–132, 491–498.
sequence members of one superfamily, bypassing global 11. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang,
multiple alignments and construction of phylogenetic trees, Z., Miller, W. & Lipman, D.J. (1997) Gapped BLAST and PSI-
which can contain slow and error-prone steps. Thus, one BLAST: a new generation of protein database search programs.
can benefit from all the available information without the Nucleic Acids Res. 25, 3389–3402.
need of selecting representative proteins and/or genomes by 12. Pearson, W.R. & Lipman, D.J. (1988) Improved tools for bio-
means of employing this faster and clear-cut procedure. In logical sequence comparison. Proc. Natl Acad. Sci. USA 85,
addition, the different taxonomic categories proposed in this 2444–2448.
work: subfamily, family and macrofamily, can be applied to 13. Pearson, W.R. (1990) Rapid and sensitive sequence comparison
other protein superfamilies, once formal definitions for each with FASTP and FASTA. Methods Enzymol. 183, 63–98.
taxonomic rank are provided. 14. Nordling, E., Jornvall, H. & Persson, B. (2002) Medium-chain
dehydrogenases/reductases (MDR). Eur. J. Biochem. 269,
4267–4276.
Acknowledgements 15. Riveros-Rosas, H., Julián-Sánchez, A. & Piña, E. (1997)
Enzymology of ethanol and acetaldehyde metabolism in mam-
We thank to R.N. Ondarza (Instituto Nacional de Salud Pública,
mals. Arch. Med. Res. 28, 453–471.
México), H. Weiner (Purdue University, USA), S. Bentley (Sanger
16. Bairoch, A. & Apweiler, R. (2000) The SWISS-PROT protein
Institute, UK), R.F. Doolittle (University of California, La Jolla,
sequence database and its supplement TrEMBL in 2000. Nucleic
USA), A. Steinbüchel (Wilhelms-Universität, Münster, Germany),
Acids Res. 28, 45–48.
M. Pharr (North Carolina State University, USA), A. Gómez-Puyou
17. Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J.,
and M. Tuena de Gómez-Puyou (Instituto de Fisiologı́a Celular-
Rapp, B.A. & Wheeler, D.L. (2000) GenBank. Nucleic Acids Res.
UNAM, México), K. Yazaki (Kyoto University, Japan), A. Sosa-
28, 15–18.
Peinado (Facultad de Medicina-UNAM, México), and X. Parés,
18. Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F. &
J. Farrés, J. A. Biosca and their collaborators (Universitat Autòno-
Higgins, D.G. (1997) The CLUSTAL_X windows interface:
ma de Barcelona, Spain), and three anonymous referees for helpful
flexible strategies for multiple sequence alignment aided by
critical review of this manuscript and/or discussions. This work
quality analysis tools. Nucleic Acids Res. 25, 4876–4882.
was supported by grants 34823-M from CONACyT, México, and
19. Page, R.D. (1996) TreeView: an application to display phylo-
IN214101 from DGAPA-UNAM, México. H. R. R. has been
genetic trees on personal computers. Comput. Appl. Biosci. 12,
supported by a graduate fellowship from DGEP-UNAM and
357–358.
CONACyT, México.
20. Kumar, S., Tamura, K., Jakobsen, I.B. & Nei, M. (2001)
MEGA2: molecular evolutionary genetics analysis software.
References Bioinformatics 17, 1244–1245.
21. Pennisi, E. (1999) Keeping genome databases clean and up to
1. Reid, M.F. & Fewson, C.A. (1994) Molecular characterization date. Science 286, 447–450.
of microbial alcohol dehydrogenases. Crit. Rev. Microbiol. 20, 22. Chen, R. & Jeong, S.S. (2000) Functional prediction: identifica-
13–56. tion of protein orthologs and paralogs. Protein Sci. 9, 2344–2353.
2. Conway, T. & Ingram, L.O. (1989) Similarity of Escherichia coli 23. Altschul, S.F. & Koonin, E.V. (1998) Iterated profile searches
propanediol oxidoreductase (fucO product) and an unusual with PSI-BLAST – a tool for discovery in protein databases.
alcohol dehydrogenase from Zymomonas mobilis and Saccharo- Trends Biochem. Sci. 23, 444–447.
myces cerevisiae. J. Bacteriol. 171, 3754–3759. 24. Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L.,
3. Scopes, R.K. (1983) An iron-activated alcohol dehydrogenase. Eddy, S.R., Griffiths-Jones, S., Howe, K.L., Marshall, M. &
FEBS Lett. 156, 303–306. Sonnhammer, E.L. (2002) The Pfam protein families database.
4. Williamson, V.M. & Paquin, C.E. (1987) Homology of Sac- Nucleic Acids Res. 30, 276–280.
charomyces cerevisiae ADH4 to an iron-activated alcohol dehy- 25. Falquet, L., Pagni, M., Bucher, P., Hulo, N., Sigrist, C.J.,
drogenase from Zymomonas mobilis. Mol. General Genet. 209, Hofmann, K. & Bairoch, A. (2002) The PROSITE database, its
374–381. status in 2002. Nucleic Acids Res. 30, 235–238.
5. Krozowski, Z. (1994) The short-chain alcohol dehydrogenase 26. Barker, W.C., Garavelli, J.S., Hou, Z., Huang, H., Ledley, R.S.,
superfamily: variations on a common theme. J. Steroid Biochem. McGarvey, P.B., Mewes, H.W., Orcutt, B.C., Pfeiffer, F., Tsugita,
Mol. Biol. 51, 125–130. A., Vinayaka, C.R., Xiao, C., Yeh, L.S. & Wu, C. (2001) Protein
6. Persson, B., Krook, M. & Jornvall, H. (1991) Characteristics of Information Resource: a community resource for expert anno-
short-chain alcohol dehydrogenases and related enzymes. Eur. J. tation of protein data. Nucleic Acids Res. 29, 29–32.
Biochem. 200, 537–543.
Ó FEBS 2003 MDR superfamily (Eur. J. Biochem. 270) 3331

27. Wu, C.H., Huang, H., Arminski, L., Castro-Alvear, J., Chen, Y., 44. Sonnhammer, E.L.L. & Koonin, E.V. (2002) Orthology, para-
Hu, Z.Z., Ledley, R.S., Lewis, K.C., Mewes, H.W., Orcutt, B.C., logy and proposed classification for paralog subtypes. Trends
Suzek, B.E., Tsugita, A., Vinayaka, C.R., Yeh, L.S., Zhang, J. & Genet. 18, 619–620.
Barker, W.C. (2002) The Protein Information Resource: an 45. Altamirano, M.M., Blackburn, J.M., Aguayo, C. & Fersht, A.R.
integrated public resource of functional annotation of proteins. (2000) Directed evolution of new catalytic activity using the
Nucleic Acids Res. 30, 35–37. alpha/beta-barrel scaffold. Nature 403, 617–622.
28. Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, 46. Gerlt, J.A. & Babbit, P.C. (2000) Can sequence determine
M.B. & Thornton, J.M. (1997) CATH – a hierarchic classification function? Genome Biol. 1, REVIEWS0005.1–000.10.
of protein domain structures. Structure 5, 1093–1108. 47. Gerlt, J.A. & Babbit, P.C. (2001) Divergent evolution of
29. Orengo, C.A., Bray, J.E., Buchan, D.W.A., Harrison, A., Lee, enzymatic function: mechanistically diverse superfamilies and
D., Pearl, F.M.G., Sillitoe, I., Todd, A.E. & Thornton, J.M. functionally distinct suprafamilies. Annu. Rev. Biochem. 70,
(2002) The CATH protein family database: a resource for 209–246.
structural and functional annotation of genomes. Proteomics 2, 48. Iborra, F.J., Renau-Piqueras, J., Portoles, M., Boleda, M.D.,
11–21. Guerri, C. & Pares, X. (1992) Immunocytochemical and bio-
30. Tatusov, R.L., Koonin, E.V. & Lipman, D.J. (1997) A genomic chemical demonstration of formaldehyde dehydrogenase (class
perspective on protein families. Science 278, 631–637. III alcohol dehydrogenase) in the nucleus. J. Histochem.
31. Tatusov, R.L., Galperin, M.Y., Natale, D.A. & Koonin, E.V. Cytochem. 40, 1865–1878.
(2000) The COG database: a tool for genome-scale analysis of 49. Peralba, J.M., Cederlund, E., Crosas, B., Moreno, A., Julia, P.,
protein functions and evolution. Nucleic Acids Res. 28, 33–36. Martinez, S.E., Persson, B., Farres, J., Pares, X. & Jornvall, H.
32. Tatusov, R.L., Natale, D.A., Garkavtsev, I.V., Tatusova, T.A., (1999) Structural and enzymatic properties of a gastric
Shankavaram, U.T., Rao, B.S., Kiryutin, B., Galperin, M.Y., NADP(H)-dependent and retinal-active alcohol dehydrogenase.
Fedorova, N.D. & Koonin, E.V. (2001) The COG database: new J. Biol. Chem. 274, 26021–26026.
developments in phylogenetic classification of proteins from 50. Valencia, E., Rosell, A., Larroy, C., Farres, J., Biosca, J.A., Fita,
complete genomes. Nucleic Acids Res. 29, 22–28. I., Pares, X. & Ochoa, W.F. (2003) Crystallization and
33. Mushegian, A.R., Garey, J.R., Martin, J. & Liu, L.X. (1998) preliminary X-ray analysis of NADP(H)-dependent alcohol
Large-scale taxonomic profiling of eukaryotic model organisms: dehydrogenases from Saccharomyces cerevisiae and Rana perezi.
a comparison of orthologous proteins encoded by the human, fly, Acta Crystallogr. D-Biol. Cryst. 59, 334–337.
nematode, and yeast genomes. Genome Res. 8, 590–598. 51. Eggeling, L. & Sahm, H. (1985) The formaldehyde dehy-
34. Chervitz, S.A., Aravind, L., Sherlock, G., Ball, C.A., Koonin, drogenase of Rhodococcus erythropolis, a trimeric enzyme
E.V., Dwight, S.S., Harris, M.A., Dolinski, K., Mohr, S., Smith, requiring a cofactor and active with alcohols. Eur. J. Biochem.
T., Weng, S., Cherry, J.M. & Botstein, D. (1998) Comparison of 150, 129–134.
the complete protein sets of worm and yeast: orthology and 52. Norin, A., Van Ophem, P.W., Piersma, S.R., Persson, B., Duine,
divergence. Science 282, 2022–2028. J.A. & Jornvall, H. (1997) Mycothiol-dependent formaldehyde
35. Wheelan, S.J., Boguski, M.S., Duret, L. & Makalowski, W. dehydrogenase, a prokaryotic medium-chain dehydrogenase/
(1999) Human and nematode orthologs – lessons from the ana- reductase, phylogenetically links different eukaroytic alcohol de-
lysis of 1800 human genes and the proteome of Caenorhabditis hydrogenases – primary structure, conformational modelling and
elegans. Gene 238, 163–170. functional correlations. Eur. J. Biochem. 248, 282–289.
36. Rubin, G.M., Yandell, M.D., Wortman, J.R., Gabor, M.G., 53. Van Ophem, P.W., Van Beeumen, J. & Duine, J.A. (1992)
Nelson, C.R., Hariharan, I.K., Fortini, M.E., Li, P.W., Apweiler, NAD-linked, factor-dependent formaldehyde dehydrogenase
R., Fleischmann, W., Cherry, J.M., Henikoff, S., Skupski, M.P., or trimeric, zinc-containing, long-chain alcohol dehydrogenase
Misra, S., Ashburner, M., Birney, E., Boguski, M.S., Brody, T., from Amycolatopsis methanolica. Eur. J. Biochem. 206,
Brokstein, P., Celniker, S.E., Chervitz, S.A., Coates, D., 511–518.
Cravchik, A., Gabrielian, A., Galle, R.F., Gelbart, W.M., 54. Duester, G., Farres, J., Felder, M.R., Holmes, R.S., Hoog, J.O.,
George, R.A., Goldstein, L.S., Gong, F., Guan, P., Harris, N.L., Pares, X., Plapp, B.V., Yin, S.J. & Jornvall, H. (1999)
Hay, B.A., Hoskins, R.A., Li, J., Li, Z., Hynes, R.O., Jones, S.J., Recommended nomenclature for the vertebrate alcohol
Kuehl, P.M., Lemaitre, B., Littleton, J.T., Morrison, D.K., dehydrogenase gene family. Biochem. Pharmacol. 58, 389–395.
Mungall, C., O’Farrell, P.H., Pickeral, O.K., Shue, C., Vosshall, 55. Tadege, M., Dupuis, I. & Kuhlemeier, C. (1999) Ethanolic fer-
L.B., Zhang, J., Zhao, Q., Zheng, X.H. & Lewis, S. (2000) mentation: new functions for an old pathway. Trends Plant Sci. 4,
Comparative genomics of the eukaryotes. Science 287, 2204– 320–325.
2215. 56. Larroy, C., Pares, X. & Biosca, J.A. (2002) Characterization of a
37. Remm, M., Storm, C.E. & Sonnhammer, E.L. (2001) Automatic Saccharomyces cerevisiae NADP(H)-dependent alcohol dehy-
clustering of orthologs and in-paralogs from pairwise species drogenase (ADHVII), a member of the cinnamyl alcohol dehy-
comparisons. J. Mol. Biol. 314, 1041–1052. drogenase family. Eur. J. Biochem. 269, 5738–5745.
38. Krause, A. & Vingron, M. (1998) A set-theoretic approach to 57. Larroy, C., Fernandez, M.R., Gonzalez, E., Pares, X. & Biosca,
database searching and clustering. Bioinformatics 14, 430–438. J.A. (2002) Characterization of the Saccharomyces cerevisiae
39. Krause, A., Stoye, J. & Vingron, M. (2000) The SYSTERS YMR318C (ADH6) gene product as a broad specificity
protein sequence cluster set. Nucleic Acids Res. 28, 270–272. NADPH-dependent alcohol dehydrogenase: relevance in alde-
40. Krause, A., Haas, S.A., Coward, E. & Vingron, M. (2002) hyde reduction. Biochem. J. 361, 163–172.
SYSTERS, GeneNest, SpliceNest: exploring sequence space from 58. Logemann, E., Reinold, S., Somssich, I.E. & Hahlbrock, K.
genome to protein. Nucleic Acids Res. 30, 299–300. (1997) A novel type of pathogen defense-related cinnamyl alcohol
41. Li, W.-H. (1997) Molecular Evolution. Sinauer Associates, dehydrogenase. Biol. Chem. 378, 909–913.
Sunderland, MA, USA. 59. Brill, E.M., Abrahams, S., Hayes, C.M., Jenkins, C.L. & Watson,
42. Page, R.D. & Holmes, E.C. (1999) Molecular Evolution: A. J.M. (1999) Molecular characterisation and expression of a
Phylogenetic Approach. Blackwell Science, Oxford, UK. wound-inducible cDNA encoding a novel cinnamyl-alcohol
43. Graur, D. & Li, W.-H. (1999) Fundamentals of Molecular dehydrogenase enzyme in lucerne (Medicago sativa L.). Plant
Evolution. 2nd edn. Sinauer, Sunderland, MA, USA. Mol. Biol. 41, 279–291.
3332 H. Riveros-Rosas et al. (Eur. J. Biochem. 270) Ó FEBS 2003

60. Williamson, J.D., Stoop, J.M., Massel, M.O., Conkling, M.A. & 77. Galperin, M.Y., Walker, D.R. & Koonin, E.V. (1998) Analogous
Pharr, D.M. (1995) Sequence analysis of a mannitol dehy- enzymes: independent inventions in enzyme evolution. Genome
drogenase cDNA from plants reveals a function for the patho- Res. 8, 779–790.
genesis-related protein ELI3. Proc. Natl Acad. Sci. USA 92, 78. Todd, A.E., Orengo, C.A. & Thornton, J.M. (1999) Evolution of
7148–7152. protein function, from a structural perspective. Curr. Opin. Chem.
61. Quirino, B.F., Normanly, J. & Amasino, R.M. (1999) Diverse Biol. 3, 548–556.
range of gene activity during Arabidopsis thaliana leaf senescence 79. Benach, J., Atrian, S., Ladenstein, R. & Gonzalez-Duarte, R.
includes pathogen-independent induction of defense-related (2001) Genesis of Drosophila ADH: the shaping of the enzymatic
genes. Plant Mol. Biol. 40, 267–278. activity from a SDR ancestor. Chem. Biol. Interact. 130–132,
62. Prata, R.T.N., Williamson, J.D., Conkling, M.A. & Pharr, D.M. 405–415.
(1997) Sugar repression of mannitol dehydrogenase activity in 80. Pal, G.P., Jany, K.D. & Saenger, W. (1987) Crystallization of and
celery cells. Plant Physiol. 114, 307–314. X-ray investigations on glucose dehydrogenase from Bacillus
63. Stoop, J.M.H., Williamson, J.D., Conkling, M.A., MacKay, J.J. megaterium. Eur. J. Biochem. 167, 123–124.
& Pharr, D.M. (1998) Characterization of NAD-dependent 81. Yamamoto, K., Kusunoki, M., Urabe, I., Tabata, S. & Osaki, S.
mannitol dehydrogenase from celery as affected by ions, chela- (2000) Crystallization and preliminary X-ray analysis of glucose
tors, reducing agents and metabolites. Plant Sci. 131, 43–51. dehydrogenase from Bacillus megaterium IWG3. Acta. Crystal-
64. Pharr, D.M., Prata, R.T.N., Jennings, D.B., Williamson, J.D., logr. D Biol. Crystallogr. 56, 1443–1445.
Zamski, E., Yamamoto, Y.T. & Conkling, M.A. (1999) Reg- 82. Baldock, C., Rafferty, J.B., Stuitje, A.R., Slabas, A.R. & Rice,
ulation of mannitol dehydrogenase: relationship to plant growth D.W. (1998) The X-ray structure of Escherichia coli enoyl
and stress tolerance. Hortscience 34, 1027–1032. reductase with bound NAD+ at 2.1 Å resolution. J. Mol. Biol.
65. Masuda, N., Yasumo, H., Furusawa, T., Tsukamoto, T., 284, 1529–1546.
Sadano, H. & Osumi, T. (1998) Nuclear receptor binding factor-1 83. Kater, M.M., Koningstein, G.M., Nijkamp, H.J. & Stuitje, A.R.
(NRBF-1), a protein interacting with a wide spectrum of nuclear (1994) The use of a hybrid genetic system to study the functional
hormone receptors. Gene 221, 225–233. relationship between prokaryotic and plant multi-enzyme fatty
66. Yamazoe, M., Shirahige, K., Rashid, M.B., Kaneko, Y., acid synthetase complexes. Plant Mol. Biol. 25, 771–790.
Nakayama, T., Ogasawara, N. & Yoshikawa, H. (1994) A pro- 84. Slabas, A.R., Cottingham, I., Austin, A., Fawcett, T. &
tein which binds preferentially to single-stranded core sequence of Sidebottom, C.M. (1991) Amino acid sequence analysis of rape
autonomously replicating sequence is essential for respiratory seed (Brassica napus) NADH-enoyl ACP reductase. Plant Mol.
function in mitochondria of Saccharomyces cerevisiae. J. Biol. Biol. 17, 911–914.
Chem. 269, 15244–15252. 85. Jornvall, H., von Bahr-Lindstrom, H., Jany, K.D., Ulmer, W. &
67. Owen, G.I. & Zelent, A. (2000) Origins and evolutionary Froschle, M. (1984) Extended superfamily of short alcohol-
diversification of the nuclear receptor superfamily. Cell. Mol. Life polyol-sugar dehydrogenases: structural similarities between
Sci. 57, 809–827. glucose and ribitol dehydrogenases. FEBS Lett. 165, 190–196.
68. Metz, J.G., Pollard, M.R., Anderson, L., Hayes, T.R. & Lassner, 86. Edgar, A.J. (2002) Molecular cloning and tissue distribution of
M.W. (2000) Purification of a jojoba embryo fatty acyl-coenzyme mammalian L-threonine 3-dehydrogenases. BMC Biochem. 3, 19.
A reductase and expression of its cDNA in high erucic acid 87. True, J.R. & Carroll, S.B. (2002) Gene co-option in physiological
rapeseed. Plant Physiol. 122, 635–644. and morphological evolution. Annu. Rev. Cell. Dev. Biol. 18,
69. Wang, X. & Kolattukudy, P.E. (1995) Solubilization, purification 53–80.
and characterization of fatty acyl-CoA reductase from duck 88. Jurgens, C., Strom, A., Wegener, D., Hettwer, S., Wilmanns, M.
uropygial gland. Biochem. Biophys. Res. Commun. 208, 210–215. & Sterner, R. (2000) Directed evolution of a (beta alpha) 8-barrel
70. O’Brien, P.J. & Herschlag, D. (1999) Catalytic promiscuity enzyme to catalyze related reactions in two different metabolic
and the evolution of new enzymatic activities. Chem. Biol. 6, pathways. Proc. Natl Acad. Sci. USA 97, 9925–9930.
R91–R105. 89. Horowitz, N.H. (1945) On the evolution of biochemical
71. Henehan, G.T. & Oppenheimer, N.J. (1993) Horse liver alcohol syntheses. Proc. Natl Acad. Sci. USA 31, 153–157.
dehydrogenase-catalyzed oxidation of aldehydes: dismutation 90. Xu, L.L., Singh, B.K. & Conn, E.E. (1988) Purification and
precedes net production of reduced nicotinamide adenine dinu- characterization of acetone cyanohydrin lyase from Linum
cleotide. Biochemistry 32, 735–738. usitatissimum. Arch. Biochem. Biophys. 263, 256–263.
72. Svensson, S., Lundsjo, A., Cronholm, T. & Hoog, J.O. (1996) 91. Trummler, K. & Wajant, H. (1997) Molecular cloning of acetone
Aldehyde dismutase activity of human liver alcohol dehy- cyanohydrin lyase from flax (Linum usitatissimum). Definition
drogenase. FEBS Lett. 394, 217–220. of a novel class of hydroxynitrile lyases. J. Biol. Chem. 272,
73. Tsai, C.S. (1982) Multifunctionality of liver alcohol dehydro- 4770–4774.
genase: kinetic and mechanistic studies of esterase reaction. Arch. 92. Trummler, K., Roos, J., Schwaneberg, U., Effenberger, F.,
Biochem. Biophys. 213, 635–642. Forster, S., Pfizenmaier, K. & Wajant, H. (1998) Expression of
74. Kusano, M., Sakai, Y., Kato, N., Yoshimoto, H., Sone, H. & the Zn2+-containing hydroxynitrile lyase from flax (Linum
Tamai, Y. (1998) Hemiacetal dehydrogenation activity of alcohol usitatissimum) in Pichia pastoris – utilization of the recombinant
dehydrogenases in Saccharomyces cerevisiae. Biosci. Biotechnol. enzyme for enzymatic analysis and site-directed mutagenesis.
Biochem. 62, 1956–1961. Plant Sci. 139, 19–27.
75. Todd, A.E., Orengo, C.A. & Thornton, J.M. (2001) Evolution of 93. Breithaupt, H., Pohl, M., Bonigk, W., Heim, P., Schimz, K.L. &
function in protein superfamilies, from a structural perspective. Kula, M.R. (1999) Cloning and expression of (R)-hydroxynitrile
J. Mol. Biol. 307, 1113–1143. lyase from Linum usitatissimum (flax). J. Mol. Catal. B-Enzym. 6,
76. Clish, C.B., Levy, B.D., Chiang, N., Tai, H.H. & Serhan, C.N. 315–332.
(2000) Oxidoreductases in lipoxin A4 metabolic inactivation: a 94. Dreveny, I., Gruber, K., Glieder, A., Thompson, A. & Kratky, C.
novel role for 15-oxoprostaglandin 13-reductase/leukotriene B4 (2001) The hydroxynitrile lyase from almond: a lyase that looks
12-hydroxydehydrogenase in inflammation. J. Biol. Chem. 275, like an oxidoreductase. Structure 9, 803–815.
25372–25380. 95. Vetter, J. (2000) Plant cyanogenic glycosides. Toxicon 38, 11–36.
Ó FEBS 2003 MDR superfamily (Eur. J. Biochem. 270) 3333

96. Thornton, J.M., Orengo, C.A., Todd, A.E. & Pearl, F.M. 115. Field, J., Rosenthal, B. & Samuelson, J. (2000) Early lateral
(1999) Protein folds, functions and evolution. J. Mol. Biol. 293, transfer of genes encoding malic enzyme, acetyl-CoA synthetase
333–342. and alcohol dehydrogenases from anaerobic prokaryotes to
97. Nakar, D. & Gutnick, D.L. (2001) Analysis of the wee gene Entamoeba histolytica. Mol. Microbiol. 38, 446–455.
cluster responsible for the biosynthesis of the polymeric bio- 116. Shafqat, J., El-Ahmad, M., Danielsson, O., Martinez, M.C.,
emulsifier from the oil- degrading strain Acinetobacter lwoffii Persson, B., Pares, X. & Jornvall, H. (1996) Pea formaldehyde-
RAG-1. Microbiology 147, 1937–1946. active class III alcohol dehydrogenase: common derivation of the
98. Babiychuk, E., Kushnir, S., Belles-Boix, E., Van Montagu, M. & plant and animal forms but not of the corresponding ethanol-
Inze, D. (1995) Arabidopsis thaliana NADPH oxidoreductase active forms (classes I and P). Proc. Natl Acad. Sci. USA 93,
homologs confer tolerance of yeasts toward the thiol-oxidizing 5595–5599.
drug diamide. J. Biol. Chem. 270, 26224–26231. 117. Ondarza, R.N., Rendon, J.L. & Ondarza, M. (1983) Glutathione
99. Ichinose, Y., Tiemann, K., Schwenger-Erger, C., Toyoda, K., reductase in evolution. J. Mol. Evol. 19, 371–375.
Hein, F., Hanselle, T., Cornels, H. & Barz, W. (2000) Genes 118. Fahey, R.C. & Sundquist, A.R. (1991) Evolution of glutathione
expressed in Ascochyta rabiei-inoculated chickpea plants and metabolism. Adv. Enzymol. Relat. Areas Mol. Biol. 64, 1–53.
elicited cell cultures as detected by differential cDNA-hybridiza- 119. Selkov, E., Maltsev, N., Olsen, G.J., Overbeek, R. & Whitman,
tion. Z. Naturforsch C. 55, 44–54. W.B. (1997) A reconstruction of the metabolism of Methano-
100. Smith, S. (1994) The animal fatty acid synthase: one gene, one coccus jannaschii from sequence data. Gene 197, GC11-GC26.
polypeptide, seven enzymes. FASEB J. 8, 1248–1259. 120. Lazcano, A. & Miller, S.L. (1999) On the origin of metabolic
101. Zhu, G., Marchewka, M.J., Woods, K.M., Upton, S.J. & pathways. J. Mol. Evol. 49, 424–431.
Keithly, J.S. (2000) Molecular analysis of a Type I fatty acid 121. Fothergill-Gilmore, L.A. & Michels, P.A. (1993) Evolution of
synthase in Cryptosporidium parvum. Mol. Biochem. Parasitol. glycolysis. Prog. Biophys. Mol. Biol. 59, 105–235.
105, 253–260. 122. Larralde, R., Robertson, M.P. & Miller, S.L. (1995) Rates of
102. Hutchinson, C.R., Kennedy, J., Park, C., Kendrew, S., Auclair, decomposition of ribose and other sugars: implications for che-
K. & Vederas, J. (2000) Aspects of the biosynthesis of non-aro- mical evolution. Proc. Natl Acad. Sci. USA 92, 8158–8160.
matic fungal polyketides by iterative polyketide synthases. 123. Conway, T. (1992) The Entner-Doudoroff pathway: history,
Antonie Van Leeuwenhoek 78, 287–295. physiology and molecular biology. FEMS Microbiol. Rev. 9,
103. Kennedy, J., Auclair, K., Kendrew, S.G., Park, C., Vederas, J.C. 1–27.
& Hutchinson, C.R. (1999) Modulation of polyketide synthase 124. Romano, A.H. & Conway, T. (1996) Evolution of carbohydrate
activity by accessory proteins during lovastatin biosynthesis. metabolic pathways. Res. Microbiol. 147, 448–455.
Science 284, 1368–1372. 125. Dandekar, T., Schuster, S., Snel, B., Huynen, M. & Bork, P.
104. James, L.C. & Tawfik, D.S. (2001) Catalytic and binding (1999) Pathway alignment: application to the comparative ana-
poly-reactivities shared by two unrelated proteins: The potential lysis of glycolytic enzymes. Biochem. J. 343, 115–124.
role of promiscuity in enzyme evolution. Protein Sci. 10, 126. Gabel, N.W. & Ponnamperuma, C. (1967) Model for origin of
2600–2607. monosaccharides. Nature 216, 453–455.
105. Tao, H. & Cornish, V.W. (2002) Milestones in directed enzyme 127. Reid, C. & Orgel, L.E. (1967) Synthesis in sugars in potentially
evolution. Curr. Opin. Chem. Biol. 6, 858–864. prebiotic conditions. Nature 216, 455.
106. Garland, D., Rao, P.V., Del Corso, A., Mura, U. & Zigler, J.S.J. 128. Epps, D.E., Nooner, D.W., Eichberg, J., Sherwood, E. & Oro, J.
(1991) zeta-Crystallin is a major protein in the lens of Camelus (1979) Cyanamide mediated synthesis under plausible primitive
dromedarius. Arch. Biochem. Biophys. 285, 134–136. earth conditions. VI. The synthesis of glycerol and glycero-
107. Rao, P.V. & Zigler, J.S.J. (1992) Purification and characterization phosphates. J. Mol. Evol. 14, 235–241.
of zeta-crystallin/quinone reductase from guinea pig liver. 129. Weber, A.L. (1987) The triose model: glyceraldehyde as a source
Biochim. Biophys. Acta 1117, 315–320. of energy and monomers for prebiotic condensation reactions.
108. Gonzalez, P., Rao, P.V., Nunez, S.B. & Zigler, J.S.J. (1995) Orig. Life Evol. Biosph. 17, 107–119.
Evidence for independent recruitment of zeta-crystallin/quinone 130. Weber, A.L. & Hsu, V. (1990) Energy-rich glyceric acid oxygen
reductase (CRYZ) as a crystallin in camelids and hystricomorph esters: implications for the origin of glycolysis. Orig. Life Evol.
rodents. Mol. Biol. Evol. 12, 773–781. Biosph. 20, 145–150.
109. Fujii, Y., Kimoto, H., Ishikawa, K., Watanabe, K., Yokota, Y., 131. Fukuchi, S. & Otsuka, J. (1992) Evolution of metabolic pathways
Nakai, N. & Taketo, A. (2001) Taxon-specific zeta-crystallin in by chance assembly of enzyme proteins generated from sense
Japanese tree frog (Hyla japonica) lens. J. Biol. Chem. 276, and antisense strands of pre-existing genes. J. Theor. Biol. 158,
28134–28139. 271–291.
110. Bhaud, Y., Geraud, M.L., Ausseil, J., Soyer-Gobillard, M.O. & 132. Skulachev, V.P. (1994) Bioenergetics: the evolution of molecular
Moreau, H. (1999) Cyclic expression of a nuclear protein in a mechanisms and the development of bioenergetic concepts.
dinoflagellate. J. Eukaryot. Microbiol. 46, 259–267. Antonie Van Leeuwenhoek 65, 271–284.
111. Guillebault, D., Derelle, E., Bhaud, Y. & Moreau, H. (2001) Role 133. Rella, R., Raia, C.A., Pensa, M., Pisani, F.M., Gambacorta, A.,
of nuclear WW domains and proline-rich proteins in dino- De Rosa, M. & Rossi, M. (1987) A novel archaebacterial NAD+-
flagellate transcription. Protist 152, 127–138. dependent alcohol dehydrogenase. Purification Properties. Eur. J.
112. Chang, A. & Fink, G.R. (1995) Targeting of the yeast plasma Biochem. 167, 475–479.
membrane [H+]ATPase: a novel gene AST1 prevents mislo- 134. Peretz, M., Bogin, O., Keinan, E. & Burstein, Y. (1993) Stereo-
calization of mutant ATPase to the vacuole. J. Cell Biol. 128, specificity of hydrogen transfer by the NADP-linked alcohol
39–49. dehydrogenase from the thermophilic bacterium Thermo-
113. Jain, R., Rivera, M.C. & Lake, J.A. (1999) Horizontal gene anaerobium brockii. Int. J. Pept. Protein Res. 42, 490–495.
transfer among genomes: The complexity hypothesis. Proc. Natl 135. Demetrius, L. (1998) Role of enzyme-substrate flexibility in
Acad. Sci. USA 96, 3801–3806. catalytic activity: an evolutionary perspective. J. Theor. Biol. 194,
114. Lawrence, J.G. & Ochman, H. (1998) Molecular archaeology 175–194.
of the Escherichia coli genome. Proc. Natl Acad. Sci. USA 95, 136. Giordano, A., Cannio, R., La Cara, F., Bartolucci, S., Rossi, M.
9413–9417. & Raia, C.A. (1999) Asn249Tyr substitution at the coenzyme
3334 H. Riveros-Rosas et al. (Eur. J. Biochem. 270) Ó FEBS 2003

binding domain activates Sulfolobus solfataricus alcohol dehy- 148. Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Birney,
drogenase and increases its thermal stability. Biochemistry 38, E., Biswas, M., Bucher, P., Cerutti, L., Corpet, F., Croning,
3043–3054. M.D., Durbin, R., Falquet, L., Fleischmann, W., Gouzy, J.,
137. Raia, C.A., Caruso, C., Marino, M., Vespa, N. & Rossi, M. Hermjakob, H., Hulo, N., Jonassen, I., Kahn, D., Kanapin, A.,
(1996) Activation of Sulfolobus solfataricus alcohol dehydro- Karavidopoulou, Y., Lopez, R., Marx, B., Mulder, N.J., Oinn,
genase by modification of cysteine residue 38 with iodoacetic acid. T.M., Pagni, M. & Servant, F. (2001) The InterPro database, an
Biochemistry 35, 638–647. integrated documentation resource for protein families, domains
138. Pietruszko, R. (1979) Nonethanol substrates of alcohol dehy- and functional sites. Nucleic Acids Res. 29, 37–40.
drogenase. In Biochemistry and Pharmacology of Ethanol 149. Ng, K.YeR., Wu, X.C. & Wong, S.L. (1992) Sorbitol dehy-
(Majchrowicz, E. & Noble, E.P., eds), pp. 87–106. Plenum Press, drogenase from Bacillus subtilis. Purification, characterization,
New York, USA. and gene cloning. J. Biol. Chem. 267, 24989–24994.
139. Liu, H. & Reynolds, K.A. (2001) Precursor supply for polyketide 150. Marini, I., Bucchioni, L., Borella, P., Del Corso, A. & Mura, U.
biosynthesis: the role of crotonyl-CoA reductase. Metab. Eng. 3, (1997) Sorbitol dehydrogenase from bovine lens: purification and
40–48. properties. Arch. Biochem. Biophys. 340, 383–391.
140. Darling, P.B., Grunow, J., Rafii, M., Brookes, S., Ball, R.O. & 151. Lindstad, R.I., Koll, P. & McKinley-McKee, J.S. (1998) Sub-
Pencharz, P.B. (2000) Threonine dehydrogenase is a minor strate specificity of sheep liver sorbitol dehydrogenase. Biochem.
degradative pathway of threonine catabolism in adult humans. J. 330, 479–487.
Am. J. Physiol. Endocrinol. Metab. 278, E877–E884. 152. Oura, Y., Yamada, K., Shiratake, K. & Yamaki, S. (2000) Pur-
141. Wilkins, A.S. (1997) Canalization: a molecular genetic ification and characterization of a NAD+-dependent sorbitol
perspective. Bioessays 19, 257–262. dehydrogenase from Japanese pear fruit. Phytochemistry 54,
142. Deltour, L., Foglio, M.H. & Duester, G. (1999) Metabolic defi- 567–572.
ciencies in alcohol dehydrogenase Adh1, Adh3, and Adh4 null 153. Hogenesch, J.B., Ching, K.A., Batalov, S., Su, A.I., Walker, J.R.,
mutant mice. Overlapping roles of Adh1 and Adh4 in ethanol Zhou, Y., Kay, S.A., Schultz, P.G. & Cooke, M.P. (2001) A
clearance and metabolism of retinol to retinoic acid. J. Biol. comparison of the Celera and Ensembl predicted gene sets reveals
Chem. 274, 16796–16801. little overlap in novel genes. Cell 106, 413–415.
143. Castresana, J. (2001) Comparative genomics and bioenergetics. 154. Shouse, B. (2002) American Association for the Advancement of
Biochim. Biophys. Acta 1506, 147–162. Science Annual Meeting. Human gene count on the rise. Science
144. Koonin, E.V., Tatusov, R.L. & Galperin, M.Y. (1998) Beyond 295, 1457.
complete genomes: from sequence to structure and function.
Curr. Opin. Struct. Biol. 8, 355–363.
145. Koonin, E.V., Wolf, Y.I. & Karev, G.P. (2002) The structure Supplementary material
of the protein universe and genome evolution. Nature 420, The following material is available from http://www.
218–223. blackwellpublishing.com/products/journals/suppmat/EJB/
146. Butler, A.B. & Saidel, W.M. (2000) Defining sameness: historical,
EJB3704/EJB3704sm.htm
biological, and generative homology. Bioessays 22, 846–853.
Table S1. Proteins that belong to MDR superfamily.
147. Lo, C.L., Ailey, B., Hubbard, T.J., Brenner, S.E., Murzin, A.G.
& Chothia, C. (2000) SCOP: a structural classification of proteins
Table S2. References for Tables 3–8.
database. Nucleic Acids Res. 28, 257–259.

You might also like