*To whom correspondence should be addressed. Tel: +33 4 91 82 55 87; Fax: +33 491 26 67 20; Email: [email protected]
Correspondence may also be addressed to Pedro M. Coutinho. Email: [email protected]
the hydrolysis and/or transglycosylation of glycosidic groups of organisms. Day-to-day inspection of new
bonds. GH-coding genes are abundant and present in enzyme characterizations reported in the literature regu-
the vast majority of genomes corresponding to larly led and continues to lead to the definition of new
almost half—presently about 47%—of the enzymes enzyme families. Significantly, the CAZy families, origin-
classified in CAZy. Because of their widespread ally created following hydrophobic cluster analysis in the
importance for biotechnological and biomedical app- 1990s from very limited number of sequences available
lications, GHs constitute so far the best biochemi- (2–6) and later complemented by BLAST- and HMMer-
cally characterized set of enzymes present in the based sequence similarity approaches, are globally surviv-
CAZy database. ing the challenge of time in spite of a hundred-fold
(2) Glycosyltransferases (GTs). These are the enzymes increase in the number of sequences.
responsible for the biosynthesis of glycosidic bonds
from phospho-activated sugar donors (6–8). They
also included in the database. In addition, as the described MANUAL FUNCTIONAL ANALYSIS
functions in CAZy are only of enzymatic nature, addi- All too often, functional annotation methods employed
tional and complementary binding and inhibitory func- during whole genome annotation are erroneous and lack
tions known to be associated with several CAZy consistent language (12,15). While sequence similarity to
proteins will be curated and explored in the near future. genes annotated by GO or best BLAST hits can be a
good-starting point to assignment to pathways or possible
general functions, such as serine/theonine kinase, many
SEMI-AUTOMATIC MODULAR ASSIGNMENT automatic functional assignments are unfortunately
much more specific. This is particularly true in the case
Carbohydrate-active enzymes, can exhibit a modular of CAZymes, since related families of the latter group
structure (Figure 1), where a module can be defined as a together enzymes of widely differing specificity.
of sequences than a few years ago, making it possible to components of carbohydrate-based systems now emerges.
perform large-scale analyses, such as the annotation of Examples include: N- and O-glycosylation of proteins,
CAZyme systems in genomes and metagenomic investiga- starch metabolism, biosynthesis of the cell-wall and its
tions of the breakdown of complex carbohydrates. A typi- subcomponents. Geisler-Lee et al. (19) have combined
cal genome analysis begins with the assignment of protein bioinformatics and transcriptome analysis of various
models to one or several CAZy families (depending on the poplar and Arabidopsis tissues and organs and have
number of CAZy modules present within the sequence). shown that CAZyme transcripts are particularly abundant
This family assignment is then followed by the prediction in wood tissues.
of general functional classes using a manual examination
of alignments to closely related sequences, taking care to
identify the retention of active-site residues. Once a NEW FEATURES
Figure 2. (A) Once a search is performed, such as for a protein accession (P00275), the resulting page indicates the modular families that compose
that protein. (B) Upon clicking the resulting links provided in A, users are directed to a page about the family and gives a listing of all annotated
In the last 2 years, the number of sequences in CAZY has the website and at www.cazypedia.org. Software from
nearly doubled and the number of available genomes is the group is available at www.cazy.org/tools.
over 750. We believe this trend will continue in the coming
years. Unfortunately, while sequencing is forever more
rapid, progress in structural information and biochemical FUNDING
characterization is much slower. The number of biochem- The authors wish to thank the Departement des Sciences
ical data has grown only by 8% over the last 2 years de la Vie of CNRS for a 2-year funding grant to B.L.C.
(Figure 3). This means that the gap is widening between and Novozymes for a contract supporting V.L.
available sequences and biochemically characterized
enzymes, making better methods for high-throughput bio- Conflict of interest statement. P.M.C. is affiliated to
chemical characterization advantageous. Université de Provence (Aix-Marseille I) and B.H. and
C.R. are members of CNRS.
