Academia.eduAcademia.edu

The WRKY superfamily of plant transcription factors

2000, Trends in Plant Science

The metabolism of one-carbon (C 1) units is vital to plants. It involves unique enzymes and takes place in four subcellular compartments. Plant C 1 biochemistry has remained relatively unexplored, partly because of the low abundance or the lability of many of its enzymes and intermediates. Fortunately, DNA sequence databases now make it easier to characterize known C 1 enzymes and to discover new ones, to identify pathways that might carry high C 1 fluxes, and to use engineering to redirect C 1 fluxes and to understand their control better.

trends in plant science Reviews The WRKY superfamily of plant transcription factors Thomas Eulgem, Paul J. Rushton, Silke Robatzek and Imre E. Somssich The WRKY proteins are a superfamily of transcription factors with up to 100 representatives in Arabidopsis. Family members appear to be involved in the regulation of various physiological programs that are unique to plants, including pathogen defense, senescence and trichome development. In spite of the strong conservation of their DNA-binding domain, the overall structures of WRKY proteins are highly divergent and can be categorized into distinct groups, which might reflect their different functions. O ne of the apparent fundamental principles of biological evolution is that the progression from ancient to advanced life forms is inseparably connected to an increase in regulatory capacity. Genome-sequencing efforts have provided evidence for a positive correlation between the proportion of genes involved in information processing and the complexity of organisms. More than 20% of the genes within the sequence available for the Arabidopsis thaliana genome appear to encode proteins that play a role in signal transduction or transcription1, whereas only 12% of the genome of the single-celled yeast Saccharomyces cerevisiae contains genes of this type2. This increase in biological complexity coincides with the appearance or expansion of specific groups of regulator genes. One example is the nuclear-receptor-gene family, which is completely absent in yeast but highly represented in metazoan organisms3. The evolution of nuclear receptors is believed to be a key event in the development of intercellular communication, a prerequisite for the multicellularity of metazoans4. Similarly, the establishment of a complex animal body plan was driven by the amplification and divergence of ancestral homeobox genes, thereby generating a sophisticated regulatory system of functionally interconnected transcriptional regulators5. To meet their disparate biological requirements, plants and animals have evolved unique regulatory mechanisms. This was partly achieved by combining functional domains from pre-existing factors to build new regulators, as exemplified by the MADS-box factors, which play a central role in determining floral and organ identity in plants6. In addition, completely new factors have arisen and we focus here on the potential biological roles of WRKY (pronounced ‘worky’) proteins, a large family of transcriptional regulators that has to date only been found in plants. The abundance of information provided by the Arabidopsis sequencing projects is an ideal basis for comparative analysis of this superfamily within one plant species. Although their precise regulatory functions are largely unknown, the fact that these factors appear to be specific to plants, with probably up to 100 members in Arabidopsis, suggests that they play an important role during plant evolution. Biochemical properties of WRKY proteins The first WRKY cDNAs were cloned from sweet potato (Ipomoea batatas; SPF1), wild oat (Avena fatua; ABF1,2), parsley (Petroselinum crispum; PcWRKY1,2,3) and Arabidopsis (ZAP1), based on the ability of the encoded proteins to bind specifically to the DNA sequence motif (T)(T)TGAC(C/T), which is known as the W box7–10. It has been suggested that the cognate binding site for SPF1 is different from other WRKY proteins. However, the oligonucleotide used to isolate SPF1 does have a W box in the flanking sequence7. The name of the WRKY family is derived from the most prominent feature of these proteins, the WRKY domain, a 60 amino acid region that is highly conserved amongst family members. The emerging picture is that these proteins are regulatory transcription factors with a binding preference for the W box, but with the potential to differentially regulate the expression of a variety of target genes. Consistent with a role as transcription factors, PcWRKY1 and WIZZ (from tobacco) have been shown to be targeted to the nucleus11,12. The WRKY domain and the W box The WRKY domain is defined by the conserved amino acid sequence WRKYGQK at its N-terminal end, together with a novel zinc-finger-like motif 8 (Fig. 1). Because of the clear binding preference of all characterized WRKY proteins for the same DNA motif, it has been assumed that the WRKY domain, as their only conserved structural feature, constitutes a DNA-binding domain. Indeed, it has recently been shown that an isolated WRKY domain has sequence-specific DNA-binding activity12. The divalent metal chelators 1,10-o-phenanthroline and EDTA abolish in vitro DNA binding, which is taken as strong support for a zinc-finger structure within the WRKY domain8,10,11. However, it has not yet been proven that zinc is actually complexed in the WRKY domain. In addition, nothing is known about the function of the WRKYGQK heptapeptide stretch, the hallmark of this superfamily. All known WRKY proteins contain either one or two WRKY domains. They can be classified on the basis of both the number of WRKY domains and the features of their zinc-finger-like motif. WRKY proteins with two WRKY domains belong to group I, whereas most proteins with one WRKY domain belong to group II (Fig. 2). Generally, the WRKY domains of group I and group II members have the same type of finger motif, whose pattern of potential zinc ligands (C–X4–5–C–X22–23–H–X1–H; Fig. 1) is unique among all described zinc-finger-like motifs13. The single finger motif of a small subset of WRKY proteins is distinct from that of group I and II members. Instead of a C2–H2 pattern, their WRKY domains contain a C2–HC motif (C–X7–C–X23–H–X1–C; Fig. 1). Owing to this distinction, they were recently assigned to the newly defined group III. Nevertheless, experimental evidence has shown that members of all three groups bind sequence specifically to various W box elements (R.S. Cormack et al., unpublished). The two WRKY domains of group I members appear to be functionally distinct. As has been shown for SPF1, ZAP1 and PcWRKY1, sequence-specific binding to their target DNA sequences is mediated mainly by the C-terminal WRKY domain7,10,12. The function of the N-terminal WRKY domain remains unclear. Because protein regions outside of the C-terminal WRKY domain contribute to the overall strength of DNA 1360 - 1385/00/$ – see front matter © 2000 Elsevier Science Ltd. All rights reserved. PII: S1360-1385(00)01600-9 May 2000, Vol. 5, No. 5 199 trends in plant science Reviews Group I WRKY1 WRKY2 WRKY3 WRKY4 WRKY20 WRKY25 WRKY26 WRKY32 WRKY33 WRKY34 WRKY44 WRKY45 WRKY58 WRKY10 TLFDIVNDGYRWRKYGQKSVKGSPYPRSYYRCSSPG...CPVKKHVERSSHDTKLLITTYEGKHDHDMP SDVDILDDGYRWRKYGQKVVKGNPNPRSYYKCTAPG...CTVRKHVERASHDLKSVITTYEGKHNHDVP SEVDLLDDGYRWRKYGQKVVKGNPYPRSYYKCTTPD...CGVRKHVERAATDPKAVVTTYEGKHNHDVP SEVDLLDDGYRWRKYGQKVVKGNPYPRSYYKCTTPG...CGVRKHVERAATDPKAVVTTYEGKHNHDLP SEVDILDDGYRWRKYGQKVVRGNPNPRSYYKCTAHG...CPVRKHVERASHDPKAVITTYEGKHDHDVP SDIDVLIDGFRWRKYGQKVVKGNTNPRSYYKCTFQG...CGVKKQVERSAADERAVLTTYEGRHNHDIP SDIDILDDGYRWRKYGQKVVKGNPNPRSYYKCTFTG...CFVRKHVERAFQDPKSVITTYEGKHKHQIP GDVGICGDGYRWRKYGQKMVKGNPHPRNYYRCTSAG...CPVRKHIETAVENTKAVIITYKGVHNHDMP SDIDILDDGYRWRKYGQKVVKGNPNPRSYYKCTTIG...CPVRKHVERASHDMRAVITTYEGKHNHDVP SDIDILDDGYRWRKYGQKVVKGNPNPRSYYKCTANG...CTVTKHVERASDDFKSVLTTYIGKHTHVVP VESDSLEDGFRWRKYGQKVVGGNAYPRSYYRCTSAN...CRARKHVERASDDPRAFITTYEGKHNHHLL SQVDILDDGYRWRKYGQKAVKNNPFPRSYYKCTEEG...CRVKKQVQRQWGDEGVVVTTYQGVHTHAVD SEVDLLDDGYRWRKYGQKVVKGNPHPRSYYKCTTPN...CTVRKHVERASTDAKAVITTYEGKHNHDVP SDEDNPNDGYRWRKYGQKVVKGNPNPRSYFKCTNIE...CRVKKHVERGADNIKLVVTTYDGIHNHPSP Group II (a) WRKY18 WRKY40 WRKY60 DTSLTVKDGFQWRKYGQKVTRDNPSPRAYFRCSFAPS..CPVKKKVQRSAEDPSLLVATYEGTHNHLGP KDGYQWRKYGQKVTRDNPSPRAYFKCACAPS..CSVKKKVQRSVEDQSVLVATYEGEHNHPMP VSSLTVKDGYQWRKYGQKITRDNPSPRAYFRCSFSPS..CLVKKKVQRSAEDPSFLVATYEGTHNHTGP (b) WRKY6 WRKY9 WRKY31 WRKY36 WRKY42 WRKY47 WRKY61 SEAPMISDGCQWRKYGQKMAKGNPCPRAYYRCTMATG..CPVRKQVQRCAEDRSILITTYEGNHNHPLP CETATMNDGCQWRKYGQKTAKGNPCPRAYYRCTVAPG..CPVRKQVQRCLEDMSILITTYEGTHNHPLP SEAAMISDGCQWRKYGQKMAKGNPCPRAYYRCTMAGG..CPVRKQVQRCAEDRSILITTYEGNHNHPLP CEDPSINDGCQWRKYGQKTAKTNPLPRAYYRCSMSSN..CPVRKQVQRCGEETSAFMTTYEGNHDHPLP SEAPMLSDGCQWRKYGQKMAKGNPCPRAYYRCTMAVG..CPVRKQVQRCAEDRTILITTYEGNHNHPLP HKQHEVNDGCQWRKYGQKMAKGNPCPRAYYRCTMAVG..CPVRKQVQRCAEDTTILTTTYEGNHNHPLP NDGCQWRKYGQKIAKGNPCPRAYYRCTIAAS..CPVRKQVQRCSEDMSILISTYEGTHNHPLP (c) WRKY8 WRKY12 WRKY13 WRKY23 WRKY24 WRKY28 WRKY43 WRKY48 WRKY49 WRKY50 WRKY51 WRKY56 WRKY57 WRKY59 TEVDHLEDGYRWRKYGQKAVKNSPYPRSYYRCTTQK...CNVKKRVERSYQDPTVVITTYESQHNHPIP SDVDVLDDGYKWRKYGQKVVKNSLHPRSYYRCTHNN...CRVKKRVERLSEDCRMVITTYEGRHNHIPS SEVDVLDDGYRWRKYGXKVVKNTQHPRSYYRCTQDK...CRVKKRVERLADDPRMVITTYEGRHLHSPS SEVDHLEDGYRWRKYGQKAVKNSPFPRSYYRCTTAS...CNVKKRVERSFRDPSTVVTTYEGQHTHISP SDDDVLDDGYRWRKYGQKSVKHNAHPRSYYRCTYHT...CNVKKQVQRLAKDPNVVVTTYEGVHNHPCE SEVDHLEDGYRWRKYGQKAVKNSPYPRSYYRCTTQK...CNVKKRVERSFQDPTVVITTYEGQHNHPIP SDADILDDGYRWRKYGQKSVKNSLYPRSYYRCTQHM...CNVKKQVQRLSKETSIVETTYEGIHNHPCE KSIDNLDDGYRWRKYGQKAVKNSPYPRSYYRCTTVG...CGVKKRVERSSDDPSIVMTTYEGQHTHPFP NSNGMCDDGYKWRKYGQKSIKNSPNPRSYYKCTNPI...CNAKKQVERSIDESNTYIITYEGFHFHYTY SEVEVLDDGFKWRKYGKKMVKNSPHPRNYYKCSVDG...CPVKKRVERDRDDPSFVITTYEGSHNHSSM DVMDDGFKWRKYGKKSVKNNINKRNYYKCSSEG...CSVKKRVERDGDDAAYVITTYEGVHNHESL SDDDVLDDGYRWRKYGQKSVKNNAHPRSYYRCTYHT...CNVKKQVQRLAKDPNVVVTTYEGVHNHPCE SDVDNLEDGYRWRKYGQKAVKNSPFPRSYYRCTNSR...CTVKKRVERSSDDPSIVITTYEGQHCHQTI DEKVALDDGYKWRKYGKKPITGSPFPRHYHKCSSPD...CNVKKKIERDTNNPDYILTTYEGRHNHPSP (d) WRKY7 WRKY11 WRKY15 WRKY17 WRKY21 WRKY39 KMADIPSDEFSWRKYGQKPIKGSPHPRGYYKCSSVRG..CPARKHVERALDDAMMLIVTYEGDHNHALV KIADIPPDEYSWRKYGQKPIKGSPHPRGYYKCSTFRG..CPARKHVERALDDPAMLIVTYEGEHRHNQS KMSDVPPDDYSWRKYGQKPIKGSPHPRGYYKCSSVRG..CPARKHVERAADDSSMLIVTYEGDHNHSLS KIADIPPDEYSWRKYGQKPIKGSPHPRGYYKCSTFRG..CPARKHVERALDDSTMLIVTYEGEHRHHQS KVADIPPDDYSWRKYGQKPIKGSPYPRGYYKCSSMRG..CPARKHVERCLEDPAMLIVTYEAEHNHPKL KIADIPPDEYSWRKYGQKPIKGSPHPRGYYKCSSVRG..CPARKHVERCIDETSMLIVTYEGEHNHSRI (e) WRKY14 WRKY16 WRKY22 WRKY27 WRKY29 WRKY35 SGEVVPSDLWAWRKYGQKPIKGSPFPRGYYRCSSSKG..CSARKQVERSRTDPNMLVITYTSEHNHPWP DRGSRSSDLWVWRKYGQKPIKSSPYPRSYYRCASSKG..CFARKQVERSRTDPNVSVITYISEHNHPFP AAEALNSDVWAWRKYGQKPIKGSPYPRGYYRCSTSKG..CLARKQVERNRSDPKMFIVTYTAEHNHPAP TQENLSSDLWAWRKYGQKPIKGSPYPRNYYRCSSSKG..CLARKQVERSNLDPNIFIVTYTGEHTHPRP KEENLLSDAWAWRKYGQKPIKGSPYPRSYYRCSSSKG..CLARKQVERNPQNPEKFTITYTNEHNHELP SGEVVPSDLWAWRKYGQKPIKGSPYPRGYYRCSSSKG..CSARKQVERSRTDPNMLVITYTSEHNHPWP Group III WRKY30 WRKY41 WRKY46 WRKY53 WRKY54 WRKY55 GVDRTLDDGFSWRKYGQKDILGAKFPRGYYRCTYRKSQGCEATKQVQRSDENQMLLEISYRGIHSCSQA GLEGPHDDIFSWRKYGQKDILGAKFPRSYYRCTFRNTQYCWATKQVQRSDGDPTIFEVTYRGTHTCSQG QENGSIDDGHCWRKYGQKEIHGSKNPRAYYRCTHRFTQDCLAVKQVQKSDTDPSLFEVKYLGNHTCNNI GLEGPQDDVFSWRKYGQKDILGAKFPRSYYRCTHRSTQNCWATKQVQRSDGDATVFEVTYRGTHTCSQA VEAKSSEDRYAWRKYGQKEILNTTFPRSYFRCTHKPTQGCKATKQVQKQDQDSEMFQITYIGYHTCTAN NTDLPPDDNHTWRKYGQKEILGSRFPRAYYRCTHQKLYNCPAKKQVQRLNDDPFTFRVTYRGSHTCYNS WRKY38 WRKY52 SPDPIYYDGYLWRKYGQKSIKKSNHQRSYYRCSYNKDHNCEARKHEQKIKDNPPVYRTTYFGHHTCKTE IPAIDEGDLWTWRKYGQKDILGSRFPRGYYRCAYKFTHGCKATKQVQRSETDSNMLAITYLSEHNHPRP Trends in Plant Science 200 May 2000, Vol. 5, No. 5 trends in plant science Reviews Fig. 1. Left Comparison of WRKY domain sequences from AtWRKY proteins. Sequences encoding the peptide stretch WRKYGQK were found by the BLAST programs tblastn and blastp programs37 in genomic and EST databases. Gaps (dots) have been inserted for optimal alignment. Residues that are highly conserved within each of the major groups are in red and potential zinc ligands are highlighted in black boxes. For each (sub)group, the position of a conserved intron is indicated by an arrowhead. Group I WRKY Pc WRKY1 WRKY WRKY Ib SPF1 WRKY WRKY At ZAP1 WRKY WRKY Nt WRKY1 WRKY WRKY Nt WRKY2 WRKY Cs SE71 WRKY WRKY binding, the N-terminal domain might parGroup II ticipate in the binding process, increasing the affinity or specificity of these proteins WRKY Pc WRKY3 for their target sites. Alternatively, it might provide an interface for protein–protein Af ABF2 WRKY interactions, a known function of some zinc-finger-like domains14; this could allow WRKY Pc WRKY4 more efficient DNA binding through interactions with other DNA-associated proWRKY Nt WIZZ teins. Not unexpectedly, the single WRKY domains of group II and III family members are more similar in sequence to the CGroup III terminal than to the N-terminal WRKY domain of group I proteins, suggesting that WRKY Pc WRKY5 the C-terminal and single WRKY domains are functionally equivalent and constitute WRKY Nt WRKY4 the major DNA-binding domain. The conservation of the WRKY domain WRKY Nt WRKY5 is mirrored by a remarkable conservation of Trends in Plant Science the cognate cis-acting W box elements. These (T)(T)TGAC(C/T) sequence elFig. 2. Schematic representation of published full length WRKY proteins from parsley (Pc), ements contain the invariant TGAC core, sweet potato (Ib), Arabidopsis (At), tobacco (Nt), cucumber (Cs) and wild oat (Af ). They are which is essential for function and WRKY divided into three groups based on the number and type of the WRKY domains they contain. binding. They mediate transcriptional WRKY domains are black, putative basic nuclear localization signals are blue and leucine 9,15 responses to pathogen-derived elicitors zippers are pink. Serine–threonine-rich regions are yellow, glutamine-rich regions are and are present in the promoters of many purple, proline-rich regions are green and acidic regions are red. plant genes that are associated with defense16. Functional W boxes frequently cluster within short promoter stretches15–17 and can act together synergistically12. WRKY–W box interactions dimensional shapes of such complexes. Indeed, WRKY proteins have been demonstrated by numerous binding experiments, both might be part of multimeric protein–DNA complexes. Both in vitro and in vivo8–10,12,18,19 (R.S. Cormack et al., unpublished), WRKY protein-containing nuclear extracts and purified recombiand random binding-site selection assays have shown that the opti- nant WRKYs from tobacco lose their DNA-binding activity when mal binding site for ZAP1 contains the W box motif 10. Interactions treated with the protein-dissociating agent deoxycholate18. of WRKY proteins with W boxes can be regulated post-trans- Furthermore, some WRKY proteins contain potential leucine ziplationally, because binding of WRKY-like DNA-binding activities pers (LZs), structures known to allow protein dimerization. They to W boxes in tobacco is abolished by treatment with alkaline appear to be functional in PcWRKY4 and 5, because their dephosphatase18 and the protein-kinase inhibitor staurosporin20. letion greatly reduces reporter-gene expression mediated by these In spite of the stereotypic binding preferences of WRKY pro- proteins in yeast (R.S. Cormack et al., unpublished). teins for W boxes, their affinities for certain types or arrangements of this element can vary (R.S. Cormack et al., unpublished). Transcriptional regulation Sequences flanking the invariant W box TGAC core might be ZAP1 and PcWRKY1, 4 and 5 can activate transcription in partly responsible for the observed specificity. In addition, the yeast10,12 (R.S. Cormack et al., unpublished), a feature that has been cooperative assembly of discrete higher-order WRKY–DNA confirmed for ZAP1 and PcWRKY1 in plant cells10,12. Although it complexes at defined W box arrangements might also account for has yet to be studied in detail, the primary structures of WRKY prospecific promoter recognition12. Owing to the high variability in teins have an abundance of potential transcriptional activation or overall protein structure, access to certain promoters would be repression domains (Fig. 2). A common feature of many domains restricted to distinct family members that fit into the three- affecting transcription is the predominance of certain amino acids, May 2000, Vol. 5, No. 5 201 trends in plant science Reviews Table 1. Identified members of the Arabidopsis WRKY superfamily of transcription factors AtWRKY Group Chr. 1 I 2 2 I 5 3 4 I I 5 2 1 5 6 II 1 7 II 4 8 9 II 10 11 II II X92976/ZAP1 AI995838 T44598 AA395490 N37131 T45479 AI099874 AI993164 AC007211 AC006955 AB026656 F1013.1 F2818 MXK23 T22085 W43265 AA585810 H77044 H77050 H77051 AI995170 AA605512 U75592 AA650675 H77127 AA394951 AA650826 AI992388 N37775 T20578 R30038 AI992658 5 1 4 R64846 T88086 R30283 AI998936 T22071 T42669 Z29806 Z29805 AC006284 B77849 AL080571 AL096246 AC007576 AB011485 T4M8.23 T29F22-end F1G15-end T16K23-end F7A19.5 AtWRKY Group Chr. 19 II II 17 4 1 2 5 2 II 4 a D88748 T20672 Z25667 T04430 T43675 T21472 H36048 AI993841 AA042185 AA395309 AA712348 R90490 AA067545 AI100579 U74179 20 21 I II 4 2 22 23 II II 4 2 24 25 I I 5 2 26 I 5 27 28 29 30 31 32 33 II II III II I I 34 35 36 I II II 5 4 4 5 4 4 2 2 4 2 1 37 38 39 40 41 ? II II III 5 3 1 4 42 43 44 45 II 4 2 2 1/3? 46 47 48 49 50 51 52 53 III II F19K23.22 AC005861 AL078637 F23B24 T22A6.70 AB010698 MPL12.9 AQ011596 F24A12TRC-end B98122 F24A12TRB-end AC002328 F20N2 AL080283 F3L17.120 AC003672 F16B22 AL078620 AC007060 F23K16.40 T5I8.10 AC002391 T20D16.5 AL031004 AL049607 AA586133 T20410 AI992739 T04811 F14417 F14438 T42934 MXH1.P3 AC000375 AB010693 B27842 AC006954 ESTa 4 I F14100 II 18 BAC.ORF 2 14 16 Genea AI998645 12 13 15 ESTa K21C13.P1 T19B7-end F25P17.13 F28M20.10 F11C18 54 55 56 57 58 59 60 61 III? III 2 4 5 5 5 5 5 4 III III 2 2 1 I II II II 3 2 2 1 AA585811 T22092 AI995443 Genea BAC.ORF AL049638 AL091613 AL078465 U93215 AQ010529 F16J13.90 T8E18-end T15N24.90 T06B20.6 F24C8-end AF007269 AC002337 IG002N01.6 T08I13.10 AB005233 AC002338 AC004165 AL093076 B09174 AB010697 MBK23 T9D9.6 T27E13.1 T10P21-end T30A11-end M0J9.24 AB009055 AL021713 AL035394 AB010696 AL022140 AL022198 AC004683 AC005499 AL022223 AC004238 AC010675 AC010852 B23309 AB012244 AC011437 AC011713 AF080120 AL049876 AF076243 AC005397 AC005896 ATU63815 AC010797 AC011664 AC011624 AC006526 AF104919 AB023033 AB017070 AC005965 AB019236 AB020744 AL078468 AL035394 AC007660 AC007660 AC007764 AL080748 AC008261 AC007019 AC006585 AC011809 MXC20.3 T9A21.10 F9D19.20 MLE8.2 F1N20.170 F6I18.160 T19C21.4 T6A23.33 M3E9.130 F19I3.6 T17F3.16 T12P18 F28C5-end MQJ16.9 F7O18.30 F23A5 F2P3.16 T22B4.50 T26N6.6 T3F17.22 F3G5.5 AT.I.24-4 IGF-F28J7 F5A18 T18B3 F11C10.9 T15B16.12 K6M13 MNL12 T19G15 MXK3 K9E15 T32A16 F9D16.280 T7D17.7 T7D17.8 F22C12 F1L14-end T4P13 F7D8.22 F27C12.8 F6A14 GenBank Accession no. Abbreviations: Chr., chromosome; ORF, open reading frame; question marks denote either inconclusive group assignment or inconclusive chromosomal position. 202 May 2000, Vol. 5, No. 5 trends in plant science Reviews including alanine, glutamine, proline, serine, threonine and charged amino acids21,22. At least two of the seven potential ‘transregulatory’ domains in PcWRKY1 activate transcription in yeast12. The possibility that WRKY proteins possess both activator and repressor functions, as shown for the maize VP1 (Ref. 23), remains to be tested. Complexity of the WRKY family in Arabidopsis The large amount of genomic and cDNA sequences available from Arabidopsis yields insights into the complexity of the WRKY family in a single plant species. In total, 61 distinct ORFs potentially encoding WRKY proteins can be found in the databases to date (Table 1). With the exception of AtWRKY1, which is identical to ZAP1 (Ref. 10), and AtWRKY44, which is defined by the ttg2 mutant (C.S. Johnson and D.R. Smyth, pers. commun.), none of these proteins has been described before. We encourage the use of the designations used in Table 1 in future studies to avoid the confusion often caused when multiple names are assigned to a given gene member within large families. The AtWRKY genes are randomly distributed over the five chromosomes and preliminary analyses suggest that they might all be present as single copies (I. Somssich and S. Robatzek, unpublished). Many of these putative WRKY proteins are represented by ESTs showing that the corresponding genes are expressed. By the number and sequence of their WRKY domains, these proteins can be assigned to the three major groups. Given that about two-thirds of the Arabidopsis genome has been sequenced to date, the total number of WRKY genes in this species might be as high as 100. A phylogenetic tree of the AtWRKY proteins based on their WRKY domains (Fig. 3) clearly indicates that group II splits up into five distinct subgroups (IIa–e). The resulting refined classification is further substantiated by the presence of ten additional structural motifs that are conserved among subsets of AtWRKY family members. Each of these motifs occurs only in certain subgroups and each subgroup seems to be best defined by combinations of such motifs. In some cases, the sequences of these motifs can reveal clues about their potential functions. In addition to peptide sequences that might serve as nuclear localization signals24, a heptad repeat of bulky hydrophobic residues characteristic for LZs (Ref. 25) is present in some of the proteins. The heptad repeat occurs exclusively in members of subgroups IIa and IIb. Recent experiments have shown that the LZ region of AtWRKY6 mediates dimerization (S. Robatzek and I.E. Somssich, unpublished). An additional common feature that is found in the WRKY genes is the existence of an intron within the region encoding the Cterminal WRKY domain of group I members or the single WRKY domain of group II and III members. This intron position is highly conserved, being localized after the codon encoding arginine that is N terminal to the zinc-finger-like motif (Fig. 1). Strikingly, in all the genes encoding subgroup IIa and IIb members, the position of this intron is exactly 16 codons further towards the C terminus. In spite of the phylogenetic distance of their WRKY domains, members of all three groups have been shown to recognize W box elements, indicating that this is a general feature of the entire superfamily. A few AtWRKY proteins do not fit neatly into any one (sub)group. For example, AtWRKY10, which carries only one WRKY domain, appears to be more related to group I (Fig. 3). This might be explained by the secondary loss of the N-terminal WRKY domain. Furthermore, based on the pattern of cysteine and histidine residues within their WRKY domains (Fig. 1), AtWRKY38 and AtWRKY52 could either belong to group III or represent members of a novel group (Fig. 3). Biological roles of WRKY factors One of the most challenging questions concerns the regulatory processes governed by WRKY proteins. Clues might come partly from gene expression studies. Because many WRKY genes are themselves transcriptionally regulated, their distinct expression patterns might yield hints as to the regulatory functions of the encoded factors under particular biological conditions. In addition, a full understanding of the biological roles of these factors will require the identification of the target genes whose expression they affect. Expression behavior of WRKY genes Current data point to many WRKY proteins having a regulatory function in the response to pathogen infection and other stresses. Effective plant defense against pathogenic microorganisms is associated with the concerted activation of a large variety of genes, occurring in several temporally distinct waves26. Increased levels of WRKY mRNA, protein and DNA-binding activity have been reported to be induced by infection with viruses19, bacteria (A. Dellagi and P. Birch, pers. commun.) or oomycetes12, by fungal elicitors9,20 (R.S. Cormack et al., unpublished), and by signaling substances such as salicylic acid18. In addition, WRKY gene expression has been shown to be upregulated in response to wounding11 (S. Robatzek and I.E. Somssich, unpublished) and upon local mechanical stimulation of plant protoplasts27. Induced WRKY mRNA accumulation is often extremely rapid and transient, and seems not to require de novo synthesis of regulatory factors9,11 (R.S. Cormack et al., unpublished). This immediate–early expression behaviour indicates a role for the WRKY proteins in regulating subsequently activated secondary-response genes, whose products carry out the protective and defensive reactions. Comparative expression studies with several AtWRKY genes also suggest that certain family members have a role in the regulation of senescence (S. Robatzek and I.E. Somssich, unpublished). Transcript levels of AtWRKY4, 6, 7 and 11 are enhanced in senescent leaves. In transgenic Arabidopsis plants, an AtWRKY6 promoter–GUS reporter gene is strongly activated in senescent leaves as well as in response to infection by pathogenic bacteria. As several genes are known to be highly expressed during both leaf senescence and defense, we might expect the existence of common regulatory mechanisms between these two physiological processes28. Inspection of plant databases has revealed the existence of more than 500 WRKY ESTs identified from various tissue sources, including roots, leaves, inflorescences, abscission zones, seeds and vascular tissue, as well as from drought- or salt-stressed, or pathogen-infected tissue. Thus, WRKY genes appear to be expressed in numerous cell types and under different physiological conditions and could therefore participate in the control of a wide variety of biological processes. Targets of WRKY regulation As suggested by the general binding preference of WRKY proteins for W boxes, genes containing these promoter elements are likely targets of WRKY factors, and these include the WRKY genes themselves12 as well as a large variety of defense-related genes of the PR type16,18. Additionally, gibberellic acid-induced expression of the wild-oat a-Amy2/54 gene8 and activation of the barley HvLox1 gene in response to the defense and wound signaling molecule jasmonic acid29 also appear to involve WRKY–W box interactions. Furthermore, a role has been suggested for SPF1 in the sucrose- or polygalacturonic-acid-induced expression of genes coding for sporamin and b-amylase in sweet potato7. However, as mentioned earlier, uncertainties about the May 2000, Vol. 5, No. 5 203 trends in plant science Reviews WRKY38 * 89 WRKY30 * WRKY54 WRKY41 55 69 WRKY55 WRKY WRKY46 WRKY53 68 WRKY C WRKY52 * WRKY C III WRKY14 * 51 WRKY35 * WRKY29 * WRKY16 77 E C 1 WRKY 1 WRKY B WRKY27 * 83 WRKY22 * 61 WRKY7 WRKY21 * WRKY11 * WRKY39 93 WRKY15 WRKY17 77 IIe 71 100 C WRKY8 WRKY12 WRKY13 WRKY23 WRKY24 WRKY28 * WRKY43 * WRKY48 WRKY49 WRKY50 WRKY51 WRKY56 WRKY57 WRKY59 WRKY1 * WRKY32 * WRKY33 * WRKY34 WRKY44 * WRKY45 WRKY2* WRKY3 WRKY4* HARF IId 3 A WRKY WRKY 2 A WRKY WRKY 2 A WRKY IIc D WRKY10 * WRKY58 WRKY10 WRKY20 WRKY25 * WRKY26 D 2 WRKY6 * WRKY47 WRKY9 WRKY61 WRKY31* WRKY36 IIb WRKY42 100 WRKY18 * WRKY40 WRKY60 * WRKY I IIa 2 LZ A WRKY WRKY LZ A [K/R]EPRVAV[Q/K]T[K/V]SEVD[I/V]L E EGDLxAVVG 1 KKRKx[K/R]xK[R/K]TV[R/I][V/K]PA B NALAGSTR LZ LREELxRVNxENKKLKEMLx2Vx6L 2 EEPExKRRKxE C VSSFK[K/R]VISLL RTGHARFRR[A/G]P 3 KAKKxxQK D LSPSNLLESPxL HARF Trends in Plant Science Fig. 3. Phylogenetic analysis of 58 members of the AtWRKY family. Amino acid sequences from the single WRKY domain of group II and III members or the C-terminal WRKY domain of group I members were aligned using PileUp (Wisconsin Package Version 10.0, Genetics Computer Group, Madison, WI, USA). The diagram shows the most parsimonious tree constructed using PAUP 3.1.1 (Smithsonian Institution, Washington, DC, USA) to perform a heuristic search with a pre-aligned reduced data set including only representatives of each AtWRKY(sub)group (indicated by asterisks). Based on the results of additional PAUP 3.1.1 runs with extended data sets, further members of each (sub)group were added to the figure. The tree shown is unrooted and has a consistency index of 0.808. The numbers above the branches are bootstrap values from 100 replicates. AtWRKYs that consistently clustered together are grouped in blue boxes. Members of subgroups IIa and IIe, are not lined up at the branches of separate subtrees but nevertheless share significant similarities in their WRKY domains. Higher-order branches on the right-hand side of the tree representing relationships within each (sub)group were not highly reproducible and were therefore eliminated. White extensions of branches within the blue boxes indicate that the branch leads only to one distinct AtWRKY. Family members that cannot be unequivocally assigned to a defined (sub)group are highlighted by gray boxes. Conserved primary structural features of the AtWRKY family outside the WRKY domains were identified using MEME (Ref. 38; http://www.sdsc.edu/MEME) and are shown below the tree. Schematic representations of typical members of each (sub)group are shown on the right: WRKY domains are indicated by black boxes; motifs 1, 2 and 3 are basic stretches that might be nuclear localization sequences; additional basic motifs not detected by MEME are shown by blue boxes without numbers; LZ indicates potential leucine zipper structures that were also predicted by the COILSCAN and COIL (Wisconsin Package Version 10.0)39 programs. 204 May 2000, Vol. 5, No. 5 trends in plant science Reviews exact binding site of SPF1 means that more work is required to establish its role in vivo. To date, W boxes have been described as positive cis-acting elements upregulating transcription. However, in the case of the Arabidopsis PR1 gene, the basal and salicylic acid-induced expression levels might be negatively regulated by W boxes17. SNI1, a negative regulator of PR gene expression, was recently identified in a genetic screening for second-site suppressors of the Arabidopsis mutation npr1 (Refs 30,31). Interestingly, SNI1, which is nuclear localized, contains no obvious DNA-binding domain. One possible mode of SNI1 action would involve interaction with WRKY factors bound to the W box31. The involvement of WRKY factors in regulating part of the defense program is further substantiated by a large-scale expression profiling study (J. Dangl and R.A. Dietrich, pers. commun.). Using a DNA microarray with 10 000 Arabidopsis ESTs, a group of 25 genes, including PR1, was identified whose expression responded coordinately to various pathogens as well as to other defense-inducing conditions. Within the first kilobase of their promoters, these genes shared only the W box motifs (TTGAC), with on average four copies, which were often clustered. By contrast, the promoters of a control set of genes not coordinately regulated with PR1 contained, on average, less than two W boxes. The only WRKY mutant so far described is transparent testa glabra 2 (ttg2), which is based on a transposon insertion within AtWRKY44/TTG2 (C.S. Johnson and D.R. Smyth, pers. commun.). In ttg2, the number of trichomes and their branching is reduced, as is anthocyanin pigmentation of the seed coat, together with a loss of mucilage. This pleiotropic phenotype resembles that of ttg1, which is defective for a regulatory protein of the WD40repeat type32. AtWRKY44/TTG2 and TTG1 might therefore act in the same regulatory cascade, controlling a common set of genes. The extensive use of reverse genetics to obtain additional tagged WRKY mutants, as well as the generation of WRKY promoter–reporter gene and WRKY overexpressor lines, will allow us to gain a more comprehensive understanding of the various biological roles of WRKY proteins. Furthermore, inducible expression systems33 could be used for controlled temporal overexpression of WRKY transgenes in their own loss-of-function background; combined with methods of large-scale gene expression profiling (e.g. differential display, DNA chips), this should facilitate the identification of defined WRKY target genes. In a similar way, the Arabidopsis NAP gene was identified as a target of the APETALA3–PISTILLATA transcription factor dimer34. Conclusions WRKY proteins have only recently been identified as a new family of transcription factors. In Arabidopsis, this family appears to be nearly as complex as the well-known MYB family35, but it is restricted to the plant kingdom. This suggests that WRKY genes originated concurrently with the major plant phyla. Current information suggests that WRKY factors play a key role in regulating the pathogen-induced defense program. The exposure of plants to a wide variety of biotic or abiotic stresses connected with their sessile, autotrophic lifestyle could be one major factor in the enormous expansion of the WRKY family during evolution. In addition, the extensive metabolic changes associated with the establishment of defense responses26 or senescence36 might require an elaborate regulatory system. WRKY proteins also seem to be involved in other plant-specific processes, such as trichome development and the biosynthesis of secondary metabolites. Thus, they appear to participate in controlling the expression of a plethora of genes. As with other large gene families, the problem of functional redundancy will complicate genetic attempts to determine the role of individual WRKY proteins. Comparative studies in lower plants (e.g. ferns, mosses and algae) can give clues to whether WRKY gene diversification correlates with increasing developmental and metabolic pathway complexity. Furthermore, generating Arabidopsis knock-out lines that affect several members of individual subgroups might help to ‘wrky’ matters out. Acknowledgements We thank Hiroshi Sano (NAIST, Japan); David R. Smyth (Monash University, Australia); Zhixiang Chen (University of Idaho, USA); Jeff Dangl (University of North Carolina, USA); Robert Dietrich (Novartis, Research Triangle, USA); Alia Dellagi and Paul Birch (Scottish Crop Research Institute, UK), for providing preprints of unpublished data; and Klaus Hahlbrock for critical reading of the manuscript and continuous support. References 1 Bevan, M. et al. (1998) Analysis of 1.9 Mb of contiguous sequence from chromosome 4 of Arabidopsis thaliana. Nature 391, 485–488 2 Mewes, H.W. et al. (1997) Overview of the yeast genome. Nature 387, 7–8 3 Clarke, N.D. and Berg, J.M. (1998) Zinc fingers in Caenorhabditis elegans: finding families and probing pathways. Science 282, 2018–2022 4 Laudet, V. et al. (1992) Evolution of the nuclear receptor gene superfamily. EMBO J. 11, 1003–1013 5 Gellon, G. and McGinnis, W. (1998) Shaping animal body plans in developmental and evolution by modulation of Hox expression patterns. BioEssays 20, 116–125 6 Riechmann, J.L. and Meyerowitz, E.M. (1997) MADS domain proteins in plant development. Biol. Chem. 378, 1079–1101 7 Ishiguro, S. and Nakamura, K. (1994) Characterization of a cDNA encoding a novel DNA-binding protein, SPF1, that recognizes SP8 sequences in the 59 upstream regions of genes coding for sporamin and b-amylase from sweet potato. Mol. Gen. Genet. 244, 563–571 8 Rushton, P.J. et al. (1995) Members of a new family of DNA-binding proteins bind to a conserved cis-element in the promoters of a-Amy2 genes. Plant Mol. Biol. 29, 691–702 9 Rushton, P.J. et al. (1996) Interaction of elicitor-induced DNA binding proteins with elicitor response elements in the promoters of parsley PR1 genes. EMBO J. 15, 5690–5700 10 de Pater, S. et al. (1996) Characterization of a zinc-dependent transcriptional activator from Arabidopsis. Nucleic Acids Res. 24, 4624–4631 11 Hara, K. et al. (2000) Rapid systemic accumulation of transcripts encoding a tobacco WRKY transcription factor upon wounding. Mol. Gen. Genet. 263, 30–37 12 Eulgem, T. et al. (1999) Early nuclear events in plant defense: rapid gene activation by WRKY transcription factors. EMBO J. 18, 4689–4699 13 Berg, J.M. and Shi, Y. (1996) The galvanization of biology: a growing appreciation for the roles of zinc. Science 271, 1081–1085 14 Mackay, J.P. and Crossley, M. (1998) Zinc fingers are sticking together. Trends Biochem. Sci. 23, 1–4 15 Fukuda, Y. and Shinshi, H. (1994) Characterization of a novel cis-acting element that is responsive to a fungal elicitor in the promoter of a tobacco class I chitinase gene. Plant Mol. Biol. 24, 485–493 16 Rushton, P.J. and Somssich, I.E. (1998) Transcriptional control of plant genes responsive to pathogens. Curr. Opin. Plant Biol. 1, 311–315 17 Lebel, E. et al. (1998) Functional analysis of the regulatory sequences controlling PR-1 gene expression in Arabidopsis. Plant J. 16, 223–233 18 Yang, P. et al. (1999) A pathogen- and salicylic acid-induced WRKY DNA-binding activity recognizes the elicitor response element of tobacco class I chitinase gene promoter. Plant J. 18, 141–149 19 Wang, Z. et al. (1998) An oligo selection procedure for identification of sequence-specific DNA-binding activities associated with plant defense. Plant J. 16, 515–522 May 2000, Vol. 5, No. 5 205 trends in plant science Reviews 20 Fukuda, Y. (1997) Interaction of tobacco nuclear proteins with an elicitorresponsive element in the promoter of a basic class I chitinase gene. Plant Mol. Biol. 34, 81–87 21 Triezenberg, S.J. (1995) Structure and function of transcriptional activation domains. Curr. Opin. Genet. Dev. 5, 190–196 22 Hanna-Rose, W. and Hansen, U. (1996) Active repression mechanisms of eukaryotic transcription repressors. Trends Genet. 12, 229–234 23 Hoecker, U. et al. (1995) Integrated control of seed maturation and germination programs by activator and repressor functions of Viviparous-1 of maize. Genes Dev. 9, 2459–2469 24 Garcia-Bustos, J. et al. (1991) Nuclear protein localization. Biochim. Biophys. Acta 1071, 83–101 25 Landschulz, W.H. et al. (1988) The leucine zipper: a hypothetical structure common to a new class of DNA-binding proteins. Science 240, 1759–1764 26 Somssich, I.E. and Hahlbrock, K. (1998) Pathogen defense in plants – a paradigm of biological complexity. Trends Plant Sci. 3, 86–90 27 Gus-Mayer, S. et al. (1998) Local mechanical stimulation induces components of the pathogen defense response in parsley. Proc. Natl. Acad. Sci. U. S. A. 95, 8398–8403 28 Quirino, B.F. et al. (1999) Diverse range of gene activity during Arabidopsis thaliana leaf senescence includes pathogen-independent induction of defenserelated genes. Plant Mol. Biol. 40, 267–278 29 Rouster, J. et al. (1997) Identification of a methyl-jasmonate-responsive region in the promoter of a lipoxygenase-1 gene expressed in barley grain. Plant J. 11, 513–523 30 Cao, H. et al. (1997) The Arabidopsis NPR1 gene that controls systemic acquired resistance encodes a novel protein containing ankyrin repeats. Cell 88, 57–63 31 Li, X. et al. (1999) Identification and cloning of a negative regulator of systemic acquired resistance, SNI1, through a screen for suppressors of npr1-1. Cell 98, 329–339 32 Walker, A.R. et al. (1999) The TRANSPARENT TESTA GLABRA1 locus, which regulates trichome differentiation and anthocyanin biosynthesis in Arabidopsis, encodes a WD40 repeat protein. Plant Cell 11, 1337–1349 33 Gatz, C. and Lenk, I. (1998) Promoters that respond to chemical inducers. Trends Plant Sci. 3, 352–358 34 Sablowski, R.W.M. and Meyerowitz, E.M. (1998) A homolog of NO APICAL MERISTEM is an immediate target of the floral homeotic genes APETALA3/PISTILLATA. Cell 92, 93–103 35 Martin, C. and Paz-Ares, J. (1997) MYB transcription factors in plants. Trends Genet. 13, 67–73 36 Gan, S. and Amasino, R.M. (1997) Making sense of senescence. Plant Physiol. 113, 313–319 37 Altschul, S.F. et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 38 Bailey, T.L. and Elkan, C. (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology (Altmann, R., ed.), pp. 28–36, AAAI Press 39 Lupas, A. (1996) Coiled coils: new structures and new functions. Trends Biochem. Sci. 21, 375–382 Thomas Eulgem, Paul Rushton, Silke Robatzek and Imre Somssich* are at the Max-Planck-Institut für Züchtungsforschung, Abteilung Biochemie, Carl-von-Linné-Weg 10, D-50829 Köln, Germany; Thomas Eulgem is currently at the Dept of Biology, 108 Coker Hall CB#3280, University of North Carolina, Chapel Hill, NC 27599-3280, USA. *Author for correspondence (tel 149 221 5062310; fax 149 221 5062313; e-mail [email protected]). Plant one-carbon metabolism and its engineering Andrew D. Hanson, Douglas A. Gage and Yair Shachar-Hill The metabolism of one-carbon (C1) units is vital to plants. It involves unique enzymes and takes place in four subcellular compartments. Plant C1 biochemistry has remained relatively unexplored, partly because of the low abundance or the lability of many of its enzymes and intermediates. Fortunately, DNA sequence databases now make it easier to characterize known C1 enzymes and to discover new ones, to identify pathways that might carry high C1 fluxes, and to use engineering to redirect C1 fluxes and to understand their control better. O ne-carbon (C1) metabolism is essential to all organisms. In plants, it supplies the C1 units needed to synthesize proteins, nucleic acids, pantothenate and many methylated molecules1. Fluxes through C1 pathways are particularly high in plants that are rich in methylated compounds such as lignin, alkaloids and betaines because methyl moieties make up several percent of their dry weight2. Transfers of C1 units are also central to the massive photorespiratory fluxes that occur in all C3 plants3. In spite of the fundamental significance of these roles, and the interest in the metabolic engineering of lignin2, betaines4 and photorespiration3, there is much that is not understood about the enzymes, pathways and regulatory mechanisms of plant C1 metabolism. In part this is because of the obstacles that C1 metabolism presents for classical biochemistry and genetics: its enzymes can be of low abundance and/or exist as 206 May 2000, Vol. 5, No. 5 several isoforms, mutants are lacking, and its key intermediates – C1 substituted folates – are labile and hard to quantify. Fortunately, classical approaches to C1 metabolism can now be complemented by genomics-driven approaches that exploit the fast-growing DNA sequence databases. Accordingly, this review has three aims: • To illustrate how genomics-based approaches are advancing our knowledge of plant C1 biochemistry. • To bring together biochemical and genomics-derived data to show which C1 pathways might operate in plants, and where they operate in the cell. • To examine progress towards engineering C1 metabolism. Nucleotide sequence information – from genomes, cDNAs and ESTs – can be used to complement biochemical approaches in several ways. Because most enzymes of C1 metabolism are highly 1360 - 1385/00/$ – see front matter © 2000 Elsevier Science Ltd. All rights reserved. PII: S1360-1385(00)01599-5