Structure and Dynamics of Green Fluorescent Protein George N Phillips JR
Many marine organisms are luminescent. The proteins that Spectral and physical properties of G F P
produce the light include a primary light producer (aequorin or T h e wild-type fluorescence absorbance/excitation peak
luciferase) and often a secondary photoprotein that red shifts is at 395nm, with a minor peak at 4 7 5 n m (extinction
the light for better penetration in the ocean. Green fluorescent coefficients o f - 30 000 and - 7,000 M -1 cm -1, respectively)
protein is one such secondary protein. It is remarkable in that [10]. T h e normal emission peak is at 508 nm. Interestingly,
it autocatalyzes the formation of its own fluorophore and thus continued excitation leads to a decrease over time of the
can be expressed in a variety of organisms in its fluorescent 3 9 5 n m excitation peak and a reciprocal increase in the
form. The recent determination of its 3D structure and 4 7 5 n m excitation peak [11]. This interconvcrsion effect
other physical characterizations are revealing its molecular is especially evident with irradiation of G F P with UV
mechanism of action. light. Recently, femtosecond time-resolved spectroscopic
studies have revealed that the two states corresponding to
the two major absorption bands can interconvert quickly
Addresses in the excited state, and the presence of an isotope
Department of Biochemistry and Cell Biology, Rice University,
Houston, TX ?7005-1899, USA; e-mail: [email protected] effect implicates proton m o v e m e n t in the interconversion
Current Opinionin StructuralBiology 1997, 7:821-82"7
© Current Biology Ltd ISSN 0959-440X fluorescence spectrum and lifetime to that in aqueous
solution [14], and fluorescence is not an inherent property
Abbreviation of the isolated fluorophore, the elucidation of its 3D
GFP green fluorescent protein
structure helps provide an explanation for the generation
of fluorescence in the mature protein, as well as the
mechanism of autocatalytic fluorophore formation. Fur-
Introduction thermore, the development of fluorescent proteins with
Bioluminescence was written about as early as the first varied emission and excitation or other characteristics
century AD by Pliny the Elder, who noticed the bright would dramatically expand their biological applications as
light eminating from some jellyfish [1]. There appear to intracellular reporters.
be many independent evolutionary origins of luminous
species [2], some of which have primary light-generating T h e ~-can structure of G F P
reactions followed by color-shifting secondary proteins. T h e structure of the wild-type protein was originally
Green fluorescent protein (GFP) is a spontaneously solved by Yang et al. [15 °°] and that of the Ser65--+Thr
fluorescent protein isolated from marine organisms, such mutant was solved by Ormo et al. [16"]. T h e monomer
as certain jellyfish and sea pansies. It converts the blue structure consists of a quite regular 13 barrel with 11
chemiluminescence of the primary proteins, aequorin or strands on the outside of a cylinder (Figure 1). T h e
luciferase, into green fluorescent light [3,4], presumably cylinder has a diameter o f - 3 0 ~ and a length of-40/i~.
to reduce scattering and hence improve penetration of the Inside the cylinder sits the fluorescent center of the
light over longer distances. molecule - - a modified tyrosine sidechain and cyclized
protein b a c k b o n e - - a s a part of an irregular {x-helical
T h e molecular cloning of G F P c D N A from the Pacific segment. Small sections of c~-helix also form caps on
jellyfish Aequorea ~ictoHa [5] and the demonstration by the ends of the cylinders. This motif, with a single
Chalfie et al. [6] that this G F P can be functionally a-helix inside a very uniform cylinder of 13-sheet structure,
expressed in bacteria have opened exciting new avenues of represents a new class of protein fold, which we have
investigation in cell, developmental and molecular biology, named the ']3-can'.
as pointed out in prior Teviews [7,8"]. A very dramatic
recent example is the production of transgenic green mice T h e regularity of the 13-can of G F P is quite remarkable.
[9]. Knowledge of the 3D structures of G F P is helping T h e eleven strands of the sheet form an almost seam-
to understand the basic photochemistry of this protein, less symmetrical structure, the only irregularities being
and the desire for specially engineered fluorescent proteins between two of the strands. In fact, the structure is so
has led at least half a dozen laboratories to carry out regular that water molecules on the outside of the can
structural studies on G F P and its mutants. This review will also form 'stripes' around the surface of the cylinder. T h e
highlight the relationship between the structure, function tightly constructed 13 barrel would appear to serve the
and dynamics of GFP. role of protecting the fluorophore well, providing overall
Figure 1
(~ (b)
The cylindrical ~-can structure of GFP [15°']. (a) End-on view. (b) Side view. Eleven strands of ~-sheet form an antiparalle113 barrel with short
(~ helices forming lids on each end. The fluorophore is inside the can, as a part of a distorted cc helix which runs along the axis of the cylinder.
(c) GFP usually forms dimers in the crystal, aligned largely along the sides of the cylinders. Figure generated using Ribbons, PDB entry code
1 GFL.
stability and resistance to unfolding caused by heat and T h e known proteins that most closely resemble the [3-can
denaturants [17]. fold of G F P are porin, which has not 11 but 16 antiparallel
Figure 2
Gin94 O_(
t NH--
| .J"
,H2N--~ Arg96
NH 2
HN...~' N
o. .J -°'y
H ,~'N--Phe o_(
His148 H OH
i ~ ...-H2N
"~ I NH 2
, , ( ~ ~ ' J ~ N - - G l y
Neutral Form
HO ,'
Anionic Form
Current Opinion in Structural Biology
Schematic diagram of the resonant forms of the fluorophore with nearby basic amino acids, His148, Gin94, and Arg96, and the acid Glu222.
The bases appear to stabilize anionic oxygen atoms at the opposite ends of the fluorophore, and the acid forms a hydrogen bond with the
hydroxyl of Ser65. Interactions between fluorophore and amino acids are shown with dotted lines.
strands and has no 'lids' at the ends of the barrel [18], and orophore originates from an internal Ser-Tyr-Gly se-
strepavidin, which is a smaller, eight-stranded antiparallel quence which is post-translationally modified to a 4-
[3 barrel [19]. Unlike strepavidin, however, both GFP and (p-hydroxybenzylidene)-imidazolidin-5-one structure [20].
porin have water molecules inside the barrel, as well as Studies of recombinant GFP expression in E. coil have
small segments of polypeptide chain inside. In the case led to a proposed sequential mechanism initiated by a
of porin, the function of which is to allow passage of small rapid cyclization between Set65 and Gly67 to form a
molecules through its center, the design needs to be open, imidazolin-5-one intermediate, followed by a much slower
whereas the function of GFP is better served by a closed (hours) rate-limiting oxidation of the Tyr66 sidechain by
structure. Because of the smaller number of strands in 02 [21,22°]. Combinatorial mutagenesis suggests that the
streptavidin, and hence the smaller inside diameter, its Gly67 is required for formation of the fluorophore [23].
center consists simply of sidechains originating from the While no known cofactors or enzymatic components are
staves of the barrel. required for this apparently autocatalytic process, it is
rather thermosensitive, with the yield of fluorescently
The fluorophore and its environment active to total GFP protein decreasing at temperatures
Analysis of a hexapeptide derived by proteolysis of >30°C [24]. Once produced, however, GFP is quite
purified GFP has led to the prediction that the flu- thermostable.
The critical fluorophore-forming sequence, Ser-Tyr-Gly, as its absorption and emission spectra are essentially mirror
occurs frequently in proteins. How is it that cyclization images [ 4 ] - - a hallmark of highly immobilized fluorescent
occurs in GFP, but not in other proteins? It would appear molecules.
that two factors are required for fluorophore formation:
close proximity of the backbone atoms of residues 65 and Most of the other polar residues in the pocket form
67; and acid/base chemistry to catalyze the cyclicization. an extensive hydrogen-bonding network on the side of
Close proximity is achieved by the removal of steric Tyr66 that requires abstraction of protons in the oxidation
hindrance from the sidechain at position 67, by using proccss. It is tempting to speculate that these residues,
a glycine residue. In fact, despite many mutations at particularly Arg96, help abstract the protons. As for the
position 67, no functionally fluorescent GFPs have been mutants, atoms in the sidechains of Thr203, Glu222,
found with anything but glycine at position 67. Amongst and I1e167 are in van der Waals contact with Tyr66,
the proteins in the protein data bank with a Ser-Tyr-Gly therefore mutation of these would have direct steric
sequence, 10 out of 206 found have the required proximity effects on the fluorophore and would also change its
yet no fluorophore is formed [25]; therefore steric factors electrostatic environment if the charge were changed, as
seem to be necessary but not sufficient for cyclization. has been suggested previously [29]. It seems probable that
Arginine at position 96 is close by and could act as a other mutations of the residues identified to be near the
base, withdrawing electrons by hydrogen bonding with fluorophore would also have an effect on the absorption
the carbonyl oxygen of Ser65 and activating the carbonyl and/or emission spectra.
carbon for nucleophilic attack by the amide nitrogen of
Gly67. Aspects of this scheme have been supported by Structures of variant G F P s
ab initio calculations, and by database searches of similar The location of certain amino acid sidechains within
compounds and protein sequences [26]. the vicinity of the fluorophore also begins to explain
the fuorescence and the behavior of certain mutants of
As model compounds identical to the hydroxyphenyl GFP. At least two resonant forms of the fluorophore can
imidizolidinone core of the fluorophore have been synthe- be drawn, one with a partial negative charge on the
sized and shown N O T to be significantly fluorescent in benzyl oxygen of Tyr66, and one with the charge on the
solution [27], the protein and its strategically placed acids carbonyl oxygen of the imidizolidone ring. Interestingly,
and bases at the edges of the fluorophore are implicated in basic residues appear to form hydrogen bonds with
providing key resonance stabilization and immobilization. each of these oxygen atoms, His148 with Tyr66 and
The remarkable cylindrical fold of the protein seems Gin94 and Arg96 with the imidizolidone. These basis
ideally suited for this function of the protein. Together residues presumably act to stabilize and possibly further
with the short a helices and loops on the ends, the barrel delocalize the charge on the fluorophorc (Figure 2).
structure forms a single compact domain and does not have The most probable possibility for charge changes is the
obvious clefts for easy access of diffusablc ligands to the Tyr66 distal oxygen-Hisl48 donor/acceptor pair, together
fluorophore. with the Glu222-Ser65 interactions [30°°]. Perturbation
of the interaction by mutation to histidine at position
The fluorophorc is protected from collisions with fluores- 66 results in only the high-energy absorption peak
cence quenchers such as molecular oxygen (Kbm<0.004 and a blue-shifted emission band, whereas disruption
L/mol-s) [28] and hence reduction of the quantum yield of the Set65 hydroxyl-Glu222 interactions result in a
of green photons that are produced from the excited state. red-shifted absorption peak and an unchanged emission
The lack of quenching of the fluorophore by molecular spectrum. The Tyr66--~Phe mutation has been reported to
oxygen can be explained by one of two mechanisms. have dramatically reduced fluorescence [21], presumably
Either the [3-can structure provides significant barriers due to its inability to form the anionic form of the
to penetration by oxygen, or the fluorophore is tightly fluorophore. Analysis of other mutants appears to support
held by local interactions. The observation that the the relationship between charge on the fluorophore and
fluorophore is near the geometrical center of the can the absorption/excitation spectra [31].
supports the notion that the can structure protects the
fluorophore from diffusional penetrative quenching and/or The availability of E. coli clones expressing GFP has led
protects the organism from eventual and unavoidable to extensive mutational analysis of GFP, with more crystal
free-radical reactions begun by photochemical processes structures now starting to appear that test ideas about
at the fluorophore by contained self-destruction. The the designed rearrangements of sidechains resulting from
crystallographic temperature factors are indeed lowest in mutagenesis (Table 1). Screens of random and directed
the center of the can, implying more rigidity, but it is point mutations for changes in fluorescent behavior
often the case that centers have low mobility within have uncovered a number of informative amino acid
protein crystal structures, and therefore this cannot be substitutions. Mutation of Ser65 to threonine, alanine,
taken as proof of a special design. GFP from the sea cysteine or leucine causes a loss of the 395 nm excitation
pansy (Renilla reniformis) may have even more specialized peak with a major increase in blue (475nm) excitation
structures for maintaining the rigidity of the fluorophore, [23,32]. When combined with Ser65 mutants, mutations
at other sites near the fluorophore such as Va168--+Leu cyclized structure in the wild-type protein. This effect
and Ser72-->Ala can further enhance the intensity of green is most likely produced by increased expression and/or
fluorescence produced by excitation at 4 8 8 n m [23,33]. folding of the protein. T h e report of improvements in
Amino acid substitutions significantly outside this region, G F P by D N A shuffling [34], comprising the mutations
however, also affect the proteins spectral character. For Phe99--->Ser, Met153--->Thr and Val163--+Ala, as numbered
example, Ser202--+Phe and Thr203-+Ile both cause the in the TU#58 system [6], are difficult to explain simply
loss of excitation in the 475 nm region with preservation on the basis of the structure. Positions 153 and 163 are
of 395 nm excitation [6,21,29]. Ile167-+Thr results in a on the surface of the protein and may exert their effects
reversed ratio of 395 to 4 7 5 n m sensitivity [11], whereas via improved solubility and/or reduced aggregation. At
Glu222--+Gly is associated with the elimination of only first glances, the Phe99--+Ser mutation would appear to
the 395 nm excitation [29]. T h e pH dependence of the destabilize the core of the protein, and at present we have
excitation bands at 395 and 475 nm [17] is almost certainly no good ideas how it would improve the system.
due to His148, the N atom of which is 3 . 3 ~ from the
Tyr66 hydroxyl oxygen atom of the fluorophore, although
N M R pK a measurements or mutagenesis studies would be GFP truncation and fusion constructs
needed for confirmation. Truncation of more than seven amino acids from the C
terminus or more than the N-terminal methionine leads
to total loss of fluorescence [35]. T h e s e N- and C-termini
Structure-based engineering of GFP truncation studies and the fluorescent fusion products are
Mutations in regions of the sequence adjacent to the now understandable, given the structure of the protein. As
fluorophorc, that is, in the range of positions 65-67, have the C terminus loops back outside the cylinder, and the
been systematically explored [23], some have significant last seven or so amino acids are disordered, their presence
wavelength shifts and most suffer a loss of fluorescence shouldn't be critical, and further addition would seem to
intensity. For example, mutation of the central tyrosine to be easily tolerated. T h e s e residues do not form a stave of
phenylanine or histidine shifts the excitation bands, but the barrel. T h e role of the N terminus is a little less clear,
an overall loss of intensity occurs. Secondary mutations to as the first strand in the barrel does not begin until amino
compensate for the deleterious intensity effects may also acid 10 or 11. Thus, barrel formation does not require the
now be possible. T h e Ser65--+Thr mutant is particularly N-terminal region. T h e N-terminal segment is, however,
interesting because of its reported increase in fluorescence an integral part of the cap on one end of the protein, and
intensity [21,32]. On the one hand, the mechanism may be essential in folding events or in protecting the
for increased fluorescence may be reduced collisional fluorophore. Again, extensions at the N terminus would
quenching, as the additional methyl group may make not disrupt the motif structure of the protein. In fact,
for better packing in the interior of the protein. On the the G F P crystal structure solved by Wu et al. [36] has
other hand, the effect has been suggested to be due to a 37-residue histidine tag on the N terminus, of which
improved conversion of the tyrosine to dehydrotyrosinc. only two amino acids are ordered and hence visible in the
This is unlikely, however, as essentially we see fully electron density map.
Table 1
Summary of different crystal forms of GFP currently under study.
Variant Space group Cell dimensions (A) Resolution (A) Form Conditions
*Includes the inadvertent Q80R mutation and an additional alanine at the N terminus, fHas the following sequence differences from the cloned
isoform: R8OG; F100Y; TIO8S; L141M; E172L SHas a 37 residue His-tag at the N terminus (only 0 and -1 positionsseen in the electron density
maps). §Four monomers per asymmetric unit, but not the commonly seen dimer. (a) MA Perozzo, F Yang, GN Phillips, W Ward, K Ward, unpublished
data. AS, ammonium sulfate; MPD, 2-methyl-2,4-pentane diol; PEG, polyethlyene glycol.
Control of GFP dimerization and calibrations [41] but the results will most likely have
Green fluorescent protein can form homodimers both in a widespread effect in cell and developmental biology.
solution and in crystals. The KD is approximately 100~t M
as measured by analytical uhracentrifugation (F Yang and Acknowledgements
GN Phillips, unpublished data). This weak association is We thank the following for financial support: the Robert A Welch
consistent with the observation that most but not all crystal Foundation; the WM Reck Center for Computational Biology; and the
National Institutes of Health (AR40252 and AR32764).
forms exist as dimers, and with correlation microscopy
of purified GFP, which reveal a diffusion coefficient
