Duons
Duons
Duons
If you wish to distribute this article to others, you can order high-quality copies for your
Updated information and services, including high-resolution figures, can be found in the online
version of this article at:
http://www.sciencemag.org/content/342/6164/1367.full.html
Supporting Online Material can be found at:
http://www.sciencemag.org/content/suppl/2013/12/11/342.6164.1367.DC1.html
A list of selected additional articles on the Science Web sites related to this article can be
found at:
http://www.sciencemag.org/content/342/6164/1367.full.html#related
This article cites 61 articles, 32 of which can be accessed free:
http://www.sciencemag.org/content/342/6164/1367.full.html#ref-list-1
This article has been cited by 1 articles hosted by HighWire Press; see:
http://www.sciencemag.org/content/342/6164/1367.full.html#related-urls
Science (print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by the
American Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. Copyright
2013 by the American Association for the Advancement of Science; all rights reserved. The title Science is a
registered trademark of AAAS.
REPORTS
References and Notes 13. P. J. Gerrish, R. E. Lenski, Genetica 102-103, 127–144 (DBI-0939454), and by funds from the Hannah Chair
1. R. E. Lenski, M. R. Rose, S. C. Simpson, S. C. Tadler, (1998). Endowment at Michigan State University. We thank three
Am. Nat. 138, 1315–1341 (1991). 14. M. Hegreness, N. Shoresh, D. Hartl, R. Kishony, Science reviewers for comments; I. Dworkin, J. Krug, A. McAdam,
2. C. L. Burch, L. Chao, Genetics 151, 921–927 (1999). 311, 1615–1617 (2006). C. Wilke, and L. Zaman for discussions; and N. Hajela for
3. D. M. Weinreich, N. F. Delaney, M. A. Depristo, 15. S.-C. Park, J. Krug, Proc. Natl. Acad. Sci. U.S.A. 104, technical assistance. R.E.L. will make strains available to
D. L. Hartl, Science 312, 111–114 (2006). 18135–18140 (2007). qualified recipients, subject to completion of a material
4. S. Kryazhimskiy, G. Tkačik, J. B. Plotkin, Proc. Natl. Acad. 16. G. I. Lang et al., Nature 500, 571–574 (2013). transfer agreement that can be found at www.technologies.
Sci. U.S.A. 106, 18638–18643 (2009). 17. S. Wielgoss et al., Proc. Natl. Acad. Sci. U.S.A. 110, msu.edu/inventors/mta-cda/mta/mta-forms. Datasets and
5. H.-H. Chou, H.-C. Chiu, N. F. Delaney, D. Segrè, 222–227 (2013). analysis scripts are available at the Dryad Digital Repository
C. J. Marx, Science 332, 1190–1192 (2011). 18. R. J. Woods et al., Science 331, 1433–1436 (2011). (doi:10.5061/dryad.0hc2m).
6. A. I. Khan, D. M. Dinh, D. Schneider, R. E. Lenski, 19. M. M. Desai, D. S. Fisher, A. W. Murray, Curr. Biol. 17,
T. F. Cooper, Science 332, 1193–1196 (2011). 385–394 (2007).
7. I. G. Szendro, M. F. Schenk, J. Franke, J. Krug, 20. F. Vasi, M. Travisano, R. E. Lenski, Am. Nat. 144, Supplementary Materials
J. A. G. M. de Visser, J. Stat. Mech. 2013, P01005 (2013). 432–456 (1994). www.sciencemag.org/content/342/6164/1364/suppl/DC1
8. T. J. Kawecki et al., Trends Ecol. Evol. 27, 547–560 21. R. G. Eagon, J. Bacteriol. 83, 736–737 (1962). Materials and Methods
(2012). 22. S. Goyal et al., Genetics 191, 1309–1319 (2012). Supplementary Text
9. R. E. Lenski, M. Travisano, Proc. Natl. Acad. Sci. U.S.A. 23. S. Wielgoss et al., G3 (Bethesda) 1, 183–186 (2011). Figs. S1 to S7
91, 6808–6814 (1994). 24. S. F. Elena, R. E. Lenski, Evolution 51, 1058–1067 (1997). Tables S1 to S4
10. J. E. Barrick et al., Nature 461, 1243–1247 (2009). 25. C. E. Paquin, J. Adams, Nature 306, 368–370 (1983). References (26–40)
11. Materials and methods and supplementary text are
available as supporting material on Science Online. Acknowledgments: This work was supported by grants from 17 July 2013; accepted 4 November 2013
12. P. Sibani, M. Brandt, P. Alstrøm, Intl. J. Mod. Phys. 12, the National Science Foundation (DEB-1019989) including Published online 14 November 2013;
361–391 (1998). the BEACON Center for the Study of Evolution in Action 10.1126/science.1243357
Exonic Transcription Factor Binding (Fig. 1, A and B; fig. S1A; and table S1). Ap-
proximately 14% of all human coding bases con-
tact a TF in at least one cell type (average 1.1%
Directs Codon Choice and Affects per cell type) (Fig. 1C and fig. S1B), and 86.9%
of genes contained coding TF footprints (average
Protein Evolution 33% per cell type) (fig. S1, C and D).
The exonic TF footprints we observed likely
underestimate the true fraction of protein-coding
Andrew B. Stergachis,1 Eric Haugen,1 Anthony Shafer,1 Wenqing Fu,1 Benjamin Vernot,1 bases that contact TFs because (i) TF footprint
Alex Reynolds,1 Anthony Raubitschek,2,3 Steven Ziegler,3 Emily M. LeProust,4* detection increases substantially with sequencing
Joshua M. Akey,1 John A. Stamatoyannopoulos1,5† depth (13), and (ii) the 81 cell types sampled, al-
though extensive, is far from complete; we saw
Genomes contain both a genetic code specifying amino acids and a regulatory code specifying little evidence of saturation of coding TF footprint
transcription factor (TF) recognition sequences. We used genomic deoxyribonuclease I footprinting discovery (fig. S2).
to map nucleotide resolution TF occupancy across the human exome in 81 diverse cell types. To ascertain coding footprints more completely,
We found that ~15% of human codons are dual-use codons (“duons”) that simultaneously specify we developed an approach for targeted exonic
both amino acids and TF recognition sites. Duons are highly conserved and have shaped protein footprinting via solution-phase capture of DNaseI-
evolution, and TF-imposed constraint appears to be a major driver of codon usage bias. Conversely, seq libraries using RNA probes complementary to
the regulatory code has been selectively depleted of TFs that recognize stop codons. More than human exons (19). Targeted capture footprinting of
17% of single-nucleotide variants within duons directly alter TF binding. Pervasive dual exons from abdominal skin and mammary stromal
encoding of amino acid and regulatory information appears to be a fundamental feature of fibroblasts yielded ~10-fold increases in DNaseI
genome evolution. cleavage—equivalent to sequencing >4 billion reads
per sample by using conventional genomic foot-
he genetic code, common to all organisms, malian genomes (7–11), which appear to be under printing (fig. S3A)—quantitatively exposing many
Fig. 1. TFs densely populate and evolutionarily constrain protein-coding synonymous (brown), and nonsynonymous (red) coding SNVs (European) within
exons. (A) Distribution of DNaseI footprints. (B) Per-nucleotide DNaseI cleavage and outside footprints [P values per (21)] (G) Structure of DNA-bound KLF4 versus
and chromatin immunoprecipitation sequencing (ChIP-seq) signal for coding CTCF average per-nucleotide DNaseI cleavage and evolutionary constraint at KLF4
(left) and NRSF (right) binding elements. (C) Proportion of coding bases within footprints. (H) Average per-nucleotide conservation at 4FDBs (brown) and NDBs
DNaseI footprints in each of 81 cell types (left), or any cell type (right). (D) Average (red) overlapping KLF4 (left) and NFIC (right) footprints [r, Pearson correlation;
footprint density within first, internal, or final coding exons [mean T SEM; P value, conservation at promoter bases versus 4FDBs (top) or NDBs (bottom)]. (I) Evo-
paired t test; nonsignificant (n.s.) indicates P > 0.1]. (E) PhyloP conservation at lutionary constraint imparted by 63 TFs at promoter elements, 4FDBs and NDBs
4FDBs within and outside footprints. (F) Estimated mutational age at all (gray), (Pearson correlations).
Fig. 2. TFs modulate global codon biases. (A) Proportions of all codons (gray), of each codon trinucleotide in coding versus noncoding regions (C, coding; NC,
or codons outside of (yellow) or within (purple) footprints, that encode asparagine noncoding). (D) Difference in average evolutionary constraint at third positions of
(top) or leucine (bottom). Codons with bias (AAC for asparagine and CTG for leucine) biased codons outside versus within footprints (P values, Mann-Whitney test). (E)
preferentially localize within footprints. (B) Preferential footprinting of biased Proportions of amino acids encoded by CpG-containing codons among all codons
codons, calculated as in (A) (P values, Pearson’s c2 test). (C) Preferential footprinting (gray), codons outside footprints (yellow), or codons within footprints (purple).
Fig. 3. TFs exploit and avoid specific coding features. (A) Percentage of amino acid sequence within YY1 footprints overlapping start codons. (D)
TF motifs occupied in coding versus noncoding regions (P values, paired t test). (Top left and bottom) For NRSF as per (C). (Right, arrow) Protein domain
(B) Density of NFYA (left), AP2 (middle), and SP1 (right) footprints relative to annotation of first exon third-frame NRSF footprints versus SP1 footprints.
translated region of first coding exons. (C) (Top) Density of YY1 footprints across (E) TF preference (avoidance) of stop codon trinucleotides within versus out-
first coding exons. (Bottom) YY1 recognition sequence and corresponding side footprints in noncoding regions (P values, Pearson’s c2 test).
Fig. 4. Genetic variation in duons frequently alters TF occupancy. (A) zygous (G/A) cells. (D) Proportion of synonymous and nonsynonymous variants
Proportion of coding footprints overlapping a SNV in any of 81 cell types. (B) in duons that allelically alter TF occupancy. (E and F) Proportion of nonsyn-
Proportion of SNVs in duons that allelically alter TF occupancy. (C) (Top) Per- onymous variants from (D) grouped by predicted impact of coding variant on
nucleotide DNaseI cleavage at common nonsynonymous G→A SNV (rs8110393) protein function using (E) SIFT or (F) Polyphen-2. None of the bins are signif-
in G/G and A/A homozygous cells. (Bottom) Allelic SP1 occupancy in hetero- icantly different (Fisher’s exact test; n.s. indicates P > 0.1).