Invited Review
piRNAs miRNAs
(e.g. tiRNAs) antisense lncRNA snoRNAs
Figure 1. Simplified representation of regulatory ncRNAs and their functions. Generalized gene models are presented in dark
grey and light orange and overlap the double-strand DNA structure (light grey), representative of the genome. Each class of
regulatory RNA is defined by a colour. Functions are ascribed to each class of ncRNA by colour below the text at the top of the
figure. Bars and arrows indicate the direction of transcription. For more detail please see the text and references therein. PARs,
promoter-associated RNAs; lncRNAs, long non-coding RNAs; miRNAs, microRNAs; snoRNAs, small nucleolar RNAs; sdRNAs,
sno-derived RNAs; endo-siRNAs, endogenous siRNAs; piRNAs, PIWI-interacting RNAs; tiRNAs, transcription initiation RNAs
to include a host of short transcripts that sit adja- non-coding transcripts (reviewed in [15,59]), which
cent to transcription start sites [38,39], including may suggest that other Dicer-independent small RNA
promoter-associated small RNAs (PASRs) [9,40] and species are still to be discovered.
transcription initiation RNAs (tiRNAs) [41], species A detailed examination of miRNA biogenesis and
that are derived from centromeres and telomeres function is beyond the scope of this work and has
[42,43], and tiny species processed from other short recently been covered in detail in several excellent
RNAs [44] (Figure 1, Table 1). Indeed, over the last reviews [16,17,60]. However, miRNA mechanisms of
decade we have witnessed a near-exponential growth action, and the autoregulatory feedback loops that
of manuscripts devoted to regulatory RNAs (Figure 2). increasingly characterize small RNA biogenesis, are
Of the classes identified to date, miRNAs, siR- well illustrated by one of the first miRNAs discov-
NAs and piRNAs, which guide effector Argonaute ered, let-7 (Figure 3). The let-7 family of miRNAs
proteins to genomic loci or target RNAs in a sequence- is highly conserved throughout the Metazoa and func-
specific manner, have been most thoroughly inves- tions as a master temporal regulator of development
tigated. In humans there are at least 700 miRNAs, and differentiation, both in early embryos and com-
hundreds of siRNAs and millions of unique piRNA plex adult tissues such as brain, in nematode, fruitfly,
sequences [15–17,45], suggesting that small RNAs zebrafish and mouse [61–67]. Indeed, let-7 targets
are a substantial portion of the RNA output of cells well-established cell-cycle regulators, including Cdk6
and that they comprise a diverse, widespread and and Ras [68,69]. Like the majority of miRNAs, the let-
basal regulatory system. Indeed, several recent studies 7 precursor hairpin, or pre-miRNA, is processed from
have shown that miRNAs and piRNAs are detectable a long RNA polymerase II transcript by the nuclear
in the most primitive multicellular organisms [46] RNase Drosha, which is then exported to the cytosol
and that once acquired, they are seldom, if ever, and processed to a ∼22 nt mature miRNA by Dicer
lost [47–49]. Although exogenous siRNAs were dis- [70].
covered a decade ago, endogenous siRNAs (endo- Each of these steps of let-7 biogenesis is tightly
siRNAs) have only recently been identified in fruit- regulated (Figure 3a). For example, while differentia-
flies and mammals [50–58], where their biogene- tion factors such as Notch [71] induce transcription,
sis is dependent on Dicer processing of duplexes pluripotency factors (i.e. those that support an undif-
formed by overlapping transcripts or long perfect ferentiated cellular state), such as c-Myc, repress tran-
hairpin structures. Work from other animals, includ- scriptional activation [72–74]. Likewise, the pluripo-
ing nematode and fruitfly, yeast and plants, indi- tency factor LIN28 can bind to the conserved loop
cates that that these endo-siRNAs are involved in of the primary let-7 transcript (pri-let-7) to directly
anti-viral defence, transposon silencing, chromatin inhibit the Drosha cleavage steps [75–77] and can
remodelling and post-transcriptional gene regulation inhibit Dicer cleavage directly or by facilitating pre-
through Argonaute-mediated cleavage of target tran- miRNA degradation [78,79]. Completing the feed-
scripts (reviewed in [16,17]). The longest small RNA back loop, let-7 targets LIN28 [75,80,81], c-Myc
class, piRNAs, which are ∼25–30 nt in length, are [82,83] and the c-Myc-activating gene IMP-1 [80].
also largely derived from, and involved in, trans- let-7 also forms a separate overlapping loop with the
poson defence (reviewed in [15,19]) but are largely TRIM–NHL family of proteins that negatively reg-
restricted to the germline, where active transposons ulate c-Myc and enhance let-7 activity [66,84–86].
could severely disrupt embryogenesis. Intriguingly, in These let-7 targets are ‘canonical’, i.e. the miRNA
a departure from the Dicer biogenesis pathway that ‘seed’ sequence (nucleotides 2–8) binds to a target
defines siRNAs and miRNAs, piRNAs are produced mRNA 3 UTR and (generally) represses translation
by successive waves of Argonaute-cleavage of long (Figure 3b).
However, let-7 also targets the Dicer coding [2–10] and that at least 80% of this transcription is
sequence (CDS) [87] (Figure 3b), consistent with exclusively associated with long non-coding RNAs
what appears to be an emerging theme of non- (lncRNAs) [9]. Although lncRNAs have frequently
canonical miRNA targets in developmentally regulated been disregarded as artifacts of chromatin remod-
genes [88–92]. Additionally, let-7 was also recently elling or transcriptional ‘noise’ [94,95], there is sub-
shown to regulate HMGA2, an oncofetal gene and stantial evidence to suggest that they mirror protein-
pluripotency factor, in a cell cycle-dependent manner. coding genes. Indeed, they are frequently long (gen-
HMGA2 translation is up-regulated upon cell cycle erally >2 and some >100 kb) [96], spliced and con-
arrest but inhibited in proliferating cells [93]. Taken tain canonical polyadenylation signals [97,98]. Addi-
as an example, let-7 provides a compelling illustration tionally, lncRNA promoters are bound and regulated
of the complexity of small RNA biogenesis and func- by transcriptional factors, including Oct3/4, Nanog,
tion, and points more generally towards small RNAs CREB, Sp1, c-myc, Sox2, NF-κB and p53 [99–102]
having a wide range of regulatory functions facilitated and epigenetically marked with specific histone mod-
by sequence-specific interactions, any of which may ifications [102,103]. Overall, there are at least tens of
malfunction to cause disease.
thousands of lncRNAs that show signatures of selec-
tion — many of which, like small RNAs, are tissue
Long non-coding RNAs and developmental stage-specific [97,104–108].
Genome-wide transcriptomic studies have now shown Long ncRNAs have a variety of functions, but one
that the mammalian genome is abundantly transcribed of their primary roles appears to be as epigenetic
Figure 3. let-7 provides a window into miRNA biogenesis and function. (a) let-7 biogenesis and gene regulation is characterized by
a series of autoregulatory feedback loops. Lines ending in bars indicate inhibitory interactions, while those terminating in arrows
indicate activating interactions. For simplicity, all let-7 family members (of which there are 11 in vertebrates) are considered as a
group. Likewise, mammalian LIN28 homologues (LIN28 and LIN28B) and TRIM–NHL family members (TRIM71 and TRIM32) are
depicted as single elements within the schematic. Mature let-7 mediates its effects through a complex composed of an Argonaute
protein (grey) and GW182 (brown), which is also depicted in simplified form in the lower panel. Consistent with its expression in
late embryogenesis, the principal targets of let-7 are cell cycle regulators, oncofetal genes, pluripotency factors and components
of the miRNA biogenesis pathway. Please see the text for more detail and references for each depicted interaction. (b) A general
schematic of mRNA transcription and miRNA targeting. Canonical miRNA targets (blue) are dependent on base pairing between
nucleotides 2–8, the seed sequence, and the mRNA 3 UTR. Due to the short length of the seed sequence, legitimate interactions
can be abolished and illegitimate targets created by single base changes. Non-canonical targets (orange), e.g. those in coding
sequences (CDSs) or 5 UTRs, are not reliant on the ‘seed’ sequence and generally show more extensive base pairing. Canonical
and non-canonical targets are depicted for let-7a:HMGA2 [235] and let-7b:Dicer [87], respectively
Similarly, loss of specific small RNA loci is asso- study has shown that a single microdeletion involv-
ciated with Prader–Willi syndrome (PWS), a disor- ing several small nucleolar RNA clusters (HBII-85
der caused by the loss of imprinting on chromo- and HBII-52) results in PWS, suggesting that loss
some 15q11-q13 and characterized by hyperphagia, of small RNAs is a causal determinant of the dis-
hypogonadism and cognitive impairment. A recent ease [144]. Consistent with this hypothesis, knockout
mice lacking the relevant snoRNAs largely recapitu- LIN28 and LIN28B, negative regulators of let-7 bio-
late the PWS phenotype [145]. Interestingly, HBII-52 genesis, correlates with repression of let-7, occurs in
forms an antisense duplex with the serotonin recep- at least 15% of human malignancies and is associ-
tor 2C (5HT2C ) mRNA and negatively regulates its ated with more advanced disease states [165]. Simi-
post-transcriptional editing [146,147], strongly impli- larly, and consistent with let-7 ’s role in developmental
cating it in PWS-associated and autistic neurological regulation, genetic variants of the LIN28 locus have
defects [148]. Taken together, these studies suggest recently been associated with altered timing of human
that the loss of small RNA loci plays an important pubertal growth and development [166].
role in human illness. Single nucleotide polymorphisms (SNPs) in mature
Like protein-coding genes, small RNAs can func- and precursor miRNAs have been robustly associ-
tion either as activators or inhibitors of disease. Con- ated with schizophrenia and autism [167,168], and
sistent with its role as a differentiation factor, let-7 a pathogenic SNP in the seed sequence of miR-
is a well-established tumour suppressor [61,149–151] 96 is responsible for progressive hearing loss [169]
whose reduced expression is associated with poor sur- (Figure 3b). A SNP in the 3 UTR of K-Ras, a
vival in human lung cancers [152]. Likewise, mir-29b well-characterized GTPase-regulated oncogene and
expression is associated with disease-free survival in target of let-7, inhibits let-7 translational suppres-
patients with ovarian serous carcinoma [153], poten- sion and results in reduced survival in oral cancers
tially due to regulation of the de novo methyl trans- [170]. Indeed, SNPs in the 3 UTRs of mRNAs
ferases Dnmt3a and Dnmt3b [154]. Indeed, altered that abolish or create target sites may be common
expression of a broad suite of miRNAs that, dependent in miRNA-associated diseases (reviewed in [171]).
on their targets can either act as tumour suppressors As examples, SNP-induced illegitimate miRNA bind-
or oncogenes (so-called oncomiRs), has been detected ing sites are associated with muscular hypertrophy in
in virtually all cancer types examined (for reviews sheep, Tourette’s syndrome and cardiovascular disease
and tables of cancer-associated miRNAs and their [171–173]. More generally, allele-specific polymor-
targets, see [134,151,155,156]). Similar relationships phisms in miRNA target sites have been shown to play
are apparent in cardiovascular illnesses [140,141]. a role in the tissue-specific miRNA regulation of hun-
For example, miR-92a controls functional recovery dreds of genes, suggesting that such genetic subtleties
of ischaemic tissues in mice [157], and miR-145 and may be a widespread underlying cause of individual
miR-143, which have recently been implicated in dif- phenotypic variability [174].
ferentiation of progenitors into cardiac myocytes, are
down-regulated in injured and atherosclerotic vessels Long non-coding RNAs
[158]. MicroRNAs may even play a direct role in viral
defence. A study of human T lymphocytes has shown The data gathered to date strongly implicate lncRNAs
that miR-29a targets the HIV-1 3 UTR and directs it to in the basal regulation of protein-coding genes, includ-
P bodies, where it is suppressed by the RNA-induced ing those central to normal development and oncoge-
silencing complex (RISC) [159]. nesis, at both the transcriptional (e.g. epigenetic) and
Small RNA dysregulation occurs for multiple rea- post-transcriptional (e.g. subcellular dynamics) levels,
sons and reflects the processes involved in their and an increasing number have been functionally val-
biogenesis, regulation and targeting. MicroRNA loci idated to affect different cellular and developmental
and individual components of the miRNA biogen- pathways (see [107]). It is not surprising, then, that
esis pathway are frequently lost or amplified in a the dysregulation of lncRNAs appears to be a primary
wide range of cancers [160,161], and there is now feature of many complex human diseases, including
widespread evidence that miRNAs that act as differ- leukaemia [175], colon cancer [176], prostate cancer
entiation factors (e.g. let-7, above) are globally down- [177], breast cancer [178], hepatocellular carcinoma
regulated in cancers [134,150,151]. Indeed, ovarian [175,179], psoriasis [180], ischaemic heart disease
cancer patients who show decreased expression of [181,182], Alzheimer’s disease [183] and spinocere-
Dicer and Drosha, the RNases involved in miRNA bellar ataxia type 8 [184].
production (see above), are associated with poor prog- In some cases, the mechanisms by which lncRNAs
noses, suboptimal surgical cytoreduction and advanced contribute to disease have been carefully dissected. For
tumour stages [162]. Likewise, a mutation resulting in example, the dsDNA-binding protein PSF constitu-
premature termination of DICER1 results in pleuropul- tively silences the proto-oncogene GAGE6. However,
monary blastoma, a rare paediatric lung tumour [163]. at least five lncRNAs can bind to PSF, which results
Consistent with these findings, studies in mice have in deactivation of PSF-induced silencing, expression
shown that mammalian systems are highly sensitive of GAGE6 and enhanced tumorigenicity [185]. Long
to Dicer activity. Complete loss of Dicer results in the ncRNAs overlapping or antisense to protein-coding
disruption of the developmental programme and early gene promoters may also contribute to oncogenesis.
embryonic lethality [164]. A transcript antisense to the p15 tumour suppressor
Elements associated with the regulation of miRNA gene, first identified in a human leukaemia, regu-
processing can also be associated with various patholo- lates the chromatin and DNA methylation status of
gies. A recent study has shown that over-expression of the p15 locus [186]. A lncRNA antisense to p21 was
also recently shown to behave similarly [187]. These function, PISRT1, which has also been identified as a
results, combined with the observation that antisense candidate gene in a goat model of this disease [193].
transcripts are present at thousands of protein-coding The potential role of lncRNAs in long-range
genes, have led to speculation that antisense lncR- enhancer function, and therefore dysfunction, is illus-
NAs generally control the expression of their cog- trated by the Evf2 lncRNA (see above), which
nate protein-coding genes through epigenetic modifi- may contribute to split-hand/split-foot malformation
cations [188,189]. This model has profound ramifi- 1 (SHFM1). Although the region associated with
cations for our understanding of disease, particularly this developmental disorder encompasses three genes,
cancer — dysregulation of a lncRNA regulating the DLX5, DLX6 and DSS1, none are directly mutated in
expression of a tumour suppressor or oncogene, and patients [194]. Instead, exhibition of the limb pheno-
not the protein-coding sequence itself, may be one of type requires the expression of both the Dlx5 and Dlx6
the ‘hits’ that leads to oncogenesis. genes to be disrupted, suggesting that SHMF1 results
from the ablation of a shared regulatory element [195].
Since it is now known that Evf2 regulates the expres-
The hidden layer of non-coding variation sion of both these genes, it warrants investigation as a
candidate SHFM1 disease locus.
These examples likely represent the tip of a very big Similarly, two lncRNAs, SOX2OT and SOX2DOT,
iceberg. The same technologies that have revealed a exhibit enriched expression in the lens of eye and over-
breadth of ncRNA expression are also driving a revo- lap a known myopia susceptibility locus (Figure 4)
lution in genome sequencing that will ultimately iden- [196,197]. These transcripts, one of which is tran-
tify variations in the human genome that underpin dis- scribed from a distal ultraconserved enhancer, also
ease susceptibility and aetiology. However, given the overlaps the SOX2 gene, itself an important regula-
focus on mutations in protein-coding exons that cause tor of ocular development. Given that developmental
most of the high-penetrance simple genetic disorders, genes are significantly enriched for a proximal asso-
the variation that occurs in non-protein-coding regions ciation with lncRNAs, it may well be that the under-
of the genome has, to date, largely been ignored or at standing of developmental disorders is likely to benefit
least not been considered [107,190]. This is changing: from an appreciation of lncRNA biology. The future
the emergence of genome-wide association studies to convergence of lncRNA identification by deep RNA
identify variant loci affecting complex diseases and sequencing with the increased resolution of disease
traits and an increased awareness of ncRNA biology variants afforded by genomic sequencing will, we sug-
have prompted a reconsideration of the underlying gest, prove a potent combination in elucidating the
protein-centric assumptions and provided a number functional contribution of ncRNAs to disease.
of novel insights into disease-causing mechanisms.
For example, many pathological mechanisms are now
known to involve aberrant regulation (and in many Non-coding RNAs as diagnostics
cases ncRNAs) rather than alterations to the protein- and therapeutics
coding sequences themselves. This is perhaps not sur-
prising, given that the primary engine of phenotypic The growing body of research showing that ncRNAs
radiation and higher complexity has been the expan- may be primary genetic regulators in complex animals
sion and divergence of the regulatory architecture that has led to the corresponding realization that this may
controls the deployment of protein components during make them ideal diagnostic markers. For example, in
differentiation and development [191], much of which some cases the expression profiles of miRNAs, in con-
may be embedded within ncRNAs [12,105]. Indeed, trast to those of protein-coding mRNAs, are able to
the same forces that drive evolutionary innovations accurately identify the origin of poorly differentiated
can result in deleterious variations. tumours and carcinomas [198,199]. Indeed, a signa-
Genome-wide association studies are beginning to ture of as few as 200 miRNAs may be sufficient for
identify novel ncRNAs as candidate disease-associated cancer classification [198], and it appears that some of
genes. For example, the lncRNA MIAT is associated the difficulties of early detection associated with colon
with myocardial infarction [181], and a novel lncRNA and other occult cancers may be overcome by pro-
induced by a chromosomal deletion that truncates the filing miRNAs obtained from patient serum, plasma,
polyadenylation site of the LUC7L gene [192] results saliva and tissues [200,201]. Likewise colon, lung and
in aberrant methylation and silencing of the neighbour- breast cancer prognosis is strongly associated with a
ing HBA2 gene, leading to the onset of α-thalassemia. small suite of miRNAs (reviewed in [202]), suggesting
Indeed, many disease variants map far from protein- that assays designed to query ncRNAs may eventually
coding genes and, given the level of genome-wide become core components of the pathologist’s toolkit.
transcription, are therefore likely to interrupt lncRNAs. This will undoubtedly be facilitated by recent advance-
For example, a disease-causing 7.4 kb deletion asso- ments in massively parallel sequencing technologies
ciated with blepharophimosis syndrome occurs over [203,204], which allow rapid and sensitive profiling of
250 kb upstream from the nearest gene, FOXL2 [193], both long and short ncRNAs, and will almost certainly
and this mutation interrupts a lncRNA of unknown make personal genomics a reality in the next 5 years
Figure 4. Long non-coding RNAs, SOX2 distal overlapping transcript (SOX2DOT) and SOX2 overlapping transcript (SOX2OT)
map to the myopia susceptibility locus. A representation of these features, the SOX2 gene, and a highly conserved SOX2DOT
enhancer [236] are shown in the inset schematic representative of a region of the human genome (chr3: 182, 255, 415–182, 945,
055; UCSC Genome Browser hg18). The relative expression of spliced ESTs corresponding to SOX2DOT or SOX2OT is depicted
in the lower half of the figure and shows that these transcripts are highly expressed in the lens of the eye. Adapted from Amaral
et al. [196]
[205]. The analysis and integration of this informa- liposome and nanoparticle delivery systems, and there
tion with other datasets (e.g. protein interaction and are currently multiple ongoing clinical trials targeting
genome-wide association studies) will pose a consid- age-related macular degeneration, respiratory syncytial
erable, but tractable, challenge well into the future. virus, acute renal failure, hepatocellular carcinoma and
The link between endogenous ncRNAs and dis- congenital pachyonychia, among others (reviewed in
ease, and the perfection of RNAi-based techniques to [212]).
silence genes in simple animals, has led to specula- There is also an increasing interest in RNA ther-
tion that RNA molecules can be employed as ther- apeutics that mimic or regulate miRNA activity in
apeutic agents. Indeed, it may be both easier and human cancers (reviewed in [213]). This could be
more productive to adjust the regulatory software facilitated by exogenous expression of a repressed
(i.e. ncRNAs) than to try and correct the hardware miRNA (using the same delivery systems as siRNA
(i.e. protein-coding genes). Hopes for RNA-based and therapeutics), by the introduction of antagomirs [214]
RNA-targeted therapies were bolstered by early suc- that are complementary and bind to miRNAs, or
cesses using siRNAs in human in vitro culture systems the use of ‘sponges’ that contain multiple artificial
[206] and in targeting HIV-1 and human BCL2 with miRNA-binding sites [215]. Artificial expression of
siRNA-like molecules [207–209]. Like gene therapy, specific miRNAs in vivo may be a powerful therapeu-
however, RNA therapeutics face considerable hur- tic mechanism, particularly given recent reports that
dles, including development of reliable delivery sys- over-expression of a single miRNA, miR-302, is capa-
tems, dosage regimes and techniques to ameliorate off- ble of inducing stemness [216,217].
target effects [210,211]. Nonetheless, multiple modes A series of recent studies has suggested that an
of administration have been developed, including viral, equally fruitful target may be gene promoters. Indeed,
cell development: reprogramming and beyond. Nat Rev Genet
