The Human Genome - Final
The Human Genome - Final
The Human Genome - Final
NCBI Genome Id 51
Ploidy Diploid
.
Number of chromosomes ~3 x 109 bp
Genome size 23 pairs
mtDNA:
•Small circular DNA (16,569 bp)
•Contains just 37 genes
– 13 of these genes code for proteins involved in respiratory complex, the
main biochemical component of energy-generating mitochondria (e.g.
gene for ATPase subunits 6 and 8; cyt c oxidase subunits I, II, III; cyt b;
NADH hydrogenase subunits 1-6)
– Other 24 specify ncRNA molecules required for expression of
mitochondrial genome [e.g. rRNA (12S and 16S) genes; tRNA genes]
– Genes are more tightly packed than nuclear genome and they contain no
introns
•The most abundant DNA molecule in the cell (~800
mitochondria/cell; ~10 copies of mtDNA/mitochondria)
– The abundance of mtDNA has been exploited in the analysis of ancient
DNA and in forensic investigations where limiting starting material can
hinder genetic analysis
Human genomes include:
– Protein-coding DNA genes
– Noncoding DNA (ncDNA)
Protein coding DNA sequences:
– Defined as those sequences that can be transcribed into mRNA and translated
into proteins during the human life cycle
– ~20,000 protein-coding genes per haploid genome (fewer than anticipated)
– Account for only a very small fraction of the genome (<2%)
– Distributed unevenly across the chromosomes, with an especially high gene
density within chromosomes 19, 11, and 1
– Protein-coding sequences represent the most widely studied and best
understood component of the human genome
– These sequences lead to the production of all human proteins, although
several biological processes (e.g. DNA rearrangements and alternative pre-
mRNA splicing) can lead to the production of many more unique proteins than
the number of protein-coding genes
– The complete modular protein-coding capacity of the genome is contained
within the exome, and consists of DNA sequences encoded by exons that can
be translated into proteins
– Because of its biological importance and lesser amount (<2% of the genome),
sequencing of the exome was the first major milepost of the Human Genome
Project (HGP)
• The size of protein-coding genes within the human
genome shows enormous variability
• For example,
– The gene for histone H1a (HIST1HIA) is relatively small and
simple, lacking introns and encoding mRNA sequences of 781 nt
and a 215 amino acid protein (648 nt open reading frame)
– Dystrophin (DMD) is the largest protein-coding gene in the
human reference genome, spanning a total of 2.2 Mbp
– Titin (TTN) has the longest coding sequence (80,780 bp), the
largest number of exons (364), and the longest single exon
(17,106 bp)
• Over the whole genome:
• The average size of an exon is 145 bp
• The average number of exons is 8.8
• The average coding sequence encodes 447 amino acids
Noncoding DNA:
– Defined as all of the DNA sequences not found within
protein-coding exons, and are never represented within
the sequence of proteins
– Accounts for >98% of the genome
– Role in
• Regulation of gene expression
• Organization of chromosome architecture
• Controlling epigenetic inheritance
• Includes:
– Genes for noncoding RNA (e.g. tRNA and rRNA)
– Untranslated components of protein-coding genes
• e.g. introns (within most protein-coding genes of the human genome, the length of
intron sequences is 10- to 100-times the length of exon sequences)
• 5‘- and 3‘-UTRs of mRNA
– Regulatory DNA sequences
• Some play role in gene expression, e.g. promoters, enhancers, silencers, operators
• Some regulate structural features of the chromosomes, e.g. telomeres,
centromeres
• Origin of replication
– Sequences related to mobile genetic elements
– Pseudogenes
– Repetitive DNA sequences
• Tandem repeats
• Interspersed repeats
– Sequences for which as yet no function has been elucidated
ncRNA (transcribed noncoding regions):
•Several regions are transcribed into functional noncoding RNA
•Human genome contains genes encoding several ncRNAs:
tRNA, rRNA, miRNA, siRNAs, snRNAs, snoRNAs, lncRNAs
•Contributes to epigenetics, transcription, RNA processing and
the translational machinery
– Regulates the expression of protein-coding genes
– Regulates splicing
– Regulates mRNA translation and stability
– Regulates chromatin structure (including histone modifications)
– Regulates DNA methylation
– Regulates DNA recombination
– Cross-regulates other noncoding RNAs
– No role (this transcription may be the product of non-specific RNA polymerase
activity)
Repetitive sequences:
•Microsatellites
– Tandem repeats fewer than 13 nucleotides
– e.g. dinucleotide repeat (AC)n
– Trinucleotide repeats are of particular importance, as sometimes
occur within coding regions of genes for proteins and may lead
to genetic disorders. For example, Huntington's disease results
from an expansion of the trinucleotide repeat (CAG)n within the
Huntingtin gene on human chromosome 4
– Telomeres end with a microsatellite hexanucleotide repeat of the
sequence (TTAGGG)n
•Minisatellites
– Tandem repeats of longer sequences
– Arrays of repeated sequences 14–60 nucleotides long
Mobile genetic elements (Retrotransposons and
Transposons)
– An abundant component in the human genome
– Played a major role in sculpting the human genome
– Some of these sequences represent endogenous retroviruses
(DNA copies of viral sequences that have become permanently
integrated into the genome and are now passed on to
succeeding generations
– Mobile elements within the human genome can be classified into
• LTR retrotransposons
• SINEs (including Alu elements)
• LINEs
• Class II DNA transposons
Types of genome-wide repeats in human genome
(source: IHGSC 2001 / Genomes 3):
Type of repeat Subtype Approximate number of
copies in human genome
SINEs 15,58,000
Alu 10,90,000
MIR 3,93,000
MIR3 75,000
LINEs 8,68,000
LINE-1 5,16,000
LINE-2 3,15,000
LINE-3 37,000
LTR elements 4,43,000
ERV class I 1,12,000
ERV(K) class II 8,000
ERV(L) class III 83,000
MaLR 2,40,000
DNA transposon 2,94,000
hAT 1,95,000
Tc-1 75,000
PiggyBac 2,000
Unclassified 22,000
The Human Reference Genome
• With the exception of identical twins, all humans show
variation in genomic DNA sequences
• The Human Reference Genome (HRG) is used as a
standard sequence reference
• There are several important points concerning the
Human Reference Genome
– The HRG is a haploid sequence. Each chromosome is
represented once
– The HRG is a composite sequence, and does not correspond to
any actual human individual
– The HRG is periodically updated to correct errors and
ambiguities
– The HRG in no way represents an "ideal" or "perfect" human
individual; It is simply a standardized representation or model
that is used for comparative purposes
Table :
•Basic information about these DNA molecules and their
gene content, based on a reference genome
•Summarizes the physical organization and gene content of
the human reference genome (that does not represent the
sequence of any specific individual)
•Data source: Ensembl genome browser release 68, July
2012
Centrome
Chromoso Length Confirmed Putative Pseudoge Misc re
Base pairs miRNA rRNA snRNA snoRNA
me (mm) proteins proteins nes ncRNA position
(Mbp)