Mito NGS

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 49

NGS of Mitochondrial DNA

What is Mitochondrial DNA?

Energy-producing organelle
~500 per cell
1-15 copies of chromosome per mt
Comparison of Human Nuclear
DNA and Mitochondrial DNA
Characteristics Nuclear DNA mtDNA
Size of genome ~3.2 billion bp ~16,569 bp
1 full set (23 homologous
chromosomes from each
Copies per cell parent) up to >1000
Percent of total DNA 0.9975 0.0025
linear; supercoiled and packed
Structure into chromosomes Circular
Inheritance Father and Mother Mother
Chromosomal pairing Diploid Haploid
Generational recombination Yes No
Replication repair Yes No
Unique to all individuals Same as all maternal
Unique (except identical twins) relatives
10-20x nuclear DNA
Mutation rate Low mutation rate
2001 Human genome 1981 Anderson et al.,
Reference sequence project/Celera sequence 1999 rCRS

Only sperm’s nuclear DNA enters egg and joins directly with egg nucleus at conception
Sperm mitochondria are ubiquitin-tagged during spermatogenesis for degradation and those
that enter are destroyed in egg
Heteroplasmy
• Presence of more than one DNA type in an individual
• The mutation or variant mtDNA mixes with the “normal”
type mtDNA in the cell
• Cells can accumulate mutations over time and the
aggregate is observed in the sequencing data.
• Heteroplasmy can be transmitted differently to different
organs and tissues and produce a mosaicism.
– Same tissue
– Different tissue
– Homoplasmic within a tissue, heteroplasmic overall
• Triplasmy: heteroplasmy at two sites in an individual
Sanger sequence heteroplasmy at position 16093 possessing both C and T nucleotides
compared to (B) the same region (positions 16086-16101) on a different sample
containing only a T at position 16093.

16086 16093 16101


(C/T)
Figure 14.8
Mito DNA Variations
Inheritance
Mito DNA Appplications
• Ancient, degraded/damaged material with
low/no nuclear DNA (e.g., bone, teeth, & hair)
to map to maternal relatives
• Evolutionary biology
• Disease mutations – diabetes, cancers
• Molecular anthropology – migration patterns
• Genetic genealogy
• Mutations in subpopulations of mitochondria -
heteroplasmy
https://www.mitomap.org/foswiki/pub/MITOMAP/MitomapFigures/WorldMigrations2012.pdf
mtDNA Cases
• Tomb of Unknown Soldier (Vietnam) – Lt.
Blassie
• Bones of Tsar Nicolas II
• Claims of Anna Anderson Manahan as
Russian princess Anastasia
• Remains of outlaw Jessie James
• 9-11 mass disaster
• Romanov family murdered in Bolshevik
Revolution of 1919
Mito DNA Sequence
• Cambridge Reference Sequence – 1981
• Revised CRS – 1999 (NCBI NC_012920)
Standard Reference Materials
• Produced by NIST, checked by FBI
• SRM 2392: 3 mt genome samples
• SRM 2392-I: HL-60 cell line
22 tRNAs

2 rRNAs

13 proteins
1122 bp “control” region
55 bp of
15,447 non-coding

28 genes H strand
8 tRNAs + ND6 L
strand
Top 40 Variants in MitoMap

There are twelve variants with ≥50% overall frequency that are widespread across all lineages (bold).

Variants present at ≥80% in lineages L, M, or N are in yellow. Variants present at ≥50% are in light blue.
Top 40 Variants in MitoMap

Additional variants present at ≥50% in lineages L, M, or N (in light blue), or in =10,000 sequences overall (in light beige)
Frequency of Selected SNPs
Locations of Selected SNPs
SNP 16519 A/T

16381 TCAGATAGGG GTCCCTTGAC CACCATCCTC CGTGAAATCA ATATCCCGCA CAAGAGTGCT


AGTCTATCCC CAGGGAACTG GTGGTAGGAG GCACTTTAGT TATAGGGCGT GTTCTCACGA

16441 ACTCTCCTCG CTCCGGGCCC ATAACACTTG GGGGTAGCTA AAGTGAACTG TATCCGACAT


TGAGAGGAGC GAGGCCCGGG TATTGTGAAC CCCCATCGAT TTCACTTGAC ATAGGCTGTA
 
16501 CTGGTTCCTA CTTCAGGGTC ATAAAGCCTA AATAGCCCAC ACGTTCCCCT TAAATAAGAC
GACCAAGGAT GAAGTCCCAG TATTTCGGAT TTATCGGGTG TGCAAGGGGA ATTTATTCTG
HaploGroup Markers
• https://www.mitomap.org/foswiki/bin/view/MITOMAP/HaplogroupMarkers
History of mtDNA Analysis

• RFLP – 1980s – using 5-6 restriction enzymes


• PCR amplification of 9 overlapping fragments
and digestion with 12-14 restriction enzymes
– Haplogroups have been defined based on site losses
or gains with various restriction endonucleases
• DNA sequence analysis with portions of control
regions (HVI & HVII) – 1990’s
– Power of discrimination is less than autosomal STR
testing … but better than no data at all
• Dec 2000- published study of 53 mtDNA
sequences of people from around the world
Challenges with Sanger Sequencing
beyond poly-C stretches
• C-stretches in HVI 16184-16193 (T at
16189) & HVII (303-315, T at 310, can be
all C’s in case of homopolymorphic C-
stretch)
• Create problems for polymerases
(slippage) especially after T to C at 16189
• Can advance screen for this, can use
different primers to remedy or scan same
stretch twice
Sanger Sequence:
Out-of-Frame after poly-C Stretch
Comparison of a sample with (A) 16189T (no HV1 C-stretch) to (B) one with the C-stretch.

(A)
16189T
Good quality sequence

(B) Poor quality sequence


(two length variants out of phase)

HV1 C-stretch
(C) Primer strategies typically used with C-stretch containing samples

C-stretch C-stretch

Figure 14.6 Use of internal primers Double reactions from the same strand
Mito NGS Tools
• AFDIL method (Fendt)
• ForenSeq mtDNA Control Region Kit and
Whole (Verogen)
• Mito kit control and whole (Qiagen)
• Precision ID mtDNA Whole Genome Panel
(ThermoFisher Scientific)
Working with mito DNA
• Digest nuclear DNA - to avoid interference
with the mtDNA assay primer set
amplification
• REPLI-g (Qiagen) - amplify human mtDNA
via WGA to enrich sample
ForenSeq mtDNA Control Region Kit

• Verogen
• Two primer sets of 122 primers, amplified
separately then combined post-amp
• 18 primary amplicons
• < 150 bp in length
• Average primary amplicon size is 118 bp
• Recommended DNA input - 50 pg

https://verogen.com/introduction-mtdna-analysis/
ForenSeq Sequencing and Analysis

• Upgraded Illumina MiSeq Verogen FGx


model
• Up to 48 samples can be multiplexed for
sequencing in a single run.
• ForenSeq Universal Analysis Software v2
• Amplicons overlap by ≥ 3 bp for
bioinformatics alignment
• Other tools - mtDNA BaseSpace applications
Verogen mtDNA Analysis Software (UAS)

• Coverage map and calls the base positions as compared to the rCRS
• Read count
• Percentage of each base detected at each position presented numerically and graphically
Precision ID mtDNA Whole Genome
Panel and Control Region Panel
• ThermoFisher
• Precision ID mtDNA Whole Genome Panel kit optimal input - 125 pg
(as little as 2 pg of DNA can be used)
• Overlapping tile approach
• 163 bp amplicon, on average
• NGS chip preparation - Ion Chef robot or manually
• Ion GeneStudio S5 Systems sequencing
• mtDNA Control region library
– 37 samples on the Ion 510 chip
– 56 samples on an Ion 520 chip
• mtDNA whole genome library
– 25 samples on the Ion 520 chip
– 32 samples on the Ion 530 chip
Data Analysis
• Applied Biosystems Converge Software
NGS Data Analysis module – base calling,
alignment
Qiagen Whole Genome and
Control Region NGS
• AFDIL primer set
• Qiagen Multiplex PCR Kit
• Sequence on Illumina NGS platform (e.g., HiSeq,
NextSeq, MiSeq, or MiniSeq)
• QIAseq 1-Step Amplicon Library Kit
• Qiagen GeneRead Adapter I Set A 12-plex includes
twelve barcoded adapters for ligation to the DNA
library.
• QIAseq Index Kit
• Data analysis: BaseSpace
Three mtDNA HV regions VR1 VR2
342 bp 16569/1 268 bp 137 bp

HV1 HV2 HV3


16024 e tch 16365 73 tc h 340 438 574
s tr s t re
C- 16519 C-

F15989 PSI (263 bp) F15 PSIII (271 bp)

R16251 R285

F16190 PSII (221 bp) F155 PSIV (227 bp)

R16410 R381
AFDIL primer set

MPS1A (170 bp) MPS3A (126 bp)

MPS1B (126 bp) MPS3B (132 bp)

MPS2A (133 bp) MPS4A (142 bp)

MPS2B (143 bp) MPS4B (158 bp)

Figure 14.5 AFDIL “mini-primer” set


SWGDAM NGS Guidelines
• Scientific Working Group on DNA Analysis
Methods (SWGDAM) “Interpretation
Guidelines for Mitochondrial DNA Analysis
by Forensic DNA Testing Laboratories”
• 2019 document updates include sections
on sequence analysis criteria,
interpretation of C-stretches, and mixture
interpretation.
SWGDAM 2003 Recommendations
■ Exclusion - if there are two or more nucleotide differences
between the questioned and known samples, the samples
can be excluded as originating from the same person or
maternal lineage.
■ Inconclusive - if there is one nucleotide difference
between the questioned and known samples, the result will
be inconclusive.
■ Cannot Exclude (Failure to Exclude) - if the sequences
from questioned and known samples under comparison
have a common base at each position or a common length
variant in the HV2 C-stretch, the samples cannot be
excluded as originating from the same person or maternal
lineage.
Some Examples
• Sequence Results Interpretation
• Q TATTGTACGG Cannot exclude (fully concordant)
• K TATTGTACGG
• Q TATTGCACAG Exclusion (differ)
• K TATTGTACGG
• Q TATTNTACGG Cannot exclude (unspecific base, common bases)
• K TATTGTACGG
• Q TATTNTACGG Cannot exclude (ambiguous, common bases)
• K TATTGTACNG
• Q TATTGTACA/GG Cannot exclude (heteroplasmy)
• K TATTGTAC G G
• Q TATTGTACA/GG Cannot exclude (heteroplasmy)
• K TATTGTACA/GG
• Q TATTGCACGG Inconclusive (identical)
• K TATTGTACGG
Table 14.4
Combined DNA Index System
(CODIS)
• U.S. National DNA Index System (NDIS)
Procedures Board approved (5/2/19) the
use of the MiSeq FGx® Forensic Genomics
System to collect data that can be entered
into NDIS.
• Control region data collected using the
Precision ID mtDNA Whole Genome Panel
was first to be approved for inclusion in the
NDIS CODIS database.
PCR Melt Results
PCR Melt Results 16519
Process for evaluation of mtDNA samples in a mass disaster
Performed separately and
preferably after evidence
is completed
Extract mtDNA Extract mtDNA
from evidence from reference
(Q) sample (K) sample

PCR Amplify PCR Amplify


HV1 and HV2 Regions HV1 and HV2 Regions

Sequence HV1 and Sequence HV1 and


HV2 Amplicons HV2 Amplicons
(both strands) (both strands)

Need clean lab:


PCR cycles 36-42 Confirm sequence with Confirm sequence with
forward and reverse strands forward and reverse strands

Note differences from Anderson Note differences from Anderson


(reference) sequence (reference) sequence

Compare Q and K
sequences Assess contamination with
Figure 14.3
reagent blank, negative
Compare with database to control, HL-60 positive
determine haplotype frequency control, elimination
samples
Comparison of Sequence Alignments for Hypothetical Q and K Samples

(A) mtDNA Sequences Aligned with rCRS (positions 16071-16140)

16090 16100 16110 16120 16130 16140


rCRS ACCGCTATGT ATTTCGTACA TTACTGCCAG CCACCATGAA TATTGTACGG TACCATAAAT
Q ACCGCTATGT ATCTCGTACA TTACTGCCAG CCACCATGAA TATTGTACAG TACCATAAAT
K ACCGCTATGT ATCTCGTACA TTACTGCCAG CCACCATGAA TATTGTACAG TACCATAAAT

(B) Reporting Format with Differences from rCRS

Sample Q Sample K
16093C 16093C
16129A 16129A
Figure 14.7
Discrimination Power

• Some haplotypes are common


• FBI mtDNA database of 1655 Caucasians:
15 individuals match at 263G, 315.1C and
153 other alleles with only a single
difference – 168/1655 of the database or
10.2% would not be excluded if a sample
was observed with this common type
Reporting Results
• Minimum data format as compared to rCRS: position +
base: 16129A
• If can’t unambiguously determine: N
• In confirmed heteroplasmy: A/G=R, C/T=Y
• Insertions notes using site immediately 5’ followed by
point and “1” for one insertion, “2” for two insertions and
so on: 315.1C (6 C’s instead of 5 following T at 310)
additionally one prior to 316
• Note deletions by a dash or a “D,” “d,” or “del” following
position where deletion was observed (309D, 309d, or
309del)
Nomenclature Issues
• Two different analysts will call same sample “differently”
• Population databases could have multiple entries for the
same mtDNA haplotype preventing an accurate estimate
for the frequency of a particular type
• 1. Characterize profiles using the least number of
differences from the reference sequence (rCRS)
• 2. If there is more than one way to maintain the same
number of differences with respect to the reference
sequence prioritize
– Insertions/deletions (indels)
– Transitions
– Transversions
• 3. Insertions and deletions should be placed 3’ with
respect to the light strand. Insertions and deletions should
be combined in situations where the same number of
differences to the reference sequence is maintained
Mixtures
• Analysis not attempted with mtDNA
sequencing
• 3 or more heteroplasmic sites in HV1 &
HV2 usually indicate mixture
• Nuclear pseudogenes (chromosome 11)
can amplify and contaminate mtDNA
sequence
Laboratories Performing mtDNA
Testing
• First efforts by Forensic Science Service in
England
• Within US, Armed Forces DNA
Identification Laboratory and the FBI Lab
have led the efforts
Armed Forces DNA Identification
Laboratory (AFDIL)
• Rockville, MD
• Goal: identify remains of military personnel
– Bones from Vietnam, WWII, Korea
– Some contract civilian work including mass
disaster victim identification
FBI Laboratory
• Focuses on use of forensic evidence including
mtDNA in criminal investigations
• Explored use of mtDNA in late 1980’s and
started using the methods in 1992
• FBI Laboratory DNA Unit II – mtDNA casework
since June 1996
• First court testimony in Aug. 1996
– State of Tennessee versus Paul William Ware
involving DNA analysis of a single pubic hair found
in throat of young victim that matched the defendant
FBI Regional Laboratories

• In Fall 2003, selected 4 regional labs to be


fully funded by FBI to do mtDNA analysis,
expected to be fully functional by Sept.
2005 and able to analyze 120 cases each
annually
– Arizona Department of Public Safety
– Connecticut State Police
– Minnesota Bureau of Criminal Apprehension
– New Jersey State Police
Private Laboratories Conducting
Forensic mtDNA Casework
• Mitotyping Technologies, LLC
• ReliaGene Technologies, Inc.
• Bode Technology Group
• Orchid Cellmark
• University of North Texas Health Sciences
Center DNA Identity Lab
• Laboratory Corporation of America
• AFDIL Consulting Services

You might also like