Rat Genomics (001-095)

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 103

Methods in

Molecular Biology 2018

G. Thomas Hayman
Jennifer R. Smith
Melinda R. Dwinell
Mary Shimoyama Editors

Rat
Genomics
M E T H O D S IN M OLECULAR B IO LO
GY

Series Editor
John M. Walker
School of Life and Medical Sciences
University of Hertfordshire
Hatfield, Hertfordshire, UK

For further volumes:


http://www.springer.com/series/7651
For over 35 years, biological scientists have come to rely on the research protocols and
methodologies in the critically acclaimed Methods in Molecular Biology series. The series
was the first to introduce the step-by-step protocols approach that has become the standard
in all biomedical protocol publishing. Each protocol is provided in readily-reproducible
step-by- step fashion, opening with an introductory overview, a list of the materials and
reagents needed to complete the experiment, and followed by a detailed procedure that is
supported with a helpful notes section offering tips and tricks of the trade as well as
troubleshooting advice. These hallmark features were introduced by series editor Dr. John
Walker and constitute the key ingredient in each and every volume of the Methods in
Molecular Biology series. Tested and trusted, comprehensive and reliable, all protocols
from the series are indexed in Pub Med.
Rat Genomics

Edited by

G. Thomas Hayman
Department of Biomedical Engineering, Rat Genome Database, Medical College of Wisconsin,
Milwaukee, WI, USA

Jennifer R. Smith
Department of Biomedical Engineering, Rat Genome Database, Medical College of Wisconsin,
Milwaukee, WI, USA

Melinda R. Dwinell
Genomic Sciences and Precision Medicine Center, Medical College of Wisconsin, Milwaukee, WI, USA
Department of Physiology, Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI,
USA

Mary Shimoyama
Department of Biomedical Engineering, Rat Genome Database, Medical College of Wisconsin,
Milwaukee, WI, USA

Editors Department of Biomedical Engineering Rat Genome Database


G. Thomas Hayman Medical College of Wisconsin Milwaukee, WI, USA
Biomedical Engineering Rat Genome
Jennifer R. Smith Database
Department of Medical College of Wisconsin
Milwaukee, WI, USA

Melinda R. Dwinell Mary Shimoyama


Genomic Sciences and Precision Department of Biomedical Engineering
Medicine Center Rat Genome Database
Medical College of Medical College of Wisconsin
Wisconsin Milwaukee, WI, Milwaukee, WI, USA
USA
Department of Physiology
Rat Genome Database
Medical College of
Wisconsin Milwaukee, WI,
USA

ISSN 1064-3745 ISSN 1940-6029 (electronic)


Methods in Molecular Biology
ISBN 978-1-4939-9580-6 ISBN 978-1-4939-9581-3 (eBook)
https://doi.org/10.1007/978-1-4939-9581-3
© Springer Science+Business Media, LLC, part of Springer Nature 2019
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction
on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation,
computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not
imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and
regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed
to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty,
express or implied, with respect to the material contained herein or for any errors or omissions that may have been
made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional
affiliations.

This Humana imprint is published by the registered company Springer Science+Business Media, LLC, part of Springer
Nature.
The registered company address is: 233 Spring Street, New York, NY 10013, U.S.A.
Dedication

This book is gratefully dedicated to the rat research community and to colleagues and
friends who are gone too soon: Rat Genome Database members, Dr. Timothy F. Lowry
and Dr. Victoria Petri, and founder of the Rat Resource and Research Center, Dr. John
K. Critser. They are sincerely missed.

v
Preface

It is an exciting time to be involved in rat research. The rich history of physiological and
behavioral data that is available for the rat spans over 150 years of research. This extensive
body of data provides a solid foundation for the use of genomic technologies such as
whole- genome and whole-exome sequencing, single-nucleotide variant discovery, and
transcrip- tomics to explore similarities and differences between established rat models for
human diseases such as kidney disease, cancer, and metabolic syndrome, as well as new
models like the hybrid rat diversity panel. Recent advances in the use of genome-editing
reagents and embryonic stem cells now allow researchers to produce new, more targeted
models and to discover the molecular mechanisms underlying both normal and disease-
related physiologi- cal processes. Thanks to the improvements in cryopreservation and
rederivation, new models can be produced, studied for a period of time, and then preserved
and stored to await new questions or the advent of new technologies to uncover the
answers to questions we can’t answer now and, in some cases, don’t even know we should
be asking. The emerging areas of interest, such as the microbiome, have opened up new
vistas for research- ers interested in the interactions between genetics and the environment.
This book provides both a historical perspective on rat research through the years and
practical information to support researchers either currently involved in genomic research
or planning to begin such a project. In some cases, a detailed protocol is provided for
researchers looking to move into a new area of investigation or to leverage a new
technology. In other cases, a detailed review of the existing models or a description of
available resources can help the researcher find, understand, and/or utilize the information,
the data, and the tools that they need to support their research efforts. Whatever the
application, it is becoming increasingly obvious that in this so-called post-genomic era, no
single type of research is sufficient to answer the increasingly complex questions of human
disease and translational research. The rat as a biomedical model is uniquely poised to
provide the ideal combination of established experimental models, extensive physiological
data, and genomic manipulability to facilitate exploration of the underlying biology.

Milwaukee, WI, USA Jennifer R. Smith

vii
Acknowledgments

We are thankful for nearly 20 years of funding from the National Heart, Lung, and Blood
Institute on behalf of the National Institutes of Health. We appreciate the contributions of
the authors and the assistance of Prof. John Walker and Ms. Anna Rakovsky, which helped
make this book a reality.

ix
Contents

Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Preface.........................................................................................................................................vii
Acknowledgments..........................................................................................................................ix
Contributors................................................................................................................................xiii

1 The Rat: A Model Used in Biomedical Research.........................................................1


Jennifer R. Smith, ELIZABETH R. Bolton, and Melinda R. Dwinell
2 Rat Genome Assemblies, Annotation, and Variant Repository..................................43
Monika Tutaj, Jennifer R. Smith, and ELIZABETH R. Bolton
3 Rat Genome Databases, Repositories, and Tools........................................................71
Stanley J. F. Laulederkind, G. THOMAS Hayman, Shur-Jen
Wang, Matthew J. Hoffman, Jennifer R. Smith, ELIZABETH R.
Bolton, Jeff De PONS, Marek A. Tutaj, Monika Tutaj, Jyothi
Thota, Melinda R. Dwinell, and Mary Shimoyama
4 Next Generation Transgenic Rat Model Production.............................................97
Wanda E. Filipiak, ELIZABETH D. HUGHES, Galina B. Gavrilina,
Anna K. LaForEST, and THOMAS L. SAUNDERS
5 Embryonic Stem Cells and Gene Manipulation in Rat.............................................115
MASUMI HIRABAYASHI, Akiko TAKIZAWA, and Shinichi Hochi
6 Protocols for Cryopreservation and Rederivation of Rat Gametes...........................131
Akiko TAKIZAWA and Tomoo Eto
7 Fluorescent Imaging and Microscopy for Dynamic Processes in Rats.....................151
Ruben M. Sandoval, Bruce A. MOLITORIS, and Oleg Palygin
8 Library Preparation for Multiplexed Reduced Representation
Bisulfite Sequencing with a Universal Adapter........................................................177
Yong Liu, ALISON J. Kriegel, and Mingyu Liang
9 Characterization of the Rat Gut Microbiota via 16S rRNA
Amplicon Library Sequencing..................................................................................195
Aaron C. ERICSSON, SUSHEEL B. BUSI, and JAMES M. AMOS-LANDGRAF
10 Networking in Biology: The Hybrid Rat Diversity Panel.........................................213
BORIS Tabakoff, Harry Smith, Lauren A. Vanderlinden,
Paula L. Hoffman, and Laura M. Saba
11 Using Heterogeneous Stocks for Fine-Mapping Genetically
Complex Traits.........................................................................................................233
Leah C. Solberg WOODS and Abraham A. Palmer
12 Mapping Mammary Tumor Traits in the Rat...........................................................249
Michael J. FLISTER, Amit JOSHI, Carmen Bergom,
and Hallgeir Rui
13 Rat Models of Metabolic Syndrome.........................................................................269
Anne E. Kwitek

xi
xii Contents

14 Genomic Research in Rat Models of Kidney Disease..............................................287


Yoram Yagil, Ronen Levi-Varadi, and Chana Yagil
15 Rat Models of Exercise for the Study of Complex Disease......................................309
Lauren Gerard Koch and Steven L. Britton
16 Behavioral Genetic Studies in Rats..........................................................................319
YANGSU Ren and Abraham A. Palmer

Index................................................................................................................................ 327
Contributors

JAMES M. AMOS-LANDGRAF ● Department of Veterinary Pathology, College of Veterinary


Medicine, University of Missouri, Columbia, MO, USA
CARMEN BERGOM ● Department of Radiation Oncology, Medical College of Wisconsin,
Milwaukee, WI, USA
ELIZABETH R. BOLTON ● Department of Biomedical Engineering, Rat Genome Database,
Medical College of Wisconsin, Milwaukee, WI, USA
STEVEN L. BRITTON ● Department of Anesthesiology, University of Michigan, Ann Arbor, MI,
USA
SUSHEEL B. BUSI ● Department of Veterinary Pathology, College of Veterinary Medicine,
University of Missouri, Columbia, MO, USA
JEFF DE PONS ● Department of Biomedical Engineering, Rat Genome Database, Medical
College of Wisconsin, Milwaukee, WI, USA
MELINDA R. DWINELL ● Genomic Sciences and Precision Medicine Center, Medical College of
Wisconsin, Milwaukee, WI, USA; Department of Physiology, Rat Genome Database,
Medical College of Wisconsin, Milwaukee, WI, USA
AARON C. ERICSSON ● Department of Veterinary Pathology, College of Veterinary Medicine,
University of Missouri, Columbia, MO, USA
TOMOO ETO ● Central Institute for Experimental Animals, Kawasaki, Kanagawa, Japan
WANDA E. FILIPIAK ● Transgenic Animal Model Core, University of Michigan Medical School,
Ann Arbor, MI, USA
MICHAEL J. FLISTER ● Genomic Sciences and Precision Medicine Center, Medical College of
Wisconsin, Milwaukee, WI, USA; Department of Physiology, Medical College of Wisconsin,
Milwaukee, WI, USA
GALINA B. GAVRILINA ● Transgenic Animal Model Core, University of Michigan Medical School,
Ann Arbor, MI, USA
G. THOMAS HAYMAN ● Department of Biomedical Engineering, Rat Genome Database, Medical
College of Wisconsin, Milwaukee, WI, USA
MASUMI HIRABAYASHI ● Center for Genetic Analysis of Behavior, National Institute for Physiological
Sciences, Okazaki, Aichi, Japan
SHINICHI HOCHI ● Faculty of Textile Science and Technology, Shinshu University, Ueda,
Nagano, Japan
MATTHEW J. HOFFMAN ● Department of Biomedical Engineering, Rat Genome Database,
Medical College of Wisconsin, Milwaukee, WI, USA; Genomic Sciences and Precision
Medicine Center, Medical College of Wisconsin, Milwaukee, WI, USA; Department of
Physiology, Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
PAULA L. HOFFMAN ● Department of Pharmaceutical Sciences, Skaggs School of Pharmacy
and Pharmaceutical Sciences, University of Colorado, Aurora, CO, USA; Department of
Pharmacology, School of Medicine, University of Colorado Anschutz Medical Campus,
Aurora, CO, USA
ELIZABETH D. HUGHES ● Transgenic Animal Model Core, University of Michigan Medical
School, Ann Arbor, MI, USA
AMIT JOSHI ● Department of Radiology, Medical College of Wisconsin, Milwaukee, WI, USA

xiii
xiv Contributors

LAUREN GERARD KOCH ● Department of Physiology and Pharmacology, University of Toledo,


Toledo, OH, USA
ALISON J. KRIEGEL ● Department of Physiology, Center of Systems Molecular Medicine, Medical
College of Wisconsin, Milwaukee, WI, USA
ANNE E. KWITEK ● Department of Physiology, Medical College of Wisconsin, Milwaukee, WI,
USA
ANNA K. LAFOREST ● Transgenic Animal Model Core, University of Michigan Medical School,
Ann Arbor, MI, USA
STANLEY J. F. LAULEDERKIND ● Department of Biomedical Engineering, Rat Genome
Database, Medical College of Wisconsin, Milwaukee, WI, USA
RONEN LEVI-VARADI ● Laboratory for Molecular Medicine, Israeli Rat Genome Center,
Barzilai University Medical Center, Ashkelon, Israel; Department of Nephrology and
Hypertension, Barzilai University Medical Center, Ashkelon, Israel; Faculty of Health
Sciences, Ben-Gurion University of the Negev, Beer-Sheba, Israel
MINGYU LIANG ● Department of Physiology, Center of Systems Molecular Medicine, Medical
College of Wisconsin, Milwaukee, WI, USA
YONG LIU ● Department of Physiology, Center of Systems Molecular Medicine, Medical College
of Wisconsin, Milwaukee, WI, USA
BRUCE A. MOLITORIS ● Division of Nephrology, Indiana University School of Medicine,
Indianapolis, IN, USA; Indiana Center for Biological Microscopy, Indianapolis, IN, USA
ABRAHAM A. PALMER ● Department of Psychiatry, University of California San Diego, La
Jolla, CA, USA; Institute for Genomic Medicine, University of California San Diego, La
Jolla, CA, USA
OLEG PALYGIN ● Department of Physiology, Medical College of Wisconsin, Milwaukee, WI,
USA
YANGSU REN ● Department of Psychiatry, University of California San Diego, La Jolla, CA,
USA
HALLGEIR RUI ● Department of Pathology, Medical College of Wisconsin, Milwaukee, WI,
USA
LAURA M. SABA ● Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and
Pharmaceutical Sciences, University of Colorado, Aurora, CO, USA
RUBEN M. SANDOVAL ● Division of Nephrology, Indiana University School of Medicine,
Indianapolis, IN, USA; Indiana Center for Biological Microscopy, Indianapolis, IN, USA
THOMAS L. SAUNDERS ● Transgenic Animal Model Core, University of Michigan Medical
School, Ann Arbor, MI, USA; Division of Genetic Medicine Genetics, Department of
Internal Medicine, University of Michigan Medical School, Ann Arbor, MI, USA
MARY SHIMOYAMA ● Department of Biomedical Engineering, Rat Genome Database, Medical
College of Wisconsin, Milwaukee, WI, USA
HARRY SMITH ● Department of Biostatistics and Informatics, Colorado School of Public
Health, University of Colorado, Aurora, CO, USA
JENNIFER R. SMITH ● Department of Biomedical Engineering, Rat Genome Database,
Medical College of Wisconsin, Milwaukee, WI, USA
LEAH C. SOLBERG WOODS ● Department of Internal Medicine, Section on Molecular Medicine,
Wake Forest University School of Medicine, Winston Salem, NC, USA
BORIS TABAKOFF ● Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and
Pharmaceutical Sciences, University of Colorado, Aurora, CO, USA
AKIKO TAKIZAWA ● Department of Physiology, Genomic Sciences and Precision Medicine
Center, Medical College of Wisconsin, Milwaukee, WI, USA
Contributors xv

JYOTHI THOTA ● Department of Biomedical Engineering, Rat Genome Database, Medical


College of Wisconsin, Milwaukee, WI, USA
MAREK A. TUTAJ ● Department of Biomedical Engineering, Rat Genome Database, Medical
College of Wisconsin, Milwaukee, WI, USA
MONIKA TUTAJ ● Department of Biomedical Engineering, Rat Genome Database, Medical
College of Wisconsin, Milwaukee, WI, USA
LAUREN A. VANDERLINDEN ● Department of Biostatistics and Informatics, Colorado School of
Public Health, University of Colorado, Aurora, CO, USA
SHUR-JEN WANG ● Department of Biomedical Engineering, Rat Genome Database, Medical
College of Wisconsin, Milwaukee, WI, USA
CHANA YAGIL ● Laboratory for Molecular Medicine, Israeli Rat Genome Center, Barzilai
University Medical Center, Ashkelon, Israel; Department of Nephrology and Hypertension,
Barzilai University Medical Center, Ashkelon, Israel; Faculty of Health Sciences, Ben-
Gurion University of the Negev, Beer-Sheba, Israel
YORAM YAGIL ● Laboratory for Molecular Medicine, Israeli Rat Genome Center, Barzilai
University Medical Center, Ashkelon, Israel; Department of Nephrology and Hypertension,
Barzilai University Medical Center, Ashkelon, Israel; Faculty of Health Sciences, Ben-
Gurion University of the Negev, Beer-Sheba, Israel
Chapter 1

The Rat: A Model Used in Biomedical Research


Jennifer R. Smith, Elizabeth R. Bolton, and Melinda R. Dwinell

Abstract
The laboratory rat, Rattus norvegicus, has been used in biomedical research for more than 150 years,
and in many cases remains the model of choice for studies of physiology, behavior, and complex
human disease. This book provides detailed information on a number of methodologies that can be
used in rat. This chapter gives an introduction to rat as a species and as a biomedical model,
providing historical information, a brief introduction to the current state of rat research, and a
perspective on the future of rat as a model for human disease.

Key words Rat, History, Models, Biomedical research, Resources, Data

1 The Evolutionary History of Rattus NORVEgicus

Rodents in general, and rats in particular, have been the subject


of both affection and hatred for hundreds of years. From the
rats mentioned in the Yoso-tama-no-kakehashi Japanese
guidebook on raising rats, published in 1775, to those owned by
today’s “rat fanciers” (members of organizations such as the
American Fancy Rat and Mouse Association (AFRMA), the
National Fancy Rat Society (NFRS) in the UK, and rat clubs
and rat fancier societies around the world), rats have been bred,
raised, exhibited, and loved as pets. One source
(http://www.petrats.org/home_.aspx) esti- mates that nearly
half a million households in the USA own at least one pet rat or
mouse.
On the other hand, across the world, rats are considered to
be a major pest, destroying farm crops and spreading disease to
domes- tic animals and humans. It is estimated that rats damage
1–5% of cereal crops worldwide
(http://www.knowledgebank.irri.org/ step-by-step-
production/postharvest/storage/storage-pests/rode nts-as-
storage-pest). The ability of the rat to successfully invade,
compete with existing species, and colonize new territory has
led to changes in ecosystems, especially those that are
relatively isolated such as islands. According to Lack et al.
2013 [1], “Of the

G. Thomas Hayman et al. (eds.), Rat Genomics, Methods in Molecular Biology, vol. 2018,
https://doi.org/10.1007/978-1-4939-9581-3_1, © Springer Science+Business Media, LLC, part of Springer Nature 2019
1
2 Jennifer R. Smith et al.

approximately 123 island groups worldwide, about 82% have been


invaded by R. norvegicus, R. rattus, or the Polynesian rat
(R. exulans; [2]), and recent reports estimated that introduced
rats have been responsible for 40–60% of all bird and reptile
extinc- tions since 1600 (Island Conservation 2006).” In addition,
rats are well documented to transmit a substantial number of
zoonotic diseases including Hantavirus-related diseases such as
hemorrhagic fever with renal syndrome, salmonellosis, typhus,
and, of course, plague [3–5].
With few exceptions (notably some areas of the Arctic and
New Zealand, the continent of Antarctica, and the Canadian
prov- ince of Alberta), the species Rattus norvegicus is essentially
ubiqui- tous in its global distribution [6], and is considered a
true commensal with respect to humans
(https://www.aciar.gov.au/ node/8241), with the two species
uneasily coexisting in a relation- ship where the rats reap the
benefits of warmth, shelter, and abun- dant food supplies while
the humans are, more or less, not directly harmed by the
association.
Although the general spread of the rat coinciding with the
spread of humans is well documented, the evolutionary origin of
the species R. norvegicus remains unclear. There is general agree-
ment that the species first developed in Asia. Traditional under-
standing placed central Asia, for example, Kazakhstan and/or
south central Russia [7], or northern China and southern Mongolia
[8], as the place of origin for the species, whereas more recent
estimates based on archeological evidence, fossils, and bone
remains sug- gested an origin in southwestern China. Recent
studies based on mitochondrial DNA sequencing [9] and whole-
genome sequenc- ing [10] gave strong evidence for southeastern
Asia as the area where the species arose and used these analyses to
predict the subsequent migration of R. norvegicus and other rat
species. Both groups postulated early migrations from southern
East Asia to northern China, followed by later migrations from
southern East Asia to the Middle East, Europe, and Africa. As an
example, Zeng et al. [10] used multiple analysis methods including
the construc- tion of phylogenetic trees using multiple algorithms,
principal component analysis, Bayesian clustering, demographic
modeling, and haplotype sharing analyses to demonstrate that rat
populations from non-Asian locales are more closely related to
rats from south- ern East Asia than they are to rats from northern
China. Interest- ingly, their analysis suggested that the migration
from southern Asia to northern Asia was far earlier (~202,000
years ago) than the westward migration. They dated the spread of
the brown rat from southern Asia to the Middle East at
approximately 3600 years ago, to Africa at about 2600 years
ago, and to Europe at
~1800 years ago, far earlier than previous estimates for the arrival
of R. norvegicus in Europe (estimated to be between 1500 and
1750 CE).
Rat in Biomedical Research 3

Mitochondrial DNA analysis has also been used to elucidate


the colonization and spread of two rat species, R. rattus
and
R. norvegicus, in the United States [1]. Haplotype comparisons
between samples from within and outside of the USA suggested
that, although R. rattus appeared to have expanded from relatively
few (perhaps only one or two) colonization events, R. norvegicus
populations appeared to be substantially more diverse, suggesting
multiple colonization and migration events. Although the authors
admitted that their conclusions were somewhat speculative, they
suggested that there could have been two invasions of rats from
East Asia into the US west coast, in addition to one from southern
locations such as the Lesser Antilles, South America, and
South Africa, and another from Europe into the US East Coast. In
each case, the rat populations spread from their coastal locations
across the neighboring inland areas, and in some cases, across the
country. Relatively high genetic diversity in coastal locations sug-
gests a continuing influx of individuals being integrated into the
established coastal populations.

2 The Early History of the Rat in Research

The use of rats in biomedical research began more than 160 years
ago. The first recorded use of rats for scientific investigation was a
study by J. M. Philipeaux in 1856 [11] on the results of adrenalec-
tomy in albino rats (the 1836 article by Samuel Moss, “Notes on
the habits of a domesticated White Rat and a Terrier Dog that
lived in harmony together” notwithstanding). Since that time, rats
have been used as models to study a wide variety of biological,
physio- logical, and medical subjects.

2.1 Nutrition In 1993, Dr. Janet R. Hunt [12] stated, “Rats were the principal
animal used to discover most of the vitamins, the essential trace
elements, and the essential amino acids. As a result, more is
known about the nutritional requirements of the rat than about any
other species.”
The first study of the nutritional quality of proteins in a mam-
mal was an article published in both The Lancet [13] and the
Proceedings of the Royal Society of London [14] in 1863 entitled
“Experiments on food; its destination and uses.” In it, William
S. Savory detailed how he fed rats “nitrogenous” (high protein,
very low fat), “non-nitrogenous” (high carbohydrate and fat,
very low protein), and mixed diets to ascertain whether
nitrogenous materials were utilized for “heat production,”
tissue formation, or both. Savory explained his use of rats in a
footnote: “Rats were chosen as subjects for these experiments
because they are omnivo- rous and will readily feed on almost
any kind of diet. Moreover from their size they are very
convenient to manage.”
4 Jennifer R. Smith et al.

In 1912, F. G. Hopkins [15] published a study in which he fed


young rats, those at a stage when rapid growth is expected,
carefully controlled diets of protein, starch, sugar, lard, and “salts”
consist- ing of the ash of incinerated “oats and dog-biscuits” with
or with- out the addition of a small ration of milk each day. He
found that the rats without the milk grew slowly for the first 13
days, then began to lose weight. Within 4 weeks, 5 of the 6 rats
that did not receive milk had died. By contrast, all of the rats given
milk grew normally for the entire experimental period. In
additional experi- ments, he found that addition of milk to the diets
of the malnour- ished rats “rescued” their growth rate and that
addition of a larger amount of milk in the diet did not significantly
improve the growth rate. From these results, Hopkins concluded
that milk contained some “accessory factors,” possibly with a
“catalytic or stimulative function,” since the amount needed was
so small. These “factors” we now know to be vitamins, and on the
basis of his early work in this field, Hopkins was awarded a Nobel
Prize in 1929 “for his discovery of the growth-stimulating
vitamins.”
E. V. McCollum was an organic chemist who, through a num-
ber of circumstances, became interested in biochemistry and
animal nutrition. Early in his career he noted that in all published
cases, animals fed simplified diets restricted to isolated proteins,
carbohy- drates, fats, and mineral salt mixtures failed to thrive. He
decided then that “the most important discovery to be made in
nutrition would be the elucidation of the cause or causes of
these failures” [16].
In 1907 McCollum began a colony of albino rats originally
purchased from a “pet-stock dealer,” the first rat colony
maintained for nutrition studies. He selected rats as his research
subjects because their size made them manageable and their
shorter life span made it easier to follow the effects of changes in
diet through- out the life of the animal. In 1913, McCollum and
Davis published a paper on an ether-soluble extract of egg yolk or
butter that is necessary for healthy growth and reproduction in rats
[17]. This “factor,” separately discovered by T.B. Osborne and L.
B. Mendel in the same year (also using rats) and published in the
same issue of the Journal of Biological Chemistry [18], was later
termed “Vita- min A” [19].
McCollum continued his studies with various simplified diets
to determine what components could be combined to provide the
nutrition necessary for health in rats. Two years later McCollum,
again with Marguerite Davis, discovered a water-soluble factor
which promoted growth and was found to correct polyneuritis in
pigeons fed an inadequate diet of purified food stuffs [20, 21]. In
1916, he and Cornelia Kennedy lobbied to eliminate the increas-
ingly popular term “vitamine” which they maintained implied an
indispensability (vita) and suggested a specific chemical structure
(amine), neither of which had been shown [22]. Rather, they
Rat in Biomedical Research 5

suggested the neutral terms “fat-soluble A” and “water-soluble B.”


By 1917 McCollum had begun to generalize his findings to the
American diet “as being of poor quality because it was derived too
largely from white flour or cornmeal, muscle meats, potatoes, and
sugar” because the “foods listed.. .were not constituted to supple-
ment each other by making good their deficiencies.” [16].
Like McCollum, T. B. Osborne and L. B. Mendel were early
pioneers in the study of animal nutrition using rats. In addition to
the essentially simultaneous (with McCollum and Davis)
discovery of a factor in butter that increased growth in rats,
Osborne, Mendel et al. meticulously documented the effects on
rats fed diets com- posed of minimal purified components
supplemented with various amino acids to determine which should
be considered essential and which could be manufactured by the
body without direct supple- mentation [23–25], with cod liver oil
as a source of fat-soluble factors, especially “fat-soluble A,” and
trace minerals such as phos- phorus to demonstrate that certain
mineral are essential for health. Osborne, Mendel et al.
demonstrated in 1914 that cod liver oil was able to reverse
xerophthalmia in rats [26]. In addition to numerous advancements
in the field of nutrition, Osborne and Mendel, with their colleague
E. L. Ferry, were the first to design and utilize “metabolic cages”
for their rats that allowed them to closely moni- tor the animals’
intake and excretion, and to prevent coprophagy which interfered
with their attempts to do so [27].
Other discoveries in the field of nutrition for which the use of
rats was instrumental include the discovery by H. Steenbock that
ultraviolet irradiation increased the content of vitamin D in foods
and other organic materials [28, 29]. By irradiating the rodent food
he was able to reverse the symptoms of rickets in his rats.
H.C. Sherman made extensive use of rats in his studies, not only
of the types of supplementation experiments that McCollum and
Davis, Osborne and Mendel, and others performed (e.g., Sherman
and Pappenheimer [30]), but of the effects of diet on life span and
longevity [31, 32]. He is also cited as an early pioneer in the use of
statistics for the analysis of biological data [7]. As an illustration
of the importance of rat research to advances in nutrition research,
in the Journal of Nutrition’s 2008 Symposium on Animal Models
in Nutrition Research [33], rats were named as being instrumental
in the discovery of cures for six of the nine vitamin deficiency
diseases listed.

2.2 Breeding, The practice of breeding rats for variations in coat color substan-
Genetics, tially predates their use in laboratory science. Rather, it was com-
and mon in Japan at least as far back as the 1700s as evidenced by the
Characterization first guidebook for breeding fancy rats entitled Yoso-tama-no-kake-
hashi, published in 1775 [34]. Text and illustrations in the book
describe and show a variety of coat colors and patterns, many of
which are still seen in modern laboratory rat strains and fancy rat
6 Jennifer R. Smith et al.

lines. In addition, the author gave advice for breeding these rats so
as to not lose their “special characteristics.”
In his 1947 paper on the domestication of the rat, W. E. Castle
described a number of early studies of coat color inheritance in rat
[35]. Between 1877 and 1885, H. Crampe published a series of
articles detailing extensive breeding experiments beginning with a
tame albino female rat and a wild gray male and continuing with
successive rounds of interbreeding of the offspring. This pre-
Mendelian study showed inheritance patterns which we now know
to be controlled by three coat color mutations occurring at the c
(albino), a (non-agouti), and h (hooded) loci, although this
explanation for the various patterns of coat color and of
inheritance was far from clear at the time. Crampe’s prodigious
dataset was reviewed and reanalyzed using Mendelian principles
by Bateson in 1903. Doncaster used Crampe’s data to categorize
the offspring color patterns from his breeding experiments of
brown/gray, black, and albino rats, published in 1906. A
publication by MacCurdy and Castle in 1907 [36], which also
detailed breeding experiments to determine coat color in rats and
guinea pigs, was beginning to move closer to the idea that the
inheritance patterns of color and markings are controlled by more
than one factor, although obvi- ously without an understanding of
the specific genes or mutations.
W. E. Castle began studying rat coat color mutations in 1907,
reporting the development of a “pink-eyed yellow” and a “red-
eyed yellow” mutant in England at that time [35]. Among other
pur- suits, he and colleagues continued their exploration of the
genetics of coat color, mutations, and linkage in the rat through
1951, resulting in the publication of more than 20 articles on the
subject (e.g., [37–54]). Several of these were published with Dr.
Helen Dean King of the Wistar Institute, looking at the linkage
groups for the mutations—both those determining coat color and
physiologi- cal mutations such as “waltzing”—which were found
or confirmed by Dr. King during her domestication studies of wild
gray rats and breeding studies of the Wistar albinos. In the
aforementioned article on the domestication of the rat [35], Castle
lists 23 known mutations, 14 of which he was able to place into
four linkage groups, representing four of the 20 rat autosomes.
Dr. Henry H. Donaldson was a professor of neurology at the
University of Chicago while John B. Watson was working on his
degree. Donaldson had previously published papers on the nervous
systems of human and frog but during his tenure at the University
of Chicago he was introduced to the rat as a model for human
neurology, probably through the influence of Swiss neuropatholo-
gist Adolf Meyer. After careful consideration, Donaldson chose to
begin working with the albino rat. E. G. Conklin quoted him in his
1938 biography of Donaldson [55] as saying “It was found that the
nervous system of the rat grows in the same manner as that of man
—only some thirty times as fast. Further, the rat of three
Rat in Biomedical Research 7

years may be regarded as equivalent in age to a man of ninety


years, and this equivalence holds through all portions of the span
of life, from birth to maturity. By the use of the equivalent ages,
observa- tions on the nervous system of the rat can be transferred
to man and tested. The results so obtained show a satisfactory
agreement and indicate that the rat may be used for further studies
in this field.”
In 1905, Dr. Donaldson, his assistant Dr. Shinkishi Hatai and
his rat colony moved to the Wistar Institute. Believing that it was
necessary to establish a standardized stock of rats to ensure
accurate quantitative results by minimizing individual differences,
Donald- son began to systematically breed albino rats under
controlled conditions, giving rise to the Wistar and Wistar-derived
rat strains. During this time, detailed records of morphological
characteristics such as body and organ sizes, and “life processes”
such as repro- duction and growth were maintained for both the
albino stocks and wild “Norways.” In 1915, Donaldson published
the first edition of these extensive records as “The Rat: Reference
Tables and Data for the Albino Rat and the Norway Rat” [56]. A
second and even more extensive edition was published in 1924
[57] and consisted of 496 pages containing 212 tables, 72 charts,
and more than 2000 references. The intent of these publications
was to create a detailed reference record that could be used by
researchers to comparatively assess the characteristics of their own
rats.
Dr. Helen Dean King joined the Wistar Institute in 1908,
starting as a volunteer before being officially hired [58]. She
began working with Dr. Donaldson in 1909, helping him to char-
acterize the albino rats and beginning a project to further standard-
ize the rats by inbreeding them. The rats inbred by Dr. King and
others at the Wistar Institute have given rise to a number of strains
and substrains of the “Wistar rat,” including the PA, WKA, WF,
WKY, and LEW, to name a few [7].
King and Donaldson were particularly interested in the process
of domestication and the physiological changes that resulted from
the process. In this regard they first attempted to plant “Albinos
where they might lead a wild life, in order to see how far, under
these conditions, they would return toward the ancestral type”
[59]. Unfortunately, in all five cases the colonies failed. Rather
than try again, the decision was made to recapitulate the domesti-
cation process under controlled conditions and follow the resulting
changes in the physiology of the rats over time. The project began
in 1909 with 16 male and 20 female wild gray rats captured in the
Philadelphia area and continued for years, with detailed reports on
the physiology and morphology of the captive rats released at the
tenth generation in 1929 [59] and the 25th generation in 1939
[60]. It was noted in the first of these articles that “in the eleventh
generation the strain ‘broke’ and several mutant varieties
appeared, so that these ten generations are marked off by this
event from those that follow.”
8 Jennifer R. Smith et al.

Changes reported in the domesticated rats vs. their wild pro-


genitors include “(1) accelerated growth rate resulting in increased
body size; (2) decreased ‘nervous tension’ resulting in tamableness
when the animals were handled frequently in early life; (3) muta-
tions in color or structure of the hair.” [35]. These changes are
among those listed in a recent publication about the “domestica-
tion syndrome” [61] which also cited the project by Donaldson and
King as the “first set of experimental domestication studies.”

2.3 BEHAVIOr Reports about studies on rat behavior began in 1898 with a paper
in the inaugural issue of the American Journal of Physiology,
“Var- iations in daily activity produced by alcohol and by changes
in barometric pressure and diet, with a description of recording
meth- ods,” by Colin C. Stewart [62]. The paper described a fairly
sophis- ticated system for recording the total daily activity of the
rats. A drum-shaped cage that the rat rotated by running in it was
connected to a clock modified to show the count of the number of
rotations of the cage and display it on the dial, possibly the first
semi-automated activity monitor. This paper was probably also
one of the first papers to look at the behavioral impact of nutrition
and of addictive substances such as alcohol.
The paper, “An experimental study of the mental processes
of the rat. II.” by Willard S. Small [63], published in the
American Journal of Psychology, appears to be the first
description of the use of a maze to test rat behavior. Small
mentioned that the “Hampton Court Maze served as model for
the apparatus. The diagram given in the Encyclopedia
Britannica was corrected to a rectangular form, as being easier
of construction.” Since the goal of the study was to examine
“the method of animal intelligence” rather than being a
quantitative study of the time needed for a rat to traverse the
maze, the author gave extensive descriptions of the movements
of the rats in the maze during two series of trials, one
consisting of five experiments and the other consisting of nine.
Not surprisingly, he noted that overall, the time to achieve the
reward at the center of the maze and the number of errors along
the way both diminished over time, although there were some
variations.
Two years later, in 1903, the psychologist John B. Watson,
who is best known for establishing the “behaviorism” theory of
psychology, earned his PhD from the University of Chicago on the
basis of his work on the relationship between brain myelination
and learning in rats of various ages [64]. Watson also employed
mazes as a test of learning ability. In the book “Behavior: an
introduction to comparative psychology” [65], published in 1914,
Watson began his description of the modified Hampton Court
maze by saying that it was “too well known to require
description,” implying that they were commonly in use.
Although best known for his work on human behaviors and
behaviorism as a branch of psychology, Watson was a strong
Rat in Biomedical Research 9

advocate for studying animal behavior: “Psychology, as the behav-


iorist views it, is a purely objective, experimental branch of natural
science.. .Heretofore the viewpoint has been that such data have
value only in so far as they can be interpreted by analogy in terms
of consciousness. The position is taken here that the behavior of
man and the behavior of animals must be considered on the same
plane; as being equally essential to a general understanding of
behavior” ([65], p. 27).
Dr. Curt Richter began his career in psychobiology in 1919 by
studying the innate behaviors of rats, particularly variations in and
periodicity of their running activity [66]. He is well known for his
early descriptions of circadian rhythms, of gender differences in
activity, and of the influence of the estrous cycle on the running
behavior of female rats [67]. He did considerable work on rats’
ability to maintain their homeostatic balances when allowed to
select freely from a “cafeteria” of proteins, carbohydrates,
minerals, and vitamins, and demonstrated that animals deprived of
an essen- tial component, such as sodium, would compensate by
eating more of that component. He also showed that an
adrenalectomized rat, which had been shown to consistently lose
sodium as a result of the loss of the sodium-retaining hormones
secreted by the adrenals, had a substantially increased “salt
appetite” and that the increase in salt consumption led to an
increase in the survival rate following adrenalectomy [68].
Interestingly, he later discovered that this sodium-balancing
ability, though present in domesticated rats, was not found in their
wild progenitors. Such changes in both the physiology and
behavior of rats during domestication was of inter- est to Richter
and was a matter of study, for example, in connection with his
efforts during World War II to control the rat population in
Baltimore [69]. He noted that rats displayed “bait shyness” and
“learned-poison-avoidance,” now referred to, and frequently stud-
ied, as neophobia and conditioned taste aversion. Richter also
developed several pieces of equipment for measuring rat behavior,
including the “Richter tube” for measuring fluid consumption and
the running wheel for monitoring activity.
In an often-quoted passage in his 1968 article “Experiences of a
Reluctant Rat-Catcher: The Common Norway Rat – Friend or
Enemy?” [70], Dr. Richter stated “During my almost half-century
in behavioral and neurological research I have chiefly used rats, but
also many different animals such as cats, dogs, monkeys, sloths,
rabbits, beavers, porcupines, honey bears, alligators, and others. If
someone were to give me the power to create an animal most useful
for all types of studies on problems concerned directly or indirectly
with human welfare, I could not possibly improve on the Norway
rat.”

2.4 Endocrinology As mentioned previously, the first recorded use of rats for the study
of endocrinology was in an 1856 study by J. M. Philipeaux on the
results of adrenalectomy in albino rats. Like Philipeaux’s
10 Jennifer R. Smith et al.

experiments, many of the earlier studies of endocrinology in rats


were studies of the effects of wholly removing endocrine glands,
including J. M. Stotsenburg’s studies at the Wistar Institute of the
effects of castration [71] and spaying [72] on growth in rats, and
F. S. Hammett’s 1924 examination of the effects of the removal of
the thyroid and/or parathyroid glands on the growth of the brain
and spinal cord [73].
J. A. Long and H. M. Evans made observations using both
intact animals and those with various endocrine glands removed or
ablated, including removal of the ovaries, total hysterectomy and
removal of the mammary glands, as well as ovarian transplantation
studies, published in the sixth volume of the Memoirs of the
University of California in 1922 [74]. In the methods section of
this publication they included a brief description of the rats that
they used—a cross between “several white females and a wild
gray male caught in Berkeley, black, gray and hooded varieties
result- ing.” The white females were said to have come from the
Wistar Institute albino stock [7]. This is the first mention of the
“Long Evans” rat which was later inbred to give the “LE” strains
and substrains.
In 1921, Drs. Long and Evans published two short articles on
the administration of anterior pituitary—one in which they fed the
rats fresh or dried whole glands [75] and another in which they
injected an extract from the glands intraperitoneally [76]. The
feeding study showed no results, but following injection of the
extract, the authors reported both the lengthening or cessation of
the rats’ normally regular estrous cycle and a marked increase in
growth compared to litter mate controls. H. M. Evans and his
collaborators subsequently continued this work, characterizing
the hormones of the anterior pituitary (for examples, see Refs.
[77–94].

3 “The Rat Toolbox”

3.1 Rat From Crampe’s first adrenalectomy studies in the 1850s, rats have
as a Precision been used as models for human physiology and disease. In
Model for Disease “Animal Models of Human Nutrition” Dr. Janet R. Hunt
references E. V. McCollum’s decision to use rats in his nutrition
studies “because of their convenient size, omnivorous feeding
habits and lack of eco- nomic value” and goes on to state that the
“omnivorous feeding patterns of the rat usually make it a better
model for human nutrition questions than a strict herbivore such as
the rabbit.” [12]. Likewise, H. C. Sherman, who was the first to
develop quantitative measures of nutrients based on their ability to
correct diseases stemming from nutritional deficiencies, said of his
research with rats, “These animals are my burettes and balances.
They give
Rat in Biomedical Research 11

quantitative answers in chemical terms to many of man’s greatest


problems!” [95].
The same is true today. Rats are used as pinch hitters for
humans in the study of debilitating and life-threatening diseases.
The ability to quickly breed rats and select for specific traits and,
in recent years, to genomically manipulate rats to produce traits,
correct deficiencies or explore the influence of specific genes or
genomic regions in the development of diseases, along with their
extensive history of physiological measurements, makes them an
ideal model for targeting specific diseases in human. In part
because the ability to do genomic manipulation in rats was
delayed, researchers traditionally concentrated on development of
a wide variety of rat strains to model human disease. We will
mention some of these here and explore the resources and
techniques for obtaining or developing more models throughout
the rest of this book.

3.2 The
DEVelopment of Rat In 1958, Smirk and Hall reported results of ongoing breeding of
Strains to Study genetically hypertensive (GH) rats from the Wistar-derived rat
Disease Mechanisms colony at the University of Otago Medical School [96]. They
developed several lines of rats by both cross-breeding and brother-
3.2.1 CARDIOVASCULAR sister mated inbreeding. In the first report, they showed that the
Diseases line produced by cross-breeding had a higher average blood
pressure than the lines produced by inbreeding (141.95
12.53 mmHg T for cross-bred males vs. 135.81 T
8.2 mmHg for one strain of inbred males and 124.14 T
10.64 mmHg for control males). The differences reported are not
as great as for the SHR rats; however they are statistically signifi-
cant. In a subsequent publication [97], in which they reported on
the development of cardiac hypertrophy in the B strain of their
genetically hypertensive rats, they stated that more than 50% of the
male rats in that strain “have blood pressures exceeding 150
mmHg.”
In 1962, L. K. Dahl et al. published the first report of two
strains of rats, selected from an outbred (unselected) colony of
Sprague-Dawley rats [98]. At the time of publication, these rats
had been selectively bred by brother-sister matings for three gen-
erations and already displayed a divergence in the effects of salt on
their blood pressure. The rats were selected based on blood pres-
sure measurements after being fed a diet containing 11.6% sea salt
and administration of triiodothyronine which had been shown to
accelerate the development of hypertension in these rats. Later
studies demonstrated that the “S,” i.e., salt sensitive (SS), rats
developed increased blood pressure whether on T3 + high salt or
on high salt alone whereas the blood pressure of the “R,” or salt
resistant, rats was less than or equal to the BP of the parental
Sprague Dawley rats regardless of the conditions. By contrast,
12 Jennifer R. Smith et al.

when fed a low salt diet, both lines remained normotensive. In


addition to increased blood pressure, SS rats displayed
cardiovascu- lar abnormalities including left ventricular
hypertrophy [99] and susceptibility to myocardial ischemia [100].
These abnormal traits could be a direct result of the hypertension
or could be separate but linked genetic traits.
In 1963, Okamoto and Aoki reported on their work to develop
a rat strain with spontaneous hypertension [101]. Starting with rats
from an inbred Wistar strain, they selected one male “that had
shown persistent high blood pressures (150–175 mmHg) since
7 weeks after birth and a female rat with a blood pressure slightly
above the average (from 130 to 140 mmHg).” The two were mated
and progeny with blood pressure > 150 mmHg for more than a
month were selected and further brother-sister mated through the
sixth generation. Testing of rats in the sixth generation of
inbreeding showed that the control rats remained relatively normo-
tensive throughout the test period of 60 weeks, whereas the blood
pressure of the hypertensive rats rose steadily with age. In males,
the blood pressure was 136T13.4 mmHg at 5 weeks of age, rising
to 206 T 16.2 mmHg at 55 weeks of age. At this point in the
breeding, the incidence of spontaneous hypertension was 100% in
this strain and the authors began to refer to them as
“spontaneously hypertensive rats (SHR).”
In the 1970s Okamoto and his colleagues began breeding
sublines of the SHR rat [102, 103]. In one case, they selected for
rats with a tendency to develop cerebral hemorrhage and/or cere-
bral infarction (stroke) and in the other selecting for those that
were hypertensive but did not develop cerebrovascular lesions.
They referred to these lines as stroke-prone (SHRSP) and stroke-
resistant (SHRSR) spontaneously hypertensive rats. A 1975 article
[104] demonstrated that, although the rats did not have as sub-
stantial a response to salt loading as the Dahl SS rats, both blood
pressure and incidence of cerebrovascular lesions increased
with salt.
In addition to strains that were developed specifically as con-
trols for cardiovascular disease model strains, such as the Dahl SR
and the stroke-resistant SHRSR strains, several additional strains
are widely accepted as control strains for cardiovascular diseases.
Among these are the Brown Norway (BN) strain which has been
shown to be resistant to salt-sensitive hypertension [105] and to
myocardial ischemia [100], Lewis rats (LEW) which are highly
resistant to salt-induced hypertension [106], and the Wistar
Kyoto (WKY) strain which is considered a normotensive control
for SHR [107]. It should be noted, however, that an inbred strain
that is considered a control in one set of studies could be
“affected” for a study which focuses on a different trait.
Rat in Biomedical Research 13

3.2.2 Metabolic
Syndrome The Lyon hypertensive (LH), normotensive (LN), and hypotensive
(LL) rat strains were first developed in 1973 in Lyon, France, from
a colony of Sprague Dawley rats [108]. The rats for breeding were
selected on the basis of the mean systolic blood pressure at 6–
12 weeks of age and the slope of systolic blood pressure vs. age.
Although only selected for blood pressure traits, the inbred LH
strain was found to also display an increased body weight and
increased plasma lipids. Plasma phospholipids, total cholesterol,
HDL-cholesterol, and VLDL+ LDL-cholesterol were all elevated
in the LH strain relative to LN and LL. Interestingly, at 5 weeks of
age, the LL strain had the highest plasma triglyceride level. How-
ever, as the rats aged, the triglyceride level in the LH rats
increased so that at 32 weeks of age LH was significantly higher
than either LN or LL, but LL was still significantly higher than
LN. Additional studies showed that the LH strain displayed
additional metabolic disorders, namely an increase in both the
insulin level and the insulin:glucose ratio [109].
Like the LH rat strain, the SHR strain was selected for
increased blood pressure, but was later found to display symptoms
of meta- bolic disfunction. Fasting glucose was greater in SHR rats
than WKY [110] and the insulin response to an oral glucose
challenge was higher in SHR than WKY, suggesting possible
insulin resistance [111].
The OLETF strain was developed at the Tokushima Research
Institute in Japan from a spontaneously diabetic rat discovered in
1984 in an outbred colony of Long-Evans rats [112, 113]. The rats
were mildly obese and developed spontaneous hyperglycemia. Sex
differences were noted in the course of the disease. Males
developed hyperglycemia much earlier than females (25 weeks of
age for males vs. 65 weeks for females). Over time, male rats also
became hypoinsulinemic and required insulin therapy to survive
which was not seen in females. Histopathological changes in the
pancreatic islets and in the kidney were also seen in males but not
females.
The Zucker “fatty” rat was first discovered as a mutant in the
13M rat stock, an outbred line derived from black offspring of
albinos from the colony of Dr. H. C. Sherman (Columbia Univer-
sity), crossed with wild males (the “M” line) and additional rats
from the Sherman colony [114] at the Harriet G. Bird Memorial
Laboratory in Stow, Massachusetts. The mutation was named
“fatty” because when present in a homozygous form the rat
became extremely obese as a juvenile. Since heterozygous litter
mates were lean and phenotypically indistinguishable from the
non-mutant homozygotes, the mutation was understood to be a
recessive allele in a single gene and has since been shown to be a
p.Gln269Pro mutation in the extracellular domain of the leptin
receptor. The original description of the Zucker rat includes
severe hyperlipid- emia and kidney lesions, but not hyperglycemia
[115].
14 Jennifer R. Smith et al.

Although the Zucker rats were not routinely hyperglycemic, it


was later found that they did show some signs of glucose intoler-
ance and occasionally animals with high blood glucose were
observed. Peterson et al. reported in 1990 the development of an
inbred diabetic strain from the Zucker rats, the ZDF strain
[116]. Male ZDF rats displayed substantial increases in blood
glucose with age, leveling off in the 400–600 mg% range at 10–15
weeks of age. They also showed increased levels of glycosy- lated
hemoglobin, free fatty acids, triglycerides, and cholesterol
compared to lean controls. Insulin was increased in younger ani-
mals but was reduced in older animals due to fatty acid-induced
apoptosis of the pancreatic beta cells [117].

3.2.3 BEHAVIOR As noted earlier in this chapter, the use of rats for studies of
and Addiction behavior began over 100 years ago with studies by C. C. Stewart
on the effect of alcohol consumption, diet, and barometric pressure
on activity in captive rats. Since that time, rats have been the
model of choice for the study of behavior and addiction, and for
testing treatments for psychiatric disorders. The body of literature
covering these topics is enormous: a search in PubMed for “rat
behavior” in 2018 returned over 150,000 articles, including almost
6500 review articles.
Often rat strains developed for the study of a non-behavioral
phenotype are found to also show behavioral abnormalities. For
instance, the Spontaneously Hypertensive Rat (SHR), established
as a model of age-related hypertension, was found to develop
vascular brain disorder with associated behavioral changes as a
result of its increased blood pressure, and has also been used as a
model for Attention Deficit Hyperactivity Disorder (ADHD) as a
result of observed changes to the catecholaminergic transmission
system [118]. Similarly, the WAG/Rij rat, a model for absence
epilepsy, also showed depression-like symptoms [119]. The
Wistar Kyoto (WKY) strain was originally bred as a normotensive
control for the SHR but has been shown to display “depressive-
like symp- toms,” including increased immobility in the forced
swim test
[120] and anhedonia characterized by lower consumption of a
sweet-tasting solution in response to acute or chronic mild stress
[121]. Physiologically, the WKY exhibited abnormalities in dopa-
minergic and noradrenergic responses and the HPA axis and TSH
systems [122]. The WKY rat, in addition to being considered a
model for depression, also displayed traits considered to be indica-
tive of anxiety such as reduced activity in the open field as well as
development of stress-induced ulcers. Another model of depres-
sion, the Flinders Sensitive Line (FSL) rat, on the other hand,
displayed similar immobility in the forced swim test but did not
appear to have an increased tendency toward anxiety [120]. It was
shown, however, that young FSL rats engaged in more “intrusive”
Rat in Biomedical Research 15

social play than Sprague-Dawley controls whereas as adults dis-


played less non-play social interaction/investigation behaviors
than controls, both of which were interpreted as depressive-like
behaviors.
The Fawn Hooded Hypertensive (FHH) inbred strain was also
developed as a model of hypertension and was later found to
display both depressive-like behaviors and abnormally high
voluntary alco- hol consumption, leading to the suggestion that the
FHH rat could be used as a model for comorbid depression and
alcoholism [122]. Interestingly, it was found that immobility in
the forced swim test and alcohol consumption were not correlated,
and in fact, administration of antidepressant drugs reduced the
immobility in the swim test without affecting the alcohol
consumed.
A number of rat strains have been developed specifically for
high and low consumption of alcohol. These include the UChA
and UChB lines from the University of Chile [123], the “Alko-
accepting” (AA) and “Alko-nonaccepting” (ANA) lines,
developed at the Research Laboratories of the State Alcohol
Monopoly (Alko), in Helsinki, Finland [124], the alcohol
Preferring and Non-preferring (P/NP) [125, 126] and the High
Alcohol Drink- ing and Low Alcohol Drinking (HAD1/LAD1 and
HAD2/ LAD2) strains [127] produced at the Indiana University
School of Medicine, and the Sardinian Preferring and Non-
preferring (sP/sNP) lines [128, 129].
The AA/ANA, P/NP, and HAD/LAD strain pairs were all
selected by giving only alcohol for a period of time, then
measuring ethanol consumption in a two-choice paradigm.
Both AA/ANA and P/NP started with outbred Wistar rats, then
high and low drinkers were inbred. Looking to develop lines
with more genetic diversity than the Wistar-derived P/NP rats,
Li et al. developed two replicate sets of model strains,
HAD1/LAD1 and HAD2/LAD2, using the same selection
process with the National Institutes of Health’s Heterogeneous
Stock (N:HS, also referred to as N:NIH) outbred rats as the
initial breeding stock. HS rats are descended from eight
genetically and phenotypically diverse founder strains (ACI/N,
BN/SsN, BUF/N, F344/N, M520/N, MR/N,
WKY/N, and WN/N) [130]. The genetic diversity of the rats is
maintained through a rotational breeding scheme to avoid
inbreeding, drift, and fixation. Since the resulting rats are a
random mosaic of the genomes of the founder strains, they
provided far more diversity for the development of models for
drinking behavior than any one strain could provide. The P strain
and both HAD lines met the criteria for animal models of
alcoholism [131]. Given a choice, their voluntary consumption of
ethanol resulted in phar- macologically meaningful blood ethanol
concentrations. They dis- played alcohol-seeking behavior for the
pharmacological effects, rather than for the taste or other
properties, as evidenced by their
16 Jennifer R. Smith et al.

willingness to dispense an ethanol solution intragastrically


(although a recent article reported that P rats did not show a
preference for intravenous ethanol [132]). In addition, they have
been shown to develop a tolerance to the effects of ethanol, to
display signs of physical dependence when the ethanol was with-
drawn after long-term exposure, and to display relapse behavior
following two or more weeks of abstinence.

3.2.4 Cancer In 1919, Dr. F. D. Bullock and Dr. M. R. Curtis at the Crocker
Institute of Cancer Research (Columbia University, New York,
NY, USA) began a project to produce a number of inbred strains
of rat for use in their studies of cancer. Previous work to induce
neo- plasms using tapeworm infestation as a chronic irritant had
demon- strated that rats from some sources were more susceptible
to tumor induction than those from other sources [133]. The group
began inbreeding rats from four commercial breeders—Fischer,
Zimmer- man, Marshall, and August—in 1919 and expanded their
efforts to include rats originally sourced from Copenhagen,
Denmark, in 1920 [7]. From these efforts, at least ten inbred
strains, including ACI, COP, Marshall 520 (M520), Fischer 344
(F344), and the now extinct Fischer 230, were developed, a
number of which are still in active use for cancer research.
Between 1920 and 1970, Dr. Curtis with her colleague Dr. W. F.
Dunning and coworkers produced a substantial number of articles
(for example, [134–142]) on tapeworm-induced sarcomas, strain
differences in susceptibility to a variety of spontaneous tumors,
chemical carcinogenesis, and chemotherapy using these inbred
strains.
Because of its history of use for cancer research, the F344/N
rat strain was the model of choice for the National Cancer
Institute, and subsequently the National Toxicology Program
(NTP), for standardized bioassays of carcinogenicity for chemical
compounds. During the more than three decades of use for these
studies, the NCI and NTP amassed an immense, publicly available
dataset derived from the testing of thousands of possibly
carcinogenic compounds [143, 144]. However, in 2006, an NTP
workshop, “Animal Models for the NTP Rodent Cancer Bioassay:
Strains and Stocks—Should We Switch?” [145] reviewed the use
of the F344/ N strain for future studies. Workshop participants
concluded that, due to problems with infertility, seizures and
chylothorax in the F344/N colony in particular, and a variable but
relatively high incidence of spontaneous tumors, particularly
testicular interstitial cell tumors and mononuclear cell leukemia,
inherent to the F344 strain in general [146], the F344/N strain
should no longer be used for NTP bioassays. After considering
several alternatives, including using a different substrain of F344
or using F1 rats from a cross between F344 and BN, the
recommendation was to replace the inbred strain with outbred rats.
The initial
Rat in Biomedical Research 17

recommendation was to use the outbred Wistar Han rat as the


“default” for most toxicological bioassays. However, after some
experimentation it was found that reproductive performance of
the Wistar Han rats was less than ideal and the outbred Hsd:SD
Sprague-Dawley strain became the default.
The Wistar Furth (WF) strain was inbred in the hopes of
producing a rat strain that developed leukemia at high frequency
[147]. Dr. J. Furth began with a partially inbred line of Wistar rats
which had been shown to develop malignant lymphomas. How-
ever, despite the large number of lymphomas found in the progen-
itor stock, the resulting fully inbred strain did not show
particularly high susceptibility to either lymphomas or leukemia.
The inci- dences of leukemias and malignant lymphomas in the
WF strain were reported as 9% and 7%, respectively, as of 1960.
Instead of leukemias/lymphomas, the rats were documented to
spontane- ously develop pituitary and mammary tumors at high
frequency (27% and 21%, respectively). In addition, adrenal and
uterine neo- plasms and lipomas occurred at much lower
frequency (2–3%), and a general category of “unclassified tumors”
were seen at a rate of approximately 4%. Despite the lower-than-
expected frequency of leukemias in the WF strain, the authors still
noted that the intracta- bility of the WF leukemias, that is, their
relative resistance to the chemotherapeutic agents of the time,
made WF a better model for human disease than the Fischer strain,
where the leukemias proved more sensitive to chemotherapeutics.

4 Resources for Rat Researchers

4.1 Where to There are a number of both commercial and non-commercial


Find Rat Strain sources for rat model strains. Some of these will be covered more
Models thoroughly later in this book (see Chapter 3). By far, the widest
variety of strains, including inbred, outbred, mutant, and trans-
genic strains, can be found at the Rat Resource and Research
Center (RRRC) in the United States and the National BioResource
Project for the Rat (NBRP-Rat) in Japan.

4.1.1 RRRC The Rat Resource and Research Center (RRRC,


http://www.rrrc. us/, [148]) was established at the University of
Missouri in 2001 by Dr. John Critser. The Center was developed
to supply the rat research community with “high quality, well-
characterized inbred, hybrid and genetically engineered rat[s].” In
2011, the leadership of the RRRC was transferred to Dr. Elizabeth
Bryda. RRRC con- tinues to maintain a limited number of live
strains and a much larger collection of cryopreserved germplasm
for submitted strains. Researchers submitting strains for
distribution and/or cryopreser- vation by the RRRC supply
information about any diseases for which that strain is a model,
and in what research area(s) the strain
18 Jennifer R. Smith et al.

has been used. In addition to this, strain pages show information


about the history of the strain, the researcher who submitted it,
associated references, product availability (e.g., available as live
animals, cryopreserved embryos or sperm), and descriptions of the
genetics, breeding and husbandry where this is available. RRRC
works with the Rat Genome Database (RGD) to ensure correct
nomenclature of the strains (see below) and links to addi- tional
information at RGD.

4.1.2 NBRP-Rat NBRP-Rat (http://www.anim.med.kyoto-


u.ac.jp/nbr/Default. aspx, [148]) is one branch of the larger
National BioResource Projects (NBRP) in Japan. The NBRP was
established in 2002 by the Ministry of Education, Culture, Sports,
Science and Technol- ogy (MEXT) to “collect, preserve, and
provide bioresources (such as experimental animals and plants)
that are essential experimental materials for life sciences
research” (http://www.nbrp.jp/about/ about.jsp). As such,
NBRP-Rat’s stated mission is “collection of rat strains and genetic
sub strains, phenotypic and genotypic charac- terization,
cryopreservation of embryos/sperm, supply of the col- lected rat
strains and a publicly accessible database of all assembled
data.”(http://www.anim.med.kyoto-u.ac.jp/nbr/about.aspx).
The strains available at NBRP-Rat include inbred, congenic, and
recombinant strains, as well as spontaneous mutants and
transgenic and mutagenized rats. Additionally, NBRP-Rat has
undertaken an extensive standardized phenotyping project, the
goal of which is to phenotype all of the standard strains and many
of the mutant strains submitted. Six male and/or female rats of
each strain undergo a battery of tests between 5 and 10 weeks of
age. Tests include morphological measurements such as body and
organ weights, and physiological tests including blood pressure,
blood chemistry, hematology and urine chemistry, as well as
locomotor and neuro- behavioral tests. The data provide an
invaluable survey of pheno- types across a wide variety of strains
measured under control, pathogen-free conditions, and can be
accessed both at NBRP and in RGD’s PhenoMiner tool.

4.1.3 GERRC With the advent of genome editing technologies for the rat, the
demand for genetically modified rats for use as disease models
has skyrocketed. However, for many researchers, production of
a gene- edited rat to confirm the involvement of a gene or
genomic region in their disease or phenotype of interest was
out of reach due to the lack of expertise and/or funding to
produce such models. In 2013, Dr. Howard Jacob and his
colleagues at the Medical College of Wisconsin were awarded a
grant to begin the MCW Gene Editing Rat Resource Center
(GERRC, https://rgd.mcw.edu/wg/gerrc/). The GERRC was
designed to leverage existing infrastructure and expertise in
gene editing to support the needs of the rat research
Rat in Biomedical Research 19

community by designing, producing, and distributing rats with


modifications in specific genes generated on specific genetic back-
grounds (i.e., existing, well-characterized inbred strains) at low
cost to researchers. The project was funded by the National Heart,
Lung and Blood Institute (NHLBI) to produce 250 gene-targeted
rat models over a period of five years to support and accelerate
research of complex diseases, particularly those of interest to the
NHLBI. Researchers nominated a gene to be edited on one or
more back- ground strains and an external advisory board (EAB)
reviewed the nominations and approved, rejected, or deferred the
nomination for editing. As of this writing, the GERRC, now under
the leadership of Dr. Melinda Dwinell and Dr. Aron Geurts, has
received a list of 109 genes for which nominations have been
accepted by the EAB, and has produced 134 publicly available
strains with genomic mod- ifications in 53 unique genes. Strains
are made available, first to the nominating group, then to any
researcher via links on the GERRC webpage.

4.1.4 Commercial
Vendors In addition to the large non-commercial repositories already men-
tioned, there are several commercial sources for rat strains.
Sprague Dawley, Inc. was started in 1925 by Robert Dawley near
Madison, Wisconsin. The original breeding stock was purported to
have come from mating a hooded male of unclear origin with
albino females from Wistar stock, and subsequently with the
albino off- spring of that mating. The original male was described
as “a hybrid hooded male rat of exceptional size and vigor which
genetically was half-white” [149]. The line was partially inbred,
then changed to random breeding and the parental strain was
considered outbred. Sprague Dawley, Inc. was obtained by Harlan
in 1980 to form Harlan Sprague Dawley, which was in turn
acquired by Envigo, Inc. in 2015 (http://www.envigo.com/).
Envigo sells 14 strains of rats, two of which are offered specifically
as aged animals—Sprague
Dawley® outbred rats (SD) and Fischer 344 inbred rats. The rats
offered by Envigo include direct descendants of a number of the
original laboratory rat stocks, including Holtzman rats (an offshoot
of SD), Lewis rats, and Wistar outbred rats, in addition to the
aforementioned SD rats.
Charles River Laboratories (https://www.criver.com/)
was started in 1947 by veterinarian Dr. Henry Foster to breed rats
and supply them to laboratories in the Boston, Massachusetts
area. Their catalog currently lists 40 rat strains that are available,
including 22 inbred strains and 16 outbred.
Originally Sage Laboratories, Horizon Discovery’s stock of
“off the shelf” knockout rats
(https://www.horizondiscovery. com/in-vivo-models) includes
models for several research areas. These include knockouts of
xenobiotic sensors and drug transpor- ters for
toxicology/ADMET (absorption, distribution,
20 Jennifer R. Smith et al.

metabolism, and excretion-toxicity) studies. For researchers inter-


ested in translational studies, Horizon offers knockout models for
Alzheimer’s disease, Autism, and Parkinson’s disease, as well as
models for oncology and cardiovascular research. For optogenetics
studies, Horizon offers ready-made Cre-driver, fluorescence
reporter, and opsin-expressing rats.
For researchers in need of specific genomic alterations, there
are a number of companies that offer modifications as a service. A
list of these can be found on the RGD Laboratory Resources
webpage at https://rgd.mcw.edu/wg/resource-
links/laboratory- resources/#strains. The companies listed include
Applied Stem- Cell, Cyagen Biosciences, Transposagen, and
PolyGene Transge- netics. Horizon Discovery also offers
generation of genome- modified rats in addition to their “off the
shelf” lines. Several of these companies also supply reagents for
researchers to generate their own models.

4.2 Where to The first two of the FAIR principles for data management
Find Data for Rat [150, 151] require that data be both Findable and Accessible.
Model Strains This requires the development and maintenance of data stores and
knowledgebases to consolidate and integrate data from various
sources, and in many cases, to expand, interpret, and/or analyze
the data through processes such as manual curation of the litera-
ture. Such resources also create environments in which researchers
can access and utilize the data for their own analyses and
download both the original data and their analysis results for their
own records. The resources available for rat data will be covered
more completely in Chapter 3, but we will touch on some of these
sources here.

4.2.1 RGD Arguably, the most diverse and inclusive source for rat data is the
Rat Genome Database (RGD, https://rgd.mcw.edu, [152]).
RGD was started in 1999 “to collect, consolidate and integrate
data generated from ongoing rat genetic and genomic research
efforts and make these data widely available to the scientific
community.” From the beginning, RGD was intended as a multiple
datatype and cross-species resource, including data for rat genes,
markers, quan- titative trait loci (QTLs), and strains, as well as
homologous mouse and human genes for comparative purposes.
This cross-disciplinary focus has continued. RGD now houses data
which associate disease, phenotype, molecular function, biological
process, subcellular localization, molecular pathway, gene-
chemical interactions, and protein-protein interactions with the
genomes of rat, human, mouse, dog, squirrel, chinchilla, pig, and
bonobo. Many of these associations are via the genes for these
species. In addition, RGD imports data for disease and phenotype
associations for human variants, as well as the extensive
phenotype data for mouse genes and QTLs to assist with
comparative genomic analyses.
Rat in Biomedical Research 21

In addition to genome-associated data, RGD houses a compre-


hensive listing of rat strains. Strain data include information about
the origin of the strain and its characteristics such as any
documen- ted disease or phenotype associations, information
about breed- ing/husbandry, and drug or chemical responses. More
recently, RGD has begun a major project to curate the results of
quantitative phenotype measurements. These data can be accessed
using RGD’s PhenoMiner tool
(https://rgd.mcw.edu/rgdweb/phenominer/ home.jsp, [153,
154]). For each measurement, information about the strain,
number of animals, their age and sex, the conditions under which
the measurement was made, and the method used to make the
measurement are captured in a standardized format so that results
can be directly compared across not only strains and conditions,
but also across studies. Researchers interested in acces- sing the
data can search based on strain, clinical measurement,
measurement method, and/or experimental condition [155–157] to
retrieve their data of interest. When a result set of interest is found,
the data can be downloaded for the researcher’s records or for
further analysis in other tools. RGD’s PhenoMiner includes
quantitative phenotype data from published research papers as
well as from high-throughput phenotyping projects such as the
PhysGen Program for Genomic Applications (PGA) and NBRP-
Rat.
Researchers interested in disease can access information about
disease models either through the Phenotypes and Models portal
or through the Disease Portals [158]. The Strains and Models
section of the former portal contains links to established models
for six disease categories—cardiovascular, neurological,
respiratory and immune/inflammatory diseases, as well as
mammary cancer and diabetes. The assignment of strains as
models for these diseases is done manually and is based on an
extensive review of the litera- ture. RGD’s Disease Portals also list
strains that have been asso- ciated with the applicable disease
categories. As of this writing, there are 12 portals covering the
following disease categories: Aging and Age-Related Disease,
Cancer, Cardiovascular Disease, Developmental Disease,
Diabetes, Hematologic Disease, Immune and Inflammatory
Disease, Neurological Disease, Obesity and Met- abolic
Syndrome, Renal Disease, Respiratory Disease, and Sensory
Organ Disease. Each portal is a consolidated view of the genes,
QTLs, and strains which are associated with any disease in the
category across rat, mouse, and human. Researchers can view all
of the associated data objects or drill down to more specific groups
of diseases.

4.2.2 NBRP-Rat As mentioned previously, NBRP-Rat stores and disseminates both


the rats themselves and data about those rats. The NBRP-Rat strain
pages give information about genetic status of the strain, what
22 Jennifer R. Smith et al.

research categories that strain has been used for, strain character-
istics and breeding performance, and references where applicable.
In many cases, there is a picture of the rat and an image of repre-
sentative organs for that strain. As also mentioned previously,
NBRP-Rat performs extensive phenotyping on strains submitted to
their repository. These data are available on their website under
the “Phenome” tab (http://www.anim.med.kyoto-u.ac.jp/NBR/
phenome.aspx) and are available in both graphical and tabular
format.

4.2.3 NCBI, Ensembl, The National Center for Biotechnology Information (NCBI,
and UCSC Genome https://www.ncbi.nlm.nih.gov/), the European
Browser for Genes Bioinformatics Institute’s Ensembl
and Genomics (https://www.ensembl.org/index.html), and the University of
California, Santa Cruz’s (UCSC, https://
genome.ucsc.edu/) Genome resources are multispecies
resources with diverse datasets that include genome sequences,
gene, tran- script and protein records, protein domain
information, functional data, and more. Much of the data
provided are consolidated from other resources, including the
Rat Genome Database in terms of functional annotations for rat
genes as well as curated records for QTLs. NCBI and Ensembl
both do gene predictions for whole genome assemblies.
Because the algorithms they use are disparate, the predicted
gene sets are not the same although there is substan- tial
overlap. For more information about the NCBI and Ensembl
genome annotation pipelines, see Chapter 2 in this book. These
resources also supply tools for analysis of the data they
provide, including genome browsers for viewing genes and
other genomic elements in the wider genomic context.

4.2.4 UniProtKB UniProtKB (https://www.uniprot.org/, [159, 160]) is a


cross- species resource that provides data for rat proteins,
including pro- tein sequences, protein structure and domains, post-
translational modifications, tissue-level expression and subcellular
localization, protein-protein interaction data, and protein family
assignments. Manual gene ontology annotations for rat proteins at
UniProtKB are imported from the Rat Genome Database.
UniProtKB also employs an automated pipeline to predict GO
annotations based on domains, keywords, etc.

4.2.5 dbSNP/EBI’s The dbSNP database at NCBI and EBI’s European Variation
European Variation Archive (EVA) have, in the past, both accepted submissions of rat
ARCHIVE genomic variant data to be included in their multispecies variant
resources. As of 2017, however, dbSNP is no longer storing or
presenting variant data for nonhuman species, making EVA the
major source for nonhuman variants. The data presented include
genomic positions, affected genes, where applicable, and predicted
or validated variant consequences for corresponding transcripts.
Rat in Biomedical Research 23

dbSNP in the past and EVA going forward consolidate multiple


records corresponding to the same variant into a single “reference”
variant record, eliminating what can at times be substantial redun-
dancy in variant data from multiple sources. The “rs ID” has been
and will continue to be the gold standard for nonredundant variant
designation.

4.3 Strain and


Allele Nomenclature The ability to identify a specific strain and/or its associated allele
(s) is absolutely essential for disseminating and accessing the
appro- priate data for that strain or allele. Without proper
identification, it is difficult, if not impossible to reproduce any
results associated with it. As such, committees for determining
guidelines for strain, gene, and allele nomenclature were
assigned for both mouse (Interna- tional Committee on
Standardized Genetic Nomenclature for Mice) and rat (Rat
Genome and Nomenclature Committee, RGNC) strains. The
RGNC has tasked RGD with assigning the correct
nomenclature to these objects. In terms of emerging rat strains,
RGD curators review the available information, including the
origin of the strain and what, if any, genomic modifications
have been made and assign nomenclature based on the
“Guidelines for Nomenclature of Mouse and Rat Strains”
(https://rgd.mcw.edu/ nomen/nomen.shtml). Authors are
asked to contact RGD before publication to register their strains
in order to receive proper nomenclature and RGD IDs, and to
use the appropriate nomencla- ture and ID(s) in their
publications for maximum traceability and reproducibility of
their results.

4.4 Molecular
Genetic Tools The molecular genetic toolbox for rat includes genetic and geno-
mic data as well as a variety of tools for using them. Interest in rat
genetics has been a foundational research focus since the first
studies on the genetics of coat color linkage and inheritance in
the late nineteenth and early twentieth centuries. Since that time,
the field has evolved to incorporate the use of genetic markers and
single nucleotide variants as markers for QTLs, establishment of a
reference genome sequence for the rat, and whole genome
sequencing (WGS) of a number of inbred rat strains which are
considered to be either established models of human disease or
control strains for those models. Many of the chapters that follow
outline research, methods and resources that came as a direct result
of this molecular genetic toolbox and/or associated investments
into infrastructure for the rat.

4.4.1 Genetic Markers Until the advent of whole genome sequencing, assignment of
genes and genetic markers to relative chromosomal positions on
chromo- somes was accomplished by somatic cell hybrids, genetic
or radia- tion hybrid mapping or in situ hybridization. In 1990
[161] and again in 1991 [162], Levan et al. published the then-
current rat gene map, consisting of 214 genes and 11 linkage
groups with
24 Jennifer R. Smith et al.

assignments to 20 of the 22 rat chromosomes. In 1991, Jacob et al.


[163] published the first QTL analysis in rat, calculating linkage
between blood pressure and the genome. They developed a set of
112 polymorphic simple sequence repeat (SSR) markers and esti-
mated that approximately 90% of the genome should lie within
30 cM of a marker. Using these markers, they were able to find
statistical associations between two genomic regions and blood
pressure, Bp1 on chromosome 10 for salt-loaded hypertension
and Bp2 on chromosome 18 for baseline diastolic blood pressure.
SSRs, later termed simple sequence length polymorphisms
(SSLPs)—stretches of tandem repeats of two, three, or four
nucleo- tides which are often polymorphic between individuals,
and in the case of rat, between strains—can be followed through
genetic crosses to assign genomic regions in the progeny to
one or the other progenitor strain. SSLPs have been used
extensively as mar- kers to delineate regions of linkage, as peak
and flanking markers, for QTLs. Between 1991 and 2004 when
the first draft of the rat genome sequence was published, the
Jacob laboratory and other groups released high density genetic
and RH maps for the rat [164–167], increasing the number of
markers available for analyses and decreasing the distance
between markers to facilitate the paring down of QTL sizes in
order to target the actual gene or genes responsible for the
observed effect on a phenotype. Data about these markers were
deposited into the Rat Genome Database, including the
centimorgan chromosomal position, the relative order of markers
on each chromosome, the PCR primer sequences for each, and, in
many cases, the sequence of the amplified PCR product. Steen et
al. [167] also characterized 4328 SSLPs in 48 commonly
used inbred rat strains and supplied the expected sizes of the
PCR products for each strain. As noted in that work, “These
maps provide the basic tools for rat genomics. They will
facilitate studies of multifactorial disease and functional genomics,
allow construction of physical maps, and provide a scaffold for
both directed and large-scale sequencing efforts and comparative
geno-
mics in this important experimental organism.”

4.4.2 Genome Sequence The project to sequence the rat genome was initiated in 2000
with a Request for Application (RFA) to form a “Network for
Large-Scale Sequencing of the Rat Genome”
(https://grants.nih.gov/grants/ guide/rfa-files/RFA-HG-00-
002.html). The initiative was funded jointly by the National
Human Genome Research Institute (NHGRI) and the National
Heart, Lung and Blood Institute (NHLBI). The stated goal was
to produce “a working draft version (3-4 fold sequence
coverage) of the rat genome sequence in two years or less.”
The specification that the result would be a working draft
version indicated that there was no intention to “finish” the
sequence, meaning that, although the draft was expected to be
of
Rat in Biomedical Research 25

high quality, it was intended that a certain proportion of the errors


would never be corrected. As of this writing, this is still the case.
Chapter 2 of this book contains a detailed analysis of the available
genome assemblies for rat. We will introduce the subject here.
Following the release of several early versions of the rat
genome between 2001 and 2003, the release of the RGSC 3.1
assembly in June of 2003 made rat the third mammalian genome
to be available as a high-quality draft assembly. In April of 2004,
Gibbs et al. published an article announcing the completion of the
first high quality draft of the rat assembly and reported on their
extensive analysis of the rat genome sequence and their
comparisons of the rat, human, and mouse reference sequences
[168]. The 3.x assem- bly was generated using a new dual-method
approach, utilizing both whole genome shotgun (WGS)
sequencing and low-coverage BAC sequencing. Additional
methods used to sup- port assembly of the sequence included the
development of “fingerprint contig (FPC)” maps of clones from
the Children’s Hospital Oakland Research Institute (CHORI-230)
rat BAC library, BAC end sequencing, and construction of a yeast
artificial chromosome (YAC)-based physical map. The methods
were devel- oped and utilized in parallel, reducing the amount of
time needed to produce the assembly. In addition, existing genetic
and radiation hybrid maps were used to support assembly and to
assess the quality of the resulting reference, allowing for robust
draft assembly. The sequence was further enhanced over time with
the incorporation of finished BAC sequences, e.g., of the
Encyclopedia of DNA Ele- ments (ENCODE) regions [169],
constituting minor upgrades of the assembly to RGSC 3.4 in
November 2004.
In 2008, Baylor College of Medicine released a new,
indepen- dent assembly of the rat genome sequence, the Rnor
4.0 assembly, and an upgrade of that assembly approximately a
year later. In addition to incorporating additional Solexa,
SOLiD, and 454 sequencing reads, the sequence was
assembled using a new version of the Atlas software which
employed a different method for merging overlapping eBACs and
WGS scaffolds (ftp://ftp.hgsc.
bcm.edu/Rnorvegicus/Rnor4.0/README_Rnor4.txt, ftp://ftp.
hgsc.bcm.edu/Rnorvegicus/Rnor4.1/READMErat4.1.txt).
However, although NCBI began the work of annotating the
latter assembly and fully integrating it into their data resources,
the 4.x versions were not generally accepted by the rat research
community and the assembly was not promoted to RefSeq
assembly status at NCBI
(https://www.ncbi.nlm.nih.gov/assembly/GCA_000001 895.2,
note that there is no GCF reference accession).
Many of the underlying problems with the assembly were
addressed in the Rnor_5.0 version of the rat reference. The Atlas
assembler used to build the initial rat reference assembly was
improved over time to better handle the BAC sequences that had
26 Jennifer R. Smith et al.

been used to build Rnor_3.1 [169]. Rnor_5.0 also featured


more cDNAs for annotating untranslated regions of protein-
coding genes and an updated rat-specific repeat library
(http://mar2015.
archive.ensembl.org/info/genome/genebuild/2012_04_rat_5_
genebuild.pdf). Additional improvements included the addition
of annotations of genes for small structured RNAs from RFAM
and miRBase, and a substantial increase in the number of genes
at NCBI with more than one transcript (from 303 genes for the
v3.4 assembly to almost 7400 genes for the v5.0 assembly).
The most recent assembly, Rnor_6.0, was released in 2014,
and includes additional BAC sequences, manual corrections, and
Pac- Bio data to fill scaffold gaps. It also includes the Y
chromosome from the SHR rat. More information on rat genome
assemblies and annotation pipelines can be found in Chapter 2 of
this book.

4.5 Genome The availability of whole genome sequence for the rat has resulted
- Related in increased availability of genome-dependent data such as
Data variants and RNA sequencing data.

4.5.1 Strain-Specific Along with upgrading and maintaining the BN reference, there
Variants have been many attempts to identify and catalog regions of genetic
variation in strains and substrains. As detailed earlier in this
chapter, the rat has served as a physiological model for over 100
years, and as multiple strains have been developed to study
pathophysiological processes, comparative analyses between
phenotypically similar and phenotypically disparate rat models
may yield insights into the underlying mechanisms. As such, a
number of whole genome sequencing projects were undertaken in
order to find and catalog these genetic differences between rat
strains.
Atanur et al. [170] sequenced the genomes of 27 rat strains,
including both cardiovascular and metabolic models of disease as
well as control strains, and determined variations using the
Rnor_3.4 assembly. Two years later, Hermsen et al. [171] not
only reanalyzed the sequence data for these 27 strains, but also
expanded the analysis by including an additional 13 strains,
aligning all of the sequences against the Rnor_5.0 assembly, and
calling the variants using updated software.
Findings from these and other similar studies suggested that in
pathophysiological models of diseases like hypertension, relevant
phenotypes may arise due to different combinations of genetic
factors, and this, in turn, reflects the complex nature of these
diseases in humans [170, 172].
When the Rnor 6.0 version of the rat reference genome was
released, an interim remapping of variants from some of the
strains was done, utilizing UCSC’s Batch Coordinate Conversion,
or “LiftOver” tool to convert the coordinates of SNPs from the
v5.0 assembly to the corresponding coordinates on the v6.0
assembly. This, however, does not always give uniformly reliable
results when
Rat in Biomedical Research 27

used for single nucleotide variants and small indels. Because of


this, a more recent project was undertaken at the Rat Genome
Database to again reanalyze the sequences for the various strains,
aligning the raw sequence reads to the Rnor 6.0 assembly and
using the latest software to call the variants.
The full set of strain-specific single nucleotide variants
(SNVs), small insertions and deletions and copy number variants
where these were called for all assemblies is available as variant
call format (.vcf) files from the RGD ftp site
(ftp://ftp.rgd.mcw.edu/pub/ strain_specific_variants/). In
addition, researchers can query the RGD database for strain-
specific SNVs and small insertions and deletions using the Variant
Visualizer tool (https://rgd.mcw.edu/
rgdweb/front/config.html). A more complete description of this
tool can be found in Chapter 3 of this book.

4.5.2 Variant ARCHIVES In May of 2017, NCBI announced that they would no longer
support the submission, storage, and presentation of nonhuman
variants, including single nucleotide polymorphisms in dbSNP and
larger structural variations in dbVar. Responsibility for all nonhu-
man variants was transitioned to the European Variation Archive
(EVA) at EMBL-EBI as of November of 2017
(https://www.ebi. ac.uk/eva/?Help#key-steps-transitional-
process). During and immediately following the initial transition
period, all nonhuman variants were transferred from NCBI’s
dbSNP database to the EVA. The EVA now accepts variant
submissions, assigns unique IDs to each variant and periodically
consolidates redundant variant records into single “reference
variant” records with the same rs-formatted IDs previously
assigned by dbSNP. In addition, EVA normalizes the data to
ensure that variant positions are standardized, annotates variant
effects, and calculates allele frequencies. Variants, whether from
small studies with only a few variants or from high- throughput
studies producing millions of variants, are submitted in VCF files.
A VCF validation software suite is provided so sub- mitters can
ensure their files are ready for loading to expedite the process. Rat
researchers unfamiliar with the validation and submis- sion
process can submit their variants to the Rat Genome Database.
RGD can incorporate the variants into the Variant Visualizer tool
and concurrently help with the submission process to the EVA.

4.5.3 RNA-Seq The advent of high-throughput sequencing technologies was


closely followed by the application of those technologies to
mapping and quantification of the transcriptome via RNA sequenc-
ing (RNA-Seq) [173]. Total or fractionated RNA from a tissue or a
population of cells, or more recently from a single cell, is sequenced
and the resulting “reads” are either aligned to a reference genome
or to a list of reference transcripts, or assembled de novo to form a
genome-wide transcription map consisting of the structure of the
28 Jennifer R. Smith et al.

transcripts (including alternatively spliced transcripts) and/or the


level of expression at either the gene or transcript level. RNA-
Seq’s wide dynamic range for detecting differences in expression,
and its potential for revealing differences in transcript structure
make it a powerful tool for comparison of transcriptomes between
organs, between species, between individuals or strains of the
same species, between developmental stages, or between
conditions.

4.5.4 PhenoGen PhenoGen Informatics (https://phenogen.org) comprises a data-


base and website that provide tools and data sets to explore
DNA variants, RNA expression, and QTLs. PhenoGen provides
a sub- stantial body of data for download, as well as a
genome/transcrip- tome browser to explore specific regions of
the genome and tools for gene list analysis. Functionality includes
tools for weighted gene co-expression network analysis, pathway
analysis, exon expression correlations, and promoter analysis.
PhenoGen also has data and tools for eQTL analysis.

4.5.5 GEO/ArrayExpress The Gene Expression Omnibus at NCBI (https://www.ncbi.nlm.


nih.gov/geo/, [174]) and ArrayExpress at EMBL-EBI
(https:// www.ebi.ac.uk/arrayexpress/, [175]) are the major public
reposi- tories for high-throughput functional genomics datasets,
including expression data derived from microarray and RNA-
Seq analyses, and ChIP-Seq and methylation profiling data
based on DNA sequencing, among others. Both repositories
accept data submis- sions that comply with the “Minimum
Information About A Microarray Experiment” (MIAME) and
“Minimum Information About a Sequencing Experiment”
(MINSEQE) guidelines. For all studies, these groups store the
experimental metadata (that is, information about the
experimental design, the samples and the submitters, with links
to applicable publications where available) as well as the
processed data, e.g., normalized expression values for array
experiments or FPKM/RPKM values for RNA-Seq experi-
ments. The raw sequence data for RNA-Seq, ChIP-Seq, and
other high-throughput sequencing assays, on the other hand, are
stored in NCBI’s Sequence Read Archive (SRA,
https://trace.ncbi.nlm. nih.gov/Traces/sra/sra.cgi) or EBI’s
European Nucleotide Archive (ENA,
https://www.ebi.ac.uk/ena).
GEO and ArrayExpress both offer analysis tools and pre-
analyzed results based on the data they are storing. GEO’s DataSet
records pages include functionality to compare samples within a
dataset to produce several types of cluster heatmaps for gene
expression, and the “Experiment design and value distribu- tion”
tool which shows box plots for the distribution of expression
values for each sample in a dataset, grouped into subsets deter-
mined by specific experimental variables, for example, samples
grouped by age or disease status. Along the same lines, GEO2R is
Rat in Biomedical Research 29

an online tool that uses R to compare groups of samples within a


dataset to identify differentially expressed genes. In addition to the
study-based presentation of data, GEO provides gene-level expres-
sion profiles. Each profile displays a chart showing the expression
level of a gene for each sample in a dataset. GEO Profiles make it
easy to see whether a particular gene was differentially expressed
across conditions. ArrayExpress provides an R Bioconductor pack-
age to access records and build Bioconductor data structures. In
addition, ArrayExpress data provide the basis for EBI’s
Expression Atlas tool (https://www.ebi.ac.uk/gxa/home).
Users can search for genes and/or conditions within or across
organisms, or browse experiments to view and download
expression results. As of this writing, the Expression Atlas
included 141 experiments for rat.

4.6 Gene
Manipulation Transgenic rat models have been generated for more than 30 years
in the Rat by DNA microinjection of donor DNA into embryos [176, 177]
and used to study the function of a gene of interest. Early methods
to create mutations within specific genes include ENU (N-ethyl-N-
nitrosourea) mutagenesis or through introduction of a Sleeping
Beauty transposon [178]. Both of these methods successfully iden-
tified mutations in targeted genes but are limited in efficiency and
specificity. Many pups need to be screened using a variety of stra-
tegies to identify positive mutant founders [179]. Although these
random mutagenesis strategies require large-scale screening
efforts to identify specific mutations, ENU-induced mutant models
have been archived at the Rat Resource and Research Center
(http:// www.rrrc.us/) and the PhysGen Knockout Program for
subsequent follow-up phenotyping [178]. Additionally, an archive
of ENU-induced mutant sperm was created by the Kyoto Univer-
sity Mutant Rat Archive [180] to be used for large-scale screening
or rederivation of models using intracytoplasmic sperm injection.
The Sleeping Beauty transposon system was implemented in rats
after successful use in mice [181, 182]. This strategy had several
advantages over ENU mutagenesis, including the ability to modify
both the transposon and transposases and also that only a few
transposon insertions were made in each founder [178]. Similar
to the ENU models, the Rat Resource and Research Center reposi-
tory has nearly 100 transposon-derived models available to
investigators.
The advent of site-directed nucleases to target specific genes
in rat embryos has rapidly changed the gene engineering landscape
for investigators using rat models [183]. These sequence-specific
nucleases, including zinc finger nucleases (ZFN), TALENs, and
CRISPR/Cas approaches, have rapidly allowed the rat genome to
be manipulated in ways previously only available in the mouse.
These strategies were initially used to create a double-stranded
break in the genomic DNA at specific locations in the genome.
These breaks are typically repaired by nonhomologous end joining
30 Jennifer R. Smith et al.

(NHEJ) which can lead to a loss or gain of DNA. Such insertions


or deletions can result in a knockout of the gene through removal
of protein coding information or introduction of a frameshift muta-
tion [183, 184]. In addition, a knockin can be created through
homology-directed repair (HDR) and the inclusion of a homolo-
gous DNA template. Genetically modified rats have been devel-
oped using ZFNs, TALENs, and the CRISPR/Cas9 approaches
using both NHEJ and HRD events. CRISPR/Cas is currently the
most widely used approach due the ease of use, efficiency, and
cost. In addition, the CRISPR/Cas system can be multiplexed,
allowing multiple founders to be generated from a single round of
embryo injections [184]. However, current reports suggested that
the CRISPR/Cas system results in more frequent off-target effects
than ZFNs and TALENs [183].
For laboratories without access to or expertise in embryo
microinjection, a new technique for in vivo genome editing,
genome editing via oviductal nucleic acids delivery (GONAD),
has shown promising results in rats. This new strategy (referred to
as i-GONAD [185] and rGONAD [186]) eliminates the need to
handle the embryos to manipulate the genome. “Improved
GONAD” (i-GONAD) was developed and tested in mice to bypass
the need for collection of embryos and microinjection (e.g., pro-
nuclear injection). Rather, early preimplantation embryos are mod-
ified by intraoviductal injection of Cas9 protein and synthetic
gRNAs, followed by in vivo electroporation. This technology has
been extended to rats with demonstration of highly efficient
knock- out and knockin modifications in two rat strains (DA and
WKY).
Unlike in mouse, rat embryonic stem cells (ESCs) have been
used less frequently for gene engineering, primarily due to
technical challenges [187]. Rat ESCs have been established for
several differ- ent rat strains [188–191] and have been successfully
used for genome editing [191]. Recent work has demonstrated the
success- ful use of the CRISPR/Cas9 system in rat ESCs to create
both in vitro and in vivo models [187]. These techniques have
the potential to create many new models to study early embryonic
development, focus on single specific cell types in vitro, and
poten- tially to humanize regions of the rat genome as has been
done in the mouse [184, 192–194].

4.7 Rat in the Rat is an excellent model for human disease, and in many cases of
Larger Context complex disease it is the model of choice. But for cases where
another model is preferred, or where additional information is
needed, a researcher needs to be able to leverage the research
done in other organisms and apply that information to their
system. In these cases, it is helpful, possibly even necessary, to be
able to access data for multiple species on a single site, and even in
a single view. To meet this need, groups like the Rat Genome
Database (RGD) and the Alliance of Genome Resources offer sites
that
Rat in Biomedical Research 31

incorporate data across multiple species. For comparative


purposes, RGD has always provided data for mouse and human, in
addition to rat. RGD curators manually assign disease and
pathway terms for all three species and import phenotype and
disease annotations for mouse and human to complement their
annotation of rat genes, QTLs and strains. In recent years,
however, RGD has expanded their repertoire of species to also
include chinchilla, 13-lined ground squirrel, dog, pig, and bonobo.
In this way, a researcher interested in otitis media, for which the
chinchilla is the model of choice, or retinal diseases, for which
squirrel is the better model, can easily compare data for that
species to a rich set of functional, disease, pathway, phenotype,
and gene-chemical interaction data for rat, mouse, and human.
The Alliance of Genome Resources, on the other hand, is a
collaboration between six model organism databases and the Gene
Ontology Consortium. The founding databases for this effort
include Saccharomyces Genome Database (SGD, [195]), Worm-
Base [196], FlyBase [197], Mouse Genome Database (MGD,
[198, 199]), Rat Genome Database (RGD), and the Zebrafish
Information Network (ZFIN, [200]). The Alliance is developing an
infrastructure and website to standardize data storage and pre-
sentation across seven species: human, mouse, rat, zebrafish, fly,
nematode, and yeast. The goal is a federated view which
highlights the diverse types of research performed using the
various organisms and pulls together these diverse data into a
unified presentation of the most complete information possible for
a given gene, genomic region, disease, phenotype or network to
drive research into human diseases forward.

5 Future Directions

Several large initiatives are building and extending rat


resources to accelerate the understanding of disease mechanisms,
mapping biol- ogy to the genome, and creating new tools to
build better animal models, datasets, and analytic tools. Two
rat-centric centers have been funded by the National Institute
on Drug Abuse (NIDA) to develop Centers of Excellence with
a focus on drug abuse and addiction. The NIDA Center for
GWAS in Outbred Rats
¼ (https://projectreporter.nih.gov/project_info_description.cfm?
aid 9464528&icde 42187407) uses
heterogenous stock rats
(HS) [130] and sophisticated genomic approaches to identify
asso- ciations, eQTLs, and genes influencing behavior. The
new NIDA
¼ Core Center of Excellence in Omics, Systems
Genetics, and the Addictome
(https://projectreporter.nih.gov/project_info_descrip tion.cfm?
aid 9531327&icde 42187459) is focused on providing
resources to the community for analysis of omics data sets to
link sequence variation, epigenetic factors, and environmental
factors to
32 Jennifer R. Smith et al.

phenotypes in order to further understanding of how the pheno-


type is altered by the genomic variation [201]. Although this
new center has a focus on rodent models of addiction, the new
methods, tools, and assembled datasets will be useful for many
complex diseases. A third rat-centric program, the Hybrid Rat
Diversity Program
(https://projectreporter.nih.gov/project_info_descrip
¼ tion.cfm?
aid 9488560&icde 42187641), was funded by the Office of the
Director at the National Institutes of Health to build a panel
similar to the Hybrid Mouse Diversity Panel [202] involving 96
rat strains to be used broadly to study the genetic and
phenotypic diversity in complex diseases. This 96-strain panel,
consisting of 33 classic inbred strains and two panels of
recombi- nant inbred strains, will include rederived rat models,
basic pheno- typic characterization, complete genomic sequencing
for all strains, and data analysis, integration, and dissemination
through the HRDP Portal at the Rat Genome Database. Saba
and Tabakoff have demonstrated how the first stages in the
HRDP are a renew- able resource that can be used by
investigators across many disci- plines to generate connectomes
to further investigate pathways and gene interactions associated
with complex traits [203].
On the horizon are genome editors and delivery mechanisms
to enable repair in somatic cells. The new NIH Somatic Cell
Genome Editing (SCGE) Program, launched in January 2018, has
funded projects to develop cell-specific and tissue-specific
delivery vehicles to target genome editing to specific cells [204].
Although rat is currently not a validation model, the breakthroughs
made within this program should be translatable to other model
organisms with the goal to develop treatments for human
disorders.

References
1. Lack JB, Hamilton MJ, Braun JK, Mares https://doi.org/10.1128/jvi. 00725-11
MA, Van Den Bussche RA (2013)
Comparative phylogeography of invasive
Rattus rattus and Rattus norvegicus in the
U.S. reveals distinct colonization histories
and dispersal. Biol Inva- sions 15(5):1067–
1087. https://doi.org/10. 1007/s10530-
012-0351-5
2. Courchamp F, Chapuis JL, Pascal M
(2003) Mammal invaders on islands:
impact, control and control impact. Biol
Rev Camb Philos Soc 78(3):347–383
3. Centers for Disease Control and
Prevention.
https://www.cdc.gov/rodents/diseases/
index.html. Accessed 15 Aug 2018
4. Lin XD, Guo WP, Wang W, Zou Y, Hao
ZY, Zhou DJ et al (2012) Migration of
Norway rats resulted in the worldwide
distribution of Seoul hantavirus today. J
Virol 86 (2):972–981.
5. Kosoy M, Khlyap L, Cosson J-F, Morand
S (2015) Aboriginal and invasive rats of
genus Rattus as hosts of infectious agents.
Vector Borne Zoonotic Dis 15(1):3–12.
https:// doi.org/10.1089/vbz.2014.1629
6. Harper GA, Bunbury N (2015) Invasive
rats on tropical islands: their population
biology and impacts on native species.
Glob Ecol Conserv 3:607–627.
https://doi.org/10.
1016/j.gecco.2015.02.010
7. Lindsey JR (1979) Historical foundations.
In: Baker HJ, Lindsey JR, Weisbroth SH
(eds) The laboratory rat: biology and
diseases, American College of Laboratory
Animal Medicine Series, vol 1. Academic
Press, New York, pp 1–36
8. Puckett EE, Park J, Combs M, Blum MJ,
Bryant JE, Caccone A et al (2016) Global
population divergence and admixture of the
brown rat (Rattus norvegicus). Proc Biol
Sci
Rat in Biomedical Research 33

283(1841):20161762. https://doi.org/10. 23. Osborne TB, Mendel LB (1916) The


1098/rspb.2016.1762 growth of rats upon diets of isolated food
9. Song Y, Lan Z, Kohn MH (2014) substances. Biochem J 10(4):534–538
Mitochon- drial DNA phylogeography of 24. Osborne TB, Mendel LB, Ferry EL, Wake-
the Norway rat. PLoS One 9(2):e88425. man AJ (1914) Amino-acids in nutrition
https://doi. and growth. J Biol Chem 17(3):325–349
org/10.1371/journal.pone.0088425 25. Osborne TB, Mendel LB, Ferry EL, Wake-
10. Zeng L, Ming C, Li Y, Su L-Y, Su Y-H, man AJ (1917) The relative value of certain
Otecko NO et al (2016) Evolutionary history proteins and protein concentrates as supple-
of the brown rat: out of southern East Asia ments to corn gluten. J Biol Chem 29 (1):69–
and selection. bioRxiv. 92
https://doi.org/10. 1101/096800 26. Osborne TB, Mendel LB, Ferry EL, Wake-
11. Philipeaux JM (1856) Note sur l extirpation man AJ (1914) The influence of cod liver
des capsules surre´nales chez les rats oil and some other fats on growth. J Biol
albinos (Mus ratus), vol XLIII. Comptes Chem 17(3):401–408
Rendus, Paris, pp 904–906 27. Ferry EL (1919) Nutrition experiments with
12. Hunt JR (1993) Animal models of human rats. A description of methods and technic. J
nutrition. In: Macrae R, Robinson RK, Sadler Lab Clin Med 5(11):735–745.
MJ (eds) Encyclopaedia of food science, food https://doi.
technology, and nutrition, vol 1. Academic org/10.5555/uri:pii:S0022214320900161
Press, Ltd., San Diego, CA, pp 188–194 28. Nelson EM, Steenbock H (1925) Fat-
13. Savory W (1863) Experiments on food; its soluble vitamins: XXI. Observations
destination and uses. Lancet 81 bearing on the alleged induction of growth-
(2066):381–383. promoting prop- erties in air by irradiation
https://doi.org/10.1016/ S0140- with ultra-violet light. J Biol Chem
6736(02)65694-6 62(3):575–593
14. Savory WS (1863) III. Experiments on 29. Steenbock H (1924) The induction of
food; its destination and uses. Proc R Soc growth promoting and calcifying properties
Lond 12:121–123. in a ration by exposure to light. Science 60
https://doi.org/10.1098/rspl. 1862.0019 (1549):224–225.
15. Hopkins FG (1912) Feeding experiments https://doi.org/10.1126/
illustrating the importance of accessory fac- science.60.1549.224
tors in normal dietaries. J Physiol 44 (5– 30. Sherman HC, Pappenheimer AM (1921)
6):425–460 Experimental rickets in rats: I. A diet produc-
16. McCollum EV (1953) My early experiences ing rickets in white rats, and its prevention by
in the study of foods and nutrition. Annu the addition of an inorganic salt. J Exp Med
Rev Biochem 22:1–16. 34(2):189–198
https://doi.org/10. 31. Sherman HC, Campbell HL (1928) The
1146/annurev.bi.22.070153.000245 influence of food upon longevity. Proc Natl
17. McCollum EV, Davis M (1913) The Acad Sci U S A 14(11):852–855
necessity of certain lipins in the diet during 32. Sherman HC, Trupp HY (1949) Further
growth. J Biol Chem 15(1):167–175 experiments with vitamin A in relation to
18. Osborne TB, Mendel LB, Ferry EL, Wake- aging and to length of life. Proc Natl Acad
man AJ (1913) The relation of growth to Sci U S A 35(2):90–92
the chemical constituents of the diet. J Biol 33. Baker DH (2008) Animal models in
Chem 15(2):311–326 nutrition research. J Nutr 138(2):391–396
19. Semba RD (2012) On the ‘discovery’ of 34. Kuramoto T (2011) Yoso-tama-no-kakeha-
vita- min A. Ann Nutr Metab 61(3):192– shi; the first Japanese guidebook on raising
198. https://doi.org/10.1159/000343124 rats. Exp Anim 60(1):1–6
20. McCollum EV, Davis M (1915) The nature 35. Castle WE (1947) The domestication of the
of the dietary deficiencies of rice. J Biol rat. Proc Natl Acad Sci U S A 33(5):109–117
Chem 23 (1):181–230 36. MacCurdy H, Castle WE (1907) Selection
21. McCollum EV, Davis M (1915) The and cross-breeding in relation to the inheri-
essential factors in the diet during growth. tance of coat-pigments and coat-patterns in
J Biol Chem 23(1):231–246 rats and guinea-pigs. The Carnegie Institu-
22. McCollum EV, Kennedy C (1916) The die- tion of Washington, Washington, DC
tary factors operating in the production of 37. Castle WE, Phillips JC (1914) Piebald rats
polyneuritis. J Biol Chem 24(4):491–502 and selection; an experimental test of the
34 Jennifer R. Smith et al.

effectiveness of selection and of the theory of 56. Donaldson HH (1915) The rat: reference
gametic purity in Mendelian crosses. The tables and data for the albino rat and the Nor-
Car- negie Institution of Washington, way rat. The Wistar Institute of Anatomy and
Washington, DC Biology, Philadelphia
38. Castle WE, Wright S (1915) Two color muta- 57. Donaldson HH (1924) The rat: data and ref-
tions of rats which show partial coupling. erence tables for the albino rat (Mus norve-
Sci- ence 42(1075):193–195. gius albinus) and the Norway rat (Mus
https://doi.org/ norvegius). The Wistar Institute of Anatomy
10.1126/science.42.1075.193 and Biology, Philadelphia
39. Castle WE (1919) Piebald rats and the 58. Ogilvie MB (2007) Inbreeding, eugenics,
theory of genes. Proc Natl Acad Sci U S A and Helen Dean King (1869–1955). J Hist
5 (4):126–130 Biol 40(3):467–507.
40. Castle WE, Wachter WL (1924) Variations https://doi.org/10.1007/ s10739-006-
of linkage in rats and mice. Genetics 9(1):1– 9117-1
12 59. King HD, Donaldson HH (1929) Life pro-
41. Castle WE (1925) A sex difference in linkage cesses and size of the body and organs of the
in rats and mice. Genetics 10(6):580–582 gray Norway rat during ten generations in
42. King HD, Castle WE (1935) Linkage studies captivity. In: Stockard CR, Evans HM (eds)
of the rat (Rattus norvegicus). Proc Natl Acad The American Anatomical Memoirs, vol 14.
Sci U S A 21(6):390–399 The Wistar Institute of Anatomy and Biology,
43. King HD, Castle WE (1937) Linkage studies Philadelphia
of the rat (Rattus norvegicus): II. Proc Natl 60. King HD (1939) Life processes in gray Nor-
Acad Sci U S A 23(2):56–60 way rats during fourteen years in captivity.
44. Castle WE, King HD (1940) Linkage studies In: Stockard CR, Evans HM (eds) The
of the rat (Rattus norvegicus): III. Proc Natl American Anatomical Memoirs, vol 17. The
Acad Sci U S A 26(9):578–580 Wistar Institute of Anatomy and Biology,
45. Castle WE, King HD (1941) Linkage studies Philadelphia
of the rat (Rattus norvegicus): V. Proc Natl 61. Wilkins AS, Wrangham RW, Fitch WT
Acad Sci U S A 27(8):394–398 (2014) The “domestication syndrome” in
46. Castle WE, King HD, Daniels AL (1941) mammals: a unified explanation based on
Linkage studies of the rat (Rattus neural crest cell behavior and genetics.
norvegicus): Genetics 197 (3):795–808.
https://doi.org/10.1534/
IV. Proc Natl Acad Sci U S A 27(6):250–254
genetics.114.165423
47. Castle WE (1941) Influence of certain
62. Stewart CC (1898) Variations in daily
color mutations on body size in mice, rats,
activity produced by alcohol and by
and rabbits. Genetics 26(2):177–191
changes in baro- metric pressure and diet,
48. Castle WE (1944) Linkage of Waltzing in with a description of recording methods.
the rat. Proc Natl Acad Sci U S A Am J Phys 1(1):40–56.
30(9):226–230 https://doi.org/10.1152/ajplegacy.1898.1.
49. Castle WE, King HD (1944) Linkage studies 1.40
of the rat (Rattus norvegicus): VI. Proc Natl 63. Willard SS (1901) Experimental Study of the
Acad Sci U S A 30(4):79–82 Mental Processes of the Rat. II. Am J Psychol
50. Castle WE (1946) Linkage in the albino 12(2):206–239.
chro- mosome of the rat. Proc Natl Acad https://doi.org/10.2307/ 1412534
Sci U S A 32(2):33–36 64. Watson JB (1903) Animal education; an
51. Castle WE, King HD (1947) Linkage experimental study on the psychical develop-
studies of the rat. J Hered 38(11):341–344 ment of the white rat, correlated with the
52. Castle WE, King HD (1948) Linkage growth of its nervous system. The University
studies of the rat: IX. Cataract. Proc Natl of Chicago Press, Chicago, p 122
Acad Sci U S A 34(4):135–136 65. Watson JB (1914) Behavior: an
53. Castle WE, King HD (1949) Linkage studies introduction to comparative psychology.
of the rat: X. Proc Natl Acad Sci U S A 35 Holt, New York
(9):545–546 66. Schulkin J, Rozin P, Stellar E (1994) Curt
54. Castle WE (1951) Variation in the hooded P. Richter – February 20, 1894–December
pattern of rats, and a new allele of hooded. 21, 1988. Biogr Mem Natl Acad Sci 65:311–
Genetics 36(3):254–266 320
55. Conklin EG (1938) Biographical memoir of 67. Richter CP (1922) A behavioristic study of
Henry Herbert Donaldson 1857–1938. the activity of the rat. Williams & Wilkins
Biogr Mem Natl Acad Sci 20:227–243 Company, Baltimore
Rat in Biomedical Research 35

68. Richter CP (1936) Increased salt appetite in 80. Evans HM, Becks H (1948) The gigantism
adrenalectomized rats. Am J Phys 115 produced in normal rats by injection of the
(1):155–161. pituitary growth hormone; skeletal changes;
https://doi.org/10.1152/ tibia, costochondral junction, and caudal ver-
ajplegacy.1936.115.1.155 tebrae. Growth 12(1):43–54
69. Richter CP (1953) Experimentally produced 81. Li CH, Simpson ME, Evans HM (1948)
behavior reactions to food poisoning in wild The gigantism produced in normal rats by
and domesticated rats. Ann N Y Acad Sci 56 injec- tion of the pituitary growth hormone;
(2):225–239 main chemical components of the body.
70. Richter CP (1968) Experiences of a reluctant Growth 12 (1):39–42
rat-catcher: the common Norway rat-friend 82. Evans HM, Sompson ME, Li CH (1948)
or enemy? Proc Am Philos Soc 112 (6):403– The gigantism produced in normal rats by
415 injec- tion of the pituitary growth hormone;
71. Stotsenberg J (1909) On the growth of the body growth and organ changes. Growth 12
albino rat (Mus norvegicus var. albus) after (1):15–32
castration. Anat Rec 3(4):233–244. 83. Li CH, Simpson ME, Evans HM (1949)
https:// doi.org/10.1002/ar.1090030410 Influence of growth and
72. Stotsenburg JM (1913) The effect of spaying adrenocorticotropic hormones on the body
and semi-spaying young albino rats (Mus composition of hypo- physectomized rats.
nor- vegicus albinus) on the growth in body Endocrinology 44 (1):71–75.
weight and body length. Anat Rec 7(6):183– https://doi.org/10.1210/endo- 44-1-71
194. 84. Asling CW, Walker DG, Simpson ME, Evans
https://doi.org/10.1002/ar.1090070602 HM (1950) Differences in the skeletal devel-
73. Hammett FS (1924) Studies of the thyroid opment attained by 60-day-old female rats
apparatus. XX. The effect of thyro- hypophysectomized at ages varying from
parathyroidectomy and parathyroidectomy at 6 to 28 days. Anat Rec 106(4):555–569
75 days of age on the growth of the brain 85. Walker DG, Simpson ME, Asling CW, Evans
and spina cord of male and female albino HM (1950) Growth and differentiation in the
rats. J Comp Neurol 37(1):15–30. rat following hypophysectomy at 6 days of
https://doi.org/ 10.1002/cne.900370103 age. Anat Rec 106(4):536–554
74. Long JA, Evans HM (1922) The oestrous 86. Koneff AA, Moon HD, Simpson ME, Li
cycle in the rat and its associated phenomena, CH, Evans HM (1951) Neoplasms in rats
vol 6. Memoirs of the University of treated with pituitary growth hormone. IV
California. University of California Press, Pituitary gland. Cancer Res 11(2):113–117
Berkeley, CA 87. Van Dyke DC, Garcia JF, Simpson ME, Huff
75. Evans HM, Long JA (1921) The effect of RL, Contopoulos AN, Evans HM (1952)
feeding the anterior lobe of the hypophysis Maintenance of circulating red cell volume in
on the oestrous cycle of the rat. Anat Rec 21 rats after removal of the posterior and inter-
(1):62. mediate lobes of the pituitary. Blood 7
https://doi.org/10.1002/ar. 1090210105 (10):1005–1016
76. Evans HM, Long JA (1921) The effect of 88. Asling CW, Walker DG, Simpson ME, Li
the anterior lobe administered CH, Evans HM (1952) Deaths in rats
intraperitoneally upon growth, maturity, submitted to hypophysectomy at an extremely
and oestrous cycles of the rat. Anat Rec early age and the survival effected by growth
21(1):62–63. https:// hormone. Anat Rec 114(1):49–65
doi.org/10.1002/ar.1090210105 89. Walker DG, Asling CW, Simpson ME, Li
77. Simpson ME, Evans HM (1946) Comparison CH, Evans HM (1952) Structural alterations
of the spermatogenic and androgenic proper- in rats hypophysectomized at six days of age
ties of testosterone propionate with those of and their correction with growth hormone.
pituitary ICSH in hypophysectomized 40-day Anat Rec 114(1):19–47
old male rats. Endocrinology 39(5):281–285. 90. Fels IG, Simpson ME, Evans HM (1953) The
https://doi.org/10.1210/endo-39-5-281 effect of magnesium ion upon the alkaline
78. Li CH, Kalman C, Evans HM (1947) The phosphatase activity in the thyroid of the
effect of the hypophyseal growth hormone hypophysectomized rat. J Biol Chem 204
on the alkaline phosphatase of rat plasma. J (2):807–814
Biol Chem 169(3):625–629 91. Nelson MM, Lyons WR, Evans HM (1953)
79. Evans HM, Becks H, Asling CW, Li CH Comparison of ovarian and pituitary
(1947) Gigantism produced in normal female
rats by chronic treatment with pure pituitary
growth hormone. Anat Rec 97(3):333
36 Jennifer R. Smith et al.

hormones for maintenance of pregnancy in stroke in spontaneously hypertensive rats.


pyridoxine-deficient rats. Endocrinology 52 Am J Phys 230(5):1354–1359.
(5):585–589. https://doi.org/10.1210/ https://doi.
endo-52-5-585 org/10.1152/ajplegacy.1976.230.5.1354
92. Contopoulos AN, Van Dyke DC, Ellis S, 104. Hazama F, Ooshima A, Tanaka T,
Simpson ME, Lawrence JH, Evans HM Tomimoto K, Okamoto K (1975) Vascular
(1955) Prevention of neonatal anemia in the lesions in the various substrains of spontane-
rat by the pituitary erythropoietic factor. ously hypertensive rats and the effects of
Blood 10(2):115–119 chronic salt ingestion. Jpn Circ J 39(1):7–22
93. Wooten E, Nelson MM, Simpson ME, Evans 105. Kunert MP, Drenjancevic-Peric I, Dwinell
HM (1955) Effect of pyridoxine deficiency MR, Lombard JH, Cowley AW Jr, Greene
on the gonadotrophic content of the anterior AS et al (2006) Consomic strategies to local-
pituitary in the rat. Endocrinology 56 (1):59– ize genomic regions related to vascular reac-
66. https://doi.org/10.1210/endo- 56-1- tivity in the Dahl salt-sensitive rat. Physiol
59 Genomics 26(3):218–225.
94. Wooten E, Nelson MM, Simpson ME, Evans https://doi.org/
HM (1958) Response of vitamin B6-deficient 10.1152/physiolgenomics.00004.2006
rats to hypophyseal follicle-stimulating and 106. Garrett MR, Dene H, Walder R, Zhang QY,
interstitial-cell-stimulating hormones. Endo- Cicila GT, Assadnia S et al (1998) Genome
crinology 63(6):860–866. scan and congenic strains for blood pressure
https://doi.org/ 10.1210/endo-63-6-860 QTL using Dahl salt-sensitive rats. Genome
95. King CG (1975) Biographical memoir of Res 8(7):711–723
Henry Clapp Sherman. Natl Acad Sci Bio- 107. Ye P, West MJ (2003) Cosegregation analysis
graph Memoirs. 395–429 of natriuretic peptide genes and blood pres-
96. Smirk FH, Hall WH (1958) Inherited sure in the spontaneously hypertensive rat.
hyper- tension in rats. Nature Clin Exp Pharmacol Physiol 30(12):930–936
182(4637):727–728 108. Dupont J, Dupont JC, Froment A, Milon H,
97. Phelan EL, Smirk FH (1960) Cardiac hyper- Vincent M (1973) Selection of three strains
trophy in genetically hypertensive rats. J of rats with spontaneously different levels of
Pathol Bacteriol 80:445–448 blood pressure. Biomedicine 19(1):36–41
98. Dahl LK, Heine M, Tassinari L (1962) 109. Bilusic M, Bataillard A, Tschannen MR,
Effects of chronia excess salt ingestion. Gao L, Barreto NE, Vincent M et al (2004)
Evidence that genetic factors play an Mapping the genetic determinants of
important role in sus- ceptibility to hypertension, met- abolic diseases, and
experimental hypertension. J Exp Med related phenotypes in the lyon hypertensive
115:1173–1190 rat. Hypertension 44 (5):695–701.
99. Moreno C, Dumas P, Kaldunski ML, https://doi.org/10.1161/01.
Tonel- lato PJ, Greene AS, Roman RJ et al HYP.0000144542.57306.5e
(2003) Genomic map of cardiovascular 110. Yamori Y, Ohta K, Ohtaka M, Nara Y,
phenotypes of hypertension in female Dahl Horie R, Ooshima A (1978) Glucose metab-
S rats. Physiol Genomics 15(3):243–257. olism in spontaneously hypertensive rats. Jpn
https://doi.org/ Heart J 19(4):559–560
10.1152/physiolgenomics.00105.2003 111. Mondon CE, Reaven GM (1988) Evidence of
100. Baker JE, Konorev EA, Gross GJ, Chilian abnormalities of insulin metabolism in rats
WM, Jacob HJ (2000) Resistance to with spontaneous hypertension. Metabolism
myocar- dial ischemia in five rat strains: is 37(4):303–305
there a genetic component of 112. Kawano K, Hirashima T, Mori S, Saitoh Y,
cardioprotection? Am J Physiol Heart Circ Kurosumi M, Natori T (1992) Spontaneous
Physiol 278(4): H1395–H1400. long-term hyperglycemic rat with diabetic
https://doi.org/10.1152/ complications. Otsuka Long-Evans Tokush-
ajpheart.2000.278.4.H1395 ima Fatty (OLETF) strain. Diabetes 41
101. Okamoto K, Aoki K (1963) Development of (11):1422–1428
a strain of spontaneously hypertensive rats. 113. Kawano K, Hirashima T, Mori S, Natori T
Jpn Circ J 27:282–293 (1994) OLETF (Otsuka Long-Evans
102. Okamoto K, Yamori Y, Oshima A, Tanaka T Tokushima Fatty) rat: a new NIDDM rat
(1972) Development of substrains in sponta- strain. Diabetes Res Clin Pract 24(Suppl):
neously hypertensive rats: genealogy, iso- S317–S320
zymes and effect of hypercholesterolemic 114. Zucker LM (1960) Two-way selection for
diet. Jpn Circ J 36(5):461–470 body size in rats, with observations on
103. Nagaoka A, Iwatsuka H, Suzuoki Z, Oka-
moto K (1976) Genetic predisposition to
Rat in Biomedical Research 37
2005.00029.x
simultaneous changes in coat color pattern
and hood size. Genetics 45(4):467–483
115. Zucker LM, Zucker TF (1961) Fatty, a new
mutation in the rat. J Hered 52(6):275–278.
https://doi.org/10.1093/oxfordjournals.
jhered.a107093
116. Peterson RG, Shaw WN, Neel M-A, Little
LA, Eichberg J (1990) Zucker diabetic
fatty rat as a model for non-insulin-
dependent dia- betes mellitus. ILAR J
32(3):16–19. https://
doi.org/10.1093/ilar.32.3.16
117. Shimabukuro M, Zhou YT, Levi M, Unger
RH (1998) Fatty acid-induced beta cell
apo- ptosis: a link between obesity and
diabetes. Proc Natl Acad Sci U S A
95(5):2498–2502
118. Tayebati SK, Tomassoni D, Amenta F (2012)
Spontaneously hypertensive rat as a model of
vascular brain disorder: microanatomy, neu-
rochemistry and behavior. J Neurol Sci 322
(1–2):241–249.
https://doi.org/10.1016/j. jns.2012.05.047
119. Sarkisova K, van Luijtelaar G (2011) The
WAG/Rij strain: a genetic animal model of
absence epilepsy with comorbidity of depres-
sion [corrected]. Prog Neuro-
Psychopharmacol Biol Psychiatry 35
(4):854–876.
https://doi.org/10.1016/j.
pnpbp.2010.11.010
120. Malkesman O, Weller A (2009) Two
different putative genetic animal models of
childhood depression--a review. Prog
Neurobiol 88 (3):153–169.
https://doi.org/10.1016/j.
pneurobio.2009.03.003
121. Malkesman O, Braw Y, Zagoory-Sharon O,
Golan O, Lavi-Avnon Y, Schroeder M et al
(2005) Reward and anxiety in genetic animal
models of childhood depression. Behav Brain
Res 164(1):1–10.
https://doi.org/10.1016/
j.bbr.2005.04.023
122. Overstreet DH (2012) Modeling depression
in animal models. Methods Mol Biol
829:125–144.
https://doi.org/10.1007/ 978-1-61779-
458-2_7
123. Mardones J, Segovia-Riquelme N (1983)
Thirty-two years of selection of rats by etha-
nol preference: UChA and UChB strains.
Neurobehav Toxicol Teratol 5(2):171–178
124. Eriksson K (1968) Genetic selection for
vol- untary alcohol consumption in the
albino rat. Science 159(3816):739–741.
https://doi.
org/10.1126/science.159.3816.739
125. Bell RL, Rodd ZA, Lumeng L, Murphy JM,
McBride WJ (2006) The alcohol-preferring P
rat and animal models of excessive alcohol
drinking. Addict Biol 11(3–4):270–288.
https://doi.org/10.1111/j.1369-1600.
126. Li TK, Lumeng L, Doolittle DP (1993)
Selective breeding for alcohol preference
and associated responses. Behav Genet 23
(2):163–170
127. Li TK, Lumeng L, Doolittle DP, Carr LG
(1991) Molecular associations of alcohol-
seeking behavior in rat lines selectively bred
for high and low voluntary ethanol drinking.
Alcohol Alcohol Suppl 1:121–124
128. Colombo G, Agabio R, Lobina C, Reali R,
Zocchi A, Fadda F, Gessa GL (1995)
Sardin- ian alcohol-preferring rats: a genetic
animal model of anxiety. Physiol Behav 57
(6):1181–1185
129. Fadda F, Mosca E, Colombo G, Gessa GL
(1989) Effect of spontaneous ingestion of
ethanol on brain dopamine metabolism. Life
Sci 44(4):281–287
130. Parker CC, Chen H, Flagel SB, Geurts
AM, Richards JB, Robinson TE et al
(2014) Rats are the smart choice: Rationale
for a renewed focus on rats in behavioral
genetics. Neuropharma- cology 76(Pt
B):250–258. https://doi.org/
10.1016/j.neuropharm.2013.05.047
131. Murphy JM, Stewart RB, Bell RL, Badia-
Elder NE, Carr LG, McBride WJ et al
(2002) Phenotypic and genotypic
characteri- zation of the Indiana
University rat lines selectively bred for
high and low alcohol pref- erence. Behav
Genet 32(5):363–388
132. Windisch KA, Kosobud AE, Czachowski
CL (2014) Intravenous alcohol self-
administration in the P rat. Alcohol 48
(5):419–425.
https://doi.org/10.1016/j.
alcohol.2013.12.007
133. Bullock FD, Curtis MR (1920) The
experi- mental production of sarcoma of
the liver of rats. Proc Meet NY Pathol Soc
20:149–175
134. Curtis MR, Dunning WF, Bullock FD
(1933) Is malignancy due to a process
analogous to somatic mutation? Science 77
(1989):175–176.
https://doi.org/10.1126/
science.77.1989.175
135. Dunning WF, Curtis MR (1946) The respec-
tive roles of longevity and genetic
specificity in the occurrence of spontaneous
tumors in the hybrids between two inbred
lines of rats. Can- cer Res 6:61–81
136. Dunning WF, Curtis MR (1954) Further
studies on the relation of dietary
tryptophan to the induction of neoplasms
in rats. Cancer Res 14(4):299–302
137. Dunning WF, Curtis MR (1952) The inci-
dence of diethylstilbestrol-induced cancer in
reciprocal F hybrids obtained from crosses
between rats of inbred lines that are
suscepti- ble and resistant to the induction
of
38 Jennifer R. Smith et al.

mammary cancer by this agent. Cancer Res section of the National Institutes of Health,
12 (10):702–706 in “Rat Quality: A Consideration of
138. Dunning WF, Curtis MR, Madsen ME Heredity, Diet and Disease.” Proceedings
(1947) The induction of neoplasms in five of the Sym- posium Held at Columbia
strains of rats with acetylaminofluorene. Can- University, College of Physicians and
cer Res 7(3):134–140 Surgeons, New York, January 31, 1952. Q
139. Dunning WF, Curtis MR, Maun ME (1949) Rev Biol 30:4. https://
The effect of dietary fat and carbohydrate on doi.org/10.1086/401094
diethylstilbestrol-induced mammary cancer 150. Wilkinson MD, Dumontier M, Aalbersberg
in rats. Cancer Res 9(6):354–361 IJ, Appleton G, Axton M, Baak A et al
140. Dunning WF, Curtis MR, Segaloff A (2016) The FAIR Guiding Principles for sci-
(1953) Strain differences in response to entific data management and stewardship. Sci
estrone and the induction of mammary Data 3:160018. https://doi.org/10.1038/
gland, adrenal, and bladder cancer in rats. sdata.2016.18
Cancer Res 13 (2):147–152 151. FAIR principles for data stewardship
141. Dunning WF, Curtis MR, Stevens M (1968) (2016) Nat Genetics 48(4):343.
Comparative carcinogenic activity of https://doi.org/10. 1038/ng.3544
dimethyl and trimethyl derivatives of 152. Shimoyama M, De Pons J, Hayman GT, Lau-
benz(a)anthra- cene in Fischer line 344 rats. lederkind SJ, Liu W, Nigam R et al (2015)
Proc Soc Exp Biol Med 128(3):720–722 The Rat Genome Database 2015: genomic,
142. Dunning WF, Curtis MR, Stevens ML, phenotypic and environmental variations and
Dumenigo F (1967) Five transplantable leu- disease. Nucleic Acids Res 43(Database
kemias in the Fischer rat, and their respon- issue):D743–D750.
siveness to steroids. Cancer Res 27(6 Pt https://doi.org/10. 1093/nar/gku1026
2):696–727 153. Laulederkind SJ, Liu W, Smith JR, Hayman
143. Zeiger E (2017) Reflections on a career and GT, Wang SJ, Nigam R et al (2013) Pheno-
on the history of genetic toxicity testing in Miner: quantitative phenotype curation at the
the National Toxicology Program. Mutation rat genome database. Database 2013:bat015.
Res 773:282–292. https://doi.org/10.1093/database/bat015
https://doi.org/10.1016/j. 154. Wang SJ, Laulederkind SJ, Hayman GT,
mrrev.2017.03.002 Petri V, Liu W, Smith JR et al (2015)
144. Bucher JR (2002) The National Toxicology Pheno- Miner: a quantitative phenotype
Program rodent bioassay: designs, interpreta- database for the laboratory rat, Rattus
tions, and scientific contributions. Ann N Y norvegicus. Appli- cation in hypertension
Acad Sci 982:198–207 and renal disease. Database 2015:bau128.
145. King-Herbert A, Thayer K (2006) NTP https://doi.org/10. 1093/database/bau128
workshop: animal models for the NTP 155. Shimoyama M, Nigam R, McIntosh LS,
rodent cancer bioassay: stocks and strains-- Nagarajan R, Rice T, Rao DC, Dwinell MR
should we switch? Toxicol Pathol (2012) Three ontologies to define
34(6):802–805. https://doi.org/10.1080/ phenotype measurement data. Front Genet
01926230600935938 3:87. https://
146. King-Herbert AP, Sills RC, Bucher JR doi.org/10.3389/fgene.2012.00087
(2010) Commentary: update on animal 156. Smith JR, Park CA, Nigam R,
models for NTP studies. Toxicol Pathol 38 Laulederkind SJ, Hayman GT, Wang SJ et
(1):180–181. al (2013) The clinical measurement,
https://doi.org/10.1177/ measurement method and experimental
0192623309356450 condition ontologies: expansion,
147. Kim U, Clifton KH, Furth J (1960) A highly improvements and new applica- tions. J
inbred line of Wistar rats yielding spontane- Biomed Semantics 4(1):26. https://
ous mammo-somatotropic pituitary and doi.org/10.1186/2041-1480-4-26
other tumors. J Natl Cancer Inst 24:1031– 157. Nigam R, Munzenmaier DH, Worthey EA,
1055 Dwinell MR, Shimoyama M, Jacob HJ
148. Shimoyama M, Smith JR, Bryda E, (2013) Rat Strain Ontology: structured con-
Kuramoto T, Saba L, Dwinell M (2017) Rat trolled vocabulary designed to facilitate
genome and model resources. ILAR J 58 access to strain data at RGD. J Biomed
(1):42–58. Semantics 4 (1):36.
https://doi.org/10.1093/ilar/ ilw041 https://doi.org/10.1186/2041- 1480-4-36
149. Poiley SM (1955) History and information 158. Hayman GT, Laulederkind SJ, Smith JR,
concerning the rat colonies in the animal Wang SJ, Petri V, Nigam R et al (2016) The
disease portals, disease-gene annotation and
the RGD disease ontology at the Rat Genome
Rat in Biomedical Research 39

Database. Database 2016:baw034. https:// (3):273–282. https://doi.org/10.1152/


doi.org/10.1093/database/baw034 physiolgenomics.00208.2007
159. The UniProt Consortium (2018) UniProt: 170. Atanur SS, Diaz AG, Maratou K, Sarkis A,
the universal protein knowledgebase. Nucleic Rotival M, Game L et al (2013) Genome
Acids Res 46(5):2699. sequencing reveals loci under artificial
https://doi.org/10. 1093/nar/gky092 selec- tion that underlie disease phenotypes
160. UniProt Consortium (2015) UniProt: a hub in the laboratory rat. Cell 154(3):691–703.
for protein information. Nucleic Acids Res https://
43 (Database issue):D204–D212. doi.org/10.1016/j.cell.2013.06.040
https://doi. org/10.1093/nar/gku989 171. Hermsen R, de Ligt J, Spee W, Blokzijl F,
161. Levan G, Klinga K, Szpirer C, Szpirier J Schafer S, Adami E et al (2015) Genomic
(1990) Gene map of the rat (Rattus norvegi- landscape of rat strain and substrain
cus). In: O’Brien Stephen J (ed) Genetic variation. BMC Genomics 16:357.
maps: locus maps of complex genomes, vol https://doi.org/10. 1186/s12864-015-
4, 5th edn. Cold Spring Harbor Laboratory, 1594-1
Cold Spring Harbor NY, pp 4.80–84.87 172. Baud A, Hermsen R, Guryev V, Stridh P,
162. Levan G, Szpirer J, Szpirer C, Klinga K, Graham D, McBride MW et al (2013) Com-
Hanson C, Islam MQ (1991) The gene map bined sequence-based and genetic mapping
of the Norway rat (Rattus norvegicus) and analysis of complex traits in outbred rats. Nat
comparative mapping with mouse and man. Genetics 45(7):767–775. https://doi.
Genomics 10(3):699–718 org/10.1038/ng.2644
163. Jacob HJ, Lindpaintner K, Lincoln SE, 173. Wang Z, Gerstein M, Snyder M (2009)
Kusumi K, Bunker RK, Mao YP et al (1991) RNA-Seq: a revolutionary tool for transcrip-
Genetic mapping of a gene causing hyperten- tomics. Nat Reviews Genet 10(1):57–63.
sion in the stroke-prone spontaneously hyper- https://doi.org/10.1038/nrg2484
tensive rat. Cell 67(1):213–224 174. Barrett T, Wilhite SE, Ledoux P,
164. Brown DM, Matise TC, Koike G, Simon Evangelista C, Kim IF, Tomashevsky M et al
JS, Winer ES, Zangen S et al (1998) An (2013) NCBI GEO: archive for functional
integrated genetic linkage map of the genomics data sets--update. Nucleic Acids
labora- tory rat. Mamm Genome 9(7):521– Res 41(Database issue):D991–D995.
530 https://doi.org/10.1093/nar/gks1193
165. Jacob HJ, Brown DM, Bunker RK, Daly 175. Kolesnikov N, Hastings E, Keays M,
MJ, Dzau VJ, Goodman A et al (1995) A Melnichuk O, Tang YA, Williams E et al
genetic linkage map of the laboratory rat, (2015) ArrayExpress update--simplifying
Rattus nor- vegicus. Nat Genetics 9(1):63– data submissions. Nucleic Acids Res
69. https:// doi.org/10.1038/ng0195-63 43(Data- base issue):D1113–D1116.
166. Kwitek AE, Gullings-Handley J, Yu J, Carlos https://doi.org/ 10.1093/nar/gku1057
DC, Orlebeke K, Nie J et al (2004) High- 176. Dwinell MR, Lazar J, Geurts AM (2011) The
density rat radiation hybrid maps containing emerging role for rat models in gene discov-
over 24,000 SSLPs, genes, and ESTs provide ery. Mamm Genome 22(7–8):466–475.
a direct link to the rat genome sequence. https://doi.org/10.1007/s00335-011-
Genome Res 14(4):750–757. 9346-2
https://doi. org/10.1101/gr.1968704 177. Yoshimi K, Mashimo T (2018) Application
167. Steen RG, Kwitek-Black AE, Glenn C, of genome editing technologies in rats for
Gullings-Handley J, Van Etten W, Atkinson human disease models. J Hum Genet 63
OS et al (1999) A high-density integrated (2):115–123. https://doi.org/10.1038/
genetic linkage and radiation hybrid map of s10038-017-0346-2
the laboratory rat. Genome Res 9(6): AP1– 178. Jacob HJ, Lazar J, Dwinell MR, Moreno C,
AP8 Geurts AM (2010) Gene targeting in the
168. Gibbs RA, Weinstock GM, Metzker ML, rat: advances and opportunities. Trends
Muzny DM, Sodergren EJ, Scherer S et al Genet 26 (12):510–518.
(2004) Genome sequence of the Brown https://doi.org/10.1016/j.
Nor- way rat yields insights into mammalian tig.2010.08.006
evolu- tion. Nature 428(6982):493–521. 179. Smits BM, Mudde JB, van de Belt J,
https:// doi.org/10.1038/nature02426 Verheul M, Olivier J, Homberg J et al
169. Worley KC, Weinstock GM, Gibbs RA (2006) Generation of gene knockouts and
(2008) Rats in the genomic era. Physiol mutant models in the laboratory rat by ENU-
Genomics 32 driven target-selected mutagenesis.
Pharmacogenet Genomics 16(3):159–169.
40 Jennifer R. Smith et al.

https://doi.org/10.1097/01.fpc blastocysts. Cell 135(7):1299–1310.


. 0000184960.82903.8f https://doi.org/10.1016/j.cell.2008.12.
180. Mashimo T, Yanagihara K, Tokuda S, Voigt 006
B, Takizawa A, Nakajima R et al (2008) An 190. Men H, Bryda EC (2013) Derivation of a
ENU-induced mutant archive for gene tar- germline competent transgenic Fischer
geting in rats. Nat Genetics 40(5):514–515. 344 embryonic stem cell line. PLoS One 8
https://doi.org/10.1038/ng0508-514 (2):e56518. https://doi.org/10.1371/jour
181. Kitada K, Ishishita S, Tosaka K, Takahashi nal.pone.0056518
R, Ueda M, Keng VW et al (2007) 191. Yamamoto S, Nakata M, Sasada R,
Transposon- tagged mutagenesis in the rat. Ooshima Y, Yano T, Shinozawa T et al
Nat Methods 4 (2):131–133. (2012) Derivation of rat embryonic stem
https://doi.org/10.1038/ nmeth1002 cells and generation of protease-activated
182. Lu B, Geurts AM, Poirier C, Petit DC, receptor-2 knockout rats. Transgenic Res 21
Harrison W, Overbeek PA et al (2007) (4):743–755.
Gen- eration of rat mutants using a coat https://doi.org/10.1007/ s11248-011-
color- tagged Sleeping Beauty transposon 9564-0
system. Mamm Genome 18(5):338–346. 192. Wallace HA, Marques-Kranc F,
https:// doi.org/10.1007/s00335-007- Richardson M, Luna-Crespo F, Sharpe JA,
9025-5 Hughes J et al (2007) Manipulating the
183. Flister MJ, Prokop JW, Lazar J, mouse genome to engineer precise
Shimoyama M, Dwinell M, Geurts A functional syntenic replacements with
(2015) 2015 Guidelines for establishing human sequence. Cell 128(1):197–209.
genetically modified rat models for https://doi.org/10.
cardiovascular research. J Cardiovasc 1016/j.cell.2006.11.044
Transl Res 8 (4):269–277. 193. Lee EC, Liang Q, Ali H, Bayliss L, Beasley
https://doi.org/10.1007/ s12265-015- A, Bloomfield-Gerdes T et al (2014)
9626-4 Complete humanization of the mouse
184. Meek S, Mashimo T, Burdon T (2017) immunoglobulin loci enables efficient
From engineering to editing the rat therapeutic antibody dis- covery. Nat
genome. Mamm Genome 28(7–8):302– Biotechnol 32(4):356–363.
314. https:// doi.org/10.1007/s00335- https://doi.org/10.1038/nbt.2825
017-9705-8 194. Macdonald LE, Karow M, Stevens S,
185. Takabayashi S, Aoshima T, Kabashima K, Auerbach W, Poueymirou WT, Yasenchak J
Aoto K, Ohtsuka M, Sato M (2018) i- et al (2014) Precise and in situ genetic
GONAD (improved genome-editing via humanization of 6 Mb of mouse immuno-
oviductal nucleic acids delivery), a globulin genes. Proc Natl Acad Sci U S A
convenient in vivo tool to produce genome- 111(14):5147–5152. https://doi.org/10.
edited rats. Sci Rep 8(1):12059. 1073/pnas.1323896111
https://doi.org/10. 1038/s41598-018- 195. Cherry JM, Hong EL, Amundsen C,
30137-x Balakrishnan R, Binkley G, Chan ET et al
186. Kobayashi T, Namba M, Koyano T, (2012) Saccharomyces Genome Database:
Fukushima M, Sato M, Ohtsuka M et al the genomics resource of budding yeast.
(2018) Successful production of genome- Nucleic Acids Res 40(Database issue):
edited rats by the rGONAD method. BMC D700–D705.
Biotechnol 18(1):19. https://doi.org/10.1093/ nar/gkr1029
https://doi.org/10. 1186/s12896-018- 196. Lee RYN, Howe KL, Harris TW,
0430-5 Arnaboldi V, Cain S, Chan J et al (2018)
187. Chen Y, Spitzer S, Agathou S, Karadottir WormBase 2017: molting into a new stage.
RT, Smith A (2017) Gene editing in rat Nucleic Acids Res 46(D1):D869–D874.
embry- onic stem cells to produce in vitro https://doi.org/10. 1093/nar/gkx998
models and in vivo reporters. Stem Cell 197. Thurmond J, Goodman JL, Strelets VB,
Reports 9 (4):1262–1274. Attrill H, Gramates LS, Marygold SJ et al
https://doi.org/10.1016/j. (2018) FlyBase 2.0: the next generation.
stemcr.2017.09.005 Nucleic Acids Res. https://doi.org/10.
188. Buehr M, Meek S, Blair K, Yang J, Ure J, 1093/nar/gky1003
Silva J et al (2008) Capture of authentic 198. Bult CJ, Blake JA, Smith CL, Kadin JA,
embryonic stem cells from rat blastocysts. Richardson JE (2018) Mouse Genome
Cell 135 (7):1287–1298. Data- base (MGD) 2019. Nucleic Acids
https://doi.org/10.1016/j. Res 47 (D1):D801–D806.
cell.2008.12.007 https://doi.org/10. 1093/nar/gky1056
189. Li P, Tong C, Mehrian-Shai R, Jia L, Wu N,
Yan Y et al (2008) Germline competent
embryonic stem cells derived from rat
Rat in Biomedical Research 41

199. Smith CM, Hayamizu TF, Finger JH, Bello Truong A, Yang WP, He A, Kayne P,
SM, McCright IJ, Xu J et al (2018) The Gargalovic P, Kirchgessner T, Pan C, Castel-
mouse Gene Expression Database (GXD): lani LW, Kostem E, Furlotte N, Drake TA,
2019 update. Nucleic Acids Res 47(Data- Eskin E, Lusis AJ (2010) A high-resolution
base-Issue):D774–D779. association mapping panel for the dissection
https://doi.org/ 10.1093/nar/gky922 of complex traits in mice. Genome Res 20
200. Ruzicka L, Bradford YM, Frazer K, Howe (2):281–290. https://doi.org/10.1101/gr.
DG, Paddock H, Ramachandran S et al 099234.109
(2015) ZFIN, The zebrafish model organism 203. Saba L, Hoffman P, Tabakoff B (2017)
database: updates and new directions. Using baseline transcriptional connectomes
Genesis 53(8):498–509. in rat to identify genetic pathways associated
https://doi.org/10.1002/ dvg.22868 with pre- disposition to complex traits.
201. Ashbrook DG, Mulligan MK, Williams Methods Mol Biol 1488:299–317.
RW (2018) Post-genomic behavioral https://doi.org/10. 1007/978-1-4939-
genetics: from revolution to routine. Genes 6427-7_14
Brain Behav 17(3):e12441. 204. Perry ME, Valdes KM, Wilder E, Austin CP,
https://doi.org/10. 1111/gbb.12441 Brooks PJ (2018) Genome editing to ‘re-
202. Bennett BJ, Farber CR, Orozco L, Kang HM, write’ wrongs. Nat Rev Drug Discov 17
Ghazalpour A, Siemers N, Neubauer M, (10):689–690.
Neuhaus I, Yordanova R, Guan B, https://doi.org/10.1038/ nrd.2018.91
Chapter 2

Rat Genome Assemblies, Annotation, and


Variant Repository
Monika Tutaj, Jennifer R. Smith, and Elizabeth R. Bolton

Abstract
The first and only published version of the rat reference genome sequence was RGSC3.1, accomplished
by the Rat Genome Sequencing Project Consortium. Here we present the history of the community effort
in the correction of sequence errors and filling missing gaps in the process of refining and providing
researchers with a high-quality rat reference sequence. The genome assembly improvements, addition of
different evidence resources over time, such as RNA-Seq data, and software development methodologies
had a positive impact on the gene model annotations. Over the years we observed a great increase in the
numbers of genes, protein coding sequences, predicted transcripts and transcript features. Before the
sequencing of the rat genome was possible, first biochemical and next genomic markers like RAPD,
AFLP, RFLP, and SSLP were fundamental in research studies involving cross-breeding between different
rat strains, in finding the level of polymorphism, linkage mapping, and phylogeny. Linkage maps provide
information on recombination rates, give insight into intra- and interspecies gene rearrangements, and help
to identify Mendelian loci and Quantitative Trait Loci (QTL). In the 1990s many reports were published
on the construction of rat linkage maps that incorporated increasing numbers of markers and facilitated
the localization of disease loci. Current genetic monitoring and linkage mapping relies on single
nucleotide polymorphisms (SNPs). The Rat Genome Database collects information on genetic variation
from the worldwide community of rat researchers and provides tools for searching and retrieving these
data. As of today we show details about almost 605 million variants coming from many studies in our
Variant Visualizer tool.

Key words Reference genome, Gene model annotations, Genomic markers, Variants, SNPs

1 Rat Genome Assemblies

1.1 History of the The laboratory rat (Rattus norvegicus) became the third mamma-
Rat Genome lian genome to be sequenced when the Rat Genome Sequencing
Sequencing Project Project Consortium (RGSPC) published a high-quality draft
sequence of the rat genome. The project was a collaborative effort
involving sequencing and analyses by researchers at 40 organiza-
tions from seven countries, coordinated by the Baylor College of
Medicine Human Genome Sequencing Center (BCM-HGSC).
Funding was primarily supplied by the National Human Genome

G. Thomas Hayman et al. (eds.), Rat Genomics, Methods in Molecular Biology, vol. 2018,
https://doi.org/10.1007/978-1-4939-9581-3_2, © Springer Science+Business Media, LLC, part of Springer Nature 2019

43
44 Monika Tutaj et al.

Research Institute (NHGRI) and the National Heart, Lung, and


Blood Institute (NHLBI) supplemented by additional funding for
individual researchers from a variety of sources [1].
The animals selected for sequencing were from a substrain of
the Brown Norway (BN) rat strain. BN/SsNHsd rats had been
obtained by the Medical College of Wisconsin from Harlan Spra-
gue Dawley but were found to be incompletely inbred, that is,
containing regions of heterozygosity when subjected to genetic
testing. Additional inbreeding was carried out at MCW to create a
fully inbred substrain. Two female rats taken from the MCW
colony at the 13th generation were used for most of the
sequencing. The BN strain was selected by the rat research
community based on the fact that it was considered genetically
diverse and had been used as a control strain in a wide variety of
studies. In addition, BN rats were being used to develop congenic
and consomic strains, as founders for the LXB recombinant inbred
panel of strains, and as one of eight founder strains for the
Heterogeneous Stock rats (see Chapter 11).
The RGSPC used a combination of whole-genome shotgun
sequencing (WGS) and a bacterial artificial chromosome (BAC)
sequencing approach to generate the first draft of the rat genome.
BAC clones combined with WGS data formed intermediate pro-
ducts called “enriched BACs” (eBACs). The researchers
developed the Atlas Genome Assembly system [2] to generate
almost 19,000 eBACs. At the same time, a large number of clones
from the Children’s Hospital Oakland Research Institute BAC
library (CHORI-230) were used to create a fingerprint contig
(FPC) map. Clones were “fingerprinted” by restriction enzyme
digestion and assembled by overlapping these segments into a
genome-wide contig map. This map was then used to assemble the
sequenced eBACs into larger super bactigs and even larger
ultrabactigs. In addition to the FPC map, a yeast artificial
chromosome (YAC)- based physical map was constructed to
improve clone positioning [2, 3]. BAC-end and in silico
coordinates were used together to localize the contigs to their
correct chromosomal region in the assembly (NCBI RefSeq
accession number prefix NC_). The con- tigs without assigned
genomic positions were annotated as unlo- calized and unplaced
sequences (NCBI RefSeq accession number prefix NW_).
Although some sequence for the Y chromosome was generated in
the first round of sequencing, the Y chromosome was not included
in the initial versions of the assembly. Attempts to purify the Y
chromosome for sequencing revealed that this chro- mosome in
the BN strain was almost twice the size of the Y in other strains
[4]. Because of the large size, it was found that insufficient
material had been isolated,
× resulting in unacceptably low coverage
of the sequence (about 2 ). The sequence was therefore excluded
from the assembly.
Rat Genome Assemblies, Annotation, and Variants 45

The first public release of the rat genome sequence was desig-
nated RGSC/Rnor2.0. Its release in November of 2002 was fol-
lowed shortly thereafter by the release of Rnor version 2.1 in
January of 2003. Release notes for the v2.1 assembly noted that a
“reduction of assembly artifacts has slightly reduced the number of
bases assembled while increasing the size of contigs, scaffolds,
and ultrabactigs” (reports available at the BCM-HGSC ftp site).
The total size of the assembly mapped onto chromosomes was
2.72 Gbp for the 2.0 release and 2.66 Gbp for the 2.1 release,
while the average size of ultrabactigs and scaffolds increased by
5.9% and 17.2%, respectively.
Rnor3.0 represented a complete reassembly of the genome
sequence. Improvements incorporated into this version of the
sequence included the addition of new sequences (over 1100 new
BACs to cover gaps), better software accuracy and relevance, utili-
zation of an improved marker set from the Medical College of
Wisconsin, and a new FPC map from the BC Cancer Agency
Genome Sciences Centre [5]. The new FPC map was based on
automated assembly of BAC clones based on the “fingerprinting,”
followed by a process of manual curation and sequencing of
clones, and the use of human and mouse orthology information to
resolve conflicts and to correct placement of sequence units. This
process of automated and manual editing expanded the contiguity
of the rat fingerprint map and in turn allowed for targeted BAC
clone selec- tion and filling of contig gaps, as well as linking some
of the unlocalized segments of the rat assembly to
chromosomes [3]. Rnor3.0 was rapidly followed by the release of
version 3.1 in June of 2003. Rnor3.1 was considered a minor
update to the previous assembly with changes only affecting
chromosomes 7 and X. Although not the first public version of
the sequence, Rnor3.1 was the first (and only) published version
of the rat refer- ence genome sequence [1]. Analysis of the
assembly revealed that it had an average overall of approximately
sevenfold sequence cover- age, with 60% provided by WGS and
40% by BACs. The assembly covered about 90% of the estimated
2.75 Gbp rat genome and contained a similar number of genes as
described for human and mouse (20,000–25,000) [1, 6].
In 2004, a series of minor updates to the assembly brought the
designation to Rnor3.4. Updates included three additions of fin-
ished BAC sequences and the correction of several alignment
switch points. Despite additional work done to improve the assem-
bly during the interim, the Rnor3.4 assembly remained the de facto
reference assembly for the rat for almost eight years. After 2004 a
number of improvements were proposed and funded by the
NHGRI to provide a more complete genome with improved accu-
racy. New assemblies were released in March of 2008 (Rnor4.0)
and November of 2009 (Rnor4.1), both of which utilized reads
downloaded from NCBI in January of 2007. The assemblies were
46 Monika Tutaj et al.

constructed using updated Atlas genome assembly software which


had been extended to include better methods to deal with repeats
and improve the alignment of BAC sequences, and included new
modules for handling heterozygosity. However, although some
work was done at NCBI to incorporate and begin to annotate the
Rnor4.1, neither assembly was accepted by the rat research com-
munity as an improvement over version 3.4 and therefore were
never completely incorporated into the major sequence databases.
The BN/NHsdMcwi strain was initially sequenced by the tra-
ditional capillary sequencing method. With advancing sequencing
methodology, SOLiD (Sequencing by Oligonucleotide Ligation
and Detection) sequence reads were generated and combined
with the previous WGS-plus-BAC assembly data for a more com-
plete representation of the genome. Highly repetitive reads were
omitted and the SOLiD data were used for revision of the scaffold-
ing. This resulted in the Rnor5.0 version of the rat reference
assembly in 2012 that showed improvements at both the nucleo-
tide and structural level.
The latest version of the assembly—Rnor6.0—was manually
checked and corrected for potential mis-assemblies by comparison
to the human and mouse genomes as well as to the previous
version
3.4 assembly. Scaffold gaps were filled using long-read PacBio
sequences with the PBjelly software and high-quality BAC clone
sequences from phase 2 and phase 3 sequencing were spliced into
the assembly to replace matched locations. In addition, a draft
sequence of the Y chromosome from a male rat of the
SHR/NHsdAkr strain which had been produced as part of a col-
laborative project to sequence Y chromosomes from several mam-
malian species, including rat [7, 8], was added. The Rnor6.0
genome reference is longer and more contiguous than previous
versions, and many gaps have been closed (Total assembly gap
length, see Table 1). The ungapped assembly lengths improved
with each release version: v6.0 ungapped length is 2.730 Gb,
versus v5.0 at 2.573 Gb and v3.4 at 2.568 Gb. The Rnor 6.0
assembly comprises 75,697 contigs in comparison with 132,131
contigs for v5.0 and 238,325 contigs for v3.4. Older versions of
the rat genome assemblies can be found in the archive sites (see
Table 2).

1.2 Other Rat Since 2004, the genomes of a number of other rat strains have
Genome been sequenced by the rat community. In 2008, the STAR
Assemblies consortium used a combination of shotgun sequencing, low
coverage WGS, and BAC end sequencing to discover almost three
million single nucleotide variants from the SS/Jr., WKY/Bbb,
GK/Ox, SHRSP/ Bbb, Sprague Dawley, and F344/Stm strains [9].
A subset of approximately 20,000 of these were used to genotype
167 inbred strains and 2 recombinant inbred (RI) panels (the 31
HXB-BXH strains and the 33 FXLE-LEXF strains). Almost
10,000 of the SNVs were then used to genotype an additional 89
F2 animals
Table 1
Statistics of rat genome assemblies

NCBI assembly information RGSC_v3.4 Rnor4.1-scaffold Rnor_4.1 Rnor_5.0 Rnor_6.0


RefSeq Assembly Accession GCF_000001895.3 GCF_000001895.4 GCF_000001895.5
GenBank Assembly Accession GCA_000001895.1 GCA_000001895.2 GCA_000001895.3 GCA_000001895.4
Release Date 12/13/2004 2/5/2010 3/12/2012 3/16/2012 7/1/2014
Total sequence length 2,826,224,306 2,472,228,416 2,796,368,354 2,909,698,938 2,870,184,193
Ungapped length 2,567,937,207 2,471,873,957 2,480,163,185 2,573,083,111 2,729,985,404
Total assembly gap length 258,270,786 354,459 316,205,169 336,599,514 140,198,789
R
Spanned gaps 249,838 10,043 207,756 121,283 74,303 at
Gaps between scaffolds 270 0 46 8109 440 G
e
Number of scaffolds 741 187,024 3238 10,848 1395 n
o
Scaffold N50 18,621,810 45,127 126,646,908 2,178,346 14,986,627
m
Scaffold L50 46 14,621 10 387 65 e
A
Number of contigs 238,325 197,067 210,974 132,131 75,697 ss
Contig N50 36,847 40,558 35,614 52,491 100,461 e
m
Contig L50 18,984 16,657 18,076 13,663 7356 bli
Unlocalized sequences count 246 3106 1278 354 es
,
Unplaced sequences 203 65 1439 578 A
Total number of chromosomes 21 0 21 22 23 n
n
and plasmids
ot
ati
o

4
7
48 Monika Tutaj et al.

Table 2
Archived references

Baylor release UCSC version Release date


v3.1 rn3 Jun. 2003
v2.1 rn2 Jan. 2003
v1.0 rn1 Nov. 2002

from a cross between the BN/Par and GK/Ox strains. In 2010 the
SHR/OlaIpcv rat genome was sequenced at 10.7-fold coverage by
paired-end sequencing on the Illumina platform. Initially 681.8
million reads were mapped to the BN reference genome (v3.4)
and covered 97.7% of the reference assembly by at least three
reads [10]. Subsequently, the NGS data set of the SHR/OlaIpcv
strain was expanded, thereby increasing the median coverage of
this genome to 23-fold in 2012. The researchers, in the same study,
also generated whole genome NGS data from the same genetic
material that was used to create the BN reference sequence
(referred to in that study as “Eve”), as well as from the BN- Lx, a
mutant strain closely related to BN. The data from the new BN
sequence (32-fold NGS coverage) were used to search for
Rnor3.4 assembly errors [11].
In 2013, two more genomes were sequenced, becoming the
first non-reference de novo assemblies of rat genomes [12]. The
DA and F344 rat strains were sequenced with an average depth of
32 × using Illumina technology. Researchers employed a reference-
aided assembly method (RAM), using the BN reference genome as
well as the Short Oligonucleotide Alignment Program (SOAP),
and GapCloser, an algorithm for contig-end extension and gap
filling. First, a semi-finished genome was constructed by aligning
sequencing reads to Rnor3.4 using SOAPaligner [13], then an
independent de novo assembly of contigs and scaffolds was gener-
ated using SOAPdenovo [14]. Finally, the genome draft was
assem- bled by anchoring scaffolds onto the semi-finished
genome. The read alignment of each strain with the BN genome
covered 98% of the reference (three reads or more). The DA and
F344 genome drafts were 1.94% and 1.91% larger than the BN
genome, respec- tively. The DA and F344 genome drafts
contained more than 49 million novel base pairs for each genome
that bridged around 400,000 gaps of the BN genome [12].
In 2013 the eight inbred strains (ACI/N, BN/SsN, BUF/N,
F344/N, M520/N, MR/N, WKY/N, and WN/N) which had
been used as the founder/progenitor strains for the NIH’s
hetero- geneous stock (N:HS) rats were sequenced using
SOLiD technol- ogy at an average of 22-fold sequence
coverage, that represents
Rat Genome Assemblies, Annotation, and Variants 49

~88% of the reference genome [15]. The same year another large-
scale sequencing project was completed. The sequencing of 27 rat
strains that served as popular disease models of hypertension, dia-
betes, and insulin resistance resulted in the identification of a num-
ber of genomic variants and coevolved gene clusters [16]. The
researchers produced the sequence data with 20-fold coverage on
average for all strains except for BBDP/Wor and WKY/NHsd rat
strains that reached approximately a tenfold coverage level.
Variant data from each of these large-scale strain-specific
sequencing pro- jects are available at the Rat Genome Database
(see below for details).

1.3 Reported
Reference The regions that posed special problems to complete genome
Genome Errors assembly were regions with unusual repeat structures, polymorph-
isms, possible BAC rearrangements, and low sequence coverage.
The Rnor3.4 assembly contained many gaps, inconsistencies, and
sequence errors due to the relatively low coverage and errors asso-
ciated with capillary technology. Genetic single nucleotide poly-
morphism (SNP) mapping by the STAR consortium in 2008
identified discrepancies between the genetic map and the draft
genome: a p11-centromeric segment of chromosome 1 was
wrongly inserted into the p14-telomeric region of chromosome
17, intra- and inter-chromosomal relocations were observed in
regions of chromosomes 2, 4, 11, 12, 14, and 17 [16]. The reloca-
tion in the p14 region of chromosome 17 and one conflict on
chromosome 9 were discovered during the revision of differences
between BCM and Celera rat genome assemblies [9]. In the study
of the rat genomic variation in complex traits, four pairs of regions
on chromosomes 1, 4 (2 regions), 9, 12, 14, 17, and 19 showed
high inter-chromosomal linkage disequilibrium, due to mis-
assembly of the Rnor3.4 reference sequence [15]. Analyses at the
Rat Genome Database of changes in NCBI (National Center for
Biotechnology Information) gene position annotations between rat
genome assemblies showed a number of co-localized genes that in
upgraded reference versions were re-annotated, frequently
together, to different chromosomes (see Table 3). We observed
25 out of 49 genes that occupied an 8.3 Mbp region of
chromosome 1 in the reference version 3.4 were relocated to two
different geno- mic regions in v5.0: 17 genes moved to
chromosome 8, while 6 genes moved to chromosome 9. All
together, we found 7 clusters of 19 to 96 co-localized genes, that
span 1.2 to 8.3 Mb regions in chromosomes 1, 4, 7, 8, 17 and
chromosome X of the assembly v3.4 that changed genomic
position in the v5.0 (see Fig. 1). The changes were less profound
between assemblies v5.0 and v6.0. Four clusters of 10 to 93 genes
spanning 1.5 to 6 Mbp regions of chromosomes 1, 3, and X
changed genomic position in the v6.0 assembly with reference to
v5.0. However, there were also numerous smaller changes in
other chromosomes in both transitions: 435 genes in
50 Monika Tutaj et al.

Table 3
Annotation changes between different rat genome assemblies

Number of Number of
Compared Chromosome genesa/ Genomic position (bp) on genesa/ Chrom osome
Asemblies OLD position total Mbp older assembly total new position
Rnor3.4 1 25/49 8.3 58877734-67150234 17/49 8
and 6/49 9
Rnor5.0 4 19/28 4.6 99068636–103646030 17/28 3
7 49/69 1.7 137288635–1389558414 49/69 X
9/69 6
8 40/41 1.2 40493892–41711691 18/41 15
16/41 4
17 27/33 5.3 44656–5318612 28/33 1
X 96/132 7 153730373–160683450 83/132 1
23/132 6.5 122692432–129236338 17/132 3
Rnor5.0 1 93/113 6 147946356–153934661 95/113 X
and 10/113 6.5 64184207–70664968 6/113 11
Rnor6.0 3 20/23 1.5 52231912–53715589 20/23 X
X 60/63 1.6 114700497–116300972 60/63 7
a
Including protein-coding genes, noncoding genes, pseudogenes, and genes under revision

Fig. 1 Number of genes re-annotated from chromosomes of the assembly Rnor3.4 to different
genomic position in Rnor5.0; Number of genes limited to 100 for better visualization
Rat Genome Assemblies, Annotation, and Variants 51

total changed chromosome position between assemblies v3.4 and


v5.0, 249 genes between assemblies v5.0 and v6.0.
In particular, repetitive regions of the genome are often mis-
assembled. Researchers reported that a duplication of the Fcgr3
locus associated with autoimmune nephritis in a rat model and in
human was not represented in the Rnor3.4 genome assembly
[5, 17]. BAC and genomic Southern blots and clonotype analysis
suggest that Fcgr3 underwent at least two duplications during the
time of divergence between the mouse and rat lineages, and the rat
has at least three expressed genes. Because of the similarity in the
sequence between the duplicated genes and the presence of a SINE
repeat element, this region presents a particular challenge for
assembly. Other regions reported as improperly assembled
included the Ttn gene with highly complex alternative splicing, a
region of duplication around the Cd36 gene, and a 5 Mbp region
with repeat sequences on chromosome 1 which contains a number
of genes studied in stroke, hypertension, and metabolic syndrome
(P2ry2, P2ry6, Pde2a, and Slco2b1) [5].

2 Rat Genome Assembly Annotations

2.1 Gene In 2004, the Ensembl gene prediction pipeline predicted 20,973
Model genes with 28,516 transcripts and 205,623 exons for the Rnor3.1
Prediction assembly [1]. The improvement provided by reassembly of the
reference sequence in general, and by the Rnor 6.0 assembly in
particular, had a positive impact on the assembly annotation. Gene
model predictions consider known protein and transcript data for
rat, as well as homology to other sequences, including rodent
proteins, non-rodent vertebrate proteins, rat cDNA data from
RefSeq and EMBL, and mouse cDNAs from Riken, RefSeq, and
EMBL. The statistics depend on the quality of the genome
sequence, the gene prediction method, the alignment criteria, and
the amount of expressed sequence evidence. Table 4 lists the
current number of gene model predictions provided by NCBI for
the v3.4, v5.0, and v6.0 rat genome assemblies. There is an
increase in the numbers of genes, protein coding sequences
(CDS), and defined noncoding 50 and 30 untranslated regions
(UTRs). Genome annotations and prediction accuracy benefit from
the addition of different evidence resources, such as the use of
RNA-Seq data, and new methodologies. This is clearly demon-
strated by the substantial increase in the number of predicted
transcripts and transcript features for the v5.0 and v6.0 assemblies
where the incorporation of RNA-seq transcriptomic data
improved the identification of isoforms, UTRs, exon boundaries
and transcripts with only low expression. Worth noting is that the
number of noncoding genes doubles from v3.4 to v5.0 and more
than triples from v3.4 to v6.0.
52 Monika Tutaj et al.

Table 4
Number of genomic features for the rat assemblies

NCBI annotations RGD QC Rnor3.4 Rnor5.0 Rnor6.0


Genes 24,949 33,330 41,517
Protein-coding genes 19,623 22,480 23,485
Non-protein-coding genes 5235 10,694 17,932
Transcripts 16,973 38,453 71,613
Exons 147,831 373,816 726,638
5’UTR 19,006 46,936 88,311
3’UTR 14,814 32,999 56,722
CDS 139,669 353,905 685,260
QTLs 2272 2294 2266
Genetic markers 43,306 45,663 44,828

2.2 NCBI Genome annotations, i.e., the prediction and localization of genes
and Ensembl Gene and other genomic elements on a genome sequence, differ between
Annotation Models NCBI and Ensembl because of variations in annotation strategies,
algorithms, and input data. The NCBI Eukaryotic Genome Anno-
tation Pipeline [18] utilizes a suite of informatic tools that include
the alignment programs Splign and ProSplign, and the gene pre-
diction program Gnomon to generate sets of genes with their
associated transcripts and proteins. The annotation process relies
heavily on the availability of transcript or protein evidence for the
species. Originally implemented in 2000 as a semi-manual process
to align Genbank and RefSeq transcripts to the genome using the
BLAST algorithm then supplementing these with ab initio gene
predictions using GenomeScan [19], NCBI’s pipeline has gone
through a number of substantial improvements. These include
the addition of EST and protein data as input and the development
of the splicing-aware aligners Splign for transcripts and ProSplign
for proteins. Addition of RNA-Seq data improved the quality of
the annotations, particularly for organisms that have little or no
experi- mental mRNA or EST data available. Reengineering the
pipeline using a new framework for parallel execution enhanced
its extensi- bility, robustness and reproducibility, as well as
improving tracking, all of which were necessary to keep pace with
both annotation of new genomic sequences and reannotation of
improved genome assemblies.
The current Eukaryotic Genome Annotation Pipeline takes as
input same-species transcripts, proteins and RNA-seq reads, and
where necessary, transcripts and proteins from closely related spe-
cies. Input transcripts include known coding and noncoding
Rat Genome Assemblies, Annotation, and Variants 53

RefSeq transcripts (i.e., those with NM_ or NR_ prefixes), long


transcripts and ESTs from NCBI’s Nucleotide database. Both
short and long RNA-Seq sequences are utilized from the Sequence
Read Archive (SRA). In addition, the following proteins are
aligned: known RefSeq (NP_) proteins and proteins derived from
tran- scripts by the International Nucleotide Sequence Database
Collab- oration (INSDC—includes DDBJ, ENA and GenBank
collaborators). Curated RefSeq genomic sequences (i.e., those
with NG_ prefixes, representing non-transcribed pseudogenes
and manually annotated gene clusters) can be used, if available.
Aligned sequences are then submitted to the Gnomon tool. Gno-
mon uses a two-step gene prediction program that assembles over-
lapping alignments into “chains” and extends these chains into
complete models in an ab initio prediction step, using a Hidden
Markov Model (HMM). Predicted gene models are aligned against
proteins from the curated Swiss-Prot database (a subset of the
UniProtKB knowledgebase) to confirm and refine the predictions,
and the best model for each gene is selected.
The Ensembl gene annotation process starts with a model-
building phase that involves the alignment of protein, cDNA,
EST, and RNA-Seq sequences to the genome assembly [20]. The
methods used in this phase depend on the input data available at
the time of annotation, with the same-species data preferred over
data from other species, and with annotated sequences preferred
over computed sequences. The Targeted Pipeline uses same-
species pro- tein sequences to identify the genomic location of
protein-coding genes, and then to produce coding models using
GeneWise [21]. Only high-confidence same-species protein
sequences are downloaded from Swiss-Prot and TrEMBL
sequences (labeled as PE level 1 or 2) and RefSeq (annotated
sequences with NP_ and AP_ accessions). The pipeline finds the
genomic location of tran- scripts by aligning protein sequences to
the genome, and both DNA and protein sequences for this region
are passed to Gene- Wise. The software uses a splice-aware
algorithm and generates a protein-coding transcript model as an
output. In addition, cDNA data together with its annotated coding
sequence (CDS) from INSDC are used to complement protein-
coding gene models with the addition of untranslated regions
(UTRs).
The Similarity Pipeline uses as an input UniProtKB proteins
from a wide range of species. The output of the Similarity Pipeline
is a set of models that supplements the Targeted Pipeline models.
This approach is especially useful for species that do not have
many same-species proteins. Increases in the amount of available
RNA-Seq sequence data prompted the addition of an RNA-Seq
pipeline, which uses these data to produce both protein-coding and
noncoding transcript models in the gene annotation process. The
model-building steps of the pipeline are followed by a filtering
step, that selects models with the highest confidence at each
location
54 Monika Tutaj et al.

(generates a hierarchy of models). Protein-coding models that


overlap with RNA-Seq, cDNA, and/or EST models are ranked as
top priority and the output are protein-coding transcript models
that are further extended to include UTR regions. Selected models
are then passed to the GeneBuilder module which removes redun-
dant transcript models in the process of clustering protein-coding
models into multi-transcript gene structures. In addition to the
modules which produce protein-coding gene predictions, the
Ensembl GeneBuild generates annotations of pseudogenes, short
noncoding RNAs such as microRNAs and transfer RNAs, and
long intergenic noncoding RNAs (lincRNAs). Output from all of
these pipelines is integrated to produce the final Ensembl gene set
[20].

2.3 Gene Annotation The genome size for human is 3.257Gb (GRCh38), slightly
Differences larger than for rat—2.870Gb (Rnor6.0) or mouse—2.819Gb
(GRCm38), but there are substantial differences in the number
of annotated proteins and transcripts between the three of them.
Currently the amount of expressed sequence evidence is much
more abundant for human and mouse than for rat. Table 5
presents data collected from annotation pages for individual
species that are available in the NCBI and Ensembl databases
[22, 23]. There are more than 2 times the number of transcripts
and 3–4 times more EST data used for the human and mouse
gene prediction models than for the rat in both Ensembl and
NCBI. There are 4 times more protein sequences used for the
human model, and 2 times more protein data for the mouse in
the NCBI prediction. We compared the number of genes
between NCBI and Ensembl for human, mouse, and rat (see
Fig. 2). There are 19,633 rat genes shared by the two gene
models, 25,372 human genes and 24,637 mouse genes. Even
though the amount of evidence for both human and mouse is
much higher than for rat, the proportion of overlapping
predicted genes is low. It suggests that the results of the
prediction do not depend on the number of provided evidences
but depend on a design of the prediction strategy. We counted
how many of NCBI’s and Ensembl’s predicted genes have been
assigned the same genomic position in the genome reference
(gene bound- aries—from start to stop position). We found that
an exact match of position applies to only a limited number of
genes: 15% of human genes (9595), 12% of rat (3902), and
10% of mouse (5637) genes. Examples of differences in rat
gene model prediction between Ensembl and NCBI are shown
in Fig. 3. Some genes are predicted to be in the same genomic
location but differ in length, number of exons or the exons’
positions. In some cases, exons have the same predicted
positions but different genes are assigned to that position by
two models. There are examples of single genes in one
database that are split into two genes in another one. In some
locations one model predicts the presence of genes whereas the
other one does not. Number and length of predicted transcripts
also differ.
Table 5
Resources for generating gene models

Human Mouse Rat


Organism
Alignment NCBI— ENSEMBL— NCBI— ENSEMBL— NCBI—
type Database GRCh38.p7 GRCh38 GRCm38.p4 GRCm38 Rnor6.0 ENSEMBL—Rnor6.0
R
a
Transcript RefSeq(NM_/NR_) 55,892 159,081 34,827 213,883 17,901 65,192 at
GenBank 178,172 223,325 81,472 G
EST 4,398,669 3,976,554 3,110,250 3,094,335 818,919 994,706 e
a
n
Protein GenBank/ENA/ 119,177 108,207 60,386 56,044 17,558 65,192 o
DDBJ/RefSeq m
RefSeq (NP_)/UniProt 42,610 74,356 29,728 30,746 19,226 30,721 e
A
Other resources — — Human, rat Mammalia, Mouse, Mouse UniProt/
ss
RefSeq vertebrate human Refseq, HI-KNAW e
UniProt RefSeq RNASeq m
a
RefSeq cDNAs with accession prefix “NM_” matching RefSeq proteins with “NP_” prefix bli
es
,
A
n
n
ot
ati
o

5
5
56 Monika Tutaj et al.

HUMAN MOUSE RAT 2988 NCBI


ENSEMBL NCBI
NCBI ENSEMBL
25338
ENSEMBL

19633
38595 25372 35124 44383 24637 29309

RGD 13250
Human Mouse Rat
Total Gene NCBI 60496 69020 47959
Number (IDs) ENSEMBL 63967 53946 32883

Gene overlap Exact match 9594 5637 3902


(by position) No match 15457 8134 5027

Fig. 2 The comparison of gene annotations for human, rat, and mouse. Number of genes with unique ID
shared between NCBI and Ensembl

Kumar et al. proposed to improve the annotation of the rat


genome by utilizing transcriptomics and proteomics data together
[25]. The researchers built the reference-based transcriptome
assembly from RNA-Seq reads and analyzed publicly available
RNA-Seq and mass spectrometry (MS)-based proteomics data.
They discovered hundreds of novel peptides in rat brain microglia
that were expressed by 249 genes. The evidence helped to identify
unknown exons, pseudogenes, and splice variants for various loci,
many of which have important disease associations. Wu et al.
showed that the genome annotation choice has a significant influ-
ence on human RNA-Seq expression analysis outcome [26]. These
examples highlight a need to improve existing methods of genome
annotation and utilize all available resources in the annotation
process. The Rat Genome Database (RGD) is in the process of
incorporation of Ensembl annotations in addition to NCBI anno-
tations that currently are prioritized in our database (Fig. 2).

3 Rat Variants and Genomic Markers: RGD Repository

3.1 Genomi There are over 700 inbred strains of rats, and the history of their
c Markers generation and evolution is not always well known. Markers are
important in research studies involving cross-breeding between
different rat strains, and essential in finding the level of polymor-
phism and genetic homogeneity between them (inter-strain and
intra-strain differences).
Years before the sequencing of the rat genome was possible,
there was an intensive search for novel markers that could be
integrated into rat genetic and radiation hybrid maps [27–
29]. The inbred mouse and rat strains were known by coat colors
and MHC until the 1970s, when biochemical markers
Fig. 3 Examples of differences in gene model prediction between ENSEMBL (yellow-blue) and NCBI
(red-brown) using the JBrowse genome viewer [24]: (a) genes in the same genomic location differ in
length and number of exons or other genes are assigned to that position; (b) the same genes differ
in the exons’ position; (c) one gene in NCBI is split into two genes in ENSEMBL; (d) in some locations one
model predicts the presence of genes whereas the other one does not; (e) number and length of
predicted transcripts differ
58 Monika Tutaj et al.

became popular [30, 31] and a decade later polymorphic DNA


markers were introduced [32]. DNA fingerprinting originally
relied on the use of restriction fragment length polymorphisms
(RFLP) that with the discovery of PCR were replaced by
amplification of simple sequence repeats (SSRs) also known as
simple sequence length polymorphisms (SSLPs) [33]. SSLPs (also
called microsa- tellites) are DNA regions containing di-, tri-, or
tetranucleotide repeats that are found randomly and abundantly
throughout the genome. The number of these repeats is highly
polymorphic between inbred rats, allowing identification of
different strains [34]. In 1991, Jacob et al. developed a set of 112
SSRs that were found to be polymorphic in length between the
SHRSP and WKY strains and used them to discover a region of
chromosome 10 linked to hypertension in this model [35]. In 1995,
the same group expanded this set of markers to 432 SSLPs and
used them to construct a more complete linkage map for the rat
[28]. Linkage maps are created using various types of polymorphic
markers and calculated based on recombination rates between
these markers. They show the linear position of genes or markers
on a chromo- some, provide information on recombination rates,
give insight into intra- and interspecies gene rearrangements, and
help to iden- tify Mendelian loci and Quantitative Trait Loci
(QTL). In the 1990s many reports were published on construction
of rat linkage maps that incorporated increasing numbers of
markers and facili- tated the localization of disease loci (see Table
6 and [27–40]). Current genetic monitoring and linkage mapping
relies on single nucleotide polymorphisms (SNPs). The
identification of SNPs has advanced rapidly and is routinely used
in linkage and haplotype mapping, association studies,
pharmacogenetics, and forensics. The most complete rat linkage
map was published recently with a dis- tinction of the variation
between male and female recombination rates [40]. Littrell et al.
constructed the refined genetic map (870 meioses; 95,769
markers), comparable with the high- resolution human map
(104,246 meioses; 833,754 markers) [41] and mouse map
(15,832 meioses; 120,789 markers) [42]. Increased accuracy
of marker placement in high-density genetic maps is essential for
QTL localization and subsequent fine gene mapping.

3.2 Geneti For personal genomics it is important to collect all information


c Variations about genetic variations in order to establish the extent to which
a gene’s function can be compromised. Disease phenotypes are
often determined by many genetic and environmental interactions
and described as a multifactorial or complex genetic trait as
opposed to a single-locus Mendelian trait. Genome-wide associa-
tion studies (GWAS) have discovered thousands of associations
between SNPs and complex traits [43, 44]. Most complex traits
result from the combination of many genetic variants with small
Table 6
Rat linkage maps

Publication year 1992 1995 1997 1998 1999 2004 2018


Jensen-
First author last name Serikawa Jacob Bihoreau Brown Steen Seaman Littrell R
Number of markers 125 432 767 678 4736 2305 95,769 at
G
Number of crosses (meiosis) 9 1 3 4 2 2 (90) 528 (870) e
n
Map length (cM) ~2400 1509 1998 1749 1503 1542 1708 o
Average intermarker interval or ~10 3.7 4.7 2.43 0.403 (1.1) (<0.02) m
(Map resolution) cM e
A
Software or (Mapping function) GeneLink MapMaker JoinMap, GMS, MapMaker, MapMaker (Kosambi Lep-MAP3 ss
Multimap Multimap function) e
m
Reference (PubMed PMID) 27 (1628813) 28 (7704027) 37 (1472068) 38 (9657848) 29 (10400928) 39 (15059993) 40 (29760201)
bli
es
,
A
n
n
ot
ati
o

5
9
60 Monika Tutaj et al.

effects, that together could account for a considerable portion of


the variation in disease risk [43]. Mapping causal variants in
model organisms requires time-consuming and costly breeding
and/or mutation procedures. It also requires development of
statistical algorithms for genomic predictions to estimate the
distribution of SNP effects and distinguish true causal mutations
from neutral variants.
RGD collects information on genetic variation from the world-
wide community of rat researchers (see Table 7) and provides
tools for searching and retrieving these data (see Chapter 3).
Currently, we show details about almost 605 million variants
(SNVs and small insertions and deletions) and about the studies
that have identified these variants using different genome
reference assemblies and methods in our Variant Visualizer [45].
The STAR project greatly contributed to increase the dbSNP
repository [46]. The STAR Consortium identified three million
new SNPs and evaluated a subset of 20,238 SNPs across 167
distinct inbred rat strains and two rat recombinant inbred (RI)
panels [9]. High-density maps were constructed for mapping
disease genes and identification of functional effects of 325,788
SNPs. 56 nonsynonymous coding SNPs were predicted to affect
protein function, and seven of these SNPs were located in genes
involved in hereditary diseases and cancer (Aldh2—acute alcohol
intolerance, Pccb—propionic acidemia, Aff4—leukemia). In
addition, 250 sig- nificant QTL were identified in the panel of 33
RI strains, that could be applicable in the detection of traits and
risk factors for complex diseases [9]. Atanur et al. identified 3.6
million high- quality SNPs and 343,243 short
≤ indels ( 15 bp) in the
SHR/O- laIpcv rat strain [16]. These variants produced 161 gains
or losses of stop codons, but most importantly the large deletions
resulted in complete or partial absence of 107 genes in the
SHR/OlaIpcv genome. They found an intronic deletion in the
SHR Echdc2 gene (beta-oxidation pathway of fatty acids), a
deletion in an inter- genic region near the Cyp4a8 gene
(hypertension candidate gene), and confirmed the presence of a
CNV in the Cd36 gene (insulin resistance, dyslipidemia, and
hypertension). The regions with cis-regulated expression
quantitative trait loci (eQTL) were enriched with SNPs, short
indels, and larger deletions [16]. Simonis et al. identified 3.2
million single nucleotide variants, and 425,924 small insertions
and deletions in the SHR/OlaIpcv and BN-Lx/ Cub strains, the
founders of an RI panel (HXB/BXH) [11]. The distribution of
SNVs was not random. They found eight regions with a high SNV
density, with a combined length of 51 Mb, together holding 97%
of the SNVs. Nine substrains of the BN rat strain from different
institutes had the same genotype in the seven regions as BN-Lx
and there were no SNV high-density regions in the SHR strain.
RNA sequencing of the liver tissues in both strains identified 532
differentially expressed genes. 1% of the SNVs and
Table 7
Rat variants resources in RGD

Number
of rat Secondary Publication data dbSNP
Assembly SNVs total strains analysis provider Primary data source PMID provider Sequence platform dbSNP build
3.4 205,581,620 28 ICL ICL, MCW, WTSI, UI, MDC, 23890820, Atanur et al. Illumina HiSeq 4,877,558 dbSnp136
KNAW, ERIBA-UMCG, 24628878, 2013, Ma 2000, SOLiD
CHPM-UT, UMMC, 20430781 et al. 2014, 2,3 and
UniSR, ICAMS, ILA Atanur et al. 4 (2 strains)
KyotoU, EMBL-EBI, 2010
DZHK, INSERM; Simonis
et al. 2012
12 HI-KNAW RGSMC1: WTCHG, 23708188 Baud et al. 2013 SOLiD 4 and
HI-KNAW, ERIBA- 5500, Illumina R
UMCG, KI-CNS, ICAMS, HiSeq 2000 at
IUSM, INSERM, KI-MBB, (1 strain) G
MDC, INc-UAB, e
CEA-CNG, MPIMG, n
ECRC, HGSC-BCM, o
WTSI, ICL, EMBL-EBI, m
DZHK e
A
2 HI-KNAW HI-KNAW, ICL, BCGSC, 22541052 Simonis et al. SOLiD 2,3 and ss
IPHYS-CAS 2012 4 (2 strains) e
2 ICAHN BGI, FIMR, UESTC 23695301 Guo et al. 2013 Illumina HiSeq m
2000 bli
es
4 UMich UMich PhD thesis2 Dr. Jun Z. Li Illumina HiSeq ,
group 2000 A
7 MCW MCW (1 strain SMPH-UW) NA Dr. Howard Illumina HiSeq n
Jacob group 2000 n
ot
(continued) ati
o

6
1
Table 7 6
(continued) 2

Number M
of rat Secondary Publication data dbSNP o
Assembly SNVs total strains analysis provider Primary data source PMID provider Sequence platform dbSNP build ni
ka
3 MDC MDC NA Dr. Norbert Illumina/Solexa, T
Huebner Genome ut
group Analyzer II aj
et
5.0 261,827,016 42 HI-KNAW HI-KNAW, MDC, UCSM, 25943489 Hermsen et al. Illumina HiSeq 4,806,887 dbSnp138
MRC-ICS, ERIBA - 2015 2000
UMCG; Ma et al. 2014,
Guo et al. 2013, Atanur
et al. 2013, Baud et al.
2013, Simonis et al. 2012
20 MiB-KyushuU MiB-KyushuU, KDRI, 27882299 Yoshihara et al. Illumina NextSeq
ILA-KyotoU 2016 500
3 UDEL Nemours, UDEL, Penn Med 26502805 Barthold et al. Illumina HiSeq
2016 2500
9 MCW MCW (1 strain SMPH-UW) NA Dr. Howard Illumina HiSeq
Jacob group 2000
6.0 137,495,090 25 RGD Atanur et al. 2013 NA in preparation Illumina HiSeq 4,726,744 dbSnp146
2000
9 MCW MCW (1 strain SMPH-UW) NA Dr. Howard Illumina HiSeq
Jacob group 2000
8 MCW (lift-over HI-KNAW NA Dr. Michael SOLiD 4 and
results) Flister group 5500
1. Rat Genome Sequencing and Mapping Consortium, 2. Ref. [57]. BCM-HGSC Human Genome Sequencing Center, Baylor College of Medicine, USA, BGI BGI-Shenzhen,
China, CBMR The Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Denmark, CEA-CNG Commissariat a l’Energie Atomique, Centre
National de Ge´notypage, France, DZHK German Centre for Cardiovascular Research, Germany, ECRC Experimental and Clinical Research Center, Charite´
Universittsmedizin Berlin, Germany, ERIBA UMCG European Research Institute for the Biology of Ageing, University Medical Center Groningen, Netherlands, EMBL-EBI
European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, UK, FIMR Laboratory of Experimental Rheumatology, Feinstein
Institute for Medical Research, USA, HGSC-BCM Human Genome Sequencing Center, Baylor College of Medicine, USA, HI-KNAW Hubrecht Institute, Royal Netherland
Academy of Arts and Sciences, Netherlands, ICL Imperial College London, UK, ICAHN The Icahn School of Medicine at Mount Sinai, USA, ICAMS The Institute of
Cardiovascular and Medical Sciences,
University of Glasgow, UK, ILA-KyotoU Institute of Laboratory Animals, Kyoto University, Japan, INc-UAB Institute of Neurosciences, Universitat Auto`noma de Barcelona, Spain,
INSERM INSERM UMR-S872, Cordeliers Research Centre & INSERM U698, Hoˆpital Bichat, France, IPHYS-CAS Institute of Physiology, Czech Academy of Sciences, Czech
Republic, IUSM Department of Medical and Molecular Genetics, Indiana University School of Medicine, USA, KDRI Department of Technology Development, Kazusa DNA
Research Institute, Japan, KI-CNS Department of Clinical Neuroscience, Karolinska Institute, Sweden, KI-MBB Department of Medical Biochemistry and Biophysics, Karolinska
Institute, Sweden, MiB-KyushuU Medical Institute of Bioregulation, Kyushu University, Japan, MDC Max Delbruck Center for Molecular Medicine, Germany, MCW Medical
College of Wisconsin, USA, MPIMG Department of Computational Biology, Max Planck Institute for Molecular Genetics, Germany, MRC-ICS Physiological Genomic and
Medicine Group, Institute of Clinical Sciences, UK, Nemours Nemours Alfred I. duPont Hospital for Children, USA, Penn Med Perelman School of Medicine, University of
Pennsylvania, USA, SMPH-UW School of Medicine and Public Health, University of Wisconsin, USA, UCSF University of California, San Francisco, USA, UCSM Department of
Pharmacology, University of Colorado School of Medicine, USA, UDEL Center for Bioinformatics and Computational Biology, University of Delaware, USA, UESTC School of
Life Science and Technology, University of Electronic Science and Technology of China, China, UMich Department of Human Genetics, University of Michigan, USA, UMMC
University of Mississippi Medical Center, USA, UniSR The Vita-Salute San Raffaele University, Italy, WTCHG Wellcome Trust Centre for Human Genetics, UK, WTSI The
Wellcome Trust Sanger Institute, UK

R
at
G
e
n
o
m
e
A
ss
e
m
bli
es
,
A
n
n
ot
ati
o

6
3
64 Monika Tutaj et al.

indels were located in coding regions or splice sites, and 489 genes
were affected by variant changes. Only 19 of 193 genes expressed
in liver were differentially expressed between SHR and BN-Lx.
The variant predictions were correct but the unaffected genes
produced transcripts without affected exons. The duplicated gene
Mx2 showed heterozygous SNVs and the expression was also
heterozy- gous in the RNA-seq data. In BN-Lx, which does not
carry the duplications, all positions were homozygous at the DNA
and RNA levels. The results implied that many of the analyzed
liver tran- scripts were not spliced according to the available
annotations. Changes in transcript structure rarely overlapped with
genomic variants [11].
The Rat Genome Sequencing and Mapping Consortium ana-
lyzed 160 phenotypes from an outbred rat heterogeneous stock
(NIH-HS) and showed the presence of segregating variation in
commonly used laboratory rats [15]. NIH-HS rats were pheno-
typed for six disease models (anxiety, diabetes, hypertension,
aortic elastic lamina ruptures, multiple sclerosis, and osteoporosis)
and several related risk factors (lipid and cholesterol levels,
cardiac hypertrophy, etc.). The researchers identified 355 QTL
for 122 phenotypes using 265,551 polymorphic high-quality
SNPs. They investigated the extent to which the variants would
identify genes and causative mutations. 212 QTL (62%) had no
candidate variant. The median proportion of heritability explained
by QTL in rats and mice was above 30% and differed substantially
from the mean proportion of heritability in human, which was less
than 10%. The important observation was that genetic variants
present in both inbred rat strains and inbred mouse strains rarely
contributed to the same phenotype [15].
Atanur et al. studied coevolved gene clusters they named
“putative artificial selective sweep (PASS)” regions and defined
by the presence of many fixed rare variants and at least one variant
that contributed to selection [16]. PASS regions co-localized with
QTL, indicating an increased genetic variation in these regions
and therefore an enrichment of variation in genes associated with
disease phenotypes for which the strains were selected. They iden-
tified 9,665,340 SNVs and 3,502,117 short indels, and 29,131
SNVs were nonsynonymous coding (NSC) variants across 27 rat
strains, including 11 models of hypertension, diabetes, and insulin
resis- tance. Half of all single strain SNVs resided within the 189
segments that occupied only 0.8% of the genome, so private SNVs
were concentrated in a small number of discrete regions of the
genome. They identified clusters of coevolved transcripts that
were unique for each disease model. In the Milan hypertensive rat
strain (MHS), a cluster of 65 transcripts (47 genes) was found
containing NSC sequence variants. The NSC cluster included the
Add1 gene that is known to cause hypertension in MHS rats due to
amino acid substitution (F316Y). ADD1 is also associated with
human
Rat Genome Assemblies, Annotation, and Variants 65

hypertension and responsiveness to antihypertensive medications.


Unfortunately, they could not find clusters with shared functional
mutations between strains representing the same disease model,
like hypertensive rats (FHH, MHS, SHR, SHRSP, and LH). Herm-
sen et al. reanalyzed primary rat sequence data of 40 rat strains
using an upgraded rat reference Rnor5.0 version and identified
over 12 million genomic variants (SNVs, indels, and structural
variants) [47]. They found 601 SNVs that have a deleterious effect
on gene function including stop_gained variants (SnpEff
consequences type; SO:0001587) and alterations of splice sites,
but in the case of 60 SNVs (10%), a neighboring single variant or
indel restored the open reading frame. Genes affected by the
remaining 541 high impact SNVs were expressed at lower levels.
The biological rele- vance of the high impact SNVs was limited
due to exon skipping or the canceling effect of adjacent variants.
They observed more non- synonymous SNVs in the substrain
variants compared to the con- trol set. They identified 3006
protein-coding genes that contain 6 or more SNVs in the coding
region and analyzed 909 genes that were under positive selection.
Functional annotation showed that this set of genes is enriched for
genes related to the immune and olfactory system. The WKY
strain had the highest degree of sub- strain variation, partly
because of a geographical relocation before the inbreeding was
completed.
Evidently, causal variants are not easy to identify. An accurate
and complete set of called variants together with gene information
and QTL data, available for many rat strains for a broad range of
complex traits, may assist in identifying the potential causal
variants. Recently She and Jarosz proposed a novel computational
strategy for an optimal crossing scheme in yeast that enables a
single- nucleotide resolution mapping of the genetic variation and
identi- fication of the causal variants from adjacent passenger
mutations [48]. This new strategy could yield a breakthrough in
the search for disease susceptibility markers.
In addition to rat data, 420,850 human variants from ClinVar
along with their predicted effects are available in the Variant Visu-
alizer tool [45]. As the rat is used to study gene-phenotype
associa- tions or specific diseases, researchers need access to all
information necessary to connect genes to phenotypes or diseases
of interest. RGD currently integrates multi-organism data and has
improved multilevel navigation to allow finding this information.

3.3 Phylogeny Researchers interested in the relationships between rat strains used
different types of markers to produce phylogenetic trees. Early
researchers, using 28 biochemical marker loci, distinguished 52
genetically different rat strains that grouped in three clusters [49].
In later studies, highly variable microsatellites and high- density
SNPs were used as genetic markers to construct phyloge- netic
trees, establish the relatedness of organisms, and predict
66 Monika Tutaj et al.

Table 8
Comparison of phylogeny studies of laboratory rats
Publication year First author Number of strains Marker type
19841995
Number of markers 1997200320052006200820082013201520152017
Brown Norway Festing Canzian Canzian Thomas Smits* Mashimo Nijman STAR Atanur Hermsen* Battula Puckett
Long Evans Sprague Dawley 1 Sprague Dawley 2 Wistar - WKY Wistar - BB
F344 52 13 63 (214 sub) RAPD Bioch, SSLP
48 39 93 37 167 28 41 51 326 (29 inb)
Sabra & Cohen PVG Bioch 28 SSLP SNP 861 SSLP 357 SNP 820 SNP SNP SNP SSLP 76 SNP 32,127
BD ACI >4800 BN
264 995 6 20,283 9.6 Mln ~ 9 Mln
4
5 1 14 8 1 1 10 1 3 1 3
4 41 7 23 3 2 5 1 6 8 1
222211326253 5 5 31 10 2 3 2 3 32
4411 725 1 37 1 22 2 2 2 2
21 2 5 4 4 2 7 1 5 2 31
2
22 331 1 3 6 4 1 4 5
5 8
3 4 7 6 2 1 3
3 11 6 52 5 2
7 7 6
3 4
3 4 3
4 2 5
52
a
Phylogeny studies that did not explore phylogeny distances

linkage between genomic regions and various phenotypes and dis-


eases (see Table 8). In 1995, Canzian et al. published a
phylogenetic tree for 13 commonly used rat strains with
construction based on 264 genome-wide loci [50]. In 1997 Dr.
Canzian built a tree for 63 inbred strains and 214 substrains using
995 microsatellite and biochemical markers [51]. The average
polymorphism for pairwise comparisons of rat strains derived
using both biochemical and microsatellite markers was 53%,
whereas the average increased to 64% when only microsatellite
markers were included in the analysis, possibly due to differences
in selective pressures between biochem- ical and genetic markers
and/or the fact that a single biochemical type can result from more
than a single genetic polymorphism. Thomas et al. used 48
substrains from 46 distinct strains and over 4800 microsatellite
markers in their population study [52]. Mashimo et al.
constructed a phylogenetic tree for 93 rat strains from a 357 SSLP
marker set [53]. Nijman et al. used a selected genome-wide set of
820 SNPs on 38 rats of 34 different strains and 3 wild rat strains
[54]. The STAR Consortium presented a phylogenetic network
using 20,283 SNPs from 167 inbred strains [9]. Atanur et al.
constructed a tree using 9.6 million SNVs [16]. In 2015, two
studies reported rat strain phylogenic analyses: Hermsen et al.
presented a population structure of 40 rat strains using almost nine
million SNVs and Battula et al. reported the genetic related- ness
of WNIN and WNIN/Ob rats with other strains using 76
unlinked microsatellite markers [47, 55].
All published phylogenetic studies for rat strains showed con-
sistently that the Brown Norway strain was the most distant from
all the other strains. The STAR Consortium found 10 clusters: two
Wistar (WKY cluster—with SHR, SHRSP, WKY, GK strains; BB
— with MHS, MNS, Lew, BB, LUDW strains), two Sprague-
Dawley subtrees (SD1—with DRH, NAR strains; SD2—with SS,
SR, LN, LH strains), Cohen with Sabra rats (CD—SBH, SBN,
CDR, CDS strains), a Fisher 344 group (F344—with F344, BUF,
MES strains), rats grouped with the Piebald Virol Glaxo strain
(PVG—
Rat Genome Assemblies, Annotation, and Variants 67

WAG, BS, GH, LOU strains), a cluster of Berlin Druckrey rats


(BD—with BDIX, BH, E3), the furthest from the BN root—the
cluster containing the August Copenhagen Irish strain (ACI—with
ACI, DA, COP), and the nearest to the BN was the Long Evans
cluster (LE—with KDP, FHH, LE, R33 strains) [13]. We used
their cluster structure to compare other published trees in Table 7
that shows the numbers of strains that were included in 10 clusters.
Two studies of Hermsen et al. and Smits et al. identify the number
of populations (clusters) and explore similarities between them
instead of constructing the phylogenetic trees [47, 56]. Smits et
al. presented a phylogenetic network that explores alternative
evolutionary paths along the network [56] and Hermsen et al.
investigated a “population” structure, appropriate for studying
ancestry of samples genotyped at a large number of genetic
markers and with a complex evolutionary history [47].
Nijman et al. determined seven very similar clusters, but the
WKY cluster was further in distance from the BN strain, the PVG
strain was separated with the LOU and WAG strains (a similar
separation was present in the Canzian 1997 tree), the ACI cluster
was closer to the BN root and the most distant was the BD group.
All Wistar strains (WKY and BB clusters) were also separated
from the group of BN in the Atanur et al. 2013 study. That was in
contrast to three other phylogenic analyses, where the WKY group
was in close proximity to the BN root, the Canzian 1997, Thomas
2003, and Battula 2015 trees. The LE group strains were the
closest branch to BN in most of the trees. The ACI cluster was
close to the F344 group in the Thomas, STAR Consortium, and
Atanur trees. The F344 strain was not included in the Nijman tree
analysis, but the Buffalo strain was placed next to the ACI cluster.
SS, SR, SHR, and MHS strains that represent models of
hypertension remain close in distance in most of the trees (SD2
and BB clusters). The MNS and MHS strains were bred together
and they are tightly related, but in the Canzian 1997 tree they were
placed separately. Two inbred WNIN strains in the Batutula
analysis were placed as an individual cluster between the F344
group and the LE strain. Some substrains showed greater
interstrain genetic differences than others [51]. Variation between
substrains were observed in LE (29% in the pairwise substrain
comparison), WKY (up to 19%), LEW (13%), SHR (11%), BB
(10%), PKD (5%) and, to a lesser degree, in GK (1%) and BN
(0.6%) inbred strains [51]. These observations are important for
research strategy design, as the results and reproduc- ibility
depend on the choice of particular substrains.

4 Conclusion

In summary, RGD provides a resource for rat genetic markers that


includes simple sequence length polymorphisms (SSLP), copy
number variations (CNV), deletions, insertions, single nucleotide
68 Monika Tutaj et al.

variants (SNV), and QTL. The data are available for three rat
genome assemblies for a range of commonly used laboratory rat
strains. This repository is valuable for researchers that use rats in
medical research but also for those who do comparative analysis
using other organisms. RGD’s major goal is to present rat genomic
and phenotypic data making it easy to interpret, to assist in experi-
mental design and in the aftermath to facilitate rat research and
interspecies comparison. RGD’s resources may improve the repro-
ducibility of scientific research between laboratories and thus
ensure the overall quality of biomedical animal research.

References
1. Gibbs RA, Weinstock GM, Metzker ML, and haplotype mapping for genetic analysis
Muzny DM, Sodergren EJ, Scherer S et al in the rat. Nat Genet 40(5):560–566.
(2004) Genome sequence of the Brown Nor- https:// doi.org/10.1038/ng.124
way rat yields insights into mammalian 10. Atanur SS, Birol I, Guryev V, Hirst M,
evolu- tion. Nature 428(6982):493–521 Hummel O, Morrissey C et al (2010) The
2. Havlak P, Chen R, Durbin KJ, Egan A, Ren Y, genome sequence of the spontaneously
Song XZ et al (2004) The Atlas genome hyper- tensive rat: analysis and functional
assem- bly system. Genome Res 14(4):721– significance. Genome Res 20(6):791–803.
732 https://doi. org/10.1101/gr.103499.109
3. Krzywinski M, Wallis J, Go¨sele C, Bosdet 11. Simonis M, Atanur SS, Linsen S, Guryev V,
I, Chiu R, Graves T et al (2004) Integrated and Ruzius FP, Game L et al (2012) Genetic basis
sequence-ordered BAC- and YAC-based of transcriptome differences between the
physi- cal maps for the rat genome. Genome founder strains of the rat HXB/BXH recombi-
Res 14 (4):766–779 nant inbred panel. Genome Biol 13(4):r31.
4. Kren V, Qi N, Krenova D, Zidek V, Sladka´ M, https://doi.org/10.1186/gb-2012-13-4-r31
Ja´chymova´ M, M´ıkova´ B et al 12. Guo X, Brenner M, Zhang X, Laragione T,
(2001) Y-chromosome transfer induces Tai S, Li Y et al (2013) Whole-genome
changes in blood pressure and blood lipids in sequences of DA and F344 rats with different
SHR. Hypertension 37(4):1147–1152 susceptibilities to arthritis, autoimmunity,
5. Gibbs R, Weinstock G (2005) Upgrading the inflammation and cancer. Genetics 194
DNA sequence of the rat genome. White (4):1017–1028.
paper available at https://doi.org/10.1534/
https://www.genome.gov/pages/ genetics.113.153049
research/sequencing/seqproposals/ 13. Li R, Yu C, Li Y, Lam TW, Yiu SM,
ratupgradeseq.pdf Kristiansen K et al (2009) SOAP2: an
6. van Boxtel R, Cuppen E (2010) Rat traps: improved ultrafast tool for short read
filling the toolbox for manipulating the rat alignment. Bioinformatics 25(15):1966–1967.
genome. Genome Biol 11(9):217. https://doi.org/10.
https:// doi.org/10.1186/gb-2010-11-9-217 1093/bioinformatics/btp336
7. Prokop JW, Underwood AC, Turner ME, 14. Li R, Zhu H, Ruan J, Qian W, Fang X, Shi Z
Miller N, Pietrzak D, Scott S et al (2013) et al (2010) De novo assembly of human gen-
Anal- ysis of Sry duplications on the Rattus omes with massively parallel short read
norvegi- cus Y-chromosome. BMC sequencing. Genome Res 20(2):265–272.
Genomics 14:792. https://doi.org/10.1101/gr.097261.109
https://doi.org/10.1186/1471-2164-14- 15. Rat Genome Sequencing and Mapping Con-
792 sortium, Baud A, Hermsen R, Guryev V,
8. Rozen S, Warren WC, Weinstock G, O’Brien Stridh P, Graham D et al (2013) Combined
SJ, Gibbs RA, Richard K et al (2006) sequence-based and genetic mapping analysis
Sequenc- ing and annotating new mammalian of complex traits in outbred rats. Nat Genet 45
Y chromo- somes. White paper available at (7):767–775.
https://www. https://doi.org/10.1038/ng. 2644
genome.gov/pages/research/sequencing/ 16. Atanur SS, Diaz AG, Maratou K, Sarkis A,
seqproposals/ychromosomewp.pdf Rotival M, Game L et al (2013) Genome
9. STAR Consortium, Saar K, Beck A, Bihoreau sequencing reveals loci under artificial
MT, Birney E, Brocklebank D et al (2008) selection
SNP
Rat Genome Assemblies, Annotation, and Variants 69

that underlie disease phenotypes in the linkage map of the laboratory rat, Rattus nor-
labora- tory rat. Cell 154(3):691–703. vegicus. Nat Genet 9(1):63–69
https://doi. org/10.1016/j.cell.2013.06.040 29. Steen RG, Kwitek-Black AE, Glenn C,
17. Aitman TJ, Dong R, Vyse TJ, Norsworthy Gullings-Handley J, Van Etten W, Atkinson
PJ, Johnson MD, Smith J et al (2006) Copy OS et al (1999) A high-density integrated
num- ber polymorphism in Fcgr3 genetic linkage and radiation hybrid map of
predisposes to glo- merulonephritis in rats the laboratory rat. Genome Res 9(6): AP1–
and humans. Nature 439(7078):851–855 AP8. Erratum in: Genome Res 1999 9 (8):793
18. Thibaud-Nissen F, Souvorov A, Murphy T, 30. Hutton JJ, Roderick TH (1970) Linkage ana-
DiCuccio M, Kitts P (2013) Eukaryotic lyses using biochemical variants in mice.
genome annotation pipeline. In: The NCBI 3. Linkage relationships of eleven biochemical
handbook, 2nd edn. National Center for Bio- markers. Biochem Genet 4(2):339–350
technology Information, Bethesda. 31. Moutier R, Toyama K, Charrier MF (1973)
https:// Biochemical polymorphism in the rat, Rattus
www.ncbi.nlm.nih.gov/books/NBK169439/ norvegicus: genetic study of four markers.
19. Yeh RF, Lim LP, Burge CB (2001) Computa- Bio- chem Genet 8(3):321–328
tional inference of homologous gene structures 32. Botstein D, White RL, Skolnick M, Davis RW
in the human genome. Genome Res 11 (1980) Construction of a genetic linkage map
(5):803–816 in man using restriction fragment length poly-
20. Aken BL, Ayling S, Barrell D, Clarke L, morphisms. Am J Hum Genet 32(3):314–331
Curwen V, Fairley S et al (2016) The Ensembl 33. Williams JG, Kubelik AR, Livak KJ, Rafalski
gene annotation system. Database (Oxford) JA, Tingey SV (1990) DNA polymorphisms
2016:baw093. https://doi.org/10.1093/data ampli- fied by arbitrary primers are useful as
base/baw093 genetic markers. Nucleic Acids Res 18:6531–
21. Birney E, Clamp M, Durbin R (2004) Gene- 6535
Wise and Genomewise. Genome Res 14 34. Bryda EC, Riley LK (2008) Multiplex micro-
(5):988–995 satellite marker panels for genetic monitoring
22. National Center for Biotechnology Informa- of common rat strains. J Am Assoc Lab Anim
tion (2005) US National Library of Sci 47(3):37–41
Medicine, Bethesda. 35. Jacob HJ, Lindpaintner K, Lincoln SE,
http://www.ncbi.nlm.nih.gov. Accessed 1 Kusumi K, Bunker RK, Mao YP et al (1991)
Feb 2015 Genetic mapping of a gene causing hyperten-
23. Yates A, Akanni W, Amode MR, Barrell D, sion in the stroke-prone spontaneously hyper-
Billis K, Carvalho-Silva D et al (2016) tensive rat. Cell 67(1):213–224
Ensembl 2016. Nucleic Acids Res 44:D710– 36. Levan G, Szpirer J, Szpirer C, Klinga K,
D716. Hanson C, Islam MQ (1991) The gene map
https://doi.org/10.1093/nar/gkv1157 of the Norway rat (Rattus norvegicus) and
24. Buels R, Yao E, Diesh CM, Hayes RD, comparative mapping with mouse and man.
Munoz- Torres M, Helt G et al (2016) Genomics 10:699–718
JBrowse: a dynamic web platform for 37. Bihoreau M-T, Sebag-Montefiore L, Godfrey
genome visualiza- tion and analysis. RF, Wallis RH, Brown JH, Danoy PA et al
Genome Biol 17:66. (1997) A high-resolution consensus linkage
https://doi.org/10.1186/s13059-016- map of the rat, integrating radiation hybrid
0924- 1 and genetic maps. Genomics 75:57–69
25. Kumar D, Yadav AK, Jia X, Mulvenna J, 38. Brown DM, Matise TC, Koike G, Simon JS,
Dash D (2015) Integrated transcriptomic- Winer ES, Zangen S et al (1998) An
proteomic analysis using a proteogenomic integrated genetic linkage map of the
workflow refines rat genome annotation. Mol laboratory rat. Mamm Genome 9(7):521–
Cell Prote- omics 15(1):329–339. 530
https://doi.org/10. 1074/mcp.M114.047126 39. Jensen-Seaman MI, Furey TS, Payseur BA,
26. Wu PY, Phan JH, Wang MD (2013) Lu Y, Roskin KM, Chen CF et al (2004) Com-
Assessing the impact of human genome parative recombination rates in the rat, mouse,
annotation choice on RNA-seq expression and human genomes. Genome Res 14 (4):528–
estimates. BMC Bioinformatics 11:S8. 538
https://doi.org/ 10.1186/1471-2105-14- 40. Littrell J, Tsaih SW, Baud A, Rastas P,
S11-S8 Solberg- Woods L, Flister MJ (2018) A high-
27. Serikawa T, Kuramoto T, Hilbert P, Mori M, resolution genetic map for the laboratory rat.
Yamada J, Dubay CJ et al (1992) Rat gene G3 (Bethesda) 8(7):2241–2248
mapping using PCR-analyzed microsatellites.
Genetics 131(3):701–721
28. Jacob HJ, Brown DM, Bunker RK, Daly MJ,
Dzau VJ, Goodman A et al (1995) A genetic
70 Monika Tutaj et al.

41. Bhe´rer C, Campbell CL, Auton A 49. Festing MF, Bender K (1984) Genetic
(2017) Refined genetic maps reveal sexual relation- ships between inbred strains of rats.
dimorphism in human meiotic recombination An analysis based on genetic markers at 28
at multiple scales. Nat Commun 8:14994 biochemical loci. Genet Res 44(3):271–281
42. Morgan AP, Gatti DM, Najarian ML, Keane 50. Canzian F, Ushijima T, Pascale R, Sugimura
TM, Galante RJ, Pack AI et al (2017) Struc- T, Dragani TA, Nagao M (1995) Construction
tural variation shapes the landscape of of a phylogenetic tree for inbred strains of rat
recom- bination in mouse. Genetics by arbitrarily primed polymerase chain
206:603–619 reaction (AP-PCR). Mamm Genome
43. Ulirsch JC, Nandakumar SK, Wang L, Giani 6(4):231–235
FC, Zhang X, Rogov P et al (2016) Systematic 51. Canzian F (1997) Phylogenetics of the labora-
functional dissection of common genetic varia- tory rat Rattus norvegicus. Genome Res 7
tion affecting red blood cell traits. Cell 165 (3):262–267
(6):1530–1545.
https://doi.org/10.1016/j. cell.2016.04.048 52. Thomas MA, Chen CF, Jensen-Seaman MI,
Tonellato PJ, Twigger SN (2003) Phyloge-
44. Wood AR, Esko T, Yang J, Vedantam S, netics of rat inbred strains. Mamm Genome
Pers TH, Gustafsson S et al (2014) Defining 14(1):61–64
the role of common variation in the genomic
and biological architecture of adult human 53. Mashimo T, Voigt B, Tsurumi T, Naoi K,
height. Nat Genet 46(11):1173–1186. Nakanishi S, Yamasaki K et al (2006) A set of
https://doi. org/10.1038/ng.3097 highly informative rat simple sequence length
polymorphism (SSLP) markers and genetically
45. Shimoyama M, De Pons J, Hayman GT, Lau- defined rat strains. BMC Genet 7:19
lederkind SJ, Liu W, Nigam R et al (2015) The
Rat Genome Database 2015: genomic, pheno- 54. Nijman IJ, Kuipers S, Verheul M, Guryev V,
typic and environmental variations and Cuppen E (2008) A genome-wide SNP panel
disease. Nucleic Acids Res 43(Database for mapping and association studies in the
issue): D743–D750 rat. BMC Genomics 9:95.
https://doi.org/10. 1186/1471-2164-9-95
46. Twigger SN, Pruitt KD, Ferna´ndez-Sua
´rez XM, Karolchik D, Worley KC, Maglott 55. Battula KK, Nappanveettil G, Nakanishi S,
Kuramoto T, Friedman JM, Kalashikam RR
DR et al (2008) What everybody should
(2015) Genetic relatedness of WNIN and
know about the rat genome and its online
WNIN/Ob with major rat strains in biomedi-
resources. Nat Genet 40(5):523–527.
cal research. Biochem Genet 53 (4–6):132–
https://doi.org/ 10.1038/ng0508-523
140. https://doi.org/10.1007/ s10528-
47. Hermsen R, de Ligt J, Spee W, Blokzijl F, 015-9679-8
Sch€afer S, Adami E et al (2015) Genomic
56. Smits BM, Guryev V, Zeegers D, Wedekind
land- scape of rat strain and substrain
D, Hedrich HJ, Cuppen E (2005) Efficient
variation. BMC Genomics 16:357.
single nucleotide polymorphism discovery in
https://doi.org/10.1186/ s12864-015-
labora- tory rat strains using wild rat-derived
1594-1
SNP can- didates. BMC Genomics 6:170
48. She R, Jarosz DF (2018) Mapping causal var-
57. Ren Y (2016) Multi-omics analysis of a rat
iants with single-nucleotide resolution reveals
model of aerobic exercise capacity and meta-
biochemical drivers of phenotypic change.
bolic fitness. PhD dissertation, University of
Cell 172(3):478–490.
Michigan, Michigan
https://doi.org/10.1016/j. cell.2017.12.015
Chapter 3

Rat Genome Databases, Repositories, and Tools


Stanley J. F. Laulederkind, G. Thomas Hayman, Shur-Jen Wang,
Matthew J. Hoffman, Jennifer R. Smith, Elizabeth R. Bolton,
Jeff De Pons, Marek A. Tutaj, Monika Tutaj, Jyothi Thota,
Melinda R. Dwinell, and Mary Shimoyama

Abstract
Resources for rat researchers are extensive, including strain repositories and databases all around the
world. The Rat Genome Database (RGD) serves as the primary rat data repository, providing both manual
and computationally collected data from other databases.

Key words Database, Genomics, Analysis, Visualization, Disease, Phenotype, Pathway, Gene, Anno-
tation, Model organism

1 Introduction

The laboratory rat (Rattus norvegicus) has been used as an


animal model for physiology, pharmacology, toxicology, nutrition,
behav- ior, immunology, and disease for over 150 years [1]. It was
the first animal to be domesticated for use by scientists [2]. The
rat’s value continues to grow as indicated by the more than 1.5
million pub- lications in PubMed, with about 40,000 being added
every year. Advanced sequencing technologies, genome
modification tech- niques, and the development of embryonic stem
cell protocols ensure that the rat remains an important mammalian
model for disease studies. The 2004 release of the reference
genome has been followed by the sequencing of genomes for more
than two dozen individual strains utilizing NextGen sequencing
technologies. These analyses have identified over 50 million
variants [3, 4].
This explosion of genomic data has been accompanied by the
ability to selectively edit the rat genome, leading to hundreds of
new strains through technologies using the CRISPR/Cas9 system
[5], zinc finger nucleases [6], transcription activator-like effector
nucleases [7], transposons [8], and meganucleases [9]. A number

G. Thomas Hayman et al. (eds.), Rat Genomics, Methods in Molecular Biology, vol. 2018,
https://doi.org/10.1007/978-1-4939-9581-3_3, © Springer Science+Business Media, LLC, part of Springer Nature 2019

71
72 Stanley J. F. Laulederkind et al.

of resources have been developed to provide investigators access


to precision rat models, comprehensive datasets, and sophisticated
software tools necessary for their research. Those include the Rat
Genome Database (RGD), Gene Editing Rat Resource Center
(GERRC), Rat Resource and Research Center (RRRC), the
National BioResource Project-Rat (NBRP-Rat), PhenoGen, and
more, as detailed later.

2 Rat Strain Repositories

Hundreds of rat strains have been developed during the past


100 years, including inbred, consomic, congenic, ENU-mutants,
and, more recently, genetically engineered mutant strains. This
large number of strains is managed mainly by two rat resource
centers, the Rat Resource and Research Center (RRRC) in the
United States and the National Bio Resource Project-Rat (NBRP-
Rat) in Japan. These resource centers collect, maintain, and
distrib- ute rat strains as animals or cryopreserved embryos and
spermato- zoa. These two centers also perform phenotypic and
genetic characterization of the specimen with dissemination of that
infor- mation through their respective, publically accessible
databases.

2.1 RRRC Many important rat strains for life science research have been
maintained by scientists in individual laboratories. This type of
resource propagation is inefficient and susceptible to changes in
funding or local interest. The NIH rat model repository workshop
was held in 1998, with scientists from around the world discussing
the needs, opportunities, and parameters for optimal standardiza-
tion, maintenance, and distribution of genetically defined rat
strains. Those scientists strongly encouraged the NIH to establish a
national rat genetics resource center, and as a result, the Rat
Resource Research Center (RRRC) was established in 2001.
The service functions of the RRRC
(https://www.rrrc.us/) involve the procurement of non-
commercial rat lines, sperm and embryo cryopreservation, cryo-
resuscitation or rederivation with pathogen and genotype quality
control, genotyping and cyto- genetic services, gut microbiome
characterization, and distribution of rats, cell lines, and tissues to
biomedical investigators. RRRC also performs research to make
improvements in rat model develop- ment and enhancement.

2.2 NBRP-Rat The National BioResource Project-Rat (NBRP-Rat)


(https://www. anim.med.kyoto-u.ac.jp/nbr/) was initiated in 2002
to establish a system to facilitate the systematic collection,
preservation, and provision of laboratory rats. It is the world’s
largest rat repository with specimens kept as live animals and
cryopreserved embryos or sperm. Hundreds of laboratories across
Japan and the world have
Rat Resources 73

been supplied with rat strains or rat DNA from NBRP-Rat. Proto-
cols for cryopreservation and rederivation techniques have also
been supplied by NBRP-Rat to the research community [10].
NBRP-Rat’s Phenome Project was a reevaluation of more than
150 strains based on over 100 phenotypic parameters in seven
general categories [11]. A major benefit of all these phenotypic
measurements is the generation of biological ranges of various
parameters, which allows visualization of normal and abnormal
values for the different rat strains examined. The data can be
visualized on the NBRP-Rat web site or in the RGD PhenoMiner
tool [12].
More than 700 rat strains have been deposited at NBRP-Rat,
with most of those available as cryopreserved sperm or embryos
and the remaining available as live animals. All of the deposited
strains can be obtained by interested researchers. The Kyoto
University Rat Mutant Archive (KURMA) was added to NBRP-
Rat to provide ENU mutant strains, which provide many models
for biomedical research. More than 150 strains have been
genotyped by NBRP- Rat with more than 300 microsatellite
markers [13, 14]. These genotyped rats have provided data to
create phylogenetic charts and SSLP charts, which allow a visual
approximation of the genetic distance between different strains.
There are also various other tools which allow public access to
data at NBRP-Rat.

2.3 Gene Editing The Gene Editing Rat Resource Center (GERRC;
Rat Resource http://rgd.mcw. edu/wg/gerrc) at the Medical College of
Center Wisconsin committed in 2013 to produce about 200 genetically
modified rat strains over a five-year period for use by researchers.
These selected strains have specific genes knocked out using
several different gene editing technologies. There were two
application rounds each year, during which researchers requested
genes to be knocked out in a specific strain, with up to two
applications allowed per laboratory. Applica- tions were reviewed
by an external advisory board to determine which models, up to
25, were to be made. After the strains were created, usually 9 to 12
months after application, the requesting investigator received the
first breeder pair. Any other breeder pairs are available to other
investigators on a first come, first served basis. An annotated list
of all the mutant strains generated by the project is available on the
GERRC web site.

3 Rat-Specific Data Resources

3.1 Rat The original 2004 release of the reference genome for the rat
Genome Project [15] was done by the Rat Genome Sequencing Consortium
(RGSC) led by the Human Genome Sequencing Center at
Baylor College of Medicine (BCM-HGSC). Access to the
original data and assembly updates (including Rnor 6.0) is
available on the BCM-HGSC web
74 Stanley J. F. Laulederkind et al.

site (https://www.hgsc.bcm.edu/other-mammals/rat-
genome- project) and at the National Center for Biotechnology
Information (NCBI)
(https://www.ncbi.nlm.nih.gov/genome/73).

3.2 Rat The Rat Genome Database (https://rgd.mcw.edu/) was


Genome estab- lished in 1999 as a resource to support the emerging
Database genomic data for the rat. This role has continued to expand with
continuing work on the rat reference genome sequence (the current
assembly is Rnor_6.0 - RGSC Genome Assembly v6.0), strain-
specific DNA sequencing [16], expanded SNP discovery, and
large-scale pheno- typing projects such as the PhysGen project
(http://pga.mcw.edu) and NBRP [10], all needing to be
integrated with existing and newly published research data. As the
amount of data has grown, so has the challenge of mining relevant
information and defining its meaning in the broader context of
biomedical science. With this in mind, much effort has gone into
the development and incorpora- tion of biomedical ontologies such
as the Gene Ontology [17], the Mammalian Phenotype Ontology
[18], the Pathway Ontology [19], and others [20]. These are
incorporated into the search and analysis tools, greatly facilitating
the discovery of information and interpretation of its meaning.
Many researchers using the rat as a model system are
ultimately studying a specific phenotype or disease with the goal
of applying this knowledge to humans. To meet this need, RGD
has developed “Disease Portals” that present RGD data and tools
from the per- spective of a particular disease. The Disease Portals
allow research- ers to visit a single page that is focused on a single
disease area like cardiovascular, neurological, etc. These disease
categories are being expanded in an ongoing process of targeted
curation to create more portals devoted to particular disease areas
that will cater directly to researchers working in those areas. The
rest of RGD is accessible via these portals, but researchers will
find the items of their greatest interest first, reducing the challenge
of finding the data and inter- preting its meaning. Similarly, the
Phenotypes & Models Portal and the Pathways Portal focus on
specific areas of research, which allows easier access to targeted
searches for relevant data.
In addition to the portal style of data organization, the
access to different software tools at RGD is an important part
of the data- base. Ranging from annotation-based analysis to
sequence-based analysis, the options are extensive to
manipulate both RGD data and user-uploaded data. Further
analysis may be done with down- loaded data via the FTP site
or the REST API.

3.2.1 RGD Data Objects RGD stores data about various “objects,” including genes, quanti-
tative trait loci (QTLs), markers, references, strains, and cell lines.
Report pages for these objects are presented in a similar format.
The most data-rich report pages are gene, QTL, and strain pages.
Disease data for genes, QTLs, and strains can also be accessed
Rat Resources 75

through various RGD Disease Portals. Pathway data for genes can
be accessed through the RGD Pathway Portal. Physiological data
for strains is accessible through the Phenotypes and Models Portal.

Genes To allow comparative investigation, RGD has historically provided


gene report pages for rat, mouse, and human, with additional
species added recently. A typical rat gene report page is shown in
Fig. 1. The top section contains the gene name and annotation-
based description, links to orthologs and external information/
analysis sites, and map information. Next are data sections (see
Fig. 1B), the first being the “Annotation” section containing
both RGD manual and imported annotations using various ontol-
ogies: Disease Ontology (DO), Gene Ontology (GO), Pathway
Ontology (PW) and phenotype ontologies (Mammalian Phenotype
Ontology (MP) and Human Phenotype Ontology (HP). Each
annotation has a term, an evidence code, reference, and source
information. Sections on Genomics, Sequence (DNA and protein),
Strain (Sequence) Variation and Additional Information are found
below the Annotation section.

Quantitative Trait Loci Another type of RGD object presented on report pages is the
Quantitative Trait Locus (QTL), which is a large region of DNA
associated with a physiological or pathological phenotype. RGD
has data on a large variety of QTLs (rat, mouse, and human)
describing physiologic and anatomic traits, like blood pressure
and organ weight, to disease traits for cancer, diabetes, and other
pathological conditions.
The top section of a QTL report page provides the QTL
name, trait and measurement type. Significance scores, map
information, and strains crossed to derive the QTL are also
provided. The Annotation section contains disease annotations
with DO terms, phenotype annotations with MP (rat, mouse) or
HP (human) terms, and experimental data annotations, which
use the following ontologies: the Vertebrate Trait Ontology
(VT), the Clinical Mea- surement Ontology (CMO), the
Measurement Method Ontology (MMO), and the Experimental
Condition Ontology (XCO). The CMO, MMO, and XCO are
RGD-produced and maintained ontol- ogies. References and
disease portal links are also provided in the Annotation section
of the QTL report page. The “Region” section provides
position markers for the QTL, and genes, markers, and
overlapping QTLs in the region.

Strains Rat strains in RGD are named according to the official


nomencla- ture rules (https://rgd.mcw.edu/nomen/rules-for-
nomen.shtml) and are organized hierarchically by the Rat Strain
(RS) ontology
[21] to facilitate access to curated strain data. The RS is available
in the RGD ontology browser [20] for easy navigation through
the
76 Stanley J. F. Laulederkind et al.

Fig. 1 Gene Report Page for Rat Ptgs1. (A) The top half of the page contains general information,
ortholog assignments, genomic positions, JBrowse model, and links to external sites. (B) The bottom half
of the page has annotations in various categories, genomic information, sequence information, and more,
all in expand- able, labeled bars
Rat Resources 77

Fig. 2 Ontology Term Browser. The RGD term browser with “rat strain” (RS) selected in the top
panel. (A) Selection “SHR” in the bottom panel has an accompanying “View Strain Report” link

rat strain nomenclature to find strains and sub-strains of interest.


Users can access strain report pages through the general RGD
search at the top of most RGD pages, through the strain search
page, or the link next to the selected strain (see Fig. 2A) in the
ontology browser. Each registered strain has a report page that
includes information on the source and availability of the strain,
and the type of manipulation used to derive the strain. On the rat
strain report pages the Annotation section has Disease, Phenotype,
Experimental Data, and Phenotype Values via Phenominer subsec-
tions containing annotations from curation of the scientific litera-
ture or user submission data. The Experimental Data annotations
and Phenotype Values via Phenominer annotations relate to quan-
titative physiological data that is available in the RGD
PhenoMiner tool (see RGD Data Analysis and Visualization
Tools section below).

3.2.2 Portal RGD currently has 12 disease portals encompassing many disease
Access to RGD areas from developmental and age-related to cardiovascular and
Data neurological. Each portal is an entry point where investigators can
Disease Portals access data and tools relevant to their research area. One can
access rat, mouse, and human genes and QTLs, and rat strains
annotated to a selected disease category or subcategory (see Fig.
3A). Annota- tions for a disease-related phenotype, biological
process, or
78 Stanley J. F. Laulederkind et al.

Fig. 3 Hematologic Disease Portal Home Page. (A) Drop down menus for selection of disease category
and specific disease. (B) Numerical summary of results for the selected disease category/disease. (C)
GViewer display of results with approximate positions of all disease genes, QTLs, and strains. (D) Lists of
genes, QTLs, and strains annotated to selected disease category/disease. (E) Graphs showing Gene
Ontology annotations for all selected disease-annotated genes, using GO slim (subset)
representations of the three GO aspects

pathway can also be accessed through a tab selected at the top of


the portal homepage. A summary box of the number of objects
anno- tated to the selected category is shown (see Fig. 3B). These
objects are presented in a genomic context via an instance of the
Genome Viewer below the summary box. The Genome Viewer
can be set to rat, mouse, or human, and to synteny views of the
unselected species. Beneath the Genome Viewer, the genes, QTLs,
and strains
Rat Resources 79

Fig. 4 De Novo Pyrimidine Biosynthetic Pathway Diagram. The diagram is accompanied by a text
description above it and a key to the left of it

are listed, linked to their respective report pages. The bottom of the
page shows graphs displaying GO annotation enrichment data.

Pathway Portal The RGD Pathway Portal presently contains 200 interactive path-
way diagram pages organized into five branches, based on the five
branches of the Pathway Ontology, which was developed at RGD.
Some pathway pages are organized into suites of related pathways,
and suite networks—higher order organizations of suites. The
molecular pathway diagrams (see Fig. 4) are designed with
Elsevier’s Pathway Studio software
(https://support.pathwaystudio.com/) and feature hyperlinks
from most of the objects in the diagram to RGD pages
representing the respective term, gene, chemical, or associated
secondary pathway. Beneath the diagram is a download- able list
of genes in the pathway (see Fig. 5A), with tabs for rat, human,
mouse, and other species. Below the gene lists are tables of
additional elements in the pathway (see Fig. 5B), disease
80 Stanley J. F. Laulederkind et al.

Fig. 5 Pathway Gene/Element Lists. A number of gene lists are found on pathway diagram pages
below the diagram. (A) A list of genes annotated to de novo pyrimidine biosynthetic pathway and its
children terms. The
Rat Resources 81

annotations to genes in the pathway (see Fig. 5C), additional


path- way annotations to genes in the diagrammed pathway (see
Fig. 5D), and, when available, phenotype annotations to the genes
in the pathway (see Fig. 5E). These tables toggle from
annotation/gene to gene/annotation displays, with all objects
linked to report pages. Below the gene lists there is a reference list
of publications asso- ciated with the diagrammed pathway. Lastly,
below the references is an ontology graph that shows the
diagrammed term and all its ancestor terms up to the root term.

Phenotypes & Models


Portal This portal contains data related to rat strains and phenotypes, as
well as essential information for conducting physiological research,
identifying disease models, and comparative analysis of strain-
centered data. Comparative analysis is the main focus of the Phe-
noMiner tool, which is described below.

3.2.3 RGD Data Analysis


and Visualization Tools Some of the data analysis tools at RGD are database-specific
Overview
instances of freely available software. These include JBrowse, Rat-
Mine, and InterViewer (Cytoscape). The remaining analysis/visu-
alization tools described in this section were developed at RGD:
Gene Annotator, GViewer (Genome Viewer), OLGA (Object List
Generator & Analyzer), PhenoMiner, and Variant Visualizer. They
all provide different views or different types of analysis of the data in
RGD. All of the tools may be accessed by the “Analysis & Visuali-
zation” icon in the middle of the RGD home page or the tab near
the top of most RGD pages.

InterViewer InterViewer, RGD’s Cytoscape-based


(https://www.cytoscape. org/) [22] protein-protein interaction
viewer, takes one or more gene symbols, RGD gene IDs, or
UniProtKB protein IDs for rat, mouse, human, and/or dog (see
Fig. 6) and displays pairwise protein interactions for them, with
information about the types of interactions and links to the
associated genes in RGD, proteins in UniProt
(ht t p: //ww w.u ni pro t.o r g/ ), and the originating interac-
tion records at IMEX (International Molecular Exchange
(IMEx) consortium) [23, 24].

JBrowse The JBrowse genome browser [25, 26] from the Generic Model
Organism Database project (http://www.gmod.org) is an
interac- tive tool which allows researchers to visualize a variety of
genetic and phenotypic data types in their genomic context.
Virtually all of

Fig. 5 (continued) list includes links to RGD gene report pages, JBrowse, and reference pages. (B) A
list of additional elements in the pathway. (C) A list of disease ontology terms/genes that can be
toggled by the title bar to genes/disease terms. All the disease terms link to ontology report pages
and the gene symbols link to gene report pages. (D) A list of additional pathways associated with
genes annotated to the diagrammed pathway. (E) A list of phenotypes associated with the genes
annotated to the diagrammed pathway
82 Stanley J. F. Laulederkind et al.

Fig. 6 Interviewer Search/Results. The target protein (rat Grb2) that initiated the search is shown in the
center of the graphic display. Individual proteins are indicated by color-coded circles (red—rat, green
—mouse, blue—human). The types of interactions are designated by color-coded lines between
the circles

the data within the Rat Genome Database has been associated with
the genome sequence in one way or another. As fundamental
datasets such as genes, quantitative trait loci, microsatellite and
SNP markers, and sequence resources such as ESTs, are aligned
with the genome sequence, they bring with them phenotypic and
other information. This information includes gene-chemical inter-
action data, genetic associations with disease, RNA-Seq data, syn-
teny views of rat, mouse, and human genomes, and many types of
variant/mutation data. Any or all of these can be accessed via the
JBrowse genome browser and their relationship to the genomic
sequence explored.
Rat Resources 83

RatMine RatMine integrates data on function, disease, phenotype,


variation, and comparative genomics from RGD, UniProtKB,
Ensembl (https://www.ensembl.org), NCBI
(https://www.ncbi.nlm.nih. gov/), PubMed
(https://www.ncbi.nlm.nih.gov/pubmed), and KEGG
(https://www.genome.jp/kegg/) to form a web-based data
warehousing, mining, and analysis tool tailored to the needs of
rat researchers. Datasets derived from querying this data or
from uploading researchers’ own data can be saved,
manipulated, and/or downloaded for use in other applications.
RatMine also has interaction datasets imported from
BioGrid (Biological General Repository for Interaction Datasets)
(https:// thebiogrid.org) [27] and IntAct
(https://www.ebi.ac.uk/intact) [23]. The BioGRID database
manually curates the biomedical literature for genetic, protein,
and chemical interaction data for major model organisms and
humans. IntAct is a molecular interac- tion database that
provides data derived from literature curation or direct user
submissions to IntAct.
A key component of RatMine and of InterMine instances in
general is the “MyMine” feature. Logging in as a specific user
allows one to keep object lists (genes, etc.), user-created queries,
and a history of activity. An API (application program interface)
allows queries to run in RatMine from various web-based
programs (Perl, Python, Ruby, or Java).

Gene Annotator The Gene Annotator (GA) takes a list of gene symbols, RGD IDs,
GenBank accession numbers, Ensembl identifiers, and/or a chro-
mosomal region, and retrieves annotation data from RGD. The
tool will retrieve annotations from most ontologies used at RGD
for genes and their orthologs, as well as links to additional informa-
tion at other databases. The entry page (see Fig. 7A) is very similar
to the InterViewer entry page.
The first GA page after a search is an annotation/external link/
species selection page where everything is selected by default
(see Fig. 7B). Clicking the submit button returns a page with all
anno- tations for the first gene (and selected orthologs) in the list.
The lists include links to RGD gene pages, ontology term pages,
anno- tation pages, and external data pages (see Fig. 7C).
A list of links at the top of the page allows the user to pick a
particular type of analysis to view (Annotation Distribution or
Comparison Heat Map) or to send the gene list to another tool
by selecting the “All Analysis Tools” link.
On the “Annotation Distribution” page (see Fig. 7D) there are
enrichment lists of terms by category, which rank the terms
accord- ing to how many of the searched genes are annotated to
those particular terms. Each entry in the list can be opened to see
which genes and which specific terms are in the annotations.
Subsets of annotations can be displayed by selecting at least two of
the check boxes which appear to the right of every term in the
lists.

You might also like