the Coronavirus Infectious Disease Ontology (CIDO) is a community-based ontology that supports co... more the Coronavirus Infectious Disease Ontology (CIDO) is a community-based ontology that supports coronavirus disease knowledge and data standardization, integration, sharing, and analysis. O ntologies, as the term is used in informatics, are structured vocabularies comprised of human-and computer-interpretable terms and relations that represent entities and relationships. Within informatics fields, ontologies play an important role in knowledge and data standardization, representation, integration, sharing and analysis. They have also become a foundation of artificial intelligence (AI) research. In what follows, we outline the Coronavirus Infectious Disease Ontology (CIDO), which covers multiple areas in the domain of coronavirus diseases, including etiology, transmission, epidemiology, pathogenesis, diagnosis, prevention, and treatment. We emphasize CIDO development relevant to COVID-19. Human coronaviruses have given rise to a series of major crises in global public health. Severe acute respiratory syndrome (SARS) emerged in China in November 2002, lasted for eight months and resulted in 8,098 confirmed human cases in 29 countries with 774 deaths (case-fatality rate: 9.6%) 1. Approximately ten years later in June 2012, the Middle East Respiratory Syndrome (MERS), another highly pathogenic coronavirus disease, was identified in Saudi Arabia. The MERS outbreak has caused 2,260 cases in 27 countries and 803 deaths (35.5%) 2. More recently, the World Health Organization (WHO) declared the Coronavirus Disease 2019 (COVID-19) outbreak as a pandemic on March 11, 2020, when there were 118,326 confirmed cases and 4,292 deaths. As of May 13, there have been over 4.4 million confirmed cases and over 295,000 deaths globally. Unfortunately, we still do not have available effective drugs and vaccines against these highly pathological coronaviruses. Extensive studies have been conducted on coronaviruses, the results of many of which exist in publicly available data repositories such as GEO 3. Publications concerning COVID-19 have exploded in recent months, and new clinical trials have been and are being conducted to develop drugs and vaccines against COVID-19, 1,430 of which have been registered in ClinicalTrials.
Background: Stem cells and stem cell lines are widely used in biomedical research. The Cell Ontol... more Background: Stem cells and stem cell lines are widely used in biomedical research. The Cell Ontology (CL) and Cell Line Ontology (CLO) are two community-based OBO Foundry ontologies in the domains of in vivo cells and in vitro cell line cells, respectively. Results: To support standardized stem cell investigations, we have developed an Ontology for Stem Cell Investigations (OSCI). OSCI imports stem cell and cell line terms from CL and CLO, and investigation-related terms from existing ontologies. A novel focus of OSCI is its application in representing metadata types associated with various stem cell investigations. We also applied OSCI to systematically categorize experimental variables in an induced pluripotent stem cell line cell study related to bipolar disorder. In addition, we used a semiautomated literature mining approach to identify over 200 stem cell gene markers. The relations between these genes and stem cells are modeled and represented in OSCI. Conclusions: OSCI standardizes stem cells found in vivo and in vitro and in various stem cell investigation processes and entities. The presented use cases demonstrate the utility of OSCI in iPSC studies and literature mining related to bipolar disorder.
COVID-19 often manifests with different outcomes in different patients, highlighting the complexi... more COVID-19 often manifests with different outcomes in different patients, highlighting the complexity of the host-pathogen interactions involved in manifestations of the disease at the molecular and cellular levels. In this paper, we propose a set of postulates and a framework for systematically understanding complex molecular host-pathogen interaction networks. Specifically, we first propose four host-pathogen interaction (HPI) postulates as the basis for understanding molecular and cellular host-pathogen interactions and their relations to disease outcomes. These four postulates cover the evolutionary dispositions involved in HPIs, the dynamic nature of HPI outcomes, roles that HPI components may occupy leading to such outcomes, and HPI checkpoints that are critical for specific disease outcomes. Based on these postulates, an HPI Postulate and Ontology (HPIPO) framework is proposed to apply interoperable ontologies to systematically model and represent various granular details and k...
Background The current COVID-19 pandemic and the previous SARS/MERS outbreaks of 2003 and 2012 ha... more Background The current COVID-19 pandemic and the previous SARS/MERS outbreaks of 2003 and 2012 have resulted in a series of major global public health crises. We argue that in the interest of developing effective and safe vaccines and drugs and to better understand coronaviruses and associated disease mechenisms it is necessary to integrate the large and exponentially growing body of heterogeneous coronavirus data. Ontologies play an important role in standard-based knowledge and data representation, integration, sharing, and analysis. Accordingly, we initiated the development of the community-based Coronavirus Infectious Disease Ontology (CIDO) in early 2020. Results As an Open Biomedical Ontology (OBO) library ontology, CIDO is open source and interoperable with other existing OBO ontologies. CIDO is aligned with the Basic Formal Ontology and Viral Infectious Disease Ontology. CIDO has imported terms from over 30 OBO ontologies. For example, CIDO imports all SARS-CoV-2 protein ter...
Current COVID-19 pandemic and previous SARS/MERS outbreaks have caused a series of major crises t... more Current COVID-19 pandemic and previous SARS/MERS outbreaks have caused a series of major crises to global public health We must integrate the large and exponentially growing amount of heterogeneous coronavirus data to better understand coronaviruses and associated disease mechanisms, in the interest of developing effective and safe vaccines and drugs Ontologies have emerged to play an important role in standard knowledge and data representation, integration, sharing, and analysis We have initiated the development of the community-based Coronavirus Infectious Disease Ontology (CIDO) As an Open Biomedical Ontology (OBO) library ontology, CIDO is an open source and interoperable with other existing OBO ontologies In this article, the general architecture and the design patterns of the CIDO are introduced, CIDO representation of coronaviruses, phenotypes, anti-coronavirus drugs and medical devices (e g ventilators) are illustrated, and an application of CIDO implemented to identify repu...
Vaccines stimulate various immune factors critical to protective immune responses. However, a com... more Vaccines stimulate various immune factors critical to protective immune responses. However, a comprehensive picture of vaccine-induced immune factors and pathways have not been systematically collected and analyzed. To address this issue, we developed VaximmutorDB, a web-based database system of vaccine immune factors (abbreviated as “vaximmutors”) manually curated from peer-reviewed articles. VaximmutorDB currently stores 1,740 vaccine immune factors from 13 host species (e.g., human, mouse, and pig). These vaximmutors were induced by 154 vaccines for 46 pathogens. Top 10 vaximmutors include three antibodies (IgG, IgG2a and IgG1), Th1 immune factors (IFN-γ and IL-2), Th2 immune factors (IL-4 and IL-6), TNF-α, CASP-1, and TLR8. Many enriched host processes (e.g., stimulatory C-type lectin receptor signaling pathway, SRP-dependent cotranslational protein targeting to membrane) and cellular components (e.g., extracellular exosome, nucleoplasm) by all the vaximmutors were identified. U...
A critical issue in the usage of cancer drugs is its association with various adverse events (AEs... more A critical issue in the usage of cancer drugs is its association with various adverse events (AEs) in some, but not all, patients. The National Cancer Institute (NCI) Common Terminology Criteria for Adverse Events (CTCAE) is a controlled terminology for AE classification and analysis in cancer clinical trials. The Ontology of Adverse Events (OAE) is a community-based ontology in the domain of AEs. In this study, OAE was first updated by including AE severity grading and OAE-CTCAE mapping. An OAE subset containing CTCAE-related terms and their associated OAE terms was generated to facilitate term usage. A use case study based on a published cancer drug clinical trial demonstrates that OAE provides better hierarchical representation, includes semantic relations, and supports automated reasoning. Demonstrated with a single patient analysis, the OAE framework supports precision informatics for representing AEs and related genetic and clinical conditions in individual patients treated wi...
Background Use of medication can cause adverse drug reactions (ADRs), unwanted or unexpected even... more Background Use of medication can cause adverse drug reactions (ADRs), unwanted or unexpected events, which are a major safety concern. Drug labels, or prescribing information or package inserts, describe ADRs. Therefore, systematically identifying ADR information from drug labels is critical in multiple aspects; however, this task is challenging due to the nature of the natural language of drug labels. Results In this paper, we present a machine learning- and rule-based system for the identification of ADR entity mentions in the text of drug labels and their normalization through the Medical Dictionary for Regulatory Activities (MedDRA) dictionary. The machine learning approach is based on a recently proposed deep learning architecture, which integrates bi-directional Long Short-Term Memory (Bi-LSTM), Convolutional Neural Network (CNN), and Conditional Random Fields (CRF) for entity recognition. The rule-based approach, used for normalizing the identified ADR mentions to MedDRA term...
This Editorial first introduces the background of the vaccine and drug relations and how biomedic... more This Editorial first introduces the background of the vaccine and drug relations and how biomedical terminologies and ontologies have been used to support their studies. The history of the seven workshops, initially named VDOSME, and then named VDOS, is also summarized and introduced. Then the 7th International Workshop on Vaccine and Drug Ontology Studies (VDOS 2018), held on August 10th, 2018, Corvallis, Oregon, USA, is introduced in detail. These VDOS workshops have greatly supported the development, applications, and discussion of vaccine- and drug-related terminology and drug studies.
Adverse drug reactions (ADRs), also called as drug adverse events (AEs), are reported in the FDA ... more Adverse drug reactions (ADRs), also called as drug adverse events (AEs), are reported in the FDA drug labels; however, it is a big challenge to properly retrieve and analyze the ADRs and their potential relationships from textual data. Previously, we identified and ontologically modeled over 240 drugs that can induce peripheral neuropathy through mining public drug-related databases and drug labels. However, the ADR mechanisms of these drugs are still unclear. In this study, we aimed to develop an ontology-based literature mining system to identify ADRs from drug labels and to elucidate potential mechanisms of the neuropathy-inducing drugs (NIDs). We developed and applied an ontology-based SciMiner literature mining strategy to mine ADRs from the drug labels provided in the Text Analysis Conference (TAC) 2017, which included drug labels for 53 neuropathy-inducing drugs (NIDs). We identified an average of 243 ADRs per NID and constructed an ADR-ADR network, which consists of 29 ADR n...
Background: The Interaction Network Ontology (INO) logically represents biological interactions, ... more Background: The Interaction Network Ontology (INO) logically represents biological interactions, pathways, and networks. INO has been demonstrated to be valuable in providing a set of structured ontological terms and associated keywords to support literature mining of gene-gene interactions from biomedical literature. However, previous work using INO focused on single keyword matching, while many interactions are represented with two or more interaction keywords used in combination. Methods: This paper reports our extension of INO to include combinatory patterns of two or more literature mining keywords co-existing in one sentence to represent specific INO interaction classes. Such keyword combinations and related INO interaction type information could be automatically obtained via SPARQL queries, formatted in Excel format, and used in an INO-supported SciMiner, an in-house literature mining program. We studied the gene interaction sentences from the commonly used benchmark Learning Logic in Language (LLL) dataset and one internally generated vaccine-related dataset to identify and analyze interaction types containing multiple keywords. Patterns obtained from the dependency parse trees of the sentences were used to identify the interaction keywords that are related to each other and collectively represent an interaction type. Results: The INO ontology currently has 575 terms including 202 terms under the interaction branch. The relations between the INO interaction types and associated keywords are represented using the INO annotation relations: 'has literature mining keywords' and 'has keyword dependency pattern'. The keyword dependency patterns were generated via running the Stanford Parser to obtain dependency relation types. Out of the 107 interactions in the LLL dataset represented with two-keyword interaction types, 86 were identified by using the direct dependency relations. The LLL dataset contained 34 gene regulation interaction types, each of which associated with multiple keywords. A hierarchical display of these 34 interaction types and their ancestor terms in INO resulted in the identification of specific gene-gene interaction patterns from the LLL dataset. The phenomenon of having multi-keyword interaction types was also frequently observed in the vaccine dataset.
Background: Pathogenic Escherichia coli infections cause various diseases in humans and many anim... more Background: Pathogenic Escherichia coli infections cause various diseases in humans and many animal species. However, with extensive E. coli vaccine research, we are still unable to fully protect ourselves against E. coli infections. To more rational development of effective and safe E. coli vaccine, it is important to better understand E. coli vaccine-associated gene interaction networks. Methods: In this study, we first extended the Vaccine Ontology (VO) to semantically represent various E. coli vaccines and genes used in the vaccine development. We also normalized E. coli gene names compiled from the annotations of various E. coli strains using a pan-genome-based annotation strategy. The Interaction Network Ontology (INO) includes a hierarchy of various interaction-related keywords useful for literature mining. Using VO, INO, and normalized E. coli gene names, we applied an ontology-based SciMiner literature mining strategy to mine all PubMed abstracts and retrieve E. coli vaccine-associated E. coli gene interactions. Four centrality metrics (i.e., degree, eigenvector, closeness, and betweenness) were calculated for identifying highly ranked genes and interaction types. Results: Using vaccine-related PubMed abstracts, our study identified 11,350 sentences that contain 88 unique INO interactions types and 1,781 unique E. coli genes. Each sentence contained at least one interaction type and two unique E. coli genes. An E. coli gene interaction network of genes and INO interaction types was created. From this big network, a sub-network consisting of 5 E. coli vaccine genes, including carA, carB, fimH, fepA, and vat, and 62 other E. coli genes, and 25 INO interaction types was identified. While many interaction types represent direct interactions between two indicated genes, our study has also shown that many of these retrieved interaction types are indirect in that the two genes participated in the specified interaction process in a required but indirect process. Our centrality analysis of these gene interaction networks identified top ranked E. coli genes and 6 INO interaction types (e.g., regulation and gene expression). Conclusions: Vaccine-related E. coli gene-gene interaction network was constructed using ontology-based literature mining strategy, which identified important E. coli vaccine genes and their interactions with other genes through specific interaction types.
Statistics play a critical role in biological and clinical research. However, most reports of sci... more Statistics play a critical role in biological and clinical research. However, most reports of scientific results in the published literature make it difficult for the reader to reproduce the statistical analyses performed in achieving those results because they provide inadequate documentation of the statistical tests and algorithms applied. The Ontology of Biological and Clinical Statistics (OBCS) is put forward here as a step towards solving this problem. The terms in OBCS including 'data collection', 'data transformation in statistics', 'data visualization', 'statistical data analysis', and 'drawing a conclusion based on data', cover the major types of statistical processes used in basic biological research and clinical outcome studies. OBCS is aligned with the Basic Formal Ontology (BFO) and extends the Ontology of Biomedical Investigations (OBI), an OBO (Open Biological and Biomedical Ontologies) Foundry ontology supported by over 20 rese...
Animal models are indispensable for vaccine research and development. However, choosing which spe... more Animal models are indispensable for vaccine research and development. However, choosing which species to use and designing a vaccine study that is optimized for that species is often challenging. Vaxar (http://www.violinet.org/vaxar/) is a web-based database and analysis system that stores manually curated data regarding vaccine-induced responses in animals. To date, Vaxar encompasses models from 35 animal species including rodents, rabbits, ferrets, primates, and birds. These 35 species have been used to study more than 1300 experimentally tested vaccines for 164 pathogens and diseases significant to humans and domestic animals. The responses to vaccines by animals in more than 1500 experimental studies are recorded in Vaxar; these data can be used for systematic meta-analysis of various animal responses to a particular vaccine. For example, several variables, including animal strain, animal age, and the dose or route of either vaccination or challenge, might affect host response o...
Background: Neuropathy often occurs following drug treatment such as chemotherapy. Severe instanc... more Background: Neuropathy often occurs following drug treatment such as chemotherapy. Severe instances of neuropathy can result in cessation of life-saving chemotherapy treatment. Results: To support data representation and analysis of drug-associated neuropathy adverse events (AEs), we developed the Ontology of Drug Neuropathy Adverse Events (ODNAE). ODNAE extends the Ontology of Adverse Events (OAE). Our combinatorial approach identified 215 US FDA-licensed small molecule drugs that induce signs and symptoms of various types of neuropathy. ODNAE imports related drugs from the Drug Ontology (DrON) with their chemical ingredients defined in ChEBI. ODNAE includes 139 drug mechanisms of action from NDF-RT and 186 biological processes represented in the Gene Ontology (GO). In total ODNAE contains 1579 terms. Our analysis of the ODNAE knowledge base shows neuropathy-inducing drugs classified under specific molecular entity groups, especially carbon, pnictogen, chalcogen, and heterocyclic compounds. The carbon drug group includes 127 organic chemical drugs. Thirty nine receptor agonist and antagonist terms were identified, including 4 pairs (31 drugs) of agonists and antagonists that share targets (e.g., adrenergic receptor, dopamine, serotonin, and sex hormone receptor). Many drugs regulate neurological system processes (e.g., negative regulation of dopamine or serotonin uptake). SPARQL scripts were used to query the ODNAE ontology knowledge base. Conclusions: ODNAE is an effective platform for building a drug-induced neuropathy knowledge base and for analyzing the underlying mechanisms of drug-induced neuropathy. The ODNAE-based methods used in this study can also be extended to the representation and study of other categories of adverse events.
A translational bioinformatics challenge exists in connecting population and individual clinical ... more A translational bioinformatics challenge exists in connecting population and individual clinical phenotypes in various formats to biological mechanisms. The Medical Dictionary for Regulatory Activities (MedDRA(®)) is the default dictionary for adverse event (AE) reporting in the US Food and Drug Administration Adverse Event Reporting System (FAERS). The ontology of adverse events (OAE) represents AEs as pathological processes occurring after drug exposures. The aim of this work was to establish a semantic framework to link biological mechanisms to phenotypes of AEs by combining OAE with MedDRA(®) in FAERS data analysis. We investigated the AEs associated with tyrosine kinase inhibitors (TKIs) and monoclonal antibodies (mAbs) targeting tyrosine kinases. The five selected TKIs/mAbs (i.e., dasatinib, imatinib, lapatinib, cetuximab, and trastuzumab) are known to induce impaired ventricular function (non-QT) cardiotoxicity. Statistical analysis of FAERS data identified 1053 distinct MedD...
The "Vaccine and Drug Ontology Studies" (VDOS) international workshop series focuses on vaccine-a... more The "Vaccine and Drug Ontology Studies" (VDOS) international workshop series focuses on vaccine-and drug-related ontology modeling and applications. Drugs and vaccines have been critical to prevent and treat human and animal diseases. Work in both (drugs and vaccines) areas is closely related-from preclinical research and development to manufacturing, clinical trials, government approval and regulation, and post-licensure usage surveillance and monitoring. Over the last decade, tremendous efforts have been made in the biomedical ontology community to ontologically represent various areas associated with vaccines and drugsextending existing clinical terminology systems such as SNOMED, RxNorm, NDF-RT, and MedDRA, developing new models such as the Vaccine Ontology (VO) and Ontology of Adverse Events (OAE), vernacular medical terminologies such as the Consumer Health Vocabulary (CHV). The VDOS workshop series provides a platform for discussing innovative solutions as well as the challenges in the development and applications of biomedical ontologies for representing and analyzing drugs and vaccines, their administration, host immune responses, adverse events, and other related topics. The five full-length papers included in this 2014 thematic issue focus on two main themes: (i) General vaccine/drug-related ontology development and exploration, and (ii) Interaction and network-related ontology studies.
Brucella is an intracellular bacterium that causes chronic brucellosis in humans and various mamm... more Brucella is an intracellular bacterium that causes chronic brucellosis in humans and various mammals. The identification of host-Brucella interaction is crucial to understand host immunity against Brucella infection and Brucella pathogenesis against host immune responses. Most of the information about the inter-species interactions between host and Brucella genes is only available in the text of the scientific publications. Many text-mining systems for extracting gene and protein interactions have been proposed. However, only a few of them have been designed by considering the peculiarities of host-pathogen interactions. In this paper, we used a text mining approach for extracting host-Brucella gene-gene interactions from the abstracts of articles in PubMed. The gene-gene interactions here represent the interactions between genes and/or gene products (e.g., proteins). The SciMiner tool, originally designed for detecting mammalian gene/protein names in text, was extended to identify host and Brucella gene/protein names in the abstracts. Next, sentence-level and abstractlevel co-occurrence based approaches, as well as sentence-level machine learning based methods, originally designed for extracting intra-species gene interactions, were utilized to extract the interactions among the identified host and Brucella genes. The extracted interactions were manually evaluated. A total of 46 host-Brucella gene interactions were identified and represented as an interaction network. Twenty four of these interactions were identified from sentence-level processing. Twenty two additional interactions were identified when abstract-level processing was performed. The Interaction Network Ontology (INO) was used to represent the identified interaction types at a hierarchical ontology structure. Ontological modeling of specific genegene interactions demonstrates that host-pathogen gene-gene interactions occur at experimental conditions which can be ontologically represented. Our results show that the introduced literature mining and ontology-based modeling approach are effective in retrieving and analyzing host-pathogen gene-gene interaction networks.
the Coronavirus Infectious Disease Ontology (CIDO) is a community-based ontology that supports co... more the Coronavirus Infectious Disease Ontology (CIDO) is a community-based ontology that supports coronavirus disease knowledge and data standardization, integration, sharing, and analysis. O ntologies, as the term is used in informatics, are structured vocabularies comprised of human-and computer-interpretable terms and relations that represent entities and relationships. Within informatics fields, ontologies play an important role in knowledge and data standardization, representation, integration, sharing and analysis. They have also become a foundation of artificial intelligence (AI) research. In what follows, we outline the Coronavirus Infectious Disease Ontology (CIDO), which covers multiple areas in the domain of coronavirus diseases, including etiology, transmission, epidemiology, pathogenesis, diagnosis, prevention, and treatment. We emphasize CIDO development relevant to COVID-19. Human coronaviruses have given rise to a series of major crises in global public health. Severe acute respiratory syndrome (SARS) emerged in China in November 2002, lasted for eight months and resulted in 8,098 confirmed human cases in 29 countries with 774 deaths (case-fatality rate: 9.6%) 1. Approximately ten years later in June 2012, the Middle East Respiratory Syndrome (MERS), another highly pathogenic coronavirus disease, was identified in Saudi Arabia. The MERS outbreak has caused 2,260 cases in 27 countries and 803 deaths (35.5%) 2. More recently, the World Health Organization (WHO) declared the Coronavirus Disease 2019 (COVID-19) outbreak as a pandemic on March 11, 2020, when there were 118,326 confirmed cases and 4,292 deaths. As of May 13, there have been over 4.4 million confirmed cases and over 295,000 deaths globally. Unfortunately, we still do not have available effective drugs and vaccines against these highly pathological coronaviruses. Extensive studies have been conducted on coronaviruses, the results of many of which exist in publicly available data repositories such as GEO 3. Publications concerning COVID-19 have exploded in recent months, and new clinical trials have been and are being conducted to develop drugs and vaccines against COVID-19, 1,430 of which have been registered in ClinicalTrials.
Background: Stem cells and stem cell lines are widely used in biomedical research. The Cell Ontol... more Background: Stem cells and stem cell lines are widely used in biomedical research. The Cell Ontology (CL) and Cell Line Ontology (CLO) are two community-based OBO Foundry ontologies in the domains of in vivo cells and in vitro cell line cells, respectively. Results: To support standardized stem cell investigations, we have developed an Ontology for Stem Cell Investigations (OSCI). OSCI imports stem cell and cell line terms from CL and CLO, and investigation-related terms from existing ontologies. A novel focus of OSCI is its application in representing metadata types associated with various stem cell investigations. We also applied OSCI to systematically categorize experimental variables in an induced pluripotent stem cell line cell study related to bipolar disorder. In addition, we used a semiautomated literature mining approach to identify over 200 stem cell gene markers. The relations between these genes and stem cells are modeled and represented in OSCI. Conclusions: OSCI standardizes stem cells found in vivo and in vitro and in various stem cell investigation processes and entities. The presented use cases demonstrate the utility of OSCI in iPSC studies and literature mining related to bipolar disorder.
COVID-19 often manifests with different outcomes in different patients, highlighting the complexi... more COVID-19 often manifests with different outcomes in different patients, highlighting the complexity of the host-pathogen interactions involved in manifestations of the disease at the molecular and cellular levels. In this paper, we propose a set of postulates and a framework for systematically understanding complex molecular host-pathogen interaction networks. Specifically, we first propose four host-pathogen interaction (HPI) postulates as the basis for understanding molecular and cellular host-pathogen interactions and their relations to disease outcomes. These four postulates cover the evolutionary dispositions involved in HPIs, the dynamic nature of HPI outcomes, roles that HPI components may occupy leading to such outcomes, and HPI checkpoints that are critical for specific disease outcomes. Based on these postulates, an HPI Postulate and Ontology (HPIPO) framework is proposed to apply interoperable ontologies to systematically model and represent various granular details and k...
Background The current COVID-19 pandemic and the previous SARS/MERS outbreaks of 2003 and 2012 ha... more Background The current COVID-19 pandemic and the previous SARS/MERS outbreaks of 2003 and 2012 have resulted in a series of major global public health crises. We argue that in the interest of developing effective and safe vaccines and drugs and to better understand coronaviruses and associated disease mechenisms it is necessary to integrate the large and exponentially growing body of heterogeneous coronavirus data. Ontologies play an important role in standard-based knowledge and data representation, integration, sharing, and analysis. Accordingly, we initiated the development of the community-based Coronavirus Infectious Disease Ontology (CIDO) in early 2020. Results As an Open Biomedical Ontology (OBO) library ontology, CIDO is open source and interoperable with other existing OBO ontologies. CIDO is aligned with the Basic Formal Ontology and Viral Infectious Disease Ontology. CIDO has imported terms from over 30 OBO ontologies. For example, CIDO imports all SARS-CoV-2 protein ter...
Current COVID-19 pandemic and previous SARS/MERS outbreaks have caused a series of major crises t... more Current COVID-19 pandemic and previous SARS/MERS outbreaks have caused a series of major crises to global public health We must integrate the large and exponentially growing amount of heterogeneous coronavirus data to better understand coronaviruses and associated disease mechanisms, in the interest of developing effective and safe vaccines and drugs Ontologies have emerged to play an important role in standard knowledge and data representation, integration, sharing, and analysis We have initiated the development of the community-based Coronavirus Infectious Disease Ontology (CIDO) As an Open Biomedical Ontology (OBO) library ontology, CIDO is an open source and interoperable with other existing OBO ontologies In this article, the general architecture and the design patterns of the CIDO are introduced, CIDO representation of coronaviruses, phenotypes, anti-coronavirus drugs and medical devices (e g ventilators) are illustrated, and an application of CIDO implemented to identify repu...
Vaccines stimulate various immune factors critical to protective immune responses. However, a com... more Vaccines stimulate various immune factors critical to protective immune responses. However, a comprehensive picture of vaccine-induced immune factors and pathways have not been systematically collected and analyzed. To address this issue, we developed VaximmutorDB, a web-based database system of vaccine immune factors (abbreviated as “vaximmutors”) manually curated from peer-reviewed articles. VaximmutorDB currently stores 1,740 vaccine immune factors from 13 host species (e.g., human, mouse, and pig). These vaximmutors were induced by 154 vaccines for 46 pathogens. Top 10 vaximmutors include three antibodies (IgG, IgG2a and IgG1), Th1 immune factors (IFN-γ and IL-2), Th2 immune factors (IL-4 and IL-6), TNF-α, CASP-1, and TLR8. Many enriched host processes (e.g., stimulatory C-type lectin receptor signaling pathway, SRP-dependent cotranslational protein targeting to membrane) and cellular components (e.g., extracellular exosome, nucleoplasm) by all the vaximmutors were identified. U...
A critical issue in the usage of cancer drugs is its association with various adverse events (AEs... more A critical issue in the usage of cancer drugs is its association with various adverse events (AEs) in some, but not all, patients. The National Cancer Institute (NCI) Common Terminology Criteria for Adverse Events (CTCAE) is a controlled terminology for AE classification and analysis in cancer clinical trials. The Ontology of Adverse Events (OAE) is a community-based ontology in the domain of AEs. In this study, OAE was first updated by including AE severity grading and OAE-CTCAE mapping. An OAE subset containing CTCAE-related terms and their associated OAE terms was generated to facilitate term usage. A use case study based on a published cancer drug clinical trial demonstrates that OAE provides better hierarchical representation, includes semantic relations, and supports automated reasoning. Demonstrated with a single patient analysis, the OAE framework supports precision informatics for representing AEs and related genetic and clinical conditions in individual patients treated wi...
Background Use of medication can cause adverse drug reactions (ADRs), unwanted or unexpected even... more Background Use of medication can cause adverse drug reactions (ADRs), unwanted or unexpected events, which are a major safety concern. Drug labels, or prescribing information or package inserts, describe ADRs. Therefore, systematically identifying ADR information from drug labels is critical in multiple aspects; however, this task is challenging due to the nature of the natural language of drug labels. Results In this paper, we present a machine learning- and rule-based system for the identification of ADR entity mentions in the text of drug labels and their normalization through the Medical Dictionary for Regulatory Activities (MedDRA) dictionary. The machine learning approach is based on a recently proposed deep learning architecture, which integrates bi-directional Long Short-Term Memory (Bi-LSTM), Convolutional Neural Network (CNN), and Conditional Random Fields (CRF) for entity recognition. The rule-based approach, used for normalizing the identified ADR mentions to MedDRA term...
This Editorial first introduces the background of the vaccine and drug relations and how biomedic... more This Editorial first introduces the background of the vaccine and drug relations and how biomedical terminologies and ontologies have been used to support their studies. The history of the seven workshops, initially named VDOSME, and then named VDOS, is also summarized and introduced. Then the 7th International Workshop on Vaccine and Drug Ontology Studies (VDOS 2018), held on August 10th, 2018, Corvallis, Oregon, USA, is introduced in detail. These VDOS workshops have greatly supported the development, applications, and discussion of vaccine- and drug-related terminology and drug studies.
Adverse drug reactions (ADRs), also called as drug adverse events (AEs), are reported in the FDA ... more Adverse drug reactions (ADRs), also called as drug adverse events (AEs), are reported in the FDA drug labels; however, it is a big challenge to properly retrieve and analyze the ADRs and their potential relationships from textual data. Previously, we identified and ontologically modeled over 240 drugs that can induce peripheral neuropathy through mining public drug-related databases and drug labels. However, the ADR mechanisms of these drugs are still unclear. In this study, we aimed to develop an ontology-based literature mining system to identify ADRs from drug labels and to elucidate potential mechanisms of the neuropathy-inducing drugs (NIDs). We developed and applied an ontology-based SciMiner literature mining strategy to mine ADRs from the drug labels provided in the Text Analysis Conference (TAC) 2017, which included drug labels for 53 neuropathy-inducing drugs (NIDs). We identified an average of 243 ADRs per NID and constructed an ADR-ADR network, which consists of 29 ADR n...
Background: The Interaction Network Ontology (INO) logically represents biological interactions, ... more Background: The Interaction Network Ontology (INO) logically represents biological interactions, pathways, and networks. INO has been demonstrated to be valuable in providing a set of structured ontological terms and associated keywords to support literature mining of gene-gene interactions from biomedical literature. However, previous work using INO focused on single keyword matching, while many interactions are represented with two or more interaction keywords used in combination. Methods: This paper reports our extension of INO to include combinatory patterns of two or more literature mining keywords co-existing in one sentence to represent specific INO interaction classes. Such keyword combinations and related INO interaction type information could be automatically obtained via SPARQL queries, formatted in Excel format, and used in an INO-supported SciMiner, an in-house literature mining program. We studied the gene interaction sentences from the commonly used benchmark Learning Logic in Language (LLL) dataset and one internally generated vaccine-related dataset to identify and analyze interaction types containing multiple keywords. Patterns obtained from the dependency parse trees of the sentences were used to identify the interaction keywords that are related to each other and collectively represent an interaction type. Results: The INO ontology currently has 575 terms including 202 terms under the interaction branch. The relations between the INO interaction types and associated keywords are represented using the INO annotation relations: 'has literature mining keywords' and 'has keyword dependency pattern'. The keyword dependency patterns were generated via running the Stanford Parser to obtain dependency relation types. Out of the 107 interactions in the LLL dataset represented with two-keyword interaction types, 86 were identified by using the direct dependency relations. The LLL dataset contained 34 gene regulation interaction types, each of which associated with multiple keywords. A hierarchical display of these 34 interaction types and their ancestor terms in INO resulted in the identification of specific gene-gene interaction patterns from the LLL dataset. The phenomenon of having multi-keyword interaction types was also frequently observed in the vaccine dataset.
Background: Pathogenic Escherichia coli infections cause various diseases in humans and many anim... more Background: Pathogenic Escherichia coli infections cause various diseases in humans and many animal species. However, with extensive E. coli vaccine research, we are still unable to fully protect ourselves against E. coli infections. To more rational development of effective and safe E. coli vaccine, it is important to better understand E. coli vaccine-associated gene interaction networks. Methods: In this study, we first extended the Vaccine Ontology (VO) to semantically represent various E. coli vaccines and genes used in the vaccine development. We also normalized E. coli gene names compiled from the annotations of various E. coli strains using a pan-genome-based annotation strategy. The Interaction Network Ontology (INO) includes a hierarchy of various interaction-related keywords useful for literature mining. Using VO, INO, and normalized E. coli gene names, we applied an ontology-based SciMiner literature mining strategy to mine all PubMed abstracts and retrieve E. coli vaccine-associated E. coli gene interactions. Four centrality metrics (i.e., degree, eigenvector, closeness, and betweenness) were calculated for identifying highly ranked genes and interaction types. Results: Using vaccine-related PubMed abstracts, our study identified 11,350 sentences that contain 88 unique INO interactions types and 1,781 unique E. coli genes. Each sentence contained at least one interaction type and two unique E. coli genes. An E. coli gene interaction network of genes and INO interaction types was created. From this big network, a sub-network consisting of 5 E. coli vaccine genes, including carA, carB, fimH, fepA, and vat, and 62 other E. coli genes, and 25 INO interaction types was identified. While many interaction types represent direct interactions between two indicated genes, our study has also shown that many of these retrieved interaction types are indirect in that the two genes participated in the specified interaction process in a required but indirect process. Our centrality analysis of these gene interaction networks identified top ranked E. coli genes and 6 INO interaction types (e.g., regulation and gene expression). Conclusions: Vaccine-related E. coli gene-gene interaction network was constructed using ontology-based literature mining strategy, which identified important E. coli vaccine genes and their interactions with other genes through specific interaction types.
Statistics play a critical role in biological and clinical research. However, most reports of sci... more Statistics play a critical role in biological and clinical research. However, most reports of scientific results in the published literature make it difficult for the reader to reproduce the statistical analyses performed in achieving those results because they provide inadequate documentation of the statistical tests and algorithms applied. The Ontology of Biological and Clinical Statistics (OBCS) is put forward here as a step towards solving this problem. The terms in OBCS including 'data collection', 'data transformation in statistics', 'data visualization', 'statistical data analysis', and 'drawing a conclusion based on data', cover the major types of statistical processes used in basic biological research and clinical outcome studies. OBCS is aligned with the Basic Formal Ontology (BFO) and extends the Ontology of Biomedical Investigations (OBI), an OBO (Open Biological and Biomedical Ontologies) Foundry ontology supported by over 20 rese...
Animal models are indispensable for vaccine research and development. However, choosing which spe... more Animal models are indispensable for vaccine research and development. However, choosing which species to use and designing a vaccine study that is optimized for that species is often challenging. Vaxar (http://www.violinet.org/vaxar/) is a web-based database and analysis system that stores manually curated data regarding vaccine-induced responses in animals. To date, Vaxar encompasses models from 35 animal species including rodents, rabbits, ferrets, primates, and birds. These 35 species have been used to study more than 1300 experimentally tested vaccines for 164 pathogens and diseases significant to humans and domestic animals. The responses to vaccines by animals in more than 1500 experimental studies are recorded in Vaxar; these data can be used for systematic meta-analysis of various animal responses to a particular vaccine. For example, several variables, including animal strain, animal age, and the dose or route of either vaccination or challenge, might affect host response o...
Background: Neuropathy often occurs following drug treatment such as chemotherapy. Severe instanc... more Background: Neuropathy often occurs following drug treatment such as chemotherapy. Severe instances of neuropathy can result in cessation of life-saving chemotherapy treatment. Results: To support data representation and analysis of drug-associated neuropathy adverse events (AEs), we developed the Ontology of Drug Neuropathy Adverse Events (ODNAE). ODNAE extends the Ontology of Adverse Events (OAE). Our combinatorial approach identified 215 US FDA-licensed small molecule drugs that induce signs and symptoms of various types of neuropathy. ODNAE imports related drugs from the Drug Ontology (DrON) with their chemical ingredients defined in ChEBI. ODNAE includes 139 drug mechanisms of action from NDF-RT and 186 biological processes represented in the Gene Ontology (GO). In total ODNAE contains 1579 terms. Our analysis of the ODNAE knowledge base shows neuropathy-inducing drugs classified under specific molecular entity groups, especially carbon, pnictogen, chalcogen, and heterocyclic compounds. The carbon drug group includes 127 organic chemical drugs. Thirty nine receptor agonist and antagonist terms were identified, including 4 pairs (31 drugs) of agonists and antagonists that share targets (e.g., adrenergic receptor, dopamine, serotonin, and sex hormone receptor). Many drugs regulate neurological system processes (e.g., negative regulation of dopamine or serotonin uptake). SPARQL scripts were used to query the ODNAE ontology knowledge base. Conclusions: ODNAE is an effective platform for building a drug-induced neuropathy knowledge base and for analyzing the underlying mechanisms of drug-induced neuropathy. The ODNAE-based methods used in this study can also be extended to the representation and study of other categories of adverse events.
A translational bioinformatics challenge exists in connecting population and individual clinical ... more A translational bioinformatics challenge exists in connecting population and individual clinical phenotypes in various formats to biological mechanisms. The Medical Dictionary for Regulatory Activities (MedDRA(®)) is the default dictionary for adverse event (AE) reporting in the US Food and Drug Administration Adverse Event Reporting System (FAERS). The ontology of adverse events (OAE) represents AEs as pathological processes occurring after drug exposures. The aim of this work was to establish a semantic framework to link biological mechanisms to phenotypes of AEs by combining OAE with MedDRA(®) in FAERS data analysis. We investigated the AEs associated with tyrosine kinase inhibitors (TKIs) and monoclonal antibodies (mAbs) targeting tyrosine kinases. The five selected TKIs/mAbs (i.e., dasatinib, imatinib, lapatinib, cetuximab, and trastuzumab) are known to induce impaired ventricular function (non-QT) cardiotoxicity. Statistical analysis of FAERS data identified 1053 distinct MedD...
The "Vaccine and Drug Ontology Studies" (VDOS) international workshop series focuses on vaccine-a... more The "Vaccine and Drug Ontology Studies" (VDOS) international workshop series focuses on vaccine-and drug-related ontology modeling and applications. Drugs and vaccines have been critical to prevent and treat human and animal diseases. Work in both (drugs and vaccines) areas is closely related-from preclinical research and development to manufacturing, clinical trials, government approval and regulation, and post-licensure usage surveillance and monitoring. Over the last decade, tremendous efforts have been made in the biomedical ontology community to ontologically represent various areas associated with vaccines and drugsextending existing clinical terminology systems such as SNOMED, RxNorm, NDF-RT, and MedDRA, developing new models such as the Vaccine Ontology (VO) and Ontology of Adverse Events (OAE), vernacular medical terminologies such as the Consumer Health Vocabulary (CHV). The VDOS workshop series provides a platform for discussing innovative solutions as well as the challenges in the development and applications of biomedical ontologies for representing and analyzing drugs and vaccines, their administration, host immune responses, adverse events, and other related topics. The five full-length papers included in this 2014 thematic issue focus on two main themes: (i) General vaccine/drug-related ontology development and exploration, and (ii) Interaction and network-related ontology studies.
Brucella is an intracellular bacterium that causes chronic brucellosis in humans and various mamm... more Brucella is an intracellular bacterium that causes chronic brucellosis in humans and various mammals. The identification of host-Brucella interaction is crucial to understand host immunity against Brucella infection and Brucella pathogenesis against host immune responses. Most of the information about the inter-species interactions between host and Brucella genes is only available in the text of the scientific publications. Many text-mining systems for extracting gene and protein interactions have been proposed. However, only a few of them have been designed by considering the peculiarities of host-pathogen interactions. In this paper, we used a text mining approach for extracting host-Brucella gene-gene interactions from the abstracts of articles in PubMed. The gene-gene interactions here represent the interactions between genes and/or gene products (e.g., proteins). The SciMiner tool, originally designed for detecting mammalian gene/protein names in text, was extended to identify host and Brucella gene/protein names in the abstracts. Next, sentence-level and abstractlevel co-occurrence based approaches, as well as sentence-level machine learning based methods, originally designed for extracting intra-species gene interactions, were utilized to extract the interactions among the identified host and Brucella genes. The extracted interactions were manually evaluated. A total of 46 host-Brucella gene interactions were identified and represented as an interaction network. Twenty four of these interactions were identified from sentence-level processing. Twenty two additional interactions were identified when abstract-level processing was performed. The Interaction Network Ontology (INO) was used to represent the identified interaction types at a hierarchical ontology structure. Ontological modeling of specific genegene interactions demonstrates that host-pathogen gene-gene interactions occur at experimental conditions which can be ontologically represented. Our results show that the introduced literature mining and ontology-based modeling approach are effective in retrieving and analyzing host-pathogen gene-gene interaction networks.
Uploads
Papers by Yongqun He