Papers by Larisa Soldatova
Nature Biotechnology, 2006
Proceedings of the National Academy of Sciences of the United States of America, Nov 29, 2021
Almost all machine learning (ML) is based on representing examples using intrinsic features. When... more Almost all machine learning (ML) is based on representing examples using intrinsic features. When there are multiple related ML problems (tasks), it is possible to transform these features into extrinsic features by first training ML models on other tasks and letting them each make predictions for each example of the new task, yielding a novel representation. We call this transformational ML (TML). TML is very closely related to, and synergistic with, transfer learning, multitask learning, and stacking. TML is applicable to improving any nonlinear ML method. We tested TML using the most important classes of nonlinear ML: random forests, gradient boosting machines, support vector machines, k-nearest neighbors, and neural networks. To ensure the generality and robustness of the evaluation, we utilized thousands of ML problems from three scientific domains: drug design, predicting gene expression, and ML algorithm selection. We found that TML significantly improved the predictive performance of all the ML methods in all the domains (4 to 50% average improvements) and that TML features generally outperformed intrinsic features. Use of TML also enhances scientific understanding through explainable ML. In drug design, we found that TML provided insight into drug target specificity, the relationships between drugs, and the relationships between target proteins. TML leads to an ecosystembased approach to ML, where new tasks, examples, predictions, and so on synergistically interact to improve performance. To contribute to this ecosystem, all our data, code, and our ∼50,000 ML models have been fully annotated with metadata, linked, and openly published using Findability, Accessibility, Interoperability, and Reusability principles (∼100 Gbytes). AI j drug design j transfer learning j stacking j multitask learning M achine learning (ML) develops computational systems
Machine Learning, Oct 20, 2020
Journal of Biomedical Semantics, May 17, 2011
Over the years, the Bio-Ontologies SIG at ISMB has provided a forum for discussion of the latest ... more Over the years, the Bio-Ontologies SIG at ISMB has provided a forum for discussion of the latest and most innovative research in the application of ontologies and more generally the organisation, presentation and dissemination of knowledge in biomedicine and the life sciences. The ten papers selected for this supplement are extended versions of the original papers presented at the 2010 SIG. The papers span a wide range of topics including practical solutions for data and knowledge integration for translational medicine, hypothesis based querying , understanding kidney and urinary pathways, mining the pharmacogenomics literature; theoretical research into the orthogonality of biomedical ontologies, the representation of diseases, the representation of research hypotheses, the combination of ontologies and natural language processing for an annotation framework, the generation of textual definitions, and the discovery of gene interaction networks.
Lecture Notes in Artificial Intelligence, 2018
知的教育システム研究会, Mar 15, 2003
Journal of the Royal Society Interface, Apr 1, 2022
Scientific results should not just be ‘repeatable’ (replicable in the same laboratory under ident... more Scientific results should not just be ‘repeatable’ (replicable in the same laboratory under identical conditions), but also ‘reproducible’ (replicable in other laboratories under similar conditions). Results should also, if possible, be ‘robust’ (replicable under a wide range of conditions). The reproducibility and robustness of only a small fraction of published biomedical results has been tested; furthermore, when reproducibility is tested, it is often not found. This situation is termed ‘the reproducibility crisis', and it is one the most important issues facing biomedicine. This crisis would be solved if it were possible to automate reproducibility testing. Here, we describe the semi-automated testing for reproducibility and robustness of simple statements (propositions) about cancer cell biology automatically extracted from the literature. From 12 260 papers, we automatically extracted statements predicted to describe experimental results regarding a change of gene expression in response to drug treatment in breast cancer, from these we selected 74 statements of high biomedical interest. To test the reproducibility of these statements, two different teams used the laboratory automation system Eve and two breast cancer cell lines (MCF7 and MDA-MB-231). Statistically significant evidence for repeatability was found for 43 statements, and significant evidence for reproducibility/robustness in 22 statements. In two cases, the automation made serendipitous discoveries. The reproduced/robust knowledge provides significant insight into cancer. We conclude that semi-automated reproducibility testing is currently achievable, that it could be scaled up to generate a substantive source of reliable knowledge and that automation has the potential to mitigate the reproducibility crisis.
Springer eBooks, 2004
How smart should we be in order to cope with the complex authoring process of smart courseware? L... more How smart should we be in order to cope with the complex authoring process of smart courseware? Lately this question gains more attention with attempts to simplify the process and efforts to define authoring systems and tools to support it. The goal of this paper is to specify an evolutional perspective on the Intelligent Educational Systems (IES) authoring and in this context to define the authoring framework EASE: powerful in its functionality, generic in its support of instructional strategies and user-friendly in its interaction with the author. The evolutional authoring support is enabled by an authoring task ontology that at a meta-level defines and controls the configuration and tuning of an authoring tool for a specific authoring process. In this way we achieve more control over the evolution of the intelligence in IES and reach a computational formalization of IES engineering.
arXiv (Cornell University), Jul 9, 2023
Recent machine learning and AI advances disrupt scientific practice, technological innovation, pr... more Recent machine learning and AI advances disrupt scientific practice, technological innovation, product development, and society. As a rule, success in classification, pattern recognition, and gaming occurs whenever there are clear performance evaluation criteria and access to extensive training data sets. Yet, AI has contributed less to fundamental science, such as discovering new principled explanatory models and equations. To set the stage for a fundamental AI4Science, we explore a perspective for an AI-driven, automated, generative, closed-loop approach to scientific discovery, including self-driven hypothesis generation and open-ended autonomous exploration of the hypothesis space. Generative AI, in general, and Large Language Models (LLMs), in particular, serve here to translate and break down high-level human or machine conjectures into smaller computable modules inserted in the automated loop. Discovering fundamental explanatory models requires causality analysis while enabling unbiased efficient search across the space of putative causal explanations. In addition, integrating AI-driven automation into the practice of science would mitigate current problems, including the replication of findings, systematic production of data, and ultimately democratisation of the scientific process. These advances promise to unleash AI's potential for searching and discovering the fundamental structure of our world beyond what human scientists have achieved or can achieve. Such a vision would push the boundaries of new fundamental science beyond automatizing current workflows and unleash new possibilities to solve some of humanity's most significant challenges.
Journal of Biomedical Semantics, 2012
Over the 14 years, the Bio-Ontologies SIG at ISMB has provided a forum for discussion of the late... more Over the 14 years, the Bio-Ontologies SIG at ISMB has provided a forum for discussion of the latest and most innovative research in the bio-ontologies development, its applications to biomedicine and more generally the organisation, presentation and dissemination of knowledge in biomedicine and the life sciences. The seven papers selected for this supplement span a wide range of topics including: web-based querying over multiple ontologies, integration of data from wikis, innovative methods of annotating and mining electronic health records, advances in annotating web documents and biomedical literature, quality control of ontology alignments, and the ontology support for predictive models about toxicity and open access to the toxicity data.
Journal of Biomedical Semantics, 2010
Springer eBooks, Apr 13, 2007
Page 1. Chapter 6 ONTOLOGY ENGINEERING FOR BIOLOGICAL APPLICATIONS Larisa N. Soldatova and Ross D... more Page 1. Chapter 6 ONTOLOGY ENGINEERING FOR BIOLOGICAL APPLICATIONS Larisa N. Soldatova and Ross D. King The Computer Science Department, The University of Wales, Aberystwyth, UK Abstract: Ontology engineering ...
Springer eBooks, 2020
Ensemble learning has been shown to significantly improve predictive accuracy in a variety of mac... more Ensemble learning has been shown to significantly improve predictive accuracy in a variety of machine learning problems. For a given predictive task, the goal of ensemble learning is to improve predictive accuracy by combining the predictive power of multiple models. In this paper, we present an ensemble learning algorithm for regression problems which leverages the distribution of the samples in a learning set to achieve improved performance. We apply the proposed algorithm to a problem in precision medicine where the goal is to predict drug perturbation effects on genes in cancer cell lines. The proposed approach significantly outperforms the base case.
The main goal of this paper is to report on the research and development of a mobile solution Kne... more The main goal of this paper is to report on the research and development of a mobile solution KneeCare, which supports knee rehabilitation. We carried out an in-depth analysis of the common causes of knee injuries and the types of knee rehabilitation, as well as the state-of-the-art mobile technology and the usability aspects of a mobile application. This helped to structure the solution by providing a set of functional and nonfunctional requirements. They were evaluated in view of healthcare professionals, whose opinion was obtained from interviews and questionnaires. A design solution was formulated by creating a low fidelity and a high fidelity prototype. A number of tools and techniques were used to help implement a design solution such as Eclipse IDE. A number of tests such as black box and usability testing were carried out to evaluate if the user requirements have been met. A number of informal interviews were conducted to ensure that the application met the user's needs. The results showed a positive reaction. The results from the usability tests showed the application has met 8 out of the 10 heuristics rules set by Nielsen.
Journal of Biomedical Semantics, 2014
Over the 16 years, the Bio-Ontologies SIG at ISMB has provided a forum for vibrant discussions of... more Over the 16 years, the Bio-Ontologies SIG at ISMB has provided a forum for vibrant discussions of the latest and most innovative advances in the research area of bio-ontologies, its applications to biomedicine and more generally in the organisation, sharing and re-use of knowledge in biomedicine and the life sciences. The six papers selected for this supplement span a wide range of topics including: ontology-based data integration, ontology-based annotation of scientific literature, ontology and data model development, representation of scientific results and gene candidate prediction.
Journal of Biomedical Semantics, 2013
Over the 15 years, the Bio-Ontologies SIG at ISMB has provided a forum for discussion of the late... more Over the 15 years, the Bio-Ontologies SIG at ISMB has provided a forum for discussion of the latest and most innovative research in the bio-ontologies development, its applications to biomedicine and more generally the organisation, presentation and dissemination of knowledge in biomedicine and the life sciences. The seven papers and the commentary selected for this supplement span a wide range of topics including: web-based querying over multiple ontologies, integration of data, annotating patent records, NCBO Web services, ontology developments for probabilistic reasoning and for physiological processes, and analysis of the progress of annotation and structural GO changes.
Lecture Notes in Computer Science, 2018
The use of general descriptive names, registered names, trademarks, service marks, etc. in this p... more The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
Uploads
Papers by Larisa Soldatova