Larisa Soldatova

Followers

Following

Public Views

Uploads

Papers by Larisa Soldatova

Reply to Wrestling with SUMO and bio-ontologies

Nature Biotechnology, 2006

Download

Transformational machine learning: Learning how to learn from many related scientific problems

Proceedings of the National Academy of Sciences of the United States of America, Nov 29, 2021

Almost all machine learning (ML) is based on representing examples using intrinsic features. When... more Almost all machine learning (ML) is based on representing examples using intrinsic features. When there are multiple related ML problems (tasks), it is possible to transform these features into extrinsic features by first training ML models on other tasks and letting them each make predictions for each example of the new task, yielding a novel representation. We call this transformational ML (TML). TML is very closely related to, and synergistic with, transfer learning, multitask learning, and stacking. TML is applicable to improving any nonlinear ML method. We tested TML using the most important classes of nonlinear ML: random forests, gradient boosting machines, support vector machines, k-nearest neighbors, and neural networks. To ensure the generality and robustness of the evaluation, we utilized thousands of ML problems from three scientific domains: drug design, predicting gene expression, and ML algorithm selection. We found that TML significantly improved the predictive performance of all the ML methods in all the domains (4 to 50% average improvements) and that TML features generally outperformed intrinsic features. Use of TML also enhances scientific understanding through explainable ML. In drug design, we found that TML provided insight into drug target specificity, the relationships between drugs, and the relationships between target proteins. TML leads to an ecosystembased approach to ML, where new tasks, examples, predictions, and so on synergistically interact to improve performance. To contribute to this ecosystem, all our data, code, and our ∼50,000 ML models have been fully annotated with metadata, linked, and openly published using Findability, Accessibility, Interoperability, and Reusability principles (∼100 Gbytes). AI j drug design j transfer learning j stacking j multitask learning M achine learning (ML) develops computational systems

Download

Guest editors’ introduction to the special issue on Discovery Science

Machine Learning, Oct 20, 2020

Download

Selected papers from the 13th Annual Bio-Ontologies Special Interest Group Meeting

Journal of Biomedical Semantics, May 17, 2011

Over the years, the Bio-Ontologies SIG at ISMB has provided a forum for discussion of the latest ... more Over the years, the Bio-Ontologies SIG at ISMB has provided a forum for discussion of the latest and most innovative research in the application of ontologies and more generally the organisation, presentation and dissemination of knowledge in biomedicine and the life sciences. The ten papers selected for this supplement are extended versions of the original papers presented at the 2010 SIG. The papers span a wide range of topics including practical solutions for data and knowledge integration for translational medicine, hypothesis based querying , understanding kidney and urinary pathways, mining the pharmacogenomics literature; theoretical research into the orthogonality of biomedical ontologies, the representation of diseases, the representation of research hypotheses, the combination of ontologies and natural language processing for an annotation framework, the generation of textual definitions, and the discovery of gene interaction networks.

Download

Discovery science : 21st International Conference, DS 2018, Limassol, Cyprus, October 29–31, 2018, Proceedings

Lecture Notes in Artificial Intelligence, 2018

Meta-QSAR: learning how to learn QSARs

Download

Test Generation Systems (テーマ:「新しいLearning Technology & Science」および一般)

知的教育システム研究会, Mar 15, 2003

Testing the reproducibility and robustness of the cancer biology literature by robot

Journal of the Royal Society Interface, Apr 1, 2022

Scientific results should not just be ‘repeatable’ (replicable in the same laboratory under ident... more Scientific results should not just be ‘repeatable’ (replicable in the same laboratory under identical conditions), but also ‘reproducible’ (replicable in other laboratories under similar conditions). Results should also, if possible, be ‘robust’ (replicable under a wide range of conditions). The reproducibility and robustness of only a small fraction of published biomedical results has been tested; furthermore, when reproducibility is tested, it is often not found. This situation is termed ‘the reproducibility crisis', and it is one the most important issues facing biomedicine. This crisis would be solved if it were possible to automate reproducibility testing. Here, we describe the semi-automated testing for reproducibility and robustness of simple statements (propositions) about cancer cell biology automatically extracted from the literature. From 12 260 papers, we automatically extracted statements predicted to describe experimental results regarding a change of gene expression in response to drug treatment in breast cancer, from these we selected 74 statements of high biomedical interest. To test the reproducibility of these statements, two different teams used the laboratory automation system Eve and two breast cancer cell lines (MCF7 and MDA-MB-231). Statistically significant evidence for repeatability was found for 43 statements, and significant evidence for reproducibility/robustness in 22 statements. In two cases, the automation made serendipitous discoveries. The reproduced/robust knowledge provides significant insight into cancer. We conclude that semi-automated reproducibility testing is currently achievable, that it could be scaled up to generate a substantive source of reliable knowledge and that automation has the potential to mitigate the reproducibility crisis.

EASE: Evolutional Authoring Support Environment

Springer eBooks, 2004

How smart should we be in order to cope with the complex authoring process of smart courseware? L... more How smart should we be in order to cope with the complex authoring process of smart courseware? Lately this question gains more attention with attempts to simplify the process and efforts to define authoring systems and tools to support it. The goal of this paper is to specify an evolutional perspective on the Intelligent Educational Systems (IES) authoring and in this context to define the authoring framework EASE: powerful in its functionality, generic in its support of instructional strategies and user-friendly in its interaction with the author. The evolutional authoring support is enabled by an authoring task ontology that at a meta-level defines and controls the configuration and tuning of an authoring tool for a specific authoring process. In this way we achieve more control over the evolution of the intelligence in IES and reach a computational formalization of IES engineering.

Download

IJCAI09 Workshop on Abductive and Inductive Knowledge Development

The Future of Fundamental Science Led by Generative Closed-Loop Artificial Intelligence

arXiv (Cornell University), Jul 9, 2023

Recent machine learning and AI advances disrupt scientific practice, technological innovation, pr... more Recent machine learning and AI advances disrupt scientific practice, technological innovation, product development, and society. As a rule, success in classification, pattern recognition, and gaming occurs whenever there are clear performance evaluation criteria and access to extensive training data sets. Yet, AI has contributed less to fundamental science, such as discovering new principled explanatory models and equations. To set the stage for a fundamental AI4Science, we explore a perspective for an AI-driven, automated, generative, closed-loop approach to scientific discovery, including self-driven hypothesis generation and open-ended autonomous exploration of the hypothesis space. Generative AI, in general, and Large Language Models (LLMs), in particular, serve here to translate and break down high-level human or machine conjectures into smaller computable modules inserted in the automated loop. Discovering fundamental explanatory models requires causality analysis while enabling unbiased efficient search across the space of putative causal explanations. In addition, integrating AI-driven automation into the practice of science would mitigate current problems, including the replication of findings, systematic production of data, and ultimately democratisation of the scientific process. These advances promise to unleash AI's potential for searching and discovering the fundamental structure of our world beyond what human scientists have achieved or can achieve. Such a vision would push the boundaries of new fundamental science beyond automatizing current workflows and unleash new possibilities to solve some of humanity's most significant challenges.

Download

Selected papers from the 14th Annual Bio-Ontologies Special Interest Group Meeting

Journal of Biomedical Semantics, 2012

Over the 14 years, the Bio-Ontologies SIG at ISMB has provided a forum for discussion of the late... more Over the 14 years, the Bio-Ontologies SIG at ISMB has provided a forum for discussion of the latest and most innovative research in the bio-ontologies development, its applications to biomedicine and more generally the organisation, presentation and dissemination of knowledge in biomedicine and the life sciences. The seven papers selected for this supplement span a wide range of topics including: web-based querying over multiple ontologies, integration of data from wikis, innovative methods of annotating and mining electronic health records, advances in annotating web documents and biomedical literature, quality control of ontology alignments, and the ontology support for predictive models about toxicity and open access to the toxicity data.

Download

Selected papers from the 12th annual Bio-Ontologies meeting

Journal of Biomedical Semantics, 2010

Download

Ontology Engineering for Biological Applications

Springer eBooks, Apr 13, 2007

Page 1. Chapter 6 ONTOLOGY ENGINEERING FOR BIOLOGICAL APPLICATIONS Larisa N. Soldatova and Ross D... more

Federated Ensemble Regression Using Classification

Springer eBooks, 2020

Ensemble learning has been shown to significantly improve predictive accuracy in a variety of mac... more Ensemble learning has been shown to significantly improve predictive accuracy in a variety of machine learning problems. For a given predictive task, the goal of ensemble learning is to improve predictive accuracy by combining the predictive power of multiple models. In this paper, we present an ensemble learning algorithm for regression problems which leverages the distribution of the samples in a learning set to achieve improved performance. We apply the proposed algorithm to a problem in precision medicine where the goal is to predict drug perturbation effects on genes in cancer cell lines. The proposed approach significantly outperforms the base case.

Download

Mobile application KneeCare to support knee rehabilitation

The main goal of this paper is to report on the research and development of a mobile solution Kne... more The main goal of this paper is to report on the research and development of a mobile solution KneeCare, which supports knee rehabilitation. We carried out an in-depth analysis of the common causes of knee injuries and the types of knee rehabilitation, as well as the state-of-the-art mobile technology and the usability aspects of a mobile application. This helped to structure the solution by providing a set of functional and nonfunctional requirements. They were evaluated in view of healthcare professionals, whose opinion was obtained from interviews and questionnaires. A design solution was formulated by creating a low fidelity and a high fidelity prototype. A number of tools and techniques were used to help implement a design solution such as Eclipse IDE. A number of tests such as black box and usability testing were carried out to evaluate if the user requirements have been met. A number of informal interviews were conducted to ensure that the application met the user's needs. The results showed a positive reaction. The results from the usability tests showed the application has met 8 out of the 10 heuristics rules set by Nielsen.

Selected papers from the 16th Annual Bio-Ontologies Special Interest Group Meeting

Journal of Biomedical Semantics, 2014

Over the 16 years, the Bio-Ontologies SIG at ISMB has provided a forum for vibrant discussions of... more Over the 16 years, the Bio-Ontologies SIG at ISMB has provided a forum for vibrant discussions of the latest and most innovative advances in the research area of bio-ontologies, its applications to biomedicine and more generally in the organisation, sharing and re-use of knowledge in biomedicine and the life sciences. The six papers selected for this supplement span a wide range of topics including: ontology-based data integration, ontology-based annotation of scientific literature, ontology and data model development, representation of scientific results and gene candidate prediction.

Download

An ontology-based disambiguation of terms

Download

Selected papers from the 15th Annual Bio-Ontologies Special Interest Group Meeting

Journal of Biomedical Semantics, 2013

Over the 15 years, the Bio-Ontologies SIG at ISMB has provided a forum for discussion of the late... more Over the 15 years, the Bio-Ontologies SIG at ISMB has provided a forum for discussion of the latest and most innovative research in the bio-ontologies development, its applications to biomedicine and more generally the organisation, presentation and dissemination of knowledge in biomedicine and the life sciences. The seven papers and the commentary selected for this supplement span a wide range of topics including: web-based querying over multiple ontologies, integration of data, annotating patent records, NCBO Web services, ontology developments for probabilistic reasoning and for physiological processes, and analysis of the progress of annotation and structural GO changes.

Download