Papers by Alexander Sboev
Research Square (Research Square), Mar 20, 2024
The problem of controlling an agent, following by the leader, which moves by some complicated rou... more The problem of controlling an agent, following by the leader, which moves by some complicated route, unknown for the follower is relevant both from a methodical point of view, and the practical one. In the first case, it is a rich polygon for investigation and prospective development of reinforcement learning (RL) methods. In the second case, a solution of the issue gives a means to control the route for an agent-follower which does not possess a detailed map of the area and the ability to navigate in space using external systems such as "Global Positioning System" (GPS). The agent is obliged to remain on the route at a given distance from the leader. We consider the problem in the next statement: the
Journal of physics, Feb 3, 2016
A study of possibility to model the learning process on base of different forms of timing-depende... more A study of possibility to model the learning process on base of different forms of timing-dependent plasticity (STDP) was performed. It is shown that the learning ability depends on the choice of spike pairing scheme and the type of input signal used for learning. The comparison of performance of several STDP rules along with several neuron models (leaky integrate-and-fire, static, Izhikevich and Hodgkin-Huxley) was carried out using the NEST simulator. The combinations of input signal and STDP spike pairing scheme, which demonstrate the best learning abilities, were extracted.
arXiv (Cornell University), Apr 30, 2021
We present the full-size Russian complexly NER-labeled corpus of Internet user reviews, along wit... more We present the full-size Russian complexly NER-labeled corpus of Internet user reviews, along with an evaluation of accuracy levels reached on this corpus by a set of advanced deep learning neural networks to extract the pharmacologically meaningful entities from Russian texts. The corpus annotation includes mentions of the following entities: Medication (33005 mentions), Adverse Drug Reaction (1778), Disease (17403), and Note (4490). Two of them-Medication and Disease-comprise a set of attributes. A part of the corpus has the coreference annotation with 1560 coreference chains in 300 documents. Special multi-label model based on a language model and the set of features is developed, appropriate for presented corpus labeling. The influence of the choice of different modifications of the models: word vector representations, types of language models pre-trained for Russian, text normalization styles, and other preliminary processing are analyzed. The sufficient size of our corpus allows to study the effects of particularities of corpus labeling and balancing entities in the corpus. As a result, the state of the art for the pharmacological entity extraction problem for Russian is established on a full-size labeled corpus. In case of the adverse drug reaction (ADR) recognition, it is 61.1 by the F1-exact metric that, as our analysis shows, is on par with the accuracy level for other language corpora with similar characteristics and the ADR representativnes. The evaluated baseline precision of coreference relation extraction on the corpus is 71, that is higher the results reached on other Russian corpora.
Procedia Computer Science, 2022
Nucleation and Atmospheric Aerosols, 2022
In this paper we estimate the accuracy of the relation extraction from texts containing pharmacol... more In this paper we estimate the accuracy of the relation extraction from texts containing pharmacologically significant information on base of the expanded version of RDRS corpus, which contains texts of internet reviews on medications in Russian. The accuracy of relation extraction is estimated and compared for two multilingual language models: XLM-RoBERTa-large and XLM-RoBERTa-large-sag. Earlier research proved XLM-RoBERTa-large-sag to be the most efficient language model for the previous version of the RDRS dataset for relation extraction using a ground-truth named entities annotation. In the current work we use two-step relation extraction approach: automated named entity recognition and extraction of relations between predicted entities. The implemented approach has given an opportunity to estimate the accuracy of the proposed solution to the relation extraction problem, as well as to estimate the accuracy at each step of the analysis. As a result, it is shown, that multilingual XLM-RoBERTa-large-sag model achieves relation extraction macro-averaged f1-score equals to 86.4% on the ground-truth named entities, 60.1% on the predicted named entities on the new version of the RDRS corpus contained more than 3800 annotated texts. Consequently, implemented approach based on the XLM-RoBERTa-large-sag language model sets the state-of-the-art for considered type of texts in Russian.
Procedia Computer Science, 2021
An important task in the field of automatic data analysis is detecting emotions in texts. The pap... more An important task in the field of automatic data analysis is detecting emotions in texts. The paper presents the approach of emotion recognition for text data in Russian. To conduct an emotion analysis, a method was created based on vector representations of words obtained by the ELMo language model, and subsequent processing by an ensemble classifier. To configure and test the created method, a specially prepared dataset of texts for five basic emotions-joy, sadness, anger, fear, and surprise-is used. The dataset was prepared using a crowdsourcing platform and a home-grown procedure for collecting and controlling annotators' markup. The overall accuracy is 0.78 (by the F1-macro score), which is currently the new state of the art for Russian. The results can be used for a wide range of tasks, for example: monitoring social moods, generating control signals for mobile robotic systems, etc.
Mathematics, Dec 14, 2021
This article is an open access article distributed under the terms and conditions of the Creative... more This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY
Mechanisms and machine science, 2020
Procedia Computer Science, 2015
An algorithm of finding documents on a given topic based on a selected reference collection of do... more An algorithm of finding documents on a given topic based on a selected reference collection of documents along with creating context-semantic graph for visualizing themes in search results is presented. The algorithm is based on integration of set of probabilistic, entropic, and semantic markers for extractions of weighted key words and combinations of words, which describe the given topic. Test results demonstrate an average precision of 99% and the recall of 84% on expert selection of documents. Also developed special approach to constructing graph on base of algorithms that extract key phrases with weights. It gives the possibility to demonstrate a structure of subtopics in large collections of documents in compact graph form.
Advances in intelligent systems and computing, Dec 9, 2020
AIP Advances, Nov 1, 2016
Георесурсы, Dec 1, 2020
The article is devoted to the development of a hybrid method for predicting and preventing the de... more The article is devoted to the development of a hybrid method for predicting and preventing the development of troubles in the process of drilling wells based on machine learning methods and modern neural network models. Troubles during the drilling process, such as filtrate leakoff; gas, oil and water shows and sticking, lead to an increase in unproductive time, i.e. time that is not technically necessary for well construction and is caused by various violations of the production process. Several different approaches have been considered, including based on the regression model for predicting the indicator function, which reflects an approach to a developing trouble, as well as anomaly extraction models built both on basic machine learning algorithms and using the neural network model of deep learning. Showing visualized examples of the work of the developed methods on simulation and real data. Intelligent analysis of Big Geodata from geological and technological measurement stations is based on well-proven machine learning algorithms. Based on these data, a neural network model was proposed to prevent troubles and emergencies during the construction of wells. The use of this method will minimize unproductive drilling time.
Георесурсы, Sep 30, 2020
This paper poses and solves the problem of using artificial intelligence methods for processing B... more This paper poses and solves the problem of using artificial intelligence methods for processing Big volumes of geodata from geological and technological measurement stations in order to identify and predict complications during well drilling. Digital modernization of the life cycle of wells using artificial intelligence methods helps to improve the efficiency of drilling oil and gas wells. In the course of creating and training artificial neural networks, regularities were modeled with a given accuracy, hidden relationships between geological and geophysical, technical and technological parameters were revealed. The clustering of Big data volumes from various sources and types of sensors used to measure parameters while well drilling has been carried out. Artificial intelligence classification models have been developed to predict the operational results of the well construction. The analysis of these issues is carried out, and the main directions for their solution are determined.
Nowadays, an analysis of virtual media to predict society's reaction to any events or processes i... more Nowadays, an analysis of virtual media to predict society's reaction to any events or processes is a task of great relevance. Especially it concerns meaningful information on healthcare problems. Internet sources contain a large amount of pharmacologically meaningful information useful for pharmacovigilance purposes and repurposing drug use. An analysis of such a scale of information demands developing the methods that require the creation of a corpus with labeled relations among entities. Before, there have been no such Russian language datasets. This paper considers the first Russian language dataset where labeled entity pairs are divided into multiple contexts within a single text (by used drugs, by different users, by the cases of use, etc.), and a method based on the XLM-RoBERTa language model, previously trained on medical texts to evaluate the state-of-the-art accuracy for the task of indication of the four types of relationships among entities: ADR-Drugname, Drugname-Diseasename, Drugname-SourceInfoDrug, Diseasename-Indication. As shown based on the presented dataset from the Russian Drug Review Corpus, the developed method achieves the F1-score of 81.2% (obtained using cross-validation and averaged for the four types of relationships), which is 7.8% higher than the basic classifiers.
This paper poses and solves the problem of using artificial intelligence methods for processing b... more This paper poses and solves the problem of using artificial intelligence methods for processing big volumes of geodata from geological and technological measurement stations in order to identify and predict complications during well drilling. Big volumes of geodata from the stations of geological and technological measurements during drilling varied from units to tens of terabytes. Digital modernization of the life cycle of well construction using machine learning methods contributes to improving the efficiency of drilling oil and gas wells. The clustering of big volumes of geodata from various sources and types of sensors used to measure parameters during drilling has been carried out. In the process of creating, training and applying software components with artificial neural networks, the specified accuracy of calculations was achieved, hidden and non-obvious patterns were revealed in big volumes of geological, geophysical, technical and technological parameters. To predict the operational results of drilling wells, classification models were developed using artificial intelligence methods. The use of a high-performance computing cluster significantly reduced the time spent on assessing the probability of complications and predicting these probabilities for 7-10 minutes ahead. A hierarchical distributed data warehouse has been formed, containing real-time drilling data in WITSML format using the SQL server (Microsoft). The module for preprocessing and uploading geodata to the WITSML repository uses the Energistics Standards DevKit API and Energistic data objects to work with geodata in the WITSML format. Drilling problems forecast accuracy which has been reached with developed system may significantly reduce non-productive time spent on eliminating of stuck pipe, mud loss and oil and gas influx events.
Uploads
Papers by Alexander Sboev