2024 Biomarkers DL ML IIM

Briefings in Bioinformatics, 2024, 25(1), 1–14
https://doi.org/10.1093/bib/bbad514
Review
From data to diagnosis: how machine learning is

revolutionizing biomarker discovery in idiopathic
inflammatory myopathies
Emily McLeish, Nataliya Slater, Frank L. Mastaglia, Merrilee Needham and Jerome D. Coudert
Corresponding author. J.D. Coudert, Centre for Molecular Medicine and Innovative Therapeutics, Murdoch University, Murdoch, WA, Australia.
Tel.: +61 8 9360 1366. E-mail: [email protected]
Abstract
Idiopathic inf lammatory myopathies (IIMs) are a heterogeneous group of muscle disorders including adult and juvenile dermatomyosi-
tis, polymyositis, immune-mediated necrotising myopathy and sporadic inclusion body myositis, all of which present with variable
symptoms and disease progression. The identification of effective biomarkers for IIMs has been challenging due to the heterogeneity
between IIMs and within IIM subgroups, but recent advances in machine learning (ML) techniques have shown promises in identifying
novel biomarkers. This paper reviews recent studies on potential biomarkers for IIM and evaluates their clinical utility. We also
explore how data analytic tools and ML algorithms have been used to identify biomarkers, highlighting their potential to advance
our understanding and diagnosis of IIM and improve patient outcomes. Overall, ML techniques have great potential to revolutionize
biomarker discovery in IIMs and lead to more effective diagnosis and treatment.
Keywords: idiopathic inf lammatory myopathies; machine learning; biomarkers; myositis-specific autoantibodies
INTRODUCTION spectrum of disease severities or to stratify rapidly progressing

patients would be invaluable for disease management. Currently,
Idiopathic inf lammatory myopathies (IIMs) encompass a diverse
a thorough evaluation that includes a combination of clinical, lab-
group of disorders, including adult dermatomyositis (ADM),
oratory, radiological and pathological assessments is necessary to
juvenile dermatomyositis (JDM), anti-synthetase syndrome (ASS),
establish an accurate diagnosis.
overlap myositis (OM), polymyositis (PM), immune-mediated
This paper reviews recent studies on potential biomarkers for
necrotising myopathy (IMNM) interchangeably referred to as
IIM and assesses their clinical utility. We also explore data analytic
necrotising autoimmune myopathy and sporadic inclusion
tools and machine learning (ML) algorithms that have proven
body myositis [1]. Biomarkers are ‘a defined characteristic
valuable for biomarker discovery, highlighting their potential to
that is measured as an indicator of normal biological pro-
advance our understanding of IIM and improve patient outcomes.
cesses, pathogenic processes or responses to an exposure
or intervention’ [2]. They have emerged as powerful tools
for diagnosis, predicting disease prognosis and identifying
therapeutic targets. For example, lymphocytes-expressing Bcl- ML approaches for biomarker discovery
2 and CCR4 are indicative of anti-HMGCR+ IMNM [3], and DM ML algorithms have revolutionized this field of biomedicine.
skeletal muscle biopsies have upregulated interferon (IFN)- Inf lammatory myopathies have been investigated using var-
stimulated gene signatures, indicating a role for type 1 IFNs in ious ML techniques, such as clustering algorithms, principal
DM pathogenesis [4, 5]. These findings have led to promising component analysis (PCA) and deep neural networks. These
mechanism-based treatments, such as tofacitinib or ruxolitinib models learn complex relationships between variables, handle
(JAK/STAT inhibitors), which have been shown to reduce serum missing or noisy data, and assist in making real-time predictions
IFN-I levels and improve skin lesions and muscle weakness in DM [9]. In addition to diagnosing diseases, ML algorithms provide
patients [6, 7]. valuable insights into therapeutic outcomes in various diseases,
However, the heterogeneity in symptoms and disease progres- allowing clinicians to tailor treatment plans based on a patient’s
sion within IIM subgroups often poses additional challenges to predicted response to therapy. It is important to note, however,
identifying effective biomarkers. Thus, a successful biomarker that further research is still needed to validate their accuracy and
would not only accurately distinguish IIM from other conditions determine their clinical utility. Nonetheless, the potential for ML
that can present with similar symptoms, such as muscular dystro- to revolutionize biomarker discovery and therapeutic outcomes
phies or metabolic myopathies but also differentiate one IIM from in various diseases including IIM is becoming increasingly
another [8]. In addition, the capacity to identify patients across the evident.
Received: October 5, 2023. Revised: November 29, 2023. Accepted: December 17, 2023
© The Author(s) 2024. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
2 | McLeish et al.
Figure 1. Interactive starburst plot showing machine learning categories, subsets, and algorithms. The interactive starburst plot displays the main
categories of machine learning, including supervised and unsupervised learning, and their corresponding subsets and algorithms. Users can explore
the plot to gain insights into the various machine learning techniques and their applications. This is an interactive plot: follow link: https://chart-
studio.plotly.com/~Emilymc/3.embed GAN: generative adversarial networks, RNN: recurrent neural networks, CNN: convolutional neural networks, AE:
autoencoders, BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies, OPICS: Ordering Points To Identify the Clustering Structure, SARSA:
State-Action-Reward-State-Action.
ML algorithms can be classified into five main categories In reinforcement learning, the algorithm learns to make deci-
(Figure 1): supervised, unsupervised, semi-supervised, reinforce- sions based on feedback from its environment [10]. This is often
ment learning and deep learning [9, 10]. used in game playing or robotics. Examples of these algorithms
In supervised learning, the algorithm is trained on labelled data include Q-learning and deep reinforcement learning networks.
to learn mapping from inputs to outputs. This approach is best Deep learning involves training artificial neural networks to
applied for classification or regression tasks [9]. Examples include recognize patterns in data. These neural networks are made up of
decision trees and support vector machines (SVMs). layers of interconnected nodes that process and transform input
In unsupervised learning, the algorithm is trained on unla- data to produce an output [11]. These networks often possess
belled data to find patterns or relationships within the data [10]. multiple layers, allowing them to learn complex representations
This method is used for clustering or dimensionality reduction of the input data. Deep learning has been particularly successful
tasks. Examples include k-means clustering and PCA. in applications such as image and speech recognition, natural
Semi-supervised learning involves training an algorithm from language processing and autonomous driving [11].
both labelled and unlabelled data. In this approach, the algorithm
is provided with some labelled data to learn from, and then it ML modelling versus statistical modelling
translates this knowledge to make predictions on the unlabelled ML and statistical modelling (SM) are two popular approaches for
data. analyzing medical and scientific data (Figure 2). Although ML and
From data to diagnosis | 3
Figure 2. The relationship between artificial intelligence, machine learning, deep learning and data science. The diagram highlights how these fields
build on each other to provide advanced solutions for data-driven problems. Figure created with Biorender.com
SM are related fields, they are not synonymous. Statistics is a trained on new data input that becomes available. SM is a valuable
branch of mathematics that deals with collecting, analyzing and approach, particularly when the underlying mechanisms of the
interpreting data. ML, on the other hand, constitutes a branch of data are known, and when the research question is confirma-
artificial intelligence dedicated to devising algorithms and mod- tory [12]. Both methods have their advantages and disadvan-
els that enable computers to acquire knowledge from data and tages, and researchers must consider their research aims and the
generate predictions [12]. Although ML has a strong mathematical nature of their data before deciding which approach is the most
foundation, incorporating statistical techniques like inference, appropriate.
hypothesis testing and regression analysis, it also extends beyond
traditional methods with techniques like neural networks, deep
learning and natural language processing to address complex ML models for diagnosis of IIM and subgroups of
problems. IIM
In some cases, ML and SM overlap, as seen with logistic regres- Accurate diagnosis of a type of IIM is often challenging, as
sion (LR) analysis. LR is conventionally considered a statistical many muscle conditions possess overlapping clinical features
model that determines the odds ratio (OR) based on binary out- and laboratory findings [8]. Recent ML models have been
comes, where the dependent variable has two possible categorical applied to multiple different patient datasets, including clinical,
values [13]. For example, the binary outcome of not having or histopathological and imaging data and have provided new
having a disease can be represented as 0 and 1, respectively. opportunities for improving the accuracy and speed of IIM
However, LR can also be viewed as a supervised ML model since diagnosis.
it utilizes a training dataset to learn the relationship between Earlier diagnostic criteria for IIM such as the Bohan and Peter
predictor variables and the binary outcome. Once trained, the criteria predominantly focused on distinguishing between DM
LR model can make predictions on new data by estimating the and PM as many other myopathies such as IBM and IMNM had
probability of the binary outcome [14]. not yet been separated from PM [16]. These initial classifica-
One of the key differences between ML and SM is the focus tions heavily relied on clinical presentation and histology, both
of each field. Statistics is primarily concerned with comparing of which required a high level of medical expertise for interpre-
and summarizing data, while ML is focused on building predic- tation (Table 1: Polymyositis and dermatomyositis diagnostic criteria).
tive models [12, 15]. Furthermore, statistics typically deal with However, later adaptions of the Bohan and Peter criteria uti-
relatively small and carefully defined datasets, while ML mostly lized ‘computer-assisted analysis’ although it was unclear what
involves working with large and complex datasets. methods specifically this refers to in this study [17]. More recent
ML models are powerful tools for exploratory research because criteria for IIM combine the clinical, histological and serology (pre-
they identify complex patterns in large and high-dimensional dominantly myositis-specific or myositis-associated antibodies
datasets; they can also handle missing data. They make no prior (MSA); see below). However, in many of these earlier publications,
assumptions about the data and are f lexible as they can be validation of the specificity and sensitivity for these criteria were
4 | McLeish et al.
Table 1: Specificity, sensitivity and characteristics of various diagnostic and classification criteria for IIM adapted from Lundberg
et al. [34]
Criteria Symptom Age Muscle Muscle Muscle EMG Muscle Extramus- MSA Sensitivity and specificity
duration weakness pain biopsy enzymes cular
features
Polymyositis and dermatomyositis diagnostic criteria
Medsger et al. [19] X X X X X X Not validated

DeVere and Bradley [20] X X X X X X Not validated
Bohan and Peter [16] X X X X X 94.3%; 29.4%
Validated by [18]
Dalakas [21] X X X X X 88.6%; 47.1%
Validated by [18]
Tanimoto et al. [22] X X X X X X X 88.6%; 29.4%
Validated by [18]
Targoff et al. [23] X X X X X X 97.1%; 29.4%
Validated by [18]
Dalakas and Hohlfeld [24] X X X X X 77.1%; 99.9%
Validated by [18]
Hoogendijk et al. [25] X X X X X X X 71.4%; 82.4%
Validated by [18]
Oddis et al. [26] X X X X X 93%; 93%
IBM-specific diagnostic criteria
Griggs criteria [27] X X X X X X Sensitivity: 11%–100%

Specificity: 73%–100%
Validated by [29] [38]
2000 European X X X X X X Sensitivity: 46%–65%
Neuromuscular Centre Specificity: 98%–100%
(ENMC) criteria [28] Validated by [38]
2010 MRC Centre for X X X X X X Sensitivity: 11%–73%
Neuromuscul. Dis. [30] Specificity: 98%–100%
Validated by [38]
2013 European X X X X X X Sensitivity: 15%–84%
Neuromuscular Centre Specificity: 98%–100%
(ENMC) criteria [32] Validated by [38]
Lloyd et al. [38] X X 90%; 96%
a Criteria based on high-performing features from other criteria.
IMNM-specific diagnostic criteria
Triplett et al. [37] X X X X X X AUC ROC 97.1%

224th ENMC International X X X X X X X 93%; 88%
Workshop (32) Validated by [1]
Overlap myositis specific criteria
Troyanov et al. [39] X X X X Sensitivity 87%
a Modified Bohan & Peter clinic-serological classifications
All IIM diagnostic criteria
2017 EULAR/ACR [1] X X X X X X 93%; 88%; Reviewed by [33]

Sensitivity 80.9–99.6%;
Reviewed by [36]
Mariampillai et al. [40] X X X X 77%; 92%
Eng et al. [35] X X X X AUROCs between 78% and
97% and AUPRCs between
55% and 96% for individual
MSA
either not performed at the time or have been conducted in later IIM diagnostic criteria). This criterion includes 12 clinical features
studies [18–34] (Table 1). including age, gender, the pattern of muscle weakness, skin mani-
A recent diagnostic data-derived criteria for IIM is the 2017 festations,
European League Against Rheumatism/American College of laboratory features such as elevated serum creatine kinase (CK)
Rheumatology (EULAR/ACR) classification criteria [1] (Table 1: All concentrations, presence of autoantibodies and histopathological
operating characteristic [AUC ROC]). The authors determined that

electrical myotonia was much more significant in IMNM than
other forms of myopathy and could help improve the diagnosis of
IMNM, particularly in cases where the disease has a chronic and
indolent course, and where patients test negative for autoanti-
bodies against hydroxy-3-methylglutaryl-coenzyme-A reductase
(HMGCR) or signal recognition particle (SRP54) [37].
Since 1987, 24 diagnostic criteria for IBM have been proposed
by IBM experts (Table 1: IBM-specific diagnostic criteria). Although
some of these criteria showed high specificity (97% or higher),
their sensitivities varied widely. In response to this, Lloyd and
coworkers [38] developed a new diagnostic criterion for IBM based
only on the most effective features from the previous criteria
and constructed using a range of classification ML algorithms.
It includes a combination of three main parameters: weakness
of finger f lexors or quadriceps, endomysial inf lammation, and
invasion of non-necrotic muscle fibres or rimmed vacuoles. This
new criterion was tested on 371 patients and reported a sensitivity
of 90% and a specificity of 96%.
Figure 3. Number of publications using ML algorithms in IIM research.
OM is a term used to describe patients with an inf lamma-
Bar graph showing the number of publications investigating the use of
ML in the field of IIM between 2014 and between January and October tory myopathy that occurs together with other connective tissue
2023∗ . The data presented herein have been derived from PubMed and are disorders (CTDs) such as systemic sclerosis (SSc, scleroderma),
ref lective of publications available as of 4 October 2023. These publica- systemic lupus erythematosus (SLE), rheumatoid arthritis (RA),
tions were identified using specific search criteria, employing the terms
Sjogren’s syndrome (SS), or mixed connective tissue diseases
‘Inf lammatory myopathies and Machine Learning’. Figure created with
Biorender.com (MCTDs) [39]. Alternatively, some authors consider a definition of
OM if certain MSA are present even without clinical features of
CTD. The diagnostic classification of OM was reviewed by Troy-
features including the pattern of inf lammation, perifascicular anov and colleagues who developed a new classification system,
atrophy and vacuoles [1]. Each feature is scored, and the total placing overlap features at the core, and compared it with the
value translates into diagnosis classification as either ‘definite’, original Bohan and Peter classification [39]. They found that the
‘probable’ or ‘possible’ JDM, DM, ADM, IBM, or PM. The sensitivity modified classification that includes overlap antibodies has led to
and specificity of the 2017 (EULAR/ACR) classification were an increased frequency of OM diagnosis than the original classifi-
evaluated at 93% and 88%, respectively, and were greater when a cation. The modified classification also showed better sensitivity
muscle biopsy had been performed. Nonetheless, the criteria still for identifying OM patients. The authors identified different types
performed well without a biopsy [1]. of OM-related antibodies that can be used as biomarkers for
Despite significant progress in the diagnostic criteria of IIM, it different disease courses and treatment responses. They pro-
is not exempt from limitations. One notable example is its failure posed that the new classification has diagnostic, prognostic and
to account for the disease heterogeneity and distinguish between therapeutic value and should replace the original classification
IMNM and PM, while also being unable to assess myositis-specific (Table 1: Overlap myositis specific criteria).
autoantibodies other than beyond anti-Jo-1. As a result, there are In a study by Eng et al. [35], IIM patients from a previous
still challenges to accurately classify and subtype patients [35, 36]. Rituximab trial were stratified into five groups using similarity
This has highlighted the need for identifying novel biomarkers network fusion (SNF). SNF is a powerful ML approach for combin-
that can aid in the diagnosis, classification and prognostication ing multiple types of biological and clinical data and is designed
of IIM, and ultimately improve patient outcomes. Over the last to uncover hidden relationships and clusters within patients by
two decades, significant advances in computational capabilities leveraging similarity information between data points. The five
have resulted in the development of more powerful ML models patient group assignments were then predicted using a sparse
that are increasingly being applied in the medical field. The multinomial regressor. The outcomes, denoted by area under the
number of yearly publications that used ML for diagnostic and receiver operating characteristic (AUROCs) ranging from 78% to
subclassification of IIMs increased by nearly 10 times between 97%, and area under the precision-recall curve (AUPRCs) span-
2014 and May 2023 (Figure 3). ML has the potential to effectively ning from 55% to 96%, indicate the feasibility of stratifying IIM
tackle the heterogeneity of IIMs, offering a promising avenue groups based on the presence of MSA. The presence of anti-Mi-
to enhance the accuracy of predicting disease progression and 2 and anti-synthetase autoantibodies (more commonly referred
outcome. to as anti-histidyl tRNA synthetase antibodies) was observed in
In 2020, a study by Triplett et al. [37], (Table 1: IMNM specific adult DM. Conversely, anti-NXP2 autoantibodies were associated
diagnostic criteria) described a novel criterion for diagnosing IMNM with juvenile DM. Furthermore, within the PM subgroup, notable
using ML regression analysis. In a cohort of 119 IMNM patients observations included a reduction in IgM levels and the presence
and 938 with other types of myopathy, a multivariate regres- of anti-SRP autoantibodies. While these findings might align more
sion analysis of 20 variables identified eight predictors, including closely with the features of INMN, it is imperative to consider
statin exposure, increased CK levels (>1000 U/l), and muscle that the study participants were enrolled prior to the introduc-
weakness in the deltoid gluteus maximus and finger extensors; tion of the INMN classification. Consequently, their inclusion in
while finger f lexor and ankle dorsif lexor were unaffected, and the PM subgroup, despite the characteristic findings, is rooted
lastly electrical myotonia could also accurately distinguish IMNM in the context of the study’s pre-existing classification criteria
from other myopathies (97% area under the curve for receiver [35].
6 | McLeish et al.
In another study, Mariampillai and colleagues stratified IIM models are updated based on new data or information, generating
patients using unsupervised multiple correspondence analysis posterior probabilities. This adaptability makes them valuable in
and hierarchical clustering [40]. This approach successfully cat- decision-making processes and modelling scenarios with uncer-
egorized the patients into four well-defined groups DM, IBM, tain or limited data. While not strictly considered ML, Bayesian
IMNM and ASS; the patients who had previously been diagnosed models are used in data analysis and diagnostics [52]. Contrary
with PM fell into the IMNM or ASS groups, suggesting that PM to other studies, they found that anti-cN1A antibodies could not
was no longer considered a separate diagnosis. Additionally, the effectively discriminate IBM from other conditions like PM/DM
algorithm showed that once again, MSA and myositis-associated and MND. However, the variability in testing methodologies used
antibodies (MAAs) were crucial for IIM classification, emphasizing across studies has introduced potential bias, given the lack of
the utility of autoantibody detection for accurate diagnosis [40]. standardized protocols [51]. Furthermore, the prognostic value of
Further study will determine whether serological testing could anti-cN1A antibodies in IBM has produced inconsistent conclu-
replace the need for muscle biopsies. Both of these studies have sions. With several studies noting limited prognostic value [53–55],
highlighted the potential of unsupervised ML algorithms for iden- while another study reported that seropositive patients showed
tifying clinically and biologically homogeneous patient groups increased mortality risk, less proximal upper limb weakness at
and underscored the unique contribution of ML for identifying disease onset, and an increased cytochrome c oxidase (COX)-
biomarkers for IIM. deficient muscle fibres [56].
The use of ML algorithms to identify unique MSA profiles
Clinical utility of autoantibodies in IIM associated with distinct clinical features has greatly improved
Myositis-specific antibodies being detected in approximately 60– IIM stratification. However, it is important to acknowledge that
70% of affected IIM patients have emerged as pivotal biomarkers there are various methods used for antibody detection, with
and offer a unique lens to dissect the heterogeneity within the each method having varying degree of specificity and sensitivity
IIM spectrum. Anti-Jo-1 autoantibodies are detected in ASS, while [57]. As discussed in the instance with anti-cN1A antibodies in
anti-Mi-2, which represent the most extensively studied MSAs, is IBM, discrepancies in methodologies can lead to contradictory
specific for DM [41]. Other examples of MSAs that are strongly results regarding the autoantibodies’ clinical utility. This illus-
associated with IMNM include antibodies to the SRP and to 3- trates the crucial point that computational methods’ reliability
hydroxy-3-methylglutaryl CoA reductase (HMGCR) [42, 43]. Anti- for biomarker discovery are limited by the accuracy of the detec-
cytosolic 5 -nucleotidase 1 A (cN1A) antibodies have been found in tion methods. As such, careful consideration and validation of
autoimmune diseases such as SS and SLE; however, in the context detection methodologies are imperative for accurate and mean-
of IIMs, they are restricted to IBM [44, 45]. ingful results.
The clinical utility of MSAs extends beyond their ability to
differentiate subgroups of IIMs. They have demonstrated an asso-
ciation with distinct clinical attributes, thereby aiding in prognosis Immunophenotyping as a tool for identifying
prediction and treatment planning. For instance, the presence of biomarkers in inflammatory myopathies
certain MSAs, like anti-TIF1-γ , anti-NXP2 and anti-HMGCR, has Immunophenotyping has become an essential tool for uncover-
long been linked to an elevated risk of malignancy in IIM patients ing novel biomarkers for inf lammatory myopathies. This entails
(described more in detail below) [46–49]. Moreover, anti-MDA5 a systematic exploration of the subsets, activation state and
antibodies, which are often found in amyopathic DM, and are a differentiation pattern of immune cell across various biological
risk factor for rapidly progressive interstitial lung disease (ILD), samples such as peripheral blood, muscles and other affected
particularly among Eastern-Asian populations [41]. Furthermore, tissues. This approach can also involve deciphering the cytokines
the presence of anti-HMGCR IMNM, particularly when statin- and chemokines that these cells produce. While certain studies
associated, are often associated with a good response to treatment have utilized ML algorithms to analyze extensive immunophe-
[42] alternatively, approximately 30% of IMNM patients with anti- notyping data produced by techniques like f low cytometry and
SRP antibodies are often refractory to steroid treatments [43]. mass cytometry, there appears to be a preference for dimen-
Embracing the capability of unsupervised hierarchical clus- sionality reduction techniques in the broader landscape. [58, 59].
tering analysis, Allenbach and colleagues [50] explored the phe- These techniques, including Uniform Manifold Approximation
notypic landscape of anti-MDA5 antibody positive patients and and Projection for Dimension Reduction, t-distributed stochastic
found that patients could be stratified into three different sub- neighbour embedding (tSNE/ viSNE), are dimensionality reduc-
groups. In the initial subset, patients faced a rapidly progress- tion algorithms [60], whereas FlowSOM (Self-Organizing Map)
ing ILD which also corresponded to an elevated mortality rate. and Spanning-tree Progression Analysis of Density-normalized
The second cluster predominantly displayed dermatological and Events (SPADE) cluster cells based on similarities in their sur-
rheumatological symptoms, offering a more favourable prognosis. face markers [61–65]. They have become a prominent feature
Lastly, the third group exhibited severe skin vasculopathy, were in the analysis of single-cell technologies, including f low and
mostly male, and had an intermediate prognosis in comparison mass cytometry, as well as scRNA-seq (Figure 4). They are often
to the other two patient groups [50]. considered an improved alternative to manual gating, as they
The presence of anti-cN1A antibodies has been proposed as offer an unbiased and systematic exploration of the data [64].
a potential biomarker for IBM but its diagnostic and prognostic For instance, Dzangué-Tchoupou and colleagues [58] performed
significance remains uncertain. Sensitivity and specificity of anti- comprehensive immune profiling of peripheral blood cells from
cN1A detection for IBM diagnosis have shown wide variability, 18 IBM patients, 26 other IIM patients and 16 HC through mass
ranging from 32.8% to 88.6% and 80% to 100%, respectively [45]. cytometry. By leveraging SPADE, CITRUS and classification and
A meta-analysis by Mavroudis and colleagues explored its diag- regression trees (CART) algorithms, along with receiver operat-
nostic utility using Bayesian models [51]. Bayesian models are a ing characteristics curves, they identified that a frequency of
statistical approach that integrates prior knowledge, sourced from CD8+ , T-bet+ cells exceeding 51.5% provided a potential diag-
previous studies or expert assumptions about the data. These nostic biomarker specific to IBM exhibiting high sensitivity and
Figure 4. Pipeline for the main steps in the FlowSOM analysis. (A) Data Preparation and quality control checks (i) The fcs-files are read, (ii) compensated,
(iii) QC checked and (iv) concatenated. (B) FlowSOM model training and evaluation of model quality. (v) The model is trained and visualization is shown as
a minimum-spanning tree, which is composed of multiple inter-connecting nodes. (vi) Each node comprises a start chart of different colours representing
an immune marker. (vii) Example of a start chart with mean immune marker values. (C) Analysis of FlowSOM model using other visualization tools
such as (viii) clustering analysis via t-SNE map, (ix) heatmaps or (x) differential analysis which can be used to infer biological conclusions about the
data. Figure created with Biorender.com
specificity. Similarly, Wilfong et al. [59], dissected mass cytome- The study identified two distinct clusters correlating with differ-
try data by integrating t-SNE, CITRUS and marker enrichment ent disease activities and clinical outcomes in ADM-ILD. Cluster 1
modelling (MEM). The authors revealed shared immunological was characterized by an enrichment of activated CD45RA+ , HLA-
features across 17 IIM patients (6 DM, 4 PM, 7 ASS) including a DR+ and CD8+ T cells with decreased proportion of the CD56dim
decreased expression of the activation marker CD180 on B cells NK cell subset that correlated with a higher prevalence of rapidly
and the homing marker CXCR3 on T cells, relative to healthy progressive ILD and higher mortality rate. In contrast, cluster 2
controls. Additionally, two distinct subgroups of IIM patients could was characterized by abundant non-activated T cells and had
be delineated. The first group demonstrated an upregulation of favourable clinical outcomes with survival rate over 6 years higher
CXCR4 across all cell populations, with the authors suggesting this than cluster 1. These findings suggest that peripheral immuno-
upregulation may be associated with increased diseased severity. logical features may be used to stratify ADM-ILD patients and
Alternatively, increased frequency of the CD19+ , CD21lo , CD11c+ correlate with differential disease severity and clinical outcomes
and CD3+ , CD4+ , PD1+ delineated the second IIM subsets and [66]. Through hierarchical clustering and Balanced Random For-
represented a pro-fibrotic phenotype [59]. est Models, distinct clusters surfaced, bearing correlations with
Supervised classification algorithms, like SVMs and random different disease activities and outcomes. Notably, these clusters
forests (RF), are also popular methods for analyzing immunophe- showcased variety of immune cell subset frequencies, each tied to
notyping data. These algorithms recognize patterns and complex divergent prognosis. Similar strategies were adopted in delineat-
relationships between surface and intracellular marker expres- ing 421 DM patients with anti-MDA5 antibodies, into three distinct
sion, fuelling predictive models for disease diagnosis and progno- prognostic clusters based on lymphocyte counts [67]. Specifically,
sis. In a study by Ye and colleagues, immune signatures were scru- the arthritis-associated cluster demonstrates elevated lympho-
tinized in 82 amyopathic dermatomyositis with interstitial lung cyte counts and boasts the most favourable prognosis, suggesting
disease (ADM-ILD) patients and 82 HC [66]. Patients were stratified a subset with a more positive disease trajectory. In contrast, the
based on their immune cell subset frequencies using hierarchical rapidly progressive interstitial lung disease (RP-ILD) cluster is
clustering analysis followed by supervised ML methods (Balanced characterized by the lowest peripheral lymphocyte levels and an
Random Forest Model) to identify the subsets of predictive value. unfavourable prognosis, highlighting a subgroup with heightened
8 | McLeish et al.
disease severity. Additionally, the cluster associated with the typ- In addition to genomics, other investigative approaches include
ical DM rash presents a moderate peripheral lymphocyte count, metabolomics. It can be applied to various biof luids, including
indicating an intermediate prognosis—offering a nuanced under- blood and urine that are more easily accessible compared to
standing of disease outcomes within this particular subgroup. invasive muscle biopsies and traditional histological analysis. ML
Overall, immunophenotyping has become an essential tool models have emerged as valuable tools for identifying biomarkers
for identifying novel biomarkers and understanding the com- and unravelling molecular mechanisms from metabolomic data.
plex immune system dysregulation that occurs in inf lammatory In a recent study conducted by Liu et al., supervised classification
myopathies. Dimensionality reduction and ML algorithms, partic- algorithms such as RF and AdaBoost were effectively employed
ularly those involving unsupervised clustering, have revolution- to detect perturbations in metabolic pathways across various
ized the way researchers approach data analysis in single-cell subtypes of IIMs. The study encompassed 52 healthy donors
technologies. They allow for the unbiased identification of cell and 79 major IIM subtypes, including DM, ASS, IMNM and MSA-
populations, which provides insights into the aetiopathology of defined subtypes, such as anti-Mi2+ , anti-MDA5+ , anti-TIF1γ + ,
disease, offering new avenues for the more accurate diagnosis and anti-Jo1+ , anti-PL7+ , anti-PL12+ , anti-EJ+ and anti-SRP+ . The anal-
identification of novel therapeutic targets. Moreover, these find- ysis revealed significant disturbances in fatty acid biosynthesis in
ings underscore the heterogeneity of inf lammatory myopathies both plasma and urine samples, with several metabolites exhibit-
and the potential utility of distinct biomarker profiles in predicting differential expression across various IIM subtypes. Notably,
ing and managing diverse clinical trajectories. creatine in plasma was identified as a potential specific biomarker
for the INMN while tiglylcarnitine in urine showed promise as
a distinctive biomarker for anti-glycyl tRNA synthetase (anti-Ej)
Leveraging the power of ML on multi-omic data subtype of ASS. Additionally, 16 shared metabolites were detected
helps unveil mechanism-based pathways in IIM among the plasma and urine samples of different IIM subtypes
Multi-omics profiling studies is a rapidly emerging field that aims [70].
to integrate data from genomics, transcriptomics, proteomics Kang et al. [71] conducted a comparative study using ML
and metabolomics to obtain a comprehensive understanding of techniques to identify metabolic differences among IIM patients,
biological systems. The generation of multi-omic meta datasets 30 ankylosing spondylitis (AS) patients and 10 HC. They employed
has significantly increased the complexity of analysis, which supervised ML models, including linear regression, RF and SVMs,
demands greater computational power for processing and anal- and discovered seven distinct metabolites, including branched-
ysis. The majority of multi-omic studies in IIM have utilized chain amino acids (BCAAs), biogenic amines and lipids, that
supervised classification-based methods such as SVMs, linear effectively distinguished IIM patients from both healthy controls
regression and RFs as well as dimensionality reduction methods and AS groups. Notably, elevated levels of specific amino acids,
such as Partial Least Squares Discriminant Analysis (PLS-DA). like BCAAs, were associated with inf lammation through mTORC1
These methods identify patterns and complex relationships in the activation. The study also explored metabolic changes in skeletal
data that would be difficult to identify using traditional statistical muscles using a mouse model of IIM induced by C-protein
methods alone. immunogens, identifying 68 significantly altered metabolites.
Using high-throughput RNA sequencing in muscles isolated Pathway analysis indicated a significant decrease in spermine
from 18 IMNM patients and 10 HC, Chen and coworkers [68] iden- and spermidine levels, indicative of polyamine pathway down-
tified 193 differentially expressed genes associated with inf lam- regulation. Furthermore, changes in metabolites related to beta-
matory immune responses, cardiac muscle contraction, skele- alanine and histidine metabolisms suggested potential muscle
tal muscle regulation and lipoprotein metabolism. Three fea- cell damage during inf lammation [71].
ture genes, LTK, MYBPH and MYL4 that are associated with the In another study, a combined metabolomic and transcrip-
autophagy-lysosome pathway and muscle inf lammation were tomic analysis of 14 IBM muscle samples revealed specific
identified as potential biomarker genes for IMNM with an accu- metabolic alterations. [72]. Employing the widely used Partial
racy of 97.3% using the least absolute shrinkage selection operator Least Squares Discriminant Analysis (PLS-DA) model, the
(LASSO) and SVM-recursive feature elimination (SVM-RFE) algo- researchers deciphered complex relationships, identifying 198
rithms [68]. metabolites linked to upregulated histamine biosynthesis and
Pinal-Fernandez and colleagues applied ML algorithms to mus- were associated with an accumulation of mast cells in IBM.
cle isolated from 20 HC and 119 myositis patients (39 with DM, 49 The glycosaminoglycan pathways were notably upregulated,
with IMNM, 18 with anti-Jo1-positive AS and 13 with IBM). RNA- as evident from the excessive chondroitin sulphate levels
transcriptomic analysis found over 10,000 unique gene expression observed in both metabolomic and transcriptomic analyses.
patterns that distinguish DM, AS, IMNM and IBM from HC [69]. Histopathological examinations further corroborated these find-
The support vector ML algorithm demonstrated >90% accuracy in ings, confirming the presence of substantial chondroitin sulphate
classifying patients. Further investigations using recursive feature accumulations within the muscle tissues of IBM patients. Notably,
elimination identified genes that were overexpressed in one type deficiencies in key energy metabolism molecules, carnitine and
of myositis. For instance, CAMK1G, EGR4 and CXCL8 transcripts creatine, were also unveiled, suggesting potential biomarker
were increased in AS, but neither in DM nor in other types of avenues for IBM treatment through diet supplementation
myositis. Additionally, the same method identified genes uniquely [72].
overexpressed in various MSA-defined myositis subtypes, such The field of multi-omics profiling has rapidly evolved to gain
as APOA4 was found to be significantly overexpressed in anti- a comprehensive understanding of IIM by integrating data from
HMGCR positive myopathy, and mucosal vascular address in cell genomics, transcriptomics, proteomics and metabolomics. Note-
adhesion molecule 1 (MADCAM1) was found overexpressed in worthy findings include identification of genetic biomarkers asso-
anti-Mi2 positive DM. These findings demonstrated the potential ciated with the autophagy–lysosome pathway and muscle inf lam-
of ML to identify genes related to specific myositis types and MSA- mation in IMNM. Pinal-Fernandez and colleagues utilized ML
defined subtypes [69]. algorithms to classify distinct myositis subtypes based on unique
Figure 5. Building blocks of typical CNN from an image. Convolutional layer: (A) set of filters are learned during training and applied to the input image
to extract features at different spatial locations. Each filter convolves over the input image to produce a feature map. Pooling layer: The pooling layer is
used to down-sample the output of the convolutional layer, reducing the spatial dimensions of the feature maps while retaining the important features.
Fully connected layer: The fully connected layer is used to produce the final output of the network. It takes the f lattened output from the previous layer
and applies a set of weights to produce a vector of outputs.Figure created with Biorender.com
gene expression patterns [69]. Metabolomic studies revealed per- supplementary diagnostic or prognostic insights. Nagawa et al.
turbations in metabolic pathways across IIM subtypes, with spe- [76] employed TA on MRI data from 55 IIM and 19 non-IIM
cific biomarkers identified for IMNM and different MSA-defined patients. Several ML models classified TA features, unveiling
IIM subtypes [69, 70]. The combined metabolomic and transcrip- disease activity trends in IIM subgroups and distinguishing anti-
tomic analysis in IBM uncovered specific metabolic alterations Jo-1 and anti-aminoacyl tRNA synthetases (ARS) IIM subgroups.
associated with histamine biosynthesis, glycosaminoglycan path- However, it showed limited ability to differentiate IIM from non-
ways and deficiencies in key energy metabolism molecules pre- IIM samples.
senting potential therapeutic avenues. These studies collectively Deep learning, like unsupervised novelty detection (ND), aids
highlight the power of multi-omics approaches and ML tech- medical image analysis by training on healthy data to detect
niques in uncovering intricate molecular signatures, biomarkers deviations. For example, Burlina and colleagues used ND on 3586
and potential therapeutic targets. ultrasound images obtained from 89 subjects, including 35 con-
trols and 54 with myositis, achieving a baseline (ROC AUC of
71.92% and 95% CI error margin). These promising results indi-
ML approaches for analyzing medical images cated the potential of implementing this method as a prescreen-
in IIMs ing tool for myopathies [77]. In a similar study, a DL neural
ML has been widely used to analyze medical images for tasks network applied to whole-body MRI achieved correct classifica-
such as segmentation, classification and diagnosis. Deep learning tion percentages of 69–77%, and comparable diagnostic prowess
models, particularly convolutional neural networks (CNN), have to radiologists in distinguishing facioscapulohumeral muscular
shown great success in various medical imaging applications, dystrophy (FSHD1) from myositis. DL even corrected radiologists’
including radiology, ophthalmology and pathology [73]. CNNs misclassifications, showcasing its efficacy to generate accurate
are specifically designed to work with images, and their suc- diagnosis from MRI data [78].
cess lies in their ability to learn hierarchical representations of In conclusion, ML has shown its potential to become an indis-
visual features directly from the raw input data (Figure 5). They pensable tool in medical image analysis, providing significant
have shown superior performance compared to traditional ML advantages over classical human-made analysis of IIM biopsies.
methods such as SVMs and RFs. Kabeya and colleagues trained DL models such as CNNs have proven to be highly effective in
a CNN on muscle biopsy images from patients with PM, DM distinguishing between different types of muscle diseases, and in
and IBM, as well as healthy controls. This model accurately dif- some instances outcompeted specialist physicians. TA analysis
ferentiated these IIMs from hereditary muscle diseases, with a provides additional information to aid in diagnosis or prognosis
sensitivity and specificity that outcompeted specialist physicians by quantifying the underlying tissue properties. Unsupervised ND
[74]. provides a promising prescreening tool for myopathies given its
Texture image analysis (TA) is a common method used effectiveness at identifying abnormal or novel patterns in imaging
in radiomics, which involves the extraction of quantitative data. These advanced ML techniques applied to imaging have the
features from digital imaging data (CT, MRI, ultrasound, PET) to potential to, providing faster and more accurate diagnoses and
characterize the underlying tissue properties [75]. TA quantifies prevent patient discomfort associated with invasive conventional
texture attributes like roughness or smoothness, furnishing muscle biopsy.
10 | McLeish et al.
Machine learning models for predicting patients’ multidimensional scaling and hierarchical clustering, to catego-
response to treatments rize subtypes of anti-TIF1-γ + myositis and assess the most critical
Beyond biomarker discovery for disease diagnosis and progno- factors for predicting cancer risk [46]. Among the patients studied,
sis, biomarkers can assist clinicians to make informed decisions 54% had cancer, typically diagnosed within 6 months of myositis
regarding patient treatment strategies. ML has also proven useful diagnosis. The anti-TIF-1γ + myositis patients were grouped into
in predicting treatment responses. For instance, in a study of low, intermediate, or high cancer risk subtypes. Key ML classi-
51 IIM (DM, PM, ASS, INMN) patients. Demographic, clinical and fiers included disease duration, blood lymphocyte percentage,
serological parameters were evaluated to determine the most neutrophil percentage, neutrophil-to-lymphocyte ratio, gender, C-
effective predictors of patients’ response to intravenous and sub- reactive protein (CRP) levels, shawl sign, arthritis/arthralgia, V-
cutaneous administration of immunoglobulins [79]. Previously, neck sign and anti-PM-Scl75 antibodies. Notably, RF achieved an
the evaluation of five supervised ML models showed that elastic accuracy of over 90%, underscoring the potential of ML models
net regression, which combines features of both Ridge regres- in aiding physicians in selecting appropriate cancer screening
sion and Lasso regression, was the most effective model for this strategies for anti-TIF-1γ + myositis patients [46].
application [80]. The authors determined that dysphagia, skin Also, Zhang and colleagues performed LR modelling in a cohort
disorders and the myositis activity index (MITAX) were good pre- of 168 IIM patients including DM, PM, ASS and IMNM to determine
dictors of muscle strength (as measured by the manual mus- the key features that could be used for malignancy prediction [82].
cle testing of eight groups (MMT8)) and found that IVIg ther- Three predictors (age, alanine aminotransferase (ALT) < 80 U/L
apy yielded better results in patients with more active systemic and seropositivity for anti-TIF-1-γ antibodies) were identified as
disease [79]. positive predictors for malignancy while, ILD was found to be a
Anti-SRP antibody-positive IMNM patients are refractory to negative predictor of malignancy. The LR model was as good or
corticosteroids [43], and several clinical risk factors are identi- better than the other ML models including RF, neural network and
fied with refractory disease including, being male, severe muscle extreme gradient boosting at predicting malignancy. The AUC of
weakness and concurrent ILD. In addition, the extent of fatty the ROC was determined at 78.4% [82].
infiltration of thigh muscles over time have been identified as In another study, the researchers examined the medical
predictors of treatment response. ML algorithms have been used records of 397 patients with IIM to identify potential risk
to analyze these pathological factors. Elevated expression of B factors for ILD, other rheumatic diseases and malignancies
cell activating factor receptor (BAFF-R) in muscle tissue has been [83]. Antibodies such as anti-PM/Scl, anti-Ro52, anti-aminoacyl-
identified as predictors of refractory SRP-positive IMNM patients. tRNA synthetase and anti-MDA5 constituted risks for ILD.
Leveraging these refractory related factors and using ML-based Patients with Raynaud’s phenomenon, arthralgia and anti-
predictive models may critically help healthcare professionals to nuclear antibodies were found to be prominent risk factors for
better identify risk-patients and adjust care plans. other overlapping rheumatic diseases. For IIM patients with
associated malignancies, being male and the presence of anti-TIF-
ML approaches for predicting comorbidities 1-γ antibodies were risk factors. Hierarchical clustering generated
IIM’s are complex multisystem autoimmune disorders, involving a subclassification into six subgroups including (1) malignancy
inf lammation and immune system dysfunctions that mainly not overlapping DM, (2) classical DM, (3) PM with severe muscle
only impact skeletal muscle but also affect other tissues and involvement, (4) DM with ILD, (5) PM with ILD and (6) overlapping
organs, including skin, joints and lungs [81]. Given the spec- of myositis with other rheumatic diseases [83].
trum of systems that can be affected, individuals with IIM often Overall, biomarkers can help serve as ML modelling provides
experience comorbidities. These comorbidities can range from numerous benefits to healthcare providers, such as rapidly identi-
rheumatic diseases to ILD, reinforcing the multifaceted nature fying patients who stand to gain from specific treatments, or alter-
of IIM. Understanding and managing these comorbidities are natively may be susceptible to adverse reactions. Additionally, ML
essential aspects of comprehensive patient care. models can help discern patients at risk of certain comorbidities,
As previously mentioned, certain subtypes of myositis are asso- enabling implementation of targeted interventions.
ciated with an increased risk of malignancy, such as DM patients Advantages and limitations of ML for biomarker
with anti-TIF1-γ [46]. It is estimated that one third of myositis discovery
patients will develop a malignancy. In fact, malignancy is the As ML algorithms become increasingly prevalent in the biomedi-
leading cause of death in adults with IIM [81]. Zhao et al. employed cal field, it is important to note that the implementation and inter-
various ML techniques, including Sankey diagrams, elastic net, RF, pretation of these models requires both expertise in data analysis
Table 2: Advantages and limitations of ML for biomarker discovery
Advantages Limitations
Can handle large amounts of data May require significant computational resources
Can detect complex patterns in data May be prone to overfitting or underfitting data
Can be used for real-time prediction May require significant training time
Can improve diagnostic accuracy May be limited by the quality and completeness of data
Can identify new biomarkers and disease subtypes May require expertise in data analysis and machine learning
Can tailor treatment plans for individual patients May not be able to capture all relevant variables in the data
Can reduce human error and bias May raise ethical concerns about the use of AI in healthcare
Can be applied to diverse types of data (e.g. imaging, genomics) May be limited by the availability of high-quality data
Can accelerate drug discovery and development May require collaboration between researchers with different expertise
and domain-specific knowledge. Additionally, the accuracy and the pooling of data and ultimately enhancing the reliability of
generalisability of these models rely heavily on the quality and biomarker discoveries.
quantity of data available for training and testing [9]. For instance, Looking forward, the establishment of patient registries
in transcriptomics studies, reference genomes may lack complete becomes crucial for comprehensive data collation, and the
annotations for certain genes or regions, leading to the complete integration of AI/ML into these registries can provide direct
omission of important transcripts. Furthermore, quantification feedback to clinicians, contributing to personalized treatment
challenges, such as accurate measurement of low-abundance strategies. While challenges and limitations persist, the ongoing
transcripts and susceptibility to noise, further compound these application of ML in IIM research has the potential to revolutionize
issues. Additionally, data acquisition method transparency with our understanding of these diseases, paving the way for more
detailed methodology description is essential to enable methods targeted and efficacious therapies, provided that standardized
generalization, and data reproducibility is essential, as varia- protocols are implemented and adhered to across the scientific
tions in sequencing platforms and bioinformatics pipelines can community.
introduce biases. While the potential benefits of using ML for
biomarker discovery are numerous, it is important to carefully
consider the limitations and potential biases inherent in these Key Points
models (Table 2).
• Integrating of ML into biomarker discovery for IIMs
Furthermore, the lack of standardization in ML modelling for
holds great potential for refining current diagnostic
biomarker discovery is a significant challenge. There is often
paradigms, predicting prognosis and tailoring targeted
variability in the selection of features, model training and evalu-
and effective treatment strategies.
ation metrics, leading to inconsistent or conf licting results. More-
• ML-driven predictive modelling with high-dimensional
over, different ML algorithms may perform differently depending
data reveals novel biomarkers, providing nuanced
on the dataset and the specific research question, making it
insights into the diverse IIM patient population and
difficult to identify the best approach. Efforts are being made to
overcoming challenges presented by its heterogeneity.
address these issues, including the development of standardized
• By leveraging ML, medical image analysis offers rapid
protocols for data sharing and analysis, and the establishment
and effective non-invasive alternatives for diagnosing.
of benchmark datasets for evaluating the performance of ML
• Careful consideration must address the limitations and
algorithms [84].
biases inherent to ML models, emphasizing robust val-
Overall, using ML algorithms to assist and complement con-
idation strategies, transparent documentation of data
ventional human interpretation can help to improve the accuracy,
sources and continuous refinement to ensure reliable
efficiency of biomarker discovery, and may lead to new insights
outcomes.
into disease mechanisms and potential therapeutic targets for IIM
• Implementing standardized protocols across all data,
patients. While ML algorithms have the potential to revolutionize
especially in autoantibody detection and transcrip-
biomarker discovery, it is important to carefully consider the
tomics, is essential. This standardization plays a critical
caveats, limitations and ethics of using these algorithms and to
role to ensuring the reproducibility of ML-driven predic-
validate the results with conventional human interpretation.
tive modelling, thereby bolstering the overall reliability
of biomarker discovery in IIMs.
CONCLUSION
In conclusion, the integration of ML in biomarker discovery for ACKNOWLEDGEMENTS
IIMs holds tremendous promise for advancing our understanding Figures 2-5 were made with Biorender.com and are subject to
of these complex diseases. ML techniques have demonstrated licensing rights.
efficacy in predicting features that can be incorporated into
innovative diagnostic criteria and evaluating the specificity and
sensitivity of these criteria. Numerous studies have underscored
FUNDING
the clinical significance of MSAs as diagnostic, prognostic and The execution of this project received valuable support from the
predictive biomarkers, enabling clinicians to tailor treatment Brain Foundation and the Spinnaker Health Research Founda-
plans and address patient comorbidities effectively. Furthermore, tion through research grants. Additionally, we are grateful for
advancements in medical image analysis present non-invasive the generous contribution made by a late IBM patient, whose
alternatives for diagnosing IIM rapidly and effectively. Overcom- name remains confidential, through a bequest. We would like
ing the challenges posed by the heterogeneity among patients, to acknowledge the Murdoch University Research Training Pro-
ML-based predictive modelling, driven by high-dimensional data gramme Scholarship, awarded to Emily McLeish and Nataliya
from immunophenotyping and multi-omic studies, has unveiled Slater, as well as the Byron Kakulas Scholarship funded by the
novel biomarkers. Perron Institute for Neurological and Translational Science, which
However, the diverse landscape of experimentation and testing, Nataliya Slater is the recipient of. Their scholarships have played
particularly in autoantibody detection and RNA-based transcrip- a significant role in facilitating this research endeavour.
tomic approaches, calls for the establishment of standardized
protocols as imperative to ensure the reproducibility and com-
parability of results across studies. This is especially crucial for
AUTHORS’ CONTRIBUTIONS
the robust implementation of ML-based predictive modelling, Conceptualisation: E.M. Writing—original draft: E.M. Reviewed
which relies heavily on consistent and high-quality data inputs. and edited the manuscript: E.M., N.S., J.D.C., M.N., F.L.M. Figures:
Addressing these standardization challenges is essential for fos- E.M., J.D.C. Funding acquisition: J.D.C. and M.N. Supervision: J.D.C.
tering collaboration among researchers and clinicians, facilitating and M.N. All authors contributed to the article and approved the
12 | McLeish et al.
submitted version. All material in this paper was conceptualized, diagnosis, drug development, and treatment. Medicina (Kaunas)
written and initially reviewed by the authors. 2020;56(9):455. https://doi.org/10.3390/medicina56090455.
13. Sperandei S. Understanding logistic regression analysis. Biochem
Med (Zagreb) 2014;24(1):12–8.
DECLARATION OF GENERATIVE AI AND 14. Shipe ME, Deppen SA, Farjah F, Grogan EL. Developing prediction
AI-ASSISTED TECHNOLOGIES IN THE models for clinical use using logistic regression: an overview. J
WRITING PROCESS Thorac Dis 2019;11(Suppl 4):S574–s84.
During the preparation of this work, the author(s) used ChatGPT 15. Bzdok D, Altman N, Krzywinski M. Statistics versus machine
3.5 was utilized for assistance with punctuation and reducing learning. Nat Methods 2018;15(4):233–4.
content during the revision process. After using this tool, the 16. Bohan A, Peter JB. Polymyositis and dermatomyositis. New Engl J
author(s) reviewed and edited the content as needed and take(s) Med 1975;292(7):344–7.
full responsibility for the content of the publication. 17. Bohan A, Peter JB, Bowman RL, Pearson CM. A computer-assisted
analysis of 153 patients with polymyositis and dermatomyositis.
Medicine 1977;56(4):255–86.
DATA AVAILABILITY STATEMENT 18. Linklater H, Pipitone N, Rose MR, et al. Classifying idiopathic
inflammatory myopathies: comparing the performance of six
The machine learning sunburst plot presented in this article existing criteria. Clin Exp Rheumatol 2013;31(5):767–9 (Epub
is based on the data available in the GitHub repository hosted by 2013/06/29).
EmilyJane994. The repository, titled “Machine-learning-sunburst- 19. Medsger TA, Dawson WN, Masi AT. The epidemiology of
plot,” can be accessed at the following URL: https://github.com/ polymyositis. Am J Med 1970;48(6):715–23.
Emilyjane994/Machine-Learning-sunburst-plot DOI: 10.5281/ 20. DeVere R, Bradley WG. Polymyositis: its presentation, morbidity
zenodo.10445877. Additionally, datasets that were derived from and mortality. Brain 1975;98(4):637–66.
sources in the public domain (Pubmed) can be found here: https://pubmed. 21. Dalakas MC. Polymyositis, dermatomyositis and inclusion-body
ncbi.nlm.nih.gov/?term=Machine+Learning+%26+Inf lammatory+ myositis. N Engl J Med 1991;325(21):1487–98.
myopathies. 22. Tanimoto K, Nakano K, Kano S, et al. Classification criteria
for polymyositis and dermatomyositis. J Rheumatol 1995;22(4):
668–74.
REFERENCES 23. Targoff IN, Miller FW, Medsger TA, Jr, et al. Classification criteria
1. Hočevar A, Rotar Z, Krosel M, et al. Performance of the for the idiopathic inflammatory myopathies. Curr Opin Rheuma-
2017 European league against rheumatism/American College of tol 1997;9(6):527–35.
Rheumatology Classification Criteria for adult and juvenile idio- 24. Dalakas MC, Hohlfeld R. Polymyositis and dermatomyositis.
pathic inflammatory myopathies in clinical practice. Ann Rheum Lancet 2003;362(9388):971–82.
Dis 2018;77(12):e90. 25. Hoogendijk JE, Amato AA, Lecky BR, et al. 119th Enmc interna-
2. Group F-NBW. Best (Biomarkers, Endpoints, and Other Tools) tional workshop: trial design in adult idiopathic inflammatory
Resource. Silver Spring, Bethesda, Maryland: Food and Drug myopathies, with the exception of inclusion body myositis, 10-
Administration (US) National Institutes of Health (US), 12 October 2003, Naarden, the Netherlands. Neuromuscul Disord
2016. 2004;14(5):337–45.
3. Kurashige T, Murao T, Mine N, et al. Anti-Hmgcr antibody- 26. Oddis CV, Medsger TA, Jr. Inflammatory myopathies. Baillieres
positive myopathy shows Bcl-2-positive inflammation and lym- Clin Rheumatol 1995;9(3):497–514.
phocytic accumulations. J Neuropathol Exp Neurol 2020;79(4): 27. Griggs RC, Askanas V, DiMauro S, et al. Inclusion body myositis
448–57. and myopathies. Ann Neurol 1995;38(5):705–13.
4. Benveniste O, Goebel HH, Stenzel W. Biomarkers in 28. Badrising UA, Maat-Schieman M, van Duinen SG, et al. Epi-
inflammatory myopathies-an expanded definition. Front Neurol demiology of inclusion body myositis in the Netherlands: a
2019;10:554. nationwide study. Neurology 2000;55(9):1385–7.
5. Greenberg SA, Pinkus JL, Pinkus GS, et al. Interferon-alpha/beta- 29. Stefen B, Waney S, Caroline S, et al. A retrospective cohort
mediated innate immune mechanisms in dermatomyositis. Ann study identifying the principal pathological features useful in
Neurol 2005;57(5):664–78. the diagnosis of inclusion body myositis. BMJ Open 2014;4(4):
6. Ladislau L, Suárez-Calvet X, Toquet S, et al. Jak inhibitor improves e004552.
type I interferon induced damage: proof of concept in dermato- 30. Hilton-Jones D, Miller A, Parton M, et al. Inclusion body myositis:
myositis. Brain 2018;141(6):1609–21. Mrc Centre for Neuromuscular Diseases, Ibm workshop, London,
7. Paik JJ, Casciola-Rosen L, Shin JY, et al. Study of tofacitinib in 13 June 2008. Neuromuscul Disord 2010;20(2):142–7.
refractory dermatomyositis: an open-label pilot study of ten 31. Allenbach Y, Mammen AL, Benveniste O, et al. 224th Enmc inter-
patients. Arthritis Rheumatol 2021;73(5):858–65. national workshop:: Clinico-Sero-pathological classification
8. Chinoy H, Lilleker JB. Pitfalls in the diagnosis of myositis. Best of immune-mediated necrotizing myopathies Zandvoort, the
Pract Res Clin Rheumatol 2020;34(1):101486. Netherlands, 14-16 October 2016. Neuromuscul Disord 2018;28(1):
9. Badillo S, Banfai B, Birzele F, et al. An introduction to machine 87–99.
learning. Clin Pharmacol Ther 2020;107(4):871–85. 32. Rose MR. 188th Enmc international workshop: inclusion body
10. Choi RY, Coyner AS, Kalpathy-Cramer J, et al. Introduction to myositis, 2-4 December 2011, Naarden, the Netherlands. Neuro-
machine learning, neural networks, and deep learning. Transl Vis muscul Disord 2013;23(12):1044–55.
Sci Technol 2020;9(2):14. 33. Schmidt J. Current classification and management of inflam-
11. Min S, Lee B, Yoon S. Deep learning in bioinformatics. Brief matory myopathies. J Neuromuscul Dis 2018;5(2):109–29.
Bioinform 2017;18(5):851–69. 34. Lundberg IE, Miller FW, Tjärnlund A, et al. Diagnosis and clas-
12. Rajula HSR, Verlato G, Manchia M, et al. Comparison of conven- sification of idiopathic inflammatory myopathies. J Intern Med
tional statistical methods with machine learning in medicine: 2016;280(1):39–51.
35. Eng SWM, Olazagasti JM, Goldenberg A, et al. A clinically and sporadic inclusion body myositis: report of 40 patients from
biologically based subclassification of the idiopathic inflamma- a single neuromuscular Center. Neuromuscul Disord 2018;28(8):
tory myopathies using machine learning. ACR Open Rheumatol 660–4.
2020;2(3):158–66. 54. Paul P, Liewluck T, Ernste FC, et al. Anti-Cn1a antibodies do not
36. Parker MJS, Oldroyd A, Roberts ME, et al. The performance of correlate with specific clinical, electromyographic, or patholog-
the European league against rheumatism/American College of ical findings in sporadic inclusion body myositis. Muscle Nerve
Rheumatology Idiopathic Inflammatory Myopathies Classifi- 2021;63(4):490–6.
cation Criteria in an expert-defined 10 year incident cohort. 55. Lucchini M, Maggi L, Pegoraro E, et al. Anti-Cn1a antibodies are
Rheumatology (Oxford) 2019;58(3):468–75. associated with more severe dysphagia in sporadic inclusion
37. Triplett JD, Shelly S, Livne G, et al. Diagnostic modelling and ther- body myositis. Cell 2021;10:1146.
apeutic monitoring of immune-mediated necrotizing myopathy: 56. Lilleker JB, Rietveld A, Pye SR, et al. Cytosolic 5’-Nucleotidase
role of electrical Myotonia. Brain Commun 2020;2(2):fcaa191. 1a autoantibody profile and clinical characteristics in inclusion
38. Lloyd TE, Mammen AL, Amato AA, et al. Evaluation and con- body myositis. Ann Rheum Dis 2017;76(5):862–8.
struction of diagnostic criteria for inclusion body myositis. Neu- 57. Betteridge Z, McHugh N. Myositis-specific autoantibodies: an
rology 2014;83(5):426–33. important tool to support diagnosis of myositis. J Intern Med
39. Troyanov Y, Targoff IN, Tremblay JL, et al. Novel classi- 2016;280(1):8–23.
fication of idiopathic inflammatory myopathies based on 58. Dzangué-Tchoupou G, Mariampillai K, Bolko L, et al. Cd8+(T-
overlap syndrome features and autoantibodies: analysis of bet+) cells as a predominant biomarker for inclusion body
100 French Canadian patients. Medicine (Baltimore) 2005;84(4): myositis. Autoimmun Rev 2019;18(4):325–33.
231–49. 59. Wilfong EM, Bartkowiak T, Vowell KN, et al. High-dimensional
40. Mariampillai K, Granger B, Amelin D, et al. Development of a new analysis reveals distinct Endotypes in patients with idiopathic
classification system for idiopathic inflammatory myopathies inflammatory myopathies. Front Immunol 2022;13:756018.
based on clinical manifestations and myositis-specific autoan- 60. E-aD A, Davis KL, Tadmor MD, et al. Visne enables visual-
tibodies. JAMA Neurol 2018;75(12):1528–37. ization of high dimensional single-cell data and reveals phe-
41. McHugh NJ, Tansley SL. Autoantibodies in myositis. Nat Rev notypic heterogeneity of Leukemia. Nat Biotechnol 2013;31(6):
Rheumatol 2018;14(5):290–302. 545–52.
42. Weeding E, Tiniakou E. Therapeutic management of immune- 61. Van Gassen S, Callebaut B, Van Helden MJ, et al. Flowsom:
mediated necrotizing myositis. Curr Treatm Opt Rheumatol using self-organizing maps for visualization and interpretation
2021;7(2):150–60. of cytometry data. Cytometry A 2015;87(7):636–45.
43. Zhao Y, Zhang W, Liu Y, et al. Factors associated with refractory 62. Weber LM, Robinson MD. Comparison of clustering methods
autoimmune necrotizing myopathy with anti-signal recognition for high-dimensional single-cell flow and mass cytometry data.
particle antibodies. Orphanet J Rare Dis 2020;15(1):181. Cytometry A 2016;89(12):1084–96.
44. Levy D, Nespola B, Giannini M, et al. Significance of Sjögren’s 63. Quintelier K, Couckuyt A, Emmaneel A, et al. Analyzing
syndrome and anti-Cn1a antibody in myositis patients. Rheuma- high-dimensional cytometry data using flowsom. Nat Protoc
tology 2021;61(2):756–63. 2021;16(8):3775–801.
45. Salam S, Dimachkie MM, Hanna MG, Machado PM. Diagnostic 64. Pedersen CB, Olsen LR. Algorithmic clustering of single-cell
and prognostic value of anti-Cn1a antibodies in inclusion body cytometry data-how unsupervised are these analyses really?
myositis. Clin Exp Rheumatol 2022;40(2):384–93. Cytometry A 2020;97(3):219–21.
46. Zhao L, Xie S, Zhou B, et al. Machine learning algorithms identify 65. Anchang B, Hart TDP, Bendall SC, et al. Visualization and cellular
clinical subtypes and cancer in anti-Tif1γ + myositis: a longitu- hierarchy inference of single-cell data using Spade. Nat Protoc
dinal study of 87 patients. Front Immunol 2022;13:802499. 2016;11(7):1264–79.
47. Ichimura Y, Konishi R, Shobo M, et al. Anti-nuclear matrix 66. Ye Y, Zhang X, Li T, et al. Two distinct immune cell signatures
protein 2 antibody-positive inflammatory myopathies repre- predict the clinical outcomes in patients with amyopathic der-
sent extensive myositis without Dermatomyositis-specific rash. matomyositis with interstitial lung disease. Arthritis Rheumatol
Rheumatology (Oxford) 2022;61(3):1222–7. 2022;74(11):1822–32.
48. Lu X, Peng Q, Wang G. The role of cancer-associated autoanti- 67. Jin Q, Fu L, Yang H, et al. Peripheral lymphocyte count
bodies as biomarkers in paraneoplastic myositis syndrome. Curr defines the clinical phenotypes and prognosis in patients with
Opin Rheumatol 2019;31(6):643–9. anti-Mda5-positive dermatomyositis. J Intern Med 2023;293(4):
49. Yang H, Peng Q, Yin L, et al. Identification of multiple 494–507.
cancer-associated myositis-specific autoantibodies in idiopathic 68. Chen K, Zhu CY, Bai JY, et al. Identification of feature genes
inflammatory myopathies: a large longitudinal cohort study. and key biological pathways in immune-Med0iated necrotiz-
Arthritis Res Ther 2017;19(1):259. ing myopathy: high-throughput sequencing and bioinformatics
50. Allenbach Y, Uzunhan Y, Toquet S, et al. Different phenotypes in analysis. Comput Struct Biotechnol J 2023;21:2228–40.
Dermatomyositis associated with anti-Mda5 antibody: study of 69. Pinal-Fernandez I, Casal-Dominguez M, Derfoul A, et al. Machine
121 cases. Neurology 2020;95(1):e70–8. learning algorithms reveal unique gene expression profiles in
51. Mavroudis I, Knights M, Petridis F, et al. Diagnostic accuracy of muscle biopsies from patients with different types of myositis.
anti-Cn1a on the diagnosis of inclusion body myositis. A hierar- Ann Rheum Dis 2020;79(9):1234–42.
chical bivariate and Bayesian meta-analysis. J Clin Neuromuscul 70. Liu D, Zhao L, Jiang Y, et al. Integrated analysis of plasma
Dis 2021;23(1):31–8. and urine reveals unique Metabolomic profiles in idiopathic
52. van de Schoot R, Depaoli S, King R, et al. Bayesian statistics and inflammatory myopathies subtypes. J Cachexia Sarcopenia Muscle
modelling. Nat Rev Methods Primers 2021;1(1):1. 2022;13(5):2456–72.
53. Felice KJ, Whitaker CH, Wu Q, et al. Sensitivity and clinical utility 71. Kang J, Kim JY, Jung Y, et al. Identification of metabolic signa-
of the anti-cytosolic 5’-Nucleotidase 1a (Cn1a) antibody test in ture associated with idiopathic inflammatory myopathy reveals
14 | McLeish et al.
polyamine pathway alteration in muscle tissue. Metabolites 79. Danieli MG, Tonacci A, Paladini A, et al. A machine learning anal-
2022;12(10):1004. https://doi.org/10.3390/metabo12101004. ysis to predict the response to intravenous and subcutaneous
72. Murakami A, Noda S, Kazuta T, et al. Metabolome and transcrip- immunoglobulin in inflammatory myopathies. A proposal for a
tome analysis on muscle of sporadic inclusion body myositis. future multi-omics approach in autoimmune diseases. Autoim-
Ann Clin Transl Neurol 2022;9(10):1602–15. mun Rev 2022;21(6):103105.
73. Lundervold AS, Lundervold A. An overview of deep learning 80. Engebretsen S, Bohlin J. Statistical predictions with Glmnet. Clin
in medical imaging focusing on Mri. Z Med Phys 2019;29(2): Epigenetics 2019;11(1):123.
102–27. 81. Oldroyd AGS, Allard AB, Callen JP, et al. A systematic review
74. Kabeya Y, Okubo M, Yonezawa S, et al. Deep convolutional neural and meta-analysis to inform cancer screening guidelines in
network-based algorithm for muscle biopsy diagnosis. Lab Invest idiopathic inflammatory myopathies. Rheumatology 2021;60(6):
2022;102(3):220–6. 2615–28.
75. Bharati MH, Liu JJ, MacGregor JF. Image texture analysis: 82. Zhang W, Huang G, Zheng K, et al. Application of logistic
methods and comparisons. Chemom Intel Lab Syst 2004;72(1): regression and machine learning methods for idiopathic inflam-
57–71. matory myopathies malignancy prediction. Clin Exp Rheumatol
76. Nagawa K, Suzuki M, Yamamoto Y, et al. Texture analysis of mus- 2023;41:330–9.
cle Mri: machine learning-based classifications in idiopathic 83. Zhu J, Wu L, Zhou Y, et al. A retrospective cohort study in Chinese
inflammatory myopathies. Sci Rep 2021;11(1):9821. patients with adult Polymyositis and Dermatomyositis: risk of
77. Burlina P, Joshi N, Billings S, et al. Deep embeddings for novelty comorbidities and subclassification using machine learning.
detection in myopathy. Comput Biol Med 2019;105:46–53. Clin Exp Rheumatol 2022;40:224–36.
78. Fabry V, Mamalet F, Laforet A, et al. A deep learning tool with- 84. Feng J, Phillips RV, Malenica I, et al. Clinical artificial intelligence
out muscle-by-muscle grading to differentiate myositis from quality improvement: towards continual monitoring and updat-
facio-scapulo-humeral dystrophy using Mri. Diagn Interv Imaging ing of Ai algorithms in healthcare. NPJ Dig Med 2022;5(1):66.
2022;103(7):353–9. https://doi.org/10.1038/s41746-022-00611-y.

2024 Biomarkers DL ML IIM

Uploaded by

Copyright:

Available Formats

2024 Biomarkers DL ML IIM

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2024 Biomarkers DL ML IIM

Uploaded by

Copyright:

Available Formats

Briefings in Bioinformatics, 2024, 25(1), 1–14

From data to diagnosis: how machine learning is

INTRODUCTION spectrum of disease severities or to stratify rapidly progressing

Polymyositis and dermatomyositis diagnostic criteria

Medsger et al. [19] X X X X X X Not validated

IBM-specific diagnostic criteria

Griggs criteria [27] X X X X X X Sensitivity: 11%–100%

a Criteria based on high-performing features from other criteria.

IMNM-specific diagnostic criteria

Triplett et al. [37] X X X X X X AUC ROC 97.1%

Overlap myositis specific criteria

Troyanov et al. [39] X X X X Sensitivity 87%

a Modified Bohan & Peter clinic-serological classifications

All IIM diagnostic criteria

2017 EULAR/ACR [1] X X X X X X 93%; 88%; Reviewed by [33]

operating characteristic [AUC ROC]). The authors determined that

Table 2: Advantages and limitations of ML for biomarker discovery

You might also like