Idiopathic inf lammatory myopathies (IIMs) are a heterogeneous group of muscle disorders including adult and juvenile dermatomyosi-
tis, polymyositis, immune-mediated necrotising myopathy and sporadic inclusion body myositis, all of which present with variable
symptoms and disease progression. The identification of effective biomarkers for IIMs has been challenging due to the heterogeneity
between IIMs and within IIM subgroups, but recent advances in machine learning (ML) techniques have shown promises in identifying
novel biomarkers. This paper reviews recent studies on potential biomarkers for IIM and evaluates their clinical utility. We also
explore how data analytic tools and ML algorithms have been used to identify biomarkers, highlighting their potential to advance
our understanding and diagnosis of IIM and improve patient outcomes. Overall, ML techniques have great potential to revolutionize
biomarker discovery in IIMs and lead to more effective diagnosis and treatment.
Keywords: idiopathic inf lammatory myopathies; machine learning; biomarkers; myositis-specific autoantibodies
Figure 1. Interactive starburst plot showing machine learning categories, subsets, and algorithms. The interactive starburst plot displays the main
categories of machine learning, including supervised and unsupervised learning, and their corresponding subsets and algorithms. Users can explore
the plot to gain insights into the various machine learning techniques and their applications. This is an interactive plot: follow link: https://chart-
studio.plotly.com/~Emilymc/3.embed GAN: generative adversarial networks, RNN: recurrent neural networks, CNN: convolutional neural networks, AE:
autoencoders, BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies, OPICS: Ordering Points To Identify the Clustering Structure, SARSA:
ML algorithms can be classified into five main categories In reinforcement learning, the algorithm learns to make deci-
(Figure 1): supervised, unsupervised, semi-supervised, reinforce- sions based on feedback from its environment [10]. This is often
ment learning and deep learning [9, 10]. used in game playing or robotics. Examples of these algorithms
In supervised learning, the algorithm is trained on labelled data include Q-learning and deep reinforcement learning networks.
to learn mapping from inputs to outputs. This approach is best Deep learning involves training artificial neural networks to
applied for classification or regression tasks [9]. Examples include recognize patterns in data. These neural networks are made up of
decision trees and support vector machines (SVMs). layers of interconnected nodes that process and transform input
In unsupervised learning, the algorithm is trained on unla- data to produce an output [11]. These networks often possess
belled data to find patterns or relationships within the data [10]. multiple layers, allowing them to learn complex representations
This method is used for clustering or dimensionality reduction of the input data. Deep learning has been particularly successful
tasks. Examples include k-means clustering and PCA. in applications such as image and speech recognition, natural
Semi-supervised learning involves training an algorithm from language processing and autonomous driving [11].
both labelled and unlabelled data. In this approach, the algorithm
is provided with some labelled data to learn from, and then it ML modelling versus statistical modelling
translates this knowledge to make predictions on the unlabelled ML and statistical modelling (SM) are two popular approaches for
data. analyzing medical and scientific data (Figure 2). Although ML and
Figure 2. The relationship between artificial intelligence, machine learning, deep learning and data science. The diagram highlights how these fields
build on each other to provide advanced solutions for data-driven problems. Figure created with Biorender.com
SM are related fields, they are not synonymous. Statistics is a trained on new data input that becomes available. SM is a valuable
branch of mathematics that deals with collecting, analyzing and approach, particularly when the underlying mechanisms of the
interpreting data. ML, on the other hand, constitutes a branch of data are known, and when the research question is confirma-
artificial intelligence dedicated to devising algorithms and mod- tory [12]. Both methods have their advantages and disadvan-
els that enable computers to acquire knowledge from data and tages, and researchers must consider their research aims and the
generate predictions [12]. Although ML has a strong mathematical nature of their data before deciding which approach is the most
foundation, incorporating statistical techniques like inference, appropriate.
hypothesis testing and regression analysis, it also extends beyond
traditional methods with techniques like neural networks, deep
learning and natural language processing to address complex ML models for diagnosis of IIM and subgroups of
problems. IIM
In some cases, ML and SM overlap, as seen with logistic regres- Accurate diagnosis of a type of IIM is often challenging, as
sion (LR) analysis. LR is conventionally considered a statistical many muscle conditions possess overlapping clinical features
model that determines the odds ratio (OR) based on binary out- and laboratory findings [8]. Recent ML models have been
comes, where the dependent variable has two possible categorical applied to multiple different patient datasets, including clinical,
values [13]. For example, the binary outcome of not having or histopathological and imaging data and have provided new
having a disease can be represented as 0 and 1, respectively. opportunities for improving the accuracy and speed of IIM
However, LR can also be viewed as a supervised ML model since diagnosis.
it utilizes a training dataset to learn the relationship between Earlier diagnostic criteria for IIM such as the Bohan and Peter
predictor variables and the binary outcome. Once trained, the criteria predominantly focused on distinguishing between DM
LR model can make predictions on new data by estimating the and PM as many other myopathies such as IBM and IMNM had
probability of the binary outcome [14]. not yet been separated from PM [16]. These initial classifica-
One of the key differences between ML and SM is the focus tions heavily relied on clinical presentation and histology, both
of each field. Statistics is primarily concerned with comparing of which required a high level of medical expertise for interpre-
and summarizing data, while ML is focused on building predic- tation (Table 1: Polymyositis and dermatomyositis diagnostic criteria).
tive models [12, 15]. Furthermore, statistics typically deal with However, later adaptions of the Bohan and Peter criteria uti-
relatively small and carefully defined datasets, while ML mostly lized ‘computer-assisted analysis’ although it was unclear what
involves working with large and complex datasets. methods specifically this refers to in this study [17]. More recent
ML models are powerful tools for exploratory research because criteria for IIM combine the clinical, histological and serology (pre-
they identify complex patterns in large and high-dimensional dominantly myositis-specific or myositis-associated antibodies
datasets; they can also handle missing data. They make no prior (MSA); see below). However, in many of these earlier publications,
assumptions about the data and are f lexible as they can be validation of the specificity and sensitivity for these criteria were
Table 1: Specificity, sensitivity and characteristics of various diagnostic and classification criteria for IIM adapted from Lundberg
et al. [34]
Criteria Symptom Age Muscle Muscle Muscle EMG Muscle Extramus- MSA Sensitivity and specificity
duration weakness pain biopsy enzymes cular
either not performed at the time or have been conducted in later IIM diagnostic criteria). This criterion includes 12 clinical features
studies [18–34] (Table 1). including age, gender, the pattern of muscle weakness, skin mani-
A recent diagnostic data-derived criteria for IIM is the 2017 festations,
European League Against Rheumatism/American College of laboratory features such as elevated serum creatine kinase (CK)
Rheumatology (EULAR/ACR) classification criteria [1] (Table 1: All concentrations, presence of autoantibodies and histopathological
In another study, Mariampillai and colleagues stratified IIM models are updated based on new data or information, generating
patients using unsupervised multiple correspondence analysis posterior probabilities. This adaptability makes them valuable in
and hierarchical clustering [40]. This approach successfully cat- decision-making processes and modelling scenarios with uncer-
egorized the patients into four well-defined groups DM, IBM, tain or limited data. While not strictly considered ML, Bayesian
IMNM and ASS; the patients who had previously been diagnosed models are used in data analysis and diagnostics [52]. Contrary
with PM fell into the IMNM or ASS groups, suggesting that PM to other studies, they found that anti-cN1A antibodies could not
was no longer considered a separate diagnosis. Additionally, the effectively discriminate IBM from other conditions like PM/DM
algorithm showed that once again, MSA and myositis-associated and MND. However, the variability in testing methodologies used
antibodies (MAAs) were crucial for IIM classification, emphasizing across studies has introduced potential bias, given the lack of
the utility of autoantibody detection for accurate diagnosis [40]. standardized protocols [51]. Furthermore, the prognostic value of
Further study will determine whether serological testing could anti-cN1A antibodies in IBM has produced inconsistent conclu-
replace the need for muscle biopsies. Both of these studies have sions. With several studies noting limited prognostic value [53–55],
highlighted the potential of unsupervised ML algorithms for iden- while another study reported that seropositive patients showed
tifying clinically and biologically homogeneous patient groups increased mortality risk, less proximal upper limb weakness at
and underscored the unique contribution of ML for identifying disease onset, and an increased cytochrome c oxidase (COX)-
biomarkers for IIM. deficient muscle fibres [56].
The use of ML algorithms to identify unique MSA profiles
Clinical utility of autoantibodies in IIM associated with distinct clinical features has greatly improved
Myositis-specific antibodies being detected in approximately 60– IIM stratification. However, it is important to acknowledge that
70% of affected IIM patients have emerged as pivotal biomarkers there are various methods used for antibody detection, with
and offer a unique lens to dissect the heterogeneity within the each method having varying degree of specificity and sensitivity
IIM spectrum. Anti-Jo-1 autoantibodies are detected in ASS, while [57]. As discussed in the instance with anti-cN1A antibodies in
anti-Mi-2, which represent the most extensively studied MSAs, is IBM, discrepancies in methodologies can lead to contradictory
specific for DM [41]. Other examples of MSAs that are strongly results regarding the autoantibodies’ clinical utility. This illus-
associated with IMNM include antibodies to the SRP and to 3- trates the crucial point that computational methods’ reliability
hydroxy-3-methylglutaryl CoA reductase (HMGCR) [42, 43]. Anti- for biomarker discovery are limited by the accuracy of the detec-
cytosolic 5 -nucleotidase 1 A (cN1A) antibodies have been found in tion methods. As such, careful consideration and validation of
autoimmune diseases such as SS and SLE; however, in the context detection methodologies are imperative for accurate and mean-
of IIMs, they are restricted to IBM [44, 45]. ingful results.
The clinical utility of MSAs extends beyond their ability to
differentiate subgroups of IIMs. They have demonstrated an asso-
ciation with distinct clinical attributes, thereby aiding in prognosis Immunophenotyping as a tool for identifying
prediction and treatment planning. For instance, the presence of biomarkers in inflammatory myopathies
certain MSAs, like anti-TIF1-γ , anti-NXP2 and anti-HMGCR, has Immunophenotyping has become an essential tool for uncover-
long been linked to an elevated risk of malignancy in IIM patients ing novel biomarkers for inf lammatory myopathies. This entails
(described more in detail below) [46–49]. Moreover, anti-MDA5 a systematic exploration of the subsets, activation state and
antibodies, which are often found in amyopathic DM, and are a differentiation pattern of immune cell across various biological
risk factor for rapidly progressive interstitial lung disease (ILD), samples such as peripheral blood, muscles and other affected
particularly among Eastern-Asian populations [41]. Furthermore, tissues. This approach can also involve deciphering the cytokines
the presence of anti-HMGCR IMNM, particularly when statin- and chemokines that these cells produce. While certain studies
associated, are often associated with a good response to treatment have utilized ML algorithms to analyze extensive immunophe-
[42] alternatively, approximately 30% of IMNM patients with anti- notyping data produced by techniques like f low cytometry and
SRP antibodies are often refractory to steroid treatments [43]. mass cytometry, there appears to be a preference for dimen-
Embracing the capability of unsupervised hierarchical clus- sionality reduction techniques in the broader landscape. [58, 59].
tering analysis, Allenbach and colleagues [50] explored the phe- These techniques, including Uniform Manifold Approximation
notypic landscape of anti-MDA5 antibody positive patients and and Projection for Dimension Reduction, t-distributed stochastic
found that patients could be stratified into three different sub- neighbour embedding (tSNE/ viSNE), are dimensionality reduc-
groups. In the initial subset, patients faced a rapidly progress- tion algorithms [60], whereas FlowSOM (Self-Organizing Map)
ing ILD which also corresponded to an elevated mortality rate. and Spanning-tree Progression Analysis of Density-normalized
The second cluster predominantly displayed dermatological and Events (SPADE) cluster cells based on similarities in their sur-
rheumatological symptoms, offering a more favourable prognosis. face markers [61–65]. They have become a prominent feature
Lastly, the third group exhibited severe skin vasculopathy, were in the analysis of single-cell technologies, including f low and
mostly male, and had an intermediate prognosis in comparison mass cytometry, as well as scRNA-seq (Figure 4). They are often
to the other two patient groups [50]. considered an improved alternative to manual gating, as they
The presence of anti-cN1A antibodies has been proposed as offer an unbiased and systematic exploration of the data [64].
a potential biomarker for IBM but its diagnostic and prognostic For instance, Dzangué-Tchoupou and colleagues [58] performed
significance remains uncertain. Sensitivity and specificity of anti- comprehensive immune profiling of peripheral blood cells from
cN1A detection for IBM diagnosis have shown wide variability, 18 IBM patients, 26 other IIM patients and 16 HC through mass
ranging from 32.8% to 88.6% and 80% to 100%, respectively [45]. cytometry. By leveraging SPADE, CITRUS and classification and
A meta-analysis by Mavroudis and colleagues explored its diag- regression trees (CART) algorithms, along with receiver operat-
nostic utility using Bayesian models [51]. Bayesian models are a ing characteristics curves, they identified that a frequency of
statistical approach that integrates prior knowledge, sourced from CD8+ , T-bet+ cells exceeding 51.5% provided a potential diag-
previous studies or expert assumptions about the data. These nostic biomarker specific to IBM exhibiting high sensitivity and
Figure 4. Pipeline for the main steps in the FlowSOM analysis. (A) Data Preparation and quality control checks (i) The fcs-files are read, (ii) compensated,
(iii) QC checked and (iv) concatenated. (B) FlowSOM model training and evaluation of model quality. (v) The model is trained and visualization is shown as
a minimum-spanning tree, which is composed of multiple inter-connecting nodes. (vi) Each node comprises a start chart of different colours representing
an immune marker. (vii) Example of a start chart with mean immune marker values. (C) Analysis of FlowSOM model using other visualization tools
such as (viii) clustering analysis via t-SNE map, (ix) heatmaps or (x) differential analysis which can be used to infer biological conclusions about the
data. Figure created with Biorender.com
specificity. Similarly, Wilfong et al. [59], dissected mass cytome- The study identified two distinct clusters correlating with differ-
try data by integrating t-SNE, CITRUS and marker enrichment ent disease activities and clinical outcomes in ADM-ILD. Cluster 1
modelling (MEM). The authors revealed shared immunological was characterized by an enrichment of activated CD45RA+ , HLA-
features across 17 IIM patients (6 DM, 4 PM, 7 ASS) including a DR+ and CD8+ T cells with decreased proportion of the CD56dim
decreased expression of the activation marker CD180 on B cells NK cell subset that correlated with a higher prevalence of rapidly
and the homing marker CXCR3 on T cells, relative to healthy progressive ILD and higher mortality rate. In contrast, cluster 2
controls. Additionally, two distinct subgroups of IIM patients could was characterized by abundant non-activated T cells and had
be delineated. The first group demonstrated an upregulation of favourable clinical outcomes with survival rate over 6 years higher
CXCR4 across all cell populations, with the authors suggesting this than cluster 1. These findings suggest that peripheral immuno-
upregulation may be associated with increased diseased severity. logical features may be used to stratify ADM-ILD patients and
Alternatively, increased frequency of the CD19+ , CD21lo , CD11c+ correlate with differential disease severity and clinical outcomes
and CD3+ , CD4+ , PD1+ delineated the second IIM subsets and [66]. Through hierarchical clustering and Balanced Random For-
represented a pro-fibrotic phenotype [59]. est Models, distinct clusters surfaced, bearing correlations with
Supervised classification algorithms, like SVMs and random different disease activities and outcomes. Notably, these clusters
forests (RF), are also popular methods for analyzing immunophe- showcased variety of immune cell subset frequencies, each tied to
notyping data. These algorithms recognize patterns and complex divergent prognosis. Similar strategies were adopted in delineat-
relationships between surface and intracellular marker expres- ing 421 DM patients with anti-MDA5 antibodies, into three distinct
sion, fuelling predictive models for disease diagnosis and progno- prognostic clusters based on lymphocyte counts [67]. Specifically,
sis. In a study by Ye and colleagues, immune signatures were scru- the arthritis-associated cluster demonstrates elevated lympho-
tinized in 82 amyopathic dermatomyositis with interstitial lung cyte counts and boasts the most favourable prognosis, suggesting
disease (ADM-ILD) patients and 82 HC [66]. Patients were stratified a subset with a more positive disease trajectory. In contrast, the
based on their immune cell subset frequencies using hierarchical rapidly progressive interstitial lung disease (RP-ILD) cluster is
clustering analysis followed by supervised ML methods (Balanced characterized by the lowest peripheral lymphocyte levels and an
Random Forest Model) to identify the subsets of predictive value. unfavourable prognosis, highlighting a subgroup with heightened
disease severity. Additionally, the cluster associated with the typ- In addition to genomics, other investigative approaches include
ical DM rash presents a moderate peripheral lymphocyte count, metabolomics. It can be applied to various biof luids, including
indicating an intermediate prognosis—offering a nuanced under- blood and urine that are more easily accessible compared to
standing of disease outcomes within this particular subgroup. invasive muscle biopsies and traditional histological analysis. ML
Overall, immunophenotyping has become an essential tool models have emerged as valuable tools for identifying biomarkers
for identifying novel biomarkers and understanding the com- and unravelling molecular mechanisms from metabolomic data.
plex immune system dysregulation that occurs in inf lammatory In a recent study conducted by Liu et al., supervised classification
myopathies. Dimensionality reduction and ML algorithms, partic- algorithms such as RF and AdaBoost were effectively employed
ularly those involving unsupervised clustering, have revolution- to detect perturbations in metabolic pathways across various
ized the way researchers approach data analysis in single-cell subtypes of IIMs. The study encompassed 52 healthy donors
technologies. They allow for the unbiased identification of cell and 79 major IIM subtypes, including DM, ASS, IMNM and MSA-
populations, which provides insights into the aetiopathology of defined subtypes, such as anti-Mi2+ , anti-MDA5+ , anti-TIF1γ + ,
disease, offering new avenues for the more accurate diagnosis and anti-Jo1+ , anti-PL7+ , anti-PL12+ , anti-EJ+ and anti-SRP+ . The anal-
identification of novel therapeutic targets. Moreover, these find- ysis revealed significant disturbances in fatty acid biosynthesis in
ings underscore the heterogeneity of inf lammatory myopathies both plasma and urine samples, with several metabolites exhibit-
and the potential utility of distinct biomarker profiles in predict- ing differential expression across various IIM subtypes. Notably,
ing and managing diverse clinical trajectories. creatine in plasma was identified as a potential specific biomarker
for the INMN while tiglylcarnitine in urine showed promise as
a distinctive biomarker for anti-glycyl tRNA synthetase (anti-Ej)
Leveraging the power of ML on multi-omic data subtype of ASS. Additionally, 16 shared metabolites were detected
helps unveil mechanism-based pathways in IIM among the plasma and urine samples of different IIM subtypes
Multi-omics profiling studies is a rapidly emerging field that aims [70].
to integrate data from genomics, transcriptomics, proteomics Kang et al. [71] conducted a comparative study using ML
and metabolomics to obtain a comprehensive understanding of techniques to identify metabolic differences among IIM patients,
biological systems. The generation of multi-omic meta datasets 30 ankylosing spondylitis (AS) patients and 10 HC. They employed
has significantly increased the complexity of analysis, which supervised ML models, including linear regression, RF and SVMs,
demands greater computational power for processing and anal- and discovered seven distinct metabolites, including branched-
ysis. The majority of multi-omic studies in IIM have utilized chain amino acids (BCAAs), biogenic amines and lipids, that
supervised classification-based methods such as SVMs, linear effectively distinguished IIM patients from both healthy controls
regression and RFs as well as dimensionality reduction methods and AS groups. Notably, elevated levels of specific amino acids,
such as Partial Least Squares Discriminant Analysis (PLS-DA). like BCAAs, were associated with inf lammation through mTORC1
These methods identify patterns and complex relationships in the activation. The study also explored metabolic changes in skeletal
data that would be difficult to identify using traditional statistical muscles using a mouse model of IIM induced by C-protein
methods alone. immunogens, identifying 68 significantly altered metabolites.
Using high-throughput RNA sequencing in muscles isolated Pathway analysis indicated a significant decrease in spermine
from 18 IMNM patients and 10 HC, Chen and coworkers [68] iden- and spermidine levels, indicative of polyamine pathway down-
tified 193 differentially expressed genes associated with inf lam- regulation. Furthermore, changes in metabolites related to beta-
matory immune responses, cardiac muscle contraction, skele- alanine and histidine metabolisms suggested potential muscle
tal muscle regulation and lipoprotein metabolism. Three fea- cell damage during inf lammation [71].
ture genes, LTK, MYBPH and MYL4 that are associated with the In another study, a combined metabolomic and transcrip-
autophagy-lysosome pathway and muscle inf lammation were tomic analysis of 14 IBM muscle samples revealed specific
identified as potential biomarker genes for IMNM with an accu- metabolic alterations. [72]. Employing the widely used Partial
racy of 97.3% using the least absolute shrinkage selection operator Least Squares Discriminant Analysis (PLS-DA) model, the
(LASSO) and SVM-recursive feature elimination (SVM-RFE) algo- researchers deciphered complex relationships, identifying 198
rithms [68]. metabolites linked to upregulated histamine biosynthesis and
Pinal-Fernandez and colleagues applied ML algorithms to mus- were associated with an accumulation of mast cells in IBM.
cle isolated from 20 HC and 119 myositis patients (39 with DM, 49 The glycosaminoglycan pathways were notably upregulated,
with IMNM, 18 with anti-Jo1-positive AS and 13 with IBM). RNA- as evident from the excessive chondroitin sulphate levels
transcriptomic analysis found over 10,000 unique gene expression observed in both metabolomic and transcriptomic analyses.
patterns that distinguish DM, AS, IMNM and IBM from HC [69]. Histopathological examinations further corroborated these find-
The support vector ML algorithm demonstrated >90% accuracy in ings, confirming the presence of substantial chondroitin sulphate
classifying patients. Further investigations using recursive feature accumulations within the muscle tissues of IBM patients. Notably,
elimination identified genes that were overexpressed in one type deficiencies in key energy metabolism molecules, carnitine and
of myositis. For instance, CAMK1G, EGR4 and CXCL8 transcripts creatine, were also unveiled, suggesting potential biomarker
were increased in AS, but neither in DM nor in other types of avenues for IBM treatment through diet supplementation
myositis. Additionally, the same method identified genes uniquely [72].
overexpressed in various MSA-defined myositis subtypes, such The field of multi-omics profiling has rapidly evolved to gain
as APOA4 was found to be significantly overexpressed in anti- a comprehensive understanding of IIM by integrating data from
HMGCR positive myopathy, and mucosal vascular address in cell genomics, transcriptomics, proteomics and metabolomics. Note-
adhesion molecule 1 (MADCAM1) was found overexpressed in worthy findings include identification of genetic biomarkers asso-
anti-Mi2 positive DM. These findings demonstrated the potential ciated with the autophagy–lysosome pathway and muscle inf lam-
of ML to identify genes related to specific myositis types and MSA- mation in IMNM. Pinal-Fernandez and colleagues utilized ML
defined subtypes [69]. algorithms to classify distinct myositis subtypes based on unique
Figure 5. Building blocks of typical CNN from an image. Convolutional layer: (A) set of filters are learned during training and applied to the input image
to extract features at different spatial locations. Each filter convolves over the input image to produce a feature map. Pooling layer: The pooling layer is
used to down-sample the output of the convolutional layer, reducing the spatial dimensions of the feature maps while retaining the important features.
Fully connected layer: The fully connected layer is used to produce the final output of the network. It takes the f lattened output from the previous layer
and applies a set of weights to produce a vector of outputs.Figure created with Biorender.com
gene expression patterns [69]. Metabolomic studies revealed per- supplementary diagnostic or prognostic insights. Nagawa et al.
turbations in metabolic pathways across IIM subtypes, with spe- [76] employed TA on MRI data from 55 IIM and 19 non-IIM
cific biomarkers identified for IMNM and different MSA-defined patients. Several ML models classified TA features, unveiling
IIM subtypes [69, 70]. The combined metabolomic and transcrip- disease activity trends in IIM subgroups and distinguishing anti-
tomic analysis in IBM uncovered specific metabolic alterations Jo-1 and anti-aminoacyl tRNA synthetases (ARS) IIM subgroups.
associated with histamine biosynthesis, glycosaminoglycan path- However, it showed limited ability to differentiate IIM from non-
ways and deficiencies in key energy metabolism molecules pre- IIM samples.
senting potential therapeutic avenues. These studies collectively Deep learning, like unsupervised novelty detection (ND), aids
highlight the power of multi-omics approaches and ML tech- medical image analysis by training on healthy data to detect
niques in uncovering intricate molecular signatures, biomarkers deviations. For example, Burlina and colleagues used ND on 3586
and potential therapeutic targets. ultrasound images obtained from 89 subjects, including 35 con-
trols and 54 with myositis, achieving a baseline (ROC AUC of
71.92% and 95% CI error margin). These promising results indi-
ML approaches for analyzing medical images cated the potential of implementing this method as a prescreen-
in IIMs ing tool for myopathies [77]. In a similar study, a DL neural
ML has been widely used to analyze medical images for tasks network applied to whole-body MRI achieved correct classifica-
such as segmentation, classification and diagnosis. Deep learning tion percentages of 69–77%, and comparable diagnostic prowess
models, particularly convolutional neural networks (CNN), have to radiologists in distinguishing facioscapulohumeral muscular
shown great success in various medical imaging applications, dystrophy (FSHD1) from myositis. DL even corrected radiologists’
including radiology, ophthalmology and pathology [73]. CNNs misclassifications, showcasing its efficacy to generate accurate
are specifically designed to work with images, and their suc- diagnosis from MRI data [78].
cess lies in their ability to learn hierarchical representations of In conclusion, ML has shown its potential to become an indis-
visual features directly from the raw input data (Figure 5). They pensable tool in medical image analysis, providing significant
have shown superior performance compared to traditional ML advantages over classical human-made analysis of IIM biopsies.
methods such as SVMs and RFs. Kabeya and colleagues trained DL models such as CNNs have proven to be highly effective in
a CNN on muscle biopsy images from patients with PM, DM distinguishing between different types of muscle diseases, and in
and IBM, as well as healthy controls. This model accurately dif- some instances outcompeted specialist physicians. TA analysis
ferentiated these IIMs from hereditary muscle diseases, with a provides additional information to aid in diagnosis or prognosis
sensitivity and specificity that outcompeted specialist physicians by quantifying the underlying tissue properties. Unsupervised ND
[74]. provides a promising prescreening tool for myopathies given its
Texture image analysis (TA) is a common method used effectiveness at identifying abnormal or novel patterns in imaging
in radiomics, which involves the extraction of quantitative data. These advanced ML techniques applied to imaging have the
features from digital imaging data (CT, MRI, ultrasound, PET) to potential to, providing faster and more accurate diagnoses and
characterize the underlying tissue properties [75]. TA quantifies prevent patient discomfort associated with invasive conventional
texture attributes like roughness or smoothness, furnishing muscle biopsy.
Machine learning models for predicting patients’ multidimensional scaling and hierarchical clustering, to catego-
response to treatments rize subtypes of anti-TIF1-γ + myositis and assess the most critical
Beyond biomarker discovery for disease diagnosis and progno- factors for predicting cancer risk [46]. Among the patients studied,
sis, biomarkers can assist clinicians to make informed decisions 54% had cancer, typically diagnosed within 6 months of myositis
regarding patient treatment strategies. ML has also proven useful diagnosis. The anti-TIF-1γ + myositis patients were grouped into
in predicting treatment responses. For instance, in a study of low, intermediate, or high cancer risk subtypes. Key ML classi-
51 IIM (DM, PM, ASS, INMN) patients. Demographic, clinical and fiers included disease duration, blood lymphocyte percentage,
serological parameters were evaluated to determine the most neutrophil percentage, neutrophil-to-lymphocyte ratio, gender, C-
effective predictors of patients’ response to intravenous and sub- reactive protein (CRP) levels, shawl sign, arthritis/arthralgia, V-
cutaneous administration of immunoglobulins [79]. Previously, neck sign and anti-PM-Scl75 antibodies. Notably, RF achieved an
the evaluation of five supervised ML models showed that elastic accuracy of over 90%, underscoring the potential of ML models
net regression, which combines features of both Ridge regres- in aiding physicians in selecting appropriate cancer screening
sion and Lasso regression, was the most effective model for this strategies for anti-TIF-1γ + myositis patients [46].
application [80]. The authors determined that dysphagia, skin Also, Zhang and colleagues performed LR modelling in a cohort
disorders and the myositis activity index (MITAX) were good pre- of 168 IIM patients including DM, PM, ASS and IMNM to determine
dictors of muscle strength (as measured by the manual mus- the key features that could be used for malignancy prediction [82].
cle testing of eight groups (MMT8)) and found that IVIg ther- Three predictors (age, alanine aminotransferase (ALT) < 80 U/L
apy yielded better results in patients with more active systemic and seropositivity for anti-TIF-1-γ antibodies) were identified as
disease [79]. positive predictors for malignancy while, ILD was found to be a
Anti-SRP antibody-positive IMNM patients are refractory to negative predictor of malignancy. The LR model was as good or
corticosteroids [43], and several clinical risk factors are identi- better than the other ML models including RF, neural network and
fied with refractory disease including, being male, severe muscle extreme gradient boosting at predicting malignancy. The AUC of
weakness and concurrent ILD. In addition, the extent of fatty the ROC was determined at 78.4% [82].
infiltration of thigh muscles over time have been identified as In another study, the researchers examined the medical
predictors of treatment response. ML algorithms have been used records of 397 patients with IIM to identify potential risk
to analyze these pathological factors. Elevated expression of B factors for ILD, other rheumatic diseases and malignancies
cell activating factor receptor (BAFF-R) in muscle tissue has been [83]. Antibodies such as anti-PM/Scl, anti-Ro52, anti-aminoacyl-
identified as predictors of refractory SRP-positive IMNM patients. tRNA synthetase and anti-MDA5 constituted risks for ILD.
Leveraging these refractory related factors and using ML-based Patients with Raynaud’s phenomenon, arthralgia and anti-
predictive models may critically help healthcare professionals to nuclear antibodies were found to be prominent risk factors for
better identify risk-patients and adjust care plans. other overlapping rheumatic diseases. For IIM patients with
associated malignancies, being male and the presence of anti-TIF-
ML approaches for predicting comorbidities 1-γ antibodies were risk factors. Hierarchical clustering generated
IIM’s are complex multisystem autoimmune disorders, involving a subclassification into six subgroups including (1) malignancy
inf lammation and immune system dysfunctions that mainly not overlapping DM, (2) classical DM, (3) PM with severe muscle
only impact skeletal muscle but also affect other tissues and involvement, (4) DM with ILD, (5) PM with ILD and (6) overlapping
organs, including skin, joints and lungs [81]. Given the spec- of myositis with other rheumatic diseases [83].
trum of systems that can be affected, individuals with IIM often Overall, biomarkers can help serve as ML modelling provides
experience comorbidities. These comorbidities can range from numerous benefits to healthcare providers, such as rapidly identi-
rheumatic diseases to ILD, reinforcing the multifaceted nature fying patients who stand to gain from specific treatments, or alter-
of IIM. Understanding and managing these comorbidities are natively may be susceptible to adverse reactions. Additionally, ML
essential aspects of comprehensive patient care. models can help discern patients at risk of certain comorbidities,
As previously mentioned, certain subtypes of myositis are asso- enabling implementation of targeted interventions.
ciated with an increased risk of malignancy, such as DM patients Advantages and limitations of ML for biomarker
with anti-TIF1-γ [46]. It is estimated that one third of myositis discovery
patients will develop a malignancy. In fact, malignancy is the As ML algorithms become increasingly prevalent in the biomedi-
leading cause of death in adults with IIM [81]. Zhao et al. employed cal field, it is important to note that the implementation and inter-
various ML techniques, including Sankey diagrams, elastic net, RF, pretation of these models requires both expertise in data analysis
Advantages Limitations
Can handle large amounts of data May require significant computational resources
Can detect complex patterns in data May be prone to overfitting or underfitting data
Can be used for real-time prediction May require significant training time
Can improve diagnostic accuracy May be limited by the quality and completeness of data
Can identify new biomarkers and disease subtypes May require expertise in data analysis and machine learning
Can tailor treatment plans for individual patients May not be able to capture all relevant variables in the data
Can reduce human error and bias May raise ethical concerns about the use of AI in healthcare
Can be applied to diverse types of data (e.g. imaging, genomics) May be limited by the availability of high-quality data
Can accelerate drug discovery and development May require collaboration between researchers with different expertise
and domain-specific knowledge. Additionally, the accuracy and the pooling of data and ultimately enhancing the reliability of
generalisability of these models rely heavily on the quality and biomarker discoveries.
quantity of data available for training and testing [9]. For instance, Looking forward, the establishment of patient registries
in transcriptomics studies, reference genomes may lack complete becomes crucial for comprehensive data collation, and the
annotations for certain genes or regions, leading to the complete integration of AI/ML into these registries can provide direct
omission of important transcripts. Furthermore, quantification feedback to clinicians, contributing to personalized treatment
challenges, such as accurate measurement of low-abundance strategies. While challenges and limitations persist, the ongoing
transcripts and susceptibility to noise, further compound these application of ML in IIM research has the potential to revolutionize
issues. Additionally, data acquisition method transparency with our understanding of these diseases, paving the way for more
detailed methodology description is essential to enable methods targeted and efficacious therapies, provided that standardized
generalization, and data reproducibility is essential, as varia- protocols are implemented and adhered to across the scientific
tions in sequencing platforms and bioinformatics pipelines can community.
introduce biases. While the potential benefits of using ML for
biomarker discovery are numerous, it is important to carefully
models (Table 2).
DATA AVAILABILITY STATEMENT 18. Linklater H, Pipitone N, Rose MR, et al. Classifying idiopathic
