Parkinson's disease (PD) is an incurable neurological disorder that degenerates the cerebrospinal... more Parkinson's disease (PD) is an incurable neurological disorder that degenerates the cerebrospinal nervous system and hinders motor functions. Electroencephalography (EEG) signal analysis can provide reliable information regarding PD conditions. However, EEG is a complex, multichannel, and nonlinear signal with noise that problematizes identifying PD symptoms. A few studies have employed fractal dimension (FD) to extract distinguishing PD features from EEG signals. However, no exploratory study exists, as per our knowledge, on the efficiency of the different FD measures. We aim to conduct a comparative analysis of the various FDs that, as feature extraction measures, can discriminate PD patients who are ON and OFF medication from healthy controls using ML architecture. This study has implemented and analyzed several techniques for segmentation, feature extraction, and ML models. The results show that k-nearest neighbors (KNN) classifier with Higuchi FD and 90% overlap for segmented window delivers the highest accuracies, yielding a mean accuracy of 99:65 AE 0:15% for PD patients ON medication and 99:45 AE 0:18% for PD patients OFF medication, respectively. The model accurately identifies the signs of the disease in resting-state EEG with almost equivalent accuracy in both OFF and ON medication patients. To enhance the interpretability in our study, we leveraged XGB's feature importance to generate brain topographic plots. This integration of explainable AI (XAI) enhanced the transparency and comprehensibility of our model's classifications. Additionally, a comparison between the performance of FD and a few entropy measures has also been drawn to validate the significance of FD as a superior feature extraction measure. This study contributes to the body of knowledge with an architectural pipeline for detecting PD in resting-state EEG while emphasizing fractal dimension as an effective way of extracting salient features from EEG signals.
Machine learning and knowledge extraction, Nov 18, 2022
This article is an open access article distributed under the terms and conditions of the Creative... more This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY
Parkinson's Disease (PD) is an incurable neurological disorder that degenerates the cerebrospinal... more Parkinson's Disease (PD) is an incurable neurological disorder that degenerates the cerebrospinal nervous system and hinders motor functions. Electroencephalography (EEG) signal analysis can provide reliable information regarding PD conditions. However, EEG is a complex, multichannel, and non-linear signal with noise that problematizes identifying PD symptoms. A few studies have employed Fractal Dimension (FD) to extract distinguishing PD features from EEG signals. However, no exploratory study exists, as per our knowledge, on the efficiency of the different FD measures. We aim to conduct a comparative analysis of the various FDs that, as feature extraction measures, can discriminate PD patients who are ON and OFF medication from healthy controls using ML architecture. This study has implemented and analyzed several techniques for segmentation, feature extraction, and ML
Biometrics is the process of measuring and analyzing human characteristics to verify a given pers... more Biometrics is the process of measuring and analyzing human characteristics to verify a given person's identity. Most real-world applications rely on unique human traits such as fingerprints or iris. However, among these unique human characteristics for biometrics, the use of Electroencephalogram (EEG) stands out given its high inter-subject variability. Recent advances in Deep Learning and a deeper understanding of EEG processing methods have led to the development of models that accurately discriminate unique individuals. However, it is still uncertain how much EEG data is required to train such models. This work aims at determining the minimal amount of training data required to develop a robust EEG-based biometric model (+95% and +99% testing accuracies) from a subject for a task-dependent task. This goal is achieved by performing and analyzing 11,780 combinations of training sizes, by employing various neural network-based learning techniques of increasing complexity, and fe...
Human mental workload is arguably the most invoked multidimensional construct in Human Factors an... more Human mental workload is arguably the most invoked multidimensional construct in Human Factors and Ergonomics, getting momentum also in Neuroscience and Neuroergonomics. Uncertainties exist in its characterization, motivating the design and development of computational models, thus recently and actively receiving support from the discipline of Computer Science. However, its role in human performance prediction is assured. This work is aimed at providing a synthesis of the current state of the art in human mental workload assessment through considerations, definitions, measurement techniques as well as applications, Findings suggest that, despite an increasing number of associated research works, a single, reliable and generally applicable framework for mental workload research does not yet appear fully established. One reason for this gap is the existence of a wide swath of operational definitions, built upon different theoretical assumptions which are rarely examined collectively. ...
Data is of high quality if it is fit for its intended use in operations, decision-making, and pla... more Data is of high quality if it is fit for its intended use in operations, decision-making, and planning. There is a colossal amount of linked data available on the web. However, it is difficult to understand how well the linked data fits into the modeling tasks due to the defects present in the data. Faults emerged in the linked data, spreading far and wide, affecting all the services designed for it. Addressing linked data quality deficiencies requires identifying quality problems, quality assessment, and the refinement of data to improve its quality. This study aims to identify existing end-to-end frameworks for quality assessment and improvement of data quality. One important finding is that most of the work deals with only one aspect rather than a combined approach. Another finding is that most of the framework aims at solving problems related to DBpedia. Therefore, a standard scalable system is required that integrates the identification of quality issues, the evaluation, and the improvement of the linked data quality. This survey contributes to understanding the state of the art of data quality evaluation and data quality improvement. A solution based on ontology is also proposed to build an end-to-end system that analyzes quality violations' root causes.
This paper presents a method to extract train driver taskload from downloads of on-train-data-rec... more This paper presents a method to extract train driver taskload from downloads of on-train-data-recorders (OTDR). OTDR are in widespread use for the purposes of condition monitoring of trains, but they may also have applications in operations monitoring and management. Evaluation of train driver workload is one such application. The paper describes the type of data held in OTDR recordings and how they can be transformed into driver actions throughout a journey. Example data from 16 commuter journeys are presented, which highlights the increased taskload during arrival at stations. Finally, the possibilities and limitations of the data are discussed.
Communications in Computer and Information Science, 2021
Instructional efficiency within education is a measurable concept and models have been proposed t... more Instructional efficiency within education is a measurable concept and models have been proposed to assess it. The main assumption behind these models is that efficiency is the capacity to achieve established goals at the minimal expense of resources. This article challenges this assumption by contributing to the body of Knowledge with a novel model that is grounded on ideal mental workload and performance, namely the parabolic model of instructional efficiency. A comparative empirical investigation has been constructed to demonstrate the potential of this model for instructional design evaluation. Evidence demonstrated that this model achieved a good concurrent validity with the well-known likelihood model of instructional efficiency, treated as baseline, but a better discriminant validity for the evaluation of the training and learning phases. Additionally, the inferences produced by this novel model have led to a superior information gain when compared to the baseline.
Reinforcement Learning (RL) has shown promise in optimizing complex control and decision-making p... more Reinforcement Learning (RL) has shown promise in optimizing complex control and decision-making processes but Deep Reinforcement Learning (DRL) lacks interpretability, limiting its adoption in regulated sectors like manufacturing, finance, and healthcare. Difficulties arise from DRL’s opaque decision-making, hindering efficiency and resource use, this issue is amplified with every advancement. While many seek to move from Experience Replay to A3C, the latter demands more resources. Despite efforts to improve Experience Replay selection strategies, there is a tendency to keep the capacity high. We investigate training a Deep Convolutional Q-learning agent across 20 Atari games intentionally reducing Experience Replay capacity from 1×106 to 5×102. We find that a reduction from 1×104 to 5×103 doesn’t significantly affect rewards, offering a practical path to resource-efficient DRL. To illuminate agent decisions and align them with game mechanics, we employ a novel method: visualizing E...
Argumentation has recently shown appealing properties for inference under uncertainty and conflic... more Argumentation has recently shown appealing properties for inference under uncertainty and conflicting knowledge. However, there is a lack of studies focused on the examination of its capacity of exploiting real-world knowledge bases for performing quantitative, case-by-case inferences. This study performs an analysis of the inferential capacity of a set of argument-based models, designed by a human reasoner, for the problem of trust assessment. Precisely, these models are exploited using data from Wikipedia, and are aimed at inferring the trustworthiness of its editors. A comparison against non-deductive approaches revealed that these models were superior according to values inferred to recognised trustworthy editors. This research contributes to the field of argumentation by employing a replicable modular design which is suitable for modelling reasoning under uncertainty applied to distinct real-world domains.
The NASA Task Load Index (N ASA − T LX) and the Workload Profile (W P) are likely the most employ... more The NASA Task Load Index (N ASA − T LX) and the Workload Profile (W P) are likely the most employed instruments for subjective mental workload (MWL) measurement. Numerous areas have made use of these methods for assessing human performance and thusly improving the design of systems and tasks. Unfortunately, MWL is still a vague concept, with different definitions and no universal measure. This research investigates the use of defeasible reasoning to represent and assess MWL. Reasoning is defeasible when a conclusion, supported by a set of premises, can be retracted in the light of new information. In this empirical study, this type of reasoning is considered for modelling MWL, given the intrinsic uncertainty involved in assessing it. In particular, it is shown how the N ASA − T LX and the W P can be translated into defeasible structures whose inferences can achieve similar validity of the original instruments, even when less information is available. It is also discussed how these structures can have a higher extensibility and how their inferences are more self-explanatory than the ones produced by the N ASA − T LX and W P .
ABSTRACT Clinicians often work under high-pressure, because of emergency situations, high volume,... more ABSTRACT Clinicians often work under high-pressure, because of emergency situations, high volume, or speed required. Their cognitive state is constant flux and while using digital interfaces, their experience and judgement is likely influenced by their mental state. Experience in aviation using the NASA-TLX (Task Load Index) tool for assessing Human Mental Workload (HMW) can be usefully applied along with Nielsen usability heuristics for evaluating an Electronic Health Records (EHR). A pilot study was conducted to assess clinicians cognitive state and investigate how HMW imposed by the EHR influences its usability. Two wards demanding different levels of physical/mental stress for staff, both using the same EHR to document patients daily progress, were compared. Ward-1: 18 long-stay elderly patients with high dependency scores; Ward-2: 10 short-stay elderly patients with low dependency score Method: Questionnaires incorporating the NASA-TLX model and Neilsens design/usability principles were completed by a clinician in each scenario, following each use of the EHR. The NASA-TLX model measures mental, physical and temporal demands, effort, performance and frustration levels. Results: HMW influences usability for the same EHR interface. Towards the end of day, clinician performance in using the EHR drastically decreases. They need to work harder mentally to reach the same level of performance (high HMW). Pearson correlation for Nielsen/NASA-TLX is significant (Ward-1: r = -0.86; Ward-2: r = -0.930). Increments of HMWs correspond to moderate decrements in usability. This evidence suggests that an EHR design process should consider more the context of use and the cognitive workload of its clinicians.
Research on the discovery, classification and validation of biological markers, or biomarkers, ha... more Research on the discovery, classification and validation of biological markers, or biomarkers, have grown extensively in the last decades. Newfound and correctly validated biomarkers have great potential as prognostic and diagnostic indicators, but present a complex relationship with pertinent endpoints such as survival or other diseases manifestations. This research proposes the use of computational argumentation theory as a starting point for the resolution of this problem for cases in which a large amount of data is unavailable. A knowledge-base containing 51 different biomarkers and their association with mortality risks in elderly was provided by a clinician. It was applied for the construction of several argument-based models capable of inferring survival or not. The prediction accuracy and sensitivity of these models were investigated, showing how these are in line with inductive classification using decision trees with limited data.
A pilot study is reported to identify an improved method of evaluating digital user interfaces in... more A pilot study is reported to identify an improved method of evaluating digital user interfaces in health care. Experience and developments from the aviation industry and the NASA-TLX mental workload assessment tools are applied in conjunction with Nielsen heuristics for evaluating an Electronic Health Record System in an Irish hospital. The NASA-TLX performs subjective workload assessments on operators working with various human-computer systems. Results suggest that depending on the cognitive workload and the working context of users, the usability will differ for the same digital interface. We conclude that incorporating the NASA-TLX with Nielsen's heuristics offers a more reliable method in design and evaluation of digital user interfaces in clinical environments, since the healthcare work context is taken into account. Improved interfaces can be expected to reduce medical errors and improve patient care.
This paper proposes a novel machine learning procedure for genome-wide association study (GWAS), ... more This paper proposes a novel machine learning procedure for genome-wide association study (GWAS), named LightGWAS. It is based on the LightGBM framework, in addition to being a single, resilient, autonomous and scalable solution to address common limitations of GWAS implementations found in the literature. These include reliance on massive manual quality control steps and specific GWAS methods for each type of dataset morphology and size. Through this research, LightG-WAS has been contrasted against PLINK2, one of the current state-ofthe-art for GWAS implementations based on general linear model with support to firth regularisation. The mean differences measured upon standard classification metrics, extracted via quantitative empirical tests through k-fold cross-validation technique, indicated that LightGWAS outperforms PLINK2 for balanced, imbalanced, and high-imbalanced genomic datasets. Paired difference tests denoted statistical significance in the results extracted from the experiments with imbalanced datasets. This article contributes to the body of knowledge by presenting a potentially more efficient GWAS procedure based on nonparametric approaches. LightGWAS ensures adaptability with higher precision in the discovery of causal single-nucleotide polymorphisms, thanks to the leafwise tree growth algorithm offered by the state-of-the-art for gradient boosting decision trees. Control for false-positives and statistical power are automatically addressed by the model's training process, which significative reduces human dependency during the study design.
Parkinson's disease (PD) is an incurable neurological disorder that degenerates the cerebrospinal... more Parkinson's disease (PD) is an incurable neurological disorder that degenerates the cerebrospinal nervous system and hinders motor functions. Electroencephalography (EEG) signal analysis can provide reliable information regarding PD conditions. However, EEG is a complex, multichannel, and nonlinear signal with noise that problematizes identifying PD symptoms. A few studies have employed fractal dimension (FD) to extract distinguishing PD features from EEG signals. However, no exploratory study exists, as per our knowledge, on the efficiency of the different FD measures. We aim to conduct a comparative analysis of the various FDs that, as feature extraction measures, can discriminate PD patients who are ON and OFF medication from healthy controls using ML architecture. This study has implemented and analyzed several techniques for segmentation, feature extraction, and ML models. The results show that k-nearest neighbors (KNN) classifier with Higuchi FD and 90% overlap for segmented window delivers the highest accuracies, yielding a mean accuracy of 99:65 AE 0:15% for PD patients ON medication and 99:45 AE 0:18% for PD patients OFF medication, respectively. The model accurately identifies the signs of the disease in resting-state EEG with almost equivalent accuracy in both OFF and ON medication patients. To enhance the interpretability in our study, we leveraged XGB's feature importance to generate brain topographic plots. This integration of explainable AI (XAI) enhanced the transparency and comprehensibility of our model's classifications. Additionally, a comparison between the performance of FD and a few entropy measures has also been drawn to validate the significance of FD as a superior feature extraction measure. This study contributes to the body of knowledge with an architectural pipeline for detecting PD in resting-state EEG while emphasizing fractal dimension as an effective way of extracting salient features from EEG signals.
Machine learning and knowledge extraction, Nov 18, 2022
This article is an open access article distributed under the terms and conditions of the Creative... more This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY
Parkinson's Disease (PD) is an incurable neurological disorder that degenerates the cerebrospinal... more Parkinson's Disease (PD) is an incurable neurological disorder that degenerates the cerebrospinal nervous system and hinders motor functions. Electroencephalography (EEG) signal analysis can provide reliable information regarding PD conditions. However, EEG is a complex, multichannel, and non-linear signal with noise that problematizes identifying PD symptoms. A few studies have employed Fractal Dimension (FD) to extract distinguishing PD features from EEG signals. However, no exploratory study exists, as per our knowledge, on the efficiency of the different FD measures. We aim to conduct a comparative analysis of the various FDs that, as feature extraction measures, can discriminate PD patients who are ON and OFF medication from healthy controls using ML architecture. This study has implemented and analyzed several techniques for segmentation, feature extraction, and ML
Biometrics is the process of measuring and analyzing human characteristics to verify a given pers... more Biometrics is the process of measuring and analyzing human characteristics to verify a given person's identity. Most real-world applications rely on unique human traits such as fingerprints or iris. However, among these unique human characteristics for biometrics, the use of Electroencephalogram (EEG) stands out given its high inter-subject variability. Recent advances in Deep Learning and a deeper understanding of EEG processing methods have led to the development of models that accurately discriminate unique individuals. However, it is still uncertain how much EEG data is required to train such models. This work aims at determining the minimal amount of training data required to develop a robust EEG-based biometric model (+95% and +99% testing accuracies) from a subject for a task-dependent task. This goal is achieved by performing and analyzing 11,780 combinations of training sizes, by employing various neural network-based learning techniques of increasing complexity, and fe...
Human mental workload is arguably the most invoked multidimensional construct in Human Factors an... more Human mental workload is arguably the most invoked multidimensional construct in Human Factors and Ergonomics, getting momentum also in Neuroscience and Neuroergonomics. Uncertainties exist in its characterization, motivating the design and development of computational models, thus recently and actively receiving support from the discipline of Computer Science. However, its role in human performance prediction is assured. This work is aimed at providing a synthesis of the current state of the art in human mental workload assessment through considerations, definitions, measurement techniques as well as applications, Findings suggest that, despite an increasing number of associated research works, a single, reliable and generally applicable framework for mental workload research does not yet appear fully established. One reason for this gap is the existence of a wide swath of operational definitions, built upon different theoretical assumptions which are rarely examined collectively. ...
Data is of high quality if it is fit for its intended use in operations, decision-making, and pla... more Data is of high quality if it is fit for its intended use in operations, decision-making, and planning. There is a colossal amount of linked data available on the web. However, it is difficult to understand how well the linked data fits into the modeling tasks due to the defects present in the data. Faults emerged in the linked data, spreading far and wide, affecting all the services designed for it. Addressing linked data quality deficiencies requires identifying quality problems, quality assessment, and the refinement of data to improve its quality. This study aims to identify existing end-to-end frameworks for quality assessment and improvement of data quality. One important finding is that most of the work deals with only one aspect rather than a combined approach. Another finding is that most of the framework aims at solving problems related to DBpedia. Therefore, a standard scalable system is required that integrates the identification of quality issues, the evaluation, and the improvement of the linked data quality. This survey contributes to understanding the state of the art of data quality evaluation and data quality improvement. A solution based on ontology is also proposed to build an end-to-end system that analyzes quality violations' root causes.
This paper presents a method to extract train driver taskload from downloads of on-train-data-rec... more This paper presents a method to extract train driver taskload from downloads of on-train-data-recorders (OTDR). OTDR are in widespread use for the purposes of condition monitoring of trains, but they may also have applications in operations monitoring and management. Evaluation of train driver workload is one such application. The paper describes the type of data held in OTDR recordings and how they can be transformed into driver actions throughout a journey. Example data from 16 commuter journeys are presented, which highlights the increased taskload during arrival at stations. Finally, the possibilities and limitations of the data are discussed.
Communications in Computer and Information Science, 2021
Instructional efficiency within education is a measurable concept and models have been proposed t... more Instructional efficiency within education is a measurable concept and models have been proposed to assess it. The main assumption behind these models is that efficiency is the capacity to achieve established goals at the minimal expense of resources. This article challenges this assumption by contributing to the body of Knowledge with a novel model that is grounded on ideal mental workload and performance, namely the parabolic model of instructional efficiency. A comparative empirical investigation has been constructed to demonstrate the potential of this model for instructional design evaluation. Evidence demonstrated that this model achieved a good concurrent validity with the well-known likelihood model of instructional efficiency, treated as baseline, but a better discriminant validity for the evaluation of the training and learning phases. Additionally, the inferences produced by this novel model have led to a superior information gain when compared to the baseline.
Reinforcement Learning (RL) has shown promise in optimizing complex control and decision-making p... more Reinforcement Learning (RL) has shown promise in optimizing complex control and decision-making processes but Deep Reinforcement Learning (DRL) lacks interpretability, limiting its adoption in regulated sectors like manufacturing, finance, and healthcare. Difficulties arise from DRL’s opaque decision-making, hindering efficiency and resource use, this issue is amplified with every advancement. While many seek to move from Experience Replay to A3C, the latter demands more resources. Despite efforts to improve Experience Replay selection strategies, there is a tendency to keep the capacity high. We investigate training a Deep Convolutional Q-learning agent across 20 Atari games intentionally reducing Experience Replay capacity from 1×106 to 5×102. We find that a reduction from 1×104 to 5×103 doesn’t significantly affect rewards, offering a practical path to resource-efficient DRL. To illuminate agent decisions and align them with game mechanics, we employ a novel method: visualizing E...
Argumentation has recently shown appealing properties for inference under uncertainty and conflic... more Argumentation has recently shown appealing properties for inference under uncertainty and conflicting knowledge. However, there is a lack of studies focused on the examination of its capacity of exploiting real-world knowledge bases for performing quantitative, case-by-case inferences. This study performs an analysis of the inferential capacity of a set of argument-based models, designed by a human reasoner, for the problem of trust assessment. Precisely, these models are exploited using data from Wikipedia, and are aimed at inferring the trustworthiness of its editors. A comparison against non-deductive approaches revealed that these models were superior according to values inferred to recognised trustworthy editors. This research contributes to the field of argumentation by employing a replicable modular design which is suitable for modelling reasoning under uncertainty applied to distinct real-world domains.
The NASA Task Load Index (N ASA − T LX) and the Workload Profile (W P) are likely the most employ... more The NASA Task Load Index (N ASA − T LX) and the Workload Profile (W P) are likely the most employed instruments for subjective mental workload (MWL) measurement. Numerous areas have made use of these methods for assessing human performance and thusly improving the design of systems and tasks. Unfortunately, MWL is still a vague concept, with different definitions and no universal measure. This research investigates the use of defeasible reasoning to represent and assess MWL. Reasoning is defeasible when a conclusion, supported by a set of premises, can be retracted in the light of new information. In this empirical study, this type of reasoning is considered for modelling MWL, given the intrinsic uncertainty involved in assessing it. In particular, it is shown how the N ASA − T LX and the W P can be translated into defeasible structures whose inferences can achieve similar validity of the original instruments, even when less information is available. It is also discussed how these structures can have a higher extensibility and how their inferences are more self-explanatory than the ones produced by the N ASA − T LX and W P .
ABSTRACT Clinicians often work under high-pressure, because of emergency situations, high volume,... more ABSTRACT Clinicians often work under high-pressure, because of emergency situations, high volume, or speed required. Their cognitive state is constant flux and while using digital interfaces, their experience and judgement is likely influenced by their mental state. Experience in aviation using the NASA-TLX (Task Load Index) tool for assessing Human Mental Workload (HMW) can be usefully applied along with Nielsen usability heuristics for evaluating an Electronic Health Records (EHR). A pilot study was conducted to assess clinicians cognitive state and investigate how HMW imposed by the EHR influences its usability. Two wards demanding different levels of physical/mental stress for staff, both using the same EHR to document patients daily progress, were compared. Ward-1: 18 long-stay elderly patients with high dependency scores; Ward-2: 10 short-stay elderly patients with low dependency score Method: Questionnaires incorporating the NASA-TLX model and Neilsens design/usability principles were completed by a clinician in each scenario, following each use of the EHR. The NASA-TLX model measures mental, physical and temporal demands, effort, performance and frustration levels. Results: HMW influences usability for the same EHR interface. Towards the end of day, clinician performance in using the EHR drastically decreases. They need to work harder mentally to reach the same level of performance (high HMW). Pearson correlation for Nielsen/NASA-TLX is significant (Ward-1: r = -0.86; Ward-2: r = -0.930). Increments of HMWs correspond to moderate decrements in usability. This evidence suggests that an EHR design process should consider more the context of use and the cognitive workload of its clinicians.
Research on the discovery, classification and validation of biological markers, or biomarkers, ha... more Research on the discovery, classification and validation of biological markers, or biomarkers, have grown extensively in the last decades. Newfound and correctly validated biomarkers have great potential as prognostic and diagnostic indicators, but present a complex relationship with pertinent endpoints such as survival or other diseases manifestations. This research proposes the use of computational argumentation theory as a starting point for the resolution of this problem for cases in which a large amount of data is unavailable. A knowledge-base containing 51 different biomarkers and their association with mortality risks in elderly was provided by a clinician. It was applied for the construction of several argument-based models capable of inferring survival or not. The prediction accuracy and sensitivity of these models were investigated, showing how these are in line with inductive classification using decision trees with limited data.
A pilot study is reported to identify an improved method of evaluating digital user interfaces in... more A pilot study is reported to identify an improved method of evaluating digital user interfaces in health care. Experience and developments from the aviation industry and the NASA-TLX mental workload assessment tools are applied in conjunction with Nielsen heuristics for evaluating an Electronic Health Record System in an Irish hospital. The NASA-TLX performs subjective workload assessments on operators working with various human-computer systems. Results suggest that depending on the cognitive workload and the working context of users, the usability will differ for the same digital interface. We conclude that incorporating the NASA-TLX with Nielsen's heuristics offers a more reliable method in design and evaluation of digital user interfaces in clinical environments, since the healthcare work context is taken into account. Improved interfaces can be expected to reduce medical errors and improve patient care.
This paper proposes a novel machine learning procedure for genome-wide association study (GWAS), ... more This paper proposes a novel machine learning procedure for genome-wide association study (GWAS), named LightGWAS. It is based on the LightGBM framework, in addition to being a single, resilient, autonomous and scalable solution to address common limitations of GWAS implementations found in the literature. These include reliance on massive manual quality control steps and specific GWAS methods for each type of dataset morphology and size. Through this research, LightG-WAS has been contrasted against PLINK2, one of the current state-ofthe-art for GWAS implementations based on general linear model with support to firth regularisation. The mean differences measured upon standard classification metrics, extracted via quantitative empirical tests through k-fold cross-validation technique, indicated that LightGWAS outperforms PLINK2 for balanced, imbalanced, and high-imbalanced genomic datasets. Paired difference tests denoted statistical significance in the results extracted from the experiments with imbalanced datasets. This article contributes to the body of knowledge by presenting a potentially more efficient GWAS procedure based on nonparametric approaches. LightGWAS ensures adaptability with higher precision in the discovery of causal single-nucleotide polymorphisms, thanks to the leafwise tree growth algorithm offered by the state-of-the-art for gradient boosting decision trees. Control for false-positives and statistical power are automatically addressed by the model's training process, which significative reduces human dependency during the study design.
Uploads
Papers by Luca Longo