Individual Project
Individual Project
Individual Project
Papers
David Lindahl
s234817
June 21, 2024
Paper 2: Detecting influenza epidemics using search engine query data [2]
Paper 2 evaluates the effectiveness of using Google search query data to track and predict influenza-
like illness (ILI) trends across the United States. They do not perform cross-evaluation or any type of
comparisons.
Implementing a cross-validation approach and dynamically retraining the model based on real-time
data could further improve the robustness and accuracy of the predictions. This could help evaluate the
model on the go, and ensure more reliable monitoring of influenza-like illness (ILI) trends. Perhaps this
adjustment could have prevented the model, from predicting systematically high 100 out of 108 weeks
starting in august 2011.[3]
An important conclusion to draw from this paper is that big data and search queries on their own
may not always provide accurate and reliable data. We already know, that combining attributes to train
models, can sometimes improve accuracy by a wide margin. Perhaps if the researches included more
medical data, the model could have predicted with higher accuracy.
Furthermore this paper teaches us the importance of adding model regularization, to avoid overfitting
to training data, and making models that generelize to new data.
1
Task 2: Predicting Frustration from Heart
Rate Signals
Abstract
This study aims to build and evaluate predictive models to classify levels of frustration based on
heart rate (HR) features. The motivation is to enhance mental health monitoring and user experience
improvement. The problem addressed is the lack of accurate predictive models for frustration levels.
Our approach involves using Logistic Regression, Decision Tree, ANN, RF, and a baseline model,
trained on HR data. The results indicate no significant differences among the models, as confirmed
by ANOVA (p-value = 0.5488). The conclusion is that current methods do not achieve high accuracy,
and future work should explore additional features or more complex models.
1 Introduction
Understanding and predicting emotional states such as frustration can be crucial for various applications,
including mental health monitoring and user experience improvement. Heart rate (HR) signals are a
non-invasive measure that can provide insights into these emotional states. This study aims to develop
and evaluate five models to classify frustration levels from HR signals. The five models are a Logistic
Regression model, ANN, Random Forrest Classifier (RF), Dececion Tree and a baseline model (majority
pick). The frustration attribute is binarized, with levels above the median considered as ’Frustrated’.
2 Data Preprocessing
The classification problem we are trying to solve, is predicting the frustration-level given a heart rate.
To be more specific, the output of the models will be a binary value, either 1 or 0, which represents
the prediction of the attribute ’Frustration Binary’. A value of 1 indicates the presence of frustration
(frustration ≥ 2), while a value of 0 indicates its absence (frustration < 2).The only features, that will
be included in the model, are features regarding the heart rate.
Before proceeding with model training, it is essential to examine the relationships between the features
to ensure their ability for inclusion in the predictive models. This involves analyzing the correlation
matrix and the pairwise relationship scatterplots.
2
2.1 Correlation Matrix Analysis
The correlation matrix, shown in Figure 1, indicates the linear relationships between the heart rate
features and the target variable ’Frustration Binary’. The correlation coefficients between the features
and ’Frustration Binary’ range from -0.02 to 0.14. These low correlation values suggest that none of
the features have a strong linear relationship with the target variable. This suggests, that linear models
could have a hard time predicting whether or not the individual is frustrated.
Looking at Figure 1, one could argue that since the correlation between the features ’HR Mean’ and
’HR Median’ is 0.95, including both is redundant. Additionally, ’HR Min’ has a correlation of only 0.03
with ’Frustrated-Binary’, suggesting that it might contribute more noise than useful information. How-
ever, even though these features aren’t linearly dependent on ’Frustrated-Binary’, they can still provide
valuable information for the predictive models through non-linear relationships and interactions with
other features.
3
2.3 Input Features
Therefore, despite the potential redundancy and low correlation values, all six features were included in
the dataset. This decision ensures that the models have access to the full range of information captured by
the heart rate signals, which might improve their ability to accurately predict frustration levels through
complex patterns that are not immediately apparent from linear correlations alone. The input features
used for these models are therefore:
3 Cross-validation
3.1 Cross-Validation: Grouped K-Fold
It is crucial to ensure that the same individuals are not present in both the training and test data to avoid
data leakage and to ensure unbiased evaluation of the models. Additionally, due to the class imbalance
(94 instances of ’Frustrated’ = true, 74 instances of ’Frustrated’ = false), we have chosen the Grouped
K-Fold method. This guarantees that no individual is present in both the training and test sets, and
ensures that the models are tested on every individual.
Classification Accuracy
Model Mean Accuracy Lower Bound Upper Bound
Decision Tree 0.4996 0.4107 0.5893
ANN 0.5893 0.4583 0.7202
Baseline 0.5586 0.4226 0.6964
RF 0.5590 0.4583 0.6429
Logistic Regression 0.5417 0.4345 0.6488
4 Model Comparison
4.1 ANOVA: Assumptions
Before conducting the ANOVA, it is essential to verify that the data meets the assumptions required for
the test.
4
4.1.1 Normality
To test for normality, we used the Shapiro-Wilk Test. The results for each model are summarized in
Table 2.
Since all p-values are above 0.05, we fail the reject the hypothesis, that the accuracies are normally
distribution. This conclusion is further supported by the Q-Q plots shown in Figure 3.
With the normality assumption satisfied, we can proceed to test the other assumptions before con-
ducting the ANOVA.
Test p-value
Levene’s Test 0.1383
Since the p-value is above 0.05, we cannot reject the null hypothesis of homogeneity of variance,
indicating that the variance of accuracies is similar across all models.
4.1.3 Independence
We are comparing models, that in each fold has trained on the same data, and tested on the same data.
Therefore, one cannot assume independace. Therefore, we have chosen to use to use the ”Repeated
Measures ANOVA” (RM ANOVA), which accounts for correlation between the models being examined.
Below are the results from the RM ANOVA. [4]
5
The p-value from the ANOVA test is 0.5488, which is above the significance level of 0.05. Therefore,
we fail to reject the null hypothesis that there are no significant differences in the mean accuracies of the
different models. This implies that none of the models significantly outperforms the others in predicting
frustration levels from heart rate signals.
5 Conclusion
Our study aimed to develop and evaluate predictive models for classifying frustration levels based on
heart rate signals. We utilized Logistic Regression, Decision Tree, ANN, RF, and a baseline model.
The ANOVA results, with a p-value of 0.5488, indicate no significant differences in the mean accuracies
among the models. All models demonstrated similar robustness, generalization, and consistency. Given
these findings, no model significantly outperforms the others, suggesting that simpler models like the
baseline can be as effective as more complex ones for this task. Overall, this study has found that the
task of predicting frustration levels based on heart rate signals has not been accomplished with high
accuracy given the current dataset.
Future work could explore having a bigger dataset, or include more complex models such as Ensem-
ble models or Deep Learning models, to achieve better accuracy, when predicting frustration-levels from
heartrate data.
References
[1] Jihong Li Ruibo Wang. Block-regularized 5 × 2 cross-validated mcnemar’s test for comparing two
classification algorithms. arxiv.org, 2015.
[2] Jeremy Ginsberg. Detecting influenza epidemics using search engine query data. Nature.com, 2009.
[3] Gary King Alessandro Vespignani David Lazer, Ryan Kennedy. The parable of google flu: Traps in
big data analysis. dash.harvard.edu, 2018.
[4] Lutfiyya N. Muhammad. Guidelines for repeated measures statistical analysis approaches with basic
science research considerations. National library of measurement, 2018.