Detecting unknown worms is a challenging task. Extant solutions, such as anti-virus tools, rely m... more Detecting unknown worms is a challenging task. Extant solutions, such as anti-virus tools, rely mainly on prior explicit knowledge of specific worm signatures. As a result, after the appearance of a new worm on the Web there is a significant delay until an update carrying the worm's signature is distributed to anti-virus tools. During this time interval a new worm can infect many computers and create significant damage. We propose an innovative technique for detecting the presence of an unknown worm, not necessarily by recognizing specific instances of the worm, but rather based on the computer measurements. We designed an experiment to test the new technique employing several computer configurations and background applications activity. During the experiments 323 computer features were monitored. Four feature selection techniques were used to reduce the amount of features and four classification algorithms were applied on the resulting feature subsets. Our results indicate that using this approach resulted, in above 90% average accuracy, and for specific unknown worms accuracy reached above 99%, using just 20 features while maintaining a low level of false positive rate.
Discretization is widely used in data mining as a preprocessing step; discretization usually lead... more Discretization is widely used in data mining as a preprocessing step; discretization usually leads to improved performance. In time series analysis commonly the data is divided into time windows. Measurements are extracted from the time win- dow into a vectorial representation and static mining methods are applied, which avoids an ex- plicit analysis along time. Abstracting time series into meaningful
Clinicians can benefit from automated support to guideline (GL) application at the point of care.... more Clinicians can benefit from automated support to guideline (GL) application at the point of care. However, several conceptual dimensions should be considered for a realistic application: 1) The representation of the knowledge might be through structured text (semi-formal), or specified in a machinecomprehensible language (formal); 2) The availability of electronic patient data might be partial or full; 3) GL-based recommendations might be triggered by a human-initiated (synchronous) session, or data-driven (asynchronous). In addition, several requirements must be fulfilled, such as an evaluation of the GL application engine by a GL simulation engine. Finally, to apply multiple GLs, by multiple users, in multiple settings, the GL-application engine should be designed as an enterprise architecture that can plug into any Electronic Medical Record (EMR). We present an architecture fulfilling these desiderata, describe application examples with different conceptual dimensions and requirements, using our new GL-application engine, PICARD, discuss lessons learned, and briefly describe a clinical evaluation of the current framework in the domain of pre-eclampsia/toxemia of pregnancy.
Journal of Urban Health-bulletin of The New York Academy of Medicine, Apr 4, 2022
obesity rate, mean commute time, and mask usage statistics significantly affected morbidity rates... more obesity rate, mean commute time, and mask usage statistics significantly affected morbidity rates, while ethnicity, median income, poverty rate, and education levels heavily influenced mortality rates. Surprisingly, the correlation between several of these factors and COVID-19 morbidity and mortality gradually shifted and even reversed during the study period; our analysis suggests that this phenomenon was probably due to COVID-19 being initially associated with more urbanized areas and, then, from 9/2020, with less urbanized ones. Thus, socioeconomic features such as ethnicity, education, and economic disparity are the major factors for predicting county-level COVID-19 mortality rates. Between counties, low variance factors (e.g., age) are not meaningful predictors. The inversion of some correlations over time can be explained by COVID-19 spreading from urban to rural areas.
We followed 295 young infantry recruits during their first 14 weeks of basic training. The preval... more We followed 295 young infantry recruits during their first 14 weeks of basic training. The prevalence of smoking increased by 50%. About half of this increase was accounted for by ex-smokers, 57% of whom had resumed the habit. Average education and military psychometric measures of both the baseline smokers and the new smokers were significantly lower than those of the abstaining never-smokers. Asian and North African origin and a lower peer group evaluation score were also risk factors. These relationships were not demonstrated among resuming ex-smokers. The rise in the smoking rate accounts for most of the known rise during full military service. We suggest early preventive measures, especially for the two groups at risk.
Please cite this article in press as: Goldstein A, et al. Evaluation of an automated knowledge-ba... more Please cite this article in press as: Goldstein A, et al. Evaluation of an automated knowledge-based textual summarization system for longitudinal clinical data, in the intensive care domain. Artif Intell Med (2017),
We use a constraint-based language to specify periodic temporal patterns. Our language is intende... more We use a constraint-based language to specify periodic temporal patterns. Our language is intended to be simple to use, while allowing a wide variety of patterns to be expressed. The language solves problems such as (1) how to use calendarbased constraints to define repetition of a periodic event, (2) what temporal relations must exist between consecutive repeats of a pattern, and (3) how expressivity is limited if the same temporal relations must hold between each pair of intervals in the pattern. This language has been implemented in a temporal-abstraction system called Résumé, and researchers have used it in a graphical knowledge-acquisition tool to acquire domain-specific knowledge from experts about patterns to be found in large databases. We summarize the results of preliminary experiments using the patternspecification and pattern-detection tools on data on patients who have cancer and were seen at the University of Chicago bone-marrow-transplantation center.
Skeletal plans are a powerful way to reuse existing domain-specific procedural knowledge. In the ... more Skeletal plans are a powerful way to reuse existing domain-specific procedural knowledge. In the $VJDDUG project, a set of tasks that support the design and the execution of skeletal plans by a human executing agent other than the original plan designer are designed. The underlying requirement to develop task-specific problem-solving methods is a PRGHOLQJ ODQJXDJe. Therefore, within the Asgaard project, a time-oriented, intention-based language, called $VEUX, was developed. During the design phase of plans, Asbru allows to express durative actions and plans caused by durative states of an observed agent. The intentions underlying these plans are represented explicitly as temporal patterns to be maintained, achieved or avoided. We will present the underlying idea of the Asgaard project and explain the time-oriented Asbru language. Finally, we show the benefits and limitations of the time-oriented, skeletal plan representation to be applicable in real-world, high-frequency domains.
International Journal of Computer and Information Engineering, Sep 22, 2008
Computer worm detection is commonly performed by antivirus software tools that rely on prior expl... more Computer worm detection is commonly performed by antivirus software tools that rely on prior explicit knowledge of the worm's code (detection based on code signatures). We present an approach for detection of the presence of computer worms based on Artificial Neural Networks (ANN) using the computer's behavioral measures. Identification of significant features, which describe the activity of a worm within a host, is commonly acquired from security experts. We suggest acquiring these features by applying feature selection methods. We compare three different feature selection techniques for the dimensionality reduction and identification of the most prominent features to capture efficiently the computer behavior in the context of worm activity. Additionally, we explore three different temporal representation techniques for the most prominent features. In order to evaluate the different techniques, several computers were infected with five different worms and 323 different features of the infected computers were measured. We evaluated each technique by preprocessing the dataset according to each one and training the ANN model with the preprocessed data. We then evaluated the ability of the model to detect the presence of a new computer worm, in particular, during heavy user activity on the infected computers.
Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019
Sepsis is a condition caused by the body's overwhelming and life-threatening response to infe... more Sepsis is a condition caused by the body's overwhelming and life-threatening response to infection, which can lead to tissue damage, organ failure, and finally death. Today, sepsis is one of the leading causes of mortality among populations in intensive care units (ICUs). Sepsis is difficult to predict, diagnose, and treat, as it involves analyzing different sets of multivariate time-series, usually with problems of missing data, different sampling frequencies, and random noise. Here, we propose a new dynamic-behavior-based model, which we call a Temporal Probabilistic proFile (TPF), for classification and prediction tasks of multivariate time series. In the TPF method, the raw, time-stamped data are first abstracted into a series of higher-level, meaningful concepts, which hold over intervals characterizing time periods. We then discover frequently repeating temporal patterns within the data. Using the discovered patterns, we create a probabilistic distribution of the temporal patterns of the overall entity population, of each target class in it, and of each entity. We then exploit TPFs as meta-features to classify the time series of new entities, or to predict their outcome, by measuring their TPF distance, either to the aggregated TPF of each class, or to the individual TPFs of each of the entities, using negative cross entropy. Our experimental results on a large benchmark clinical data set show that TPFs improve sepsis prediction capabilities, and perform better than other machine learning approaches.
Increasingly, frequent temporal patterns discovered in longitudinal patient records are proposed ... more Increasingly, frequent temporal patterns discovered in longitudinal patient records are proposed as features for classification and prediction, and as means to cluster patient clinical trajectories. However, to justify that, we must demonstrate that most frequent temporal patterns are indeed consistently discoverable within the records of different patient subsets within similar patient populations. We have developed several measures for the consistency of the discovery of temporal patterns. We focus on time-interval relations patterns (TIRPs) that can be discovered within different subsets of the same patient population. We expect the discovered TIRPs (1) to be frequent in each subset, (2) preserve their "local" metrics - the absolute frequency of each pattern, measured by a Proportion Test, and (3) preserve their "global" characteristics - their overall distribution, measured by a Kolmogorov-Smirnov test. We also wanted to examine the effect on consistency, over a variety of settings, of varying the minimal frequency threshold for TIRP discovery, and of using a TIRP-filtering criterion that we previously introduced, the Semantic Adjacency Criterion (SAC). We applied our methodology to three medical domains (oncology, infectious hepatitis, and diabetes). We found that, within the minimal frequency ranges we had examined, 70-95% of the discovered TIRPs were consistently discoverable; 40-48% of them maintained their local frequency. TIRP global distribution similarity varied widely, from 0% to 65%. Increasing the threshold usually increased the percentage of TIRPs that were repeatedly discovered across different patient subsets within the same domain, and the probability of a similar TIRP distribution. Using the SAC principle, enhanced, for most minimal support levels, the percentage of repeating TIRPs, their local consistency and their global consistency. The effect of using the SAC was further strengthened as the minimal frequency threshold was raised.
MobiGuide is a ubiquitous, distributed and personalized evidence-based decision-support system (D... more MobiGuide is a ubiquitous, distributed and personalized evidence-based decision-support system (DSS) used by patients and their care providers. Its central DSS applies computer-interpretable clinical guidelines (CIGs) to provide real-time patient-speciflc and personalized recommendations by matching CIG knowledge with a highly-adaptive patient model, the parameters of which are stored in a personal health record (PHR). The PHR integrates data from hospital medical records, mobile biosensors, data entered by patients, and recommendations and abstractions output by the DSS. CIGs are customized to consider the patients' psycho-social context and their preferences; shared decision making is supported via decision trees instantiated with patient utilities. The central DSS "projects" personalized CIG-knowledge to a mobile DSS operating on the patients' smart phones that applies that knowledge locally. In this paper we explain the knowledge elicitation and specification methodologies that we have developed for making CIGs patient-centered and enabling their personalization. We then demonstrate feasibility, in two very different clinical domains, and two different geographic sites, as part of a multinational feasibility study, of the full architecture that we have designed and implemented. We analyze usage patterns and opinions collected via questionnaires of the 10 atrial fibrillation (AF) and 20 gestational diabetes mellitus (GDM) patients and their care providers. The analysis is guided by three hypotheses concerning the effect of the personal patient model on patients and clinicians' behavior and on patients' satisfaction. The results demonstrate the sustainable usage of the system by patients and their care providers and patients' satisfaction, which stems mostly from their increased sense of safety. The system has affected the behavior of clinicians, which have inspected the patients' models between scheduled visits, resulting in change of diagnosis for two of the ten AF patients and anticipated change in therapy for eleven of the twenty GDM patients.
The outcome of a collective decision-making process, such as crowdsourcing, often relies on the p... more The outcome of a collective decision-making process, such as crowdsourcing, often relies on the procedure through which the perspectives of its individual members are aggregated. Popular aggregation methods, such as the majority rule, often fail to produce the optimal result, especially in high-complexity tasks. Methods that rely on meta-cognitive information, such as confidence-based methods and the Surprisingly Popular Option, had shown an improvement in various tasks. However, there is still a significant number of cases with no optimal solution. Our aim is to exploit metacognitive information and to learn from it, for the purpose of enhancing the ability of the group to produce a correct answer. Specifically, we propose two different feature-representation approaches: (1) Response-Centered feature Representation (RCR), which focuses on the characteristics of the individual response instances, and (2) Answer-Centered feature Representation (ACR), which focuses on the characteristics of each of the potential answers. Using these two feature-representation approaches, we train Machine-Learning (ML) models, for the purpose of predicting the correctness of a response and of an answer. The trained models are used as the basis of an ML-based aggregation methodology that, contrary to other ML-based techniques, has the advantage of being a "one-shot" technique, independent from the crowd-specific composition and personal record, and adaptive to various types of situations. To evaluate our methodology, we collected 2490 responses for different tasks, which we used for feature engineering and for the training of ML models. We tested our feature-representation approaches through the performance of our proposed ML-based aggregation methods. The results show an increase of 20% to 35% in the success rate, compared to the use of standard rule-based aggregation methods.
The temporal dynamics of social interactions were shown to influence the spread of disease. Here,... more The temporal dynamics of social interactions were shown to influence the spread of disease. Here, we model the conditions of progression and competition for several viral strains, exploring various levels of cross-immunity over temporal networks. We use our interaction-driven contagion model and characterize, using it, several viral variants. Our results, obtained on temporal random networks and on real-world interaction data, demonstrate that temporal dynamics are crucial to determining the competition results. We consider two and three competing pathogens and show the conditions under which a slower pathogen will remain active and create a second wave infecting most of the population. We then show that when the duration of the encounters is considered, the spreading dynamics change significantly. Our results indicate that when considering airborne diseases, it might be crucial to consider the duration of temporal meetings to model the spread of pathogens in a population.
Contacts’ temporal ordering and dynamics, such as their order and timing, are crucial for underst... more Contacts’ temporal ordering and dynamics, such as their order and timing, are crucial for understanding the transmission of infectious diseases. Using path-preserving temporal networks, we evaluate the effect of spatial pods (social distancing pods) and temporal pods (meetings’ rate reduction) on the spread of the disease. We use our interaction-driven contagion model, instantiated for COVID-19, over history-maintaining random temporal networks as well as over realworld contacts. We find that temporal pods significantly reduce the overall number of infected individuals and slow the spread of the disease. This result is robust under changing initial conditions, such as initial patients’ numbers and locations. Social distancing (spatial) pods perform well only at the initial phase of the disease, i.e., with a minimal number of initial patients. Using real-life contact information and extending our interaction-driven model to consider the exposures, we demonstrate the beneficial effect...
Detecting unknown worms is a challenging task. Extant solutions, such as anti-virus tools, rely m... more Detecting unknown worms is a challenging task. Extant solutions, such as anti-virus tools, rely mainly on prior explicit knowledge of specific worm signatures. As a result, after the appearance of a new worm on the Web there is a significant delay until an update carrying the worm's signature is distributed to anti-virus tools. During this time interval a new worm can infect many computers and create significant damage. We propose an innovative technique for detecting the presence of an unknown worm, not necessarily by recognizing specific instances of the worm, but rather based on the computer measurements. We designed an experiment to test the new technique employing several computer configurations and background applications activity. During the experiments 323 computer features were monitored. Four feature selection techniques were used to reduce the amount of features and four classification algorithms were applied on the resulting feature subsets. Our results indicate that using this approach resulted, in above 90% average accuracy, and for specific unknown worms accuracy reached above 99%, using just 20 features while maintaining a low level of false positive rate.
Discretization is widely used in data mining as a preprocessing step; discretization usually lead... more Discretization is widely used in data mining as a preprocessing step; discretization usually leads to improved performance. In time series analysis commonly the data is divided into time windows. Measurements are extracted from the time win- dow into a vectorial representation and static mining methods are applied, which avoids an ex- plicit analysis along time. Abstracting time series into meaningful
Clinicians can benefit from automated support to guideline (GL) application at the point of care.... more Clinicians can benefit from automated support to guideline (GL) application at the point of care. However, several conceptual dimensions should be considered for a realistic application: 1) The representation of the knowledge might be through structured text (semi-formal), or specified in a machinecomprehensible language (formal); 2) The availability of electronic patient data might be partial or full; 3) GL-based recommendations might be triggered by a human-initiated (synchronous) session, or data-driven (asynchronous). In addition, several requirements must be fulfilled, such as an evaluation of the GL application engine by a GL simulation engine. Finally, to apply multiple GLs, by multiple users, in multiple settings, the GL-application engine should be designed as an enterprise architecture that can plug into any Electronic Medical Record (EMR). We present an architecture fulfilling these desiderata, describe application examples with different conceptual dimensions and requirements, using our new GL-application engine, PICARD, discuss lessons learned, and briefly describe a clinical evaluation of the current framework in the domain of pre-eclampsia/toxemia of pregnancy.
Journal of Urban Health-bulletin of The New York Academy of Medicine, Apr 4, 2022
obesity rate, mean commute time, and mask usage statistics significantly affected morbidity rates... more obesity rate, mean commute time, and mask usage statistics significantly affected morbidity rates, while ethnicity, median income, poverty rate, and education levels heavily influenced mortality rates. Surprisingly, the correlation between several of these factors and COVID-19 morbidity and mortality gradually shifted and even reversed during the study period; our analysis suggests that this phenomenon was probably due to COVID-19 being initially associated with more urbanized areas and, then, from 9/2020, with less urbanized ones. Thus, socioeconomic features such as ethnicity, education, and economic disparity are the major factors for predicting county-level COVID-19 mortality rates. Between counties, low variance factors (e.g., age) are not meaningful predictors. The inversion of some correlations over time can be explained by COVID-19 spreading from urban to rural areas.
We followed 295 young infantry recruits during their first 14 weeks of basic training. The preval... more We followed 295 young infantry recruits during their first 14 weeks of basic training. The prevalence of smoking increased by 50%. About half of this increase was accounted for by ex-smokers, 57% of whom had resumed the habit. Average education and military psychometric measures of both the baseline smokers and the new smokers were significantly lower than those of the abstaining never-smokers. Asian and North African origin and a lower peer group evaluation score were also risk factors. These relationships were not demonstrated among resuming ex-smokers. The rise in the smoking rate accounts for most of the known rise during full military service. We suggest early preventive measures, especially for the two groups at risk.
Please cite this article in press as: Goldstein A, et al. Evaluation of an automated knowledge-ba... more Please cite this article in press as: Goldstein A, et al. Evaluation of an automated knowledge-based textual summarization system for longitudinal clinical data, in the intensive care domain. Artif Intell Med (2017),
We use a constraint-based language to specify periodic temporal patterns. Our language is intende... more We use a constraint-based language to specify periodic temporal patterns. Our language is intended to be simple to use, while allowing a wide variety of patterns to be expressed. The language solves problems such as (1) how to use calendarbased constraints to define repetition of a periodic event, (2) what temporal relations must exist between consecutive repeats of a pattern, and (3) how expressivity is limited if the same temporal relations must hold between each pair of intervals in the pattern. This language has been implemented in a temporal-abstraction system called Résumé, and researchers have used it in a graphical knowledge-acquisition tool to acquire domain-specific knowledge from experts about patterns to be found in large databases. We summarize the results of preliminary experiments using the patternspecification and pattern-detection tools on data on patients who have cancer and were seen at the University of Chicago bone-marrow-transplantation center.
Skeletal plans are a powerful way to reuse existing domain-specific procedural knowledge. In the ... more Skeletal plans are a powerful way to reuse existing domain-specific procedural knowledge. In the $VJDDUG project, a set of tasks that support the design and the execution of skeletal plans by a human executing agent other than the original plan designer are designed. The underlying requirement to develop task-specific problem-solving methods is a PRGHOLQJ ODQJXDJe. Therefore, within the Asgaard project, a time-oriented, intention-based language, called $VEUX, was developed. During the design phase of plans, Asbru allows to express durative actions and plans caused by durative states of an observed agent. The intentions underlying these plans are represented explicitly as temporal patterns to be maintained, achieved or avoided. We will present the underlying idea of the Asgaard project and explain the time-oriented Asbru language. Finally, we show the benefits and limitations of the time-oriented, skeletal plan representation to be applicable in real-world, high-frequency domains.
International Journal of Computer and Information Engineering, Sep 22, 2008
Computer worm detection is commonly performed by antivirus software tools that rely on prior expl... more Computer worm detection is commonly performed by antivirus software tools that rely on prior explicit knowledge of the worm's code (detection based on code signatures). We present an approach for detection of the presence of computer worms based on Artificial Neural Networks (ANN) using the computer's behavioral measures. Identification of significant features, which describe the activity of a worm within a host, is commonly acquired from security experts. We suggest acquiring these features by applying feature selection methods. We compare three different feature selection techniques for the dimensionality reduction and identification of the most prominent features to capture efficiently the computer behavior in the context of worm activity. Additionally, we explore three different temporal representation techniques for the most prominent features. In order to evaluate the different techniques, several computers were infected with five different worms and 323 different features of the infected computers were measured. We evaluated each technique by preprocessing the dataset according to each one and training the ANN model with the preprocessed data. We then evaluated the ability of the model to detect the presence of a new computer worm, in particular, during heavy user activity on the infected computers.
Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019
Sepsis is a condition caused by the body's overwhelming and life-threatening response to infe... more Sepsis is a condition caused by the body's overwhelming and life-threatening response to infection, which can lead to tissue damage, organ failure, and finally death. Today, sepsis is one of the leading causes of mortality among populations in intensive care units (ICUs). Sepsis is difficult to predict, diagnose, and treat, as it involves analyzing different sets of multivariate time-series, usually with problems of missing data, different sampling frequencies, and random noise. Here, we propose a new dynamic-behavior-based model, which we call a Temporal Probabilistic proFile (TPF), for classification and prediction tasks of multivariate time series. In the TPF method, the raw, time-stamped data are first abstracted into a series of higher-level, meaningful concepts, which hold over intervals characterizing time periods. We then discover frequently repeating temporal patterns within the data. Using the discovered patterns, we create a probabilistic distribution of the temporal patterns of the overall entity population, of each target class in it, and of each entity. We then exploit TPFs as meta-features to classify the time series of new entities, or to predict their outcome, by measuring their TPF distance, either to the aggregated TPF of each class, or to the individual TPFs of each of the entities, using negative cross entropy. Our experimental results on a large benchmark clinical data set show that TPFs improve sepsis prediction capabilities, and perform better than other machine learning approaches.
Increasingly, frequent temporal patterns discovered in longitudinal patient records are proposed ... more Increasingly, frequent temporal patterns discovered in longitudinal patient records are proposed as features for classification and prediction, and as means to cluster patient clinical trajectories. However, to justify that, we must demonstrate that most frequent temporal patterns are indeed consistently discoverable within the records of different patient subsets within similar patient populations. We have developed several measures for the consistency of the discovery of temporal patterns. We focus on time-interval relations patterns (TIRPs) that can be discovered within different subsets of the same patient population. We expect the discovered TIRPs (1) to be frequent in each subset, (2) preserve their "local" metrics - the absolute frequency of each pattern, measured by a Proportion Test, and (3) preserve their "global" characteristics - their overall distribution, measured by a Kolmogorov-Smirnov test. We also wanted to examine the effect on consistency, over a variety of settings, of varying the minimal frequency threshold for TIRP discovery, and of using a TIRP-filtering criterion that we previously introduced, the Semantic Adjacency Criterion (SAC). We applied our methodology to three medical domains (oncology, infectious hepatitis, and diabetes). We found that, within the minimal frequency ranges we had examined, 70-95% of the discovered TIRPs were consistently discoverable; 40-48% of them maintained their local frequency. TIRP global distribution similarity varied widely, from 0% to 65%. Increasing the threshold usually increased the percentage of TIRPs that were repeatedly discovered across different patient subsets within the same domain, and the probability of a similar TIRP distribution. Using the SAC principle, enhanced, for most minimal support levels, the percentage of repeating TIRPs, their local consistency and their global consistency. The effect of using the SAC was further strengthened as the minimal frequency threshold was raised.
MobiGuide is a ubiquitous, distributed and personalized evidence-based decision-support system (D... more MobiGuide is a ubiquitous, distributed and personalized evidence-based decision-support system (DSS) used by patients and their care providers. Its central DSS applies computer-interpretable clinical guidelines (CIGs) to provide real-time patient-speciflc and personalized recommendations by matching CIG knowledge with a highly-adaptive patient model, the parameters of which are stored in a personal health record (PHR). The PHR integrates data from hospital medical records, mobile biosensors, data entered by patients, and recommendations and abstractions output by the DSS. CIGs are customized to consider the patients' psycho-social context and their preferences; shared decision making is supported via decision trees instantiated with patient utilities. The central DSS "projects" personalized CIG-knowledge to a mobile DSS operating on the patients' smart phones that applies that knowledge locally. In this paper we explain the knowledge elicitation and specification methodologies that we have developed for making CIGs patient-centered and enabling their personalization. We then demonstrate feasibility, in two very different clinical domains, and two different geographic sites, as part of a multinational feasibility study, of the full architecture that we have designed and implemented. We analyze usage patterns and opinions collected via questionnaires of the 10 atrial fibrillation (AF) and 20 gestational diabetes mellitus (GDM) patients and their care providers. The analysis is guided by three hypotheses concerning the effect of the personal patient model on patients and clinicians' behavior and on patients' satisfaction. The results demonstrate the sustainable usage of the system by patients and their care providers and patients' satisfaction, which stems mostly from their increased sense of safety. The system has affected the behavior of clinicians, which have inspected the patients' models between scheduled visits, resulting in change of diagnosis for two of the ten AF patients and anticipated change in therapy for eleven of the twenty GDM patients.
The outcome of a collective decision-making process, such as crowdsourcing, often relies on the p... more The outcome of a collective decision-making process, such as crowdsourcing, often relies on the procedure through which the perspectives of its individual members are aggregated. Popular aggregation methods, such as the majority rule, often fail to produce the optimal result, especially in high-complexity tasks. Methods that rely on meta-cognitive information, such as confidence-based methods and the Surprisingly Popular Option, had shown an improvement in various tasks. However, there is still a significant number of cases with no optimal solution. Our aim is to exploit metacognitive information and to learn from it, for the purpose of enhancing the ability of the group to produce a correct answer. Specifically, we propose two different feature-representation approaches: (1) Response-Centered feature Representation (RCR), which focuses on the characteristics of the individual response instances, and (2) Answer-Centered feature Representation (ACR), which focuses on the characteristics of each of the potential answers. Using these two feature-representation approaches, we train Machine-Learning (ML) models, for the purpose of predicting the correctness of a response and of an answer. The trained models are used as the basis of an ML-based aggregation methodology that, contrary to other ML-based techniques, has the advantage of being a "one-shot" technique, independent from the crowd-specific composition and personal record, and adaptive to various types of situations. To evaluate our methodology, we collected 2490 responses for different tasks, which we used for feature engineering and for the training of ML models. We tested our feature-representation approaches through the performance of our proposed ML-based aggregation methods. The results show an increase of 20% to 35% in the success rate, compared to the use of standard rule-based aggregation methods.
The temporal dynamics of social interactions were shown to influence the spread of disease. Here,... more The temporal dynamics of social interactions were shown to influence the spread of disease. Here, we model the conditions of progression and competition for several viral strains, exploring various levels of cross-immunity over temporal networks. We use our interaction-driven contagion model and characterize, using it, several viral variants. Our results, obtained on temporal random networks and on real-world interaction data, demonstrate that temporal dynamics are crucial to determining the competition results. We consider two and three competing pathogens and show the conditions under which a slower pathogen will remain active and create a second wave infecting most of the population. We then show that when the duration of the encounters is considered, the spreading dynamics change significantly. Our results indicate that when considering airborne diseases, it might be crucial to consider the duration of temporal meetings to model the spread of pathogens in a population.
Contacts’ temporal ordering and dynamics, such as their order and timing, are crucial for underst... more Contacts’ temporal ordering and dynamics, such as their order and timing, are crucial for understanding the transmission of infectious diseases. Using path-preserving temporal networks, we evaluate the effect of spatial pods (social distancing pods) and temporal pods (meetings’ rate reduction) on the spread of the disease. We use our interaction-driven contagion model, instantiated for COVID-19, over history-maintaining random temporal networks as well as over realworld contacts. We find that temporal pods significantly reduce the overall number of infected individuals and slow the spread of the disease. This result is robust under changing initial conditions, such as initial patients’ numbers and locations. Social distancing (spatial) pods perform well only at the initial phase of the disease, i.e., with a minimal number of initial patients. Using real-life contact information and extending our interaction-driven model to consider the exposures, we demonstrate the beneficial effect...
Uploads
Papers by Yuval Shahar