Papers by Aris Gkoulalas-divanis
Mobility, Data Mining and Privacy, 2008
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11, 2011
Sequence datasets are encountered in a plethora of applications spanning from web usage analysis ... more Sequence datasets are encountered in a plethora of applications spanning from web usage analysis to healthcare studies and ubiquitous computing. Disseminating such datasets offers remarkable opportunities for discovering interesting knowledge patterns, but may lead to serious privacy violations if sensitive patterns, such as business secrets, are disclosed. In this work, we consider how to sanitize data to prevent the disclosure of sensitive patterns during sequential pattern mining, while ensuring that the nonsensitive patterns can still be discovered. First, we re-define the problem of sequential pattern hiding to capture the information loss incurred by sanitization in terms of both events' modification (distortion) and lost nonsensitive knowledge patterns (side-effects). Second, we model sequences as graphs and propose two algorithms to solve the problem by operating on the graphs. The first algorithm attempts to sanitize data with minimal distortion, whereas the second focuses on reducing the side-effects. Extensive experiments show that our algorithms outperform the existing solution in terms of data distortion and side-effects and are more efficient.
Proceedings of the 4th International Workshop on Privacy and Anonymity in the Information Society - PAIS '11, 2011
Transaction data about individuals are increasingly collected to support a plethora of applicatio... more Transaction data about individuals are increasingly collected to support a plethora of applications, spanning from marketing to biomedical studies. Publishing these data is required by many organizations, but may result in privacy breaches, if an attacker exploits potentially identifying information to link individuals to their records in the published data. Algorithms that prevent this threat by transforming transaction data prior to their release have been proposed recently, but incur significant information loss due to their inability to accommodate a range of different privacy requirements that data owners often have. To address this issue, we propose a novel clustering-based framework to anonymizing transaction data. Our framework provides the basis for designing algorithms that explore a larger solution space than existing methods, which allows publishing data with less information loss, and can satisfy a wide range of privacy requirements. Based on this framework, we develop PCTA, a generalization-based algorithm to construct anonymizations that incur a small amount of information loss under many different privacy requirements. Experiments with benchmark datasets verify that PCTA significantly outperforms the current state-of-the-art algorithms in terms of data utility, while being comparable in terms of efficiency.
Crossroads, 2009
As it becomes evident, there exists an extended set of application scenarios in which information... more As it becomes evident, there exists an extended set of application scenarios in which information or knowledge derived from the data must be shared with other (possibly untrusted) entities. The sharing of data and/or knowledge may come at a cost to privacy, primarily due to two reasons:
SIAM International Conference on Data Mining, 2009
In this paper, we propose a privacy model that offers trajectory pri- vacy to the requesters of L... more In this paper, we propose a privacy model that offers trajectory pri- vacy to the requesters of Location-Based Services (LBSs), by uti- lizing an underlying network of user movement. The privacy model has been implemented as a framework that (i) reconstructs the user movement from a series of independent location updates, (ii) identi- fies routes where user privacy is at
Proceedings of the 15th ACM international conference on Information and knowledge management - CIKM '06, 2006
The rapid growth of transactional data brought, soon enough, into attention the need of its furth... more The rapid growth of transactional data brought, soon enough, into attention the need of its further exploitation. In this paper, we investigate the problem of securing sensitive knowledge from being exposed in patterns extracted during association rule mining. Instead of hiding the produced rules directly, we decide to hide the sensitive frequent itemsets that may lead to the production of these rules. As a first step, we introduce the notion of distance between two databases and a measure for quantifying it. By trying to minimize the distance between the original database and its sanitized version (that can safely be released), we propose a novel, exact algorithm for association rule hiding and evaluate it on real world datasets demonstrating its effectiveness towards solving the problem.
2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware, 2009
The widespread adoption of Location Based Services (LBSs) coupled with recent advances in locatio... more The widespread adoption of Location Based Services (LBSs) coupled with recent advances in location tracking technologies, pose serious concerns to user privacy. As a consequence, privacy preserving approaches have been proposed to protect the location information which is communicated during a request for an LBS. Most existing approaches are centralized as they rely on a trusted server to protect the real location of the user. Although the centralized approaches are commonplace, so far no attempt has been made to integrate them in a unified framework. Such an integration would provide the means for easily implementing and testing new techniques by offering ready-made vanilla system components and allow for both the experimental and analytical evaluation of the implemented techniques.
Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM '11, 2011
Existing approaches for privacy-aware mobility data sharing aim at publishing an anonymized versi... more Existing approaches for privacy-aware mobility data sharing aim at publishing an anonymized version of the mobility dataset, operating under the assumption that most of the information in the original dataset can be disclosed without causing any privacy violations. In this paper, we assume that the majority of the information that exists in the mobility dataset must remain private and the
Proceedings of the 15th International Conference on Extending Database Technology - EDBT '12, 2012
Mobility data sources feed larger and larger trajectory databases nowadays. Due to the need of ex... more Mobility data sources feed larger and larger trajectory databases nowadays. Due to the need of extracting useful knowledge patterns that improve services based on users' and customers' behavior, querying and mining such databases has gained significant attention in recent years. However, publishing mobility data may lead to severe privacy violations. In this paper, we present Private-HERMES, an integrated platform for applying data mining and privacy-preserving querying over mobility data. The presented platform provides a two-dimension benchmark framework that includes: (i) a query engine that provides privacyaware data management functionality of the in-house data via a set of auditing mechanisms that protect the sensitive information against several types of attacks, and (ii) a progressive analysis framework, which, apart from anonymization methods for data publishing, includes various well-known mobility data mining techniques to evaluate the effect of anonymization in the querying and mining results. The demonstration of Private-HERMES via a real-world case study, illustrates the flexibility and usefulness of the platform for supporting privacy-aware data analysis, as well as for providing an extensible blueprint benchmark architecture for privacy-preservation related methods in mobility data.
Lecture Notes in Computer Science, 2013
Publishing datasets about individuals that contain both relational and transaction (i.e., set-val... more Publishing datasets about individuals that contain both relational and transaction (i.e., set-valued) attributes is essential to support many applications, ranging from healthcare to marketing. However, preserving the privacy and utility of these datasets is challenging, as it requires (i) guarding against attackers, whose knowledge spans both attribute types, and (ii) minimizing the overall information loss. Existing anonymization techniques are not applicable to such datasets, and the problem cannot be tackled based on popular, multi-objective optimization strategies. This work proposes the first approach to address this problem. Based on this approach, we develop two frameworks to offer privacy, with bounded information loss in one attribute type and minimal information loss in the other. To realize each framework, we propose privacy algorithms that effectively preserve data utility, as verified by extensive experiments.
ITAB 2010 Corfu, Greece : 10th International Conference on Information Technology and Applications in Biomedicine : Emerging Technologies for Patient Specific Healthcare : 2-5 November 2010, Aquis Corfu Holiday Palace Hotel, Greece. Int..., 2010
Patient-specific records contained in Electronic Medical Record (EMR) systems are increasingly co... more Patient-specific records contained in Electronic Medical Record (EMR) systems are increasingly combined with genomic sequences and deposited into bio-repositories. This allows researchers to perform large-scale, low-cost biomedical studies, such as Genome-Wide Association Studies (GWAS) aimed at identifying associations between genetic factors and complex health-related phenomena, which are an integral facet of personalized medicine. Disseminating this data, however, raises serious privacy concerns because patients' genomic sequences can be linked to their identities through diagnosis codes. This work proposes an approach that guards against this type of data linkage by modifying diagnosis codes in a way that limits the probability of associating a patient's identity to their genomic sequence. Experiments using EMRs from the Vanderbilt University Medical Center verify that our approach generates data that can support up to 29.4% more GWAS than the best-so-far method, while p...
Proceedings of the 2009 SIAM International Conference on Data Mining, 2009
In this paper, we propose a privacy model that offers trajectory privacy to the requesters of Loc... more In this paper, we propose a privacy model that offers trajectory privacy to the requesters of Location-Based Services (LBSs), by utilizing an underlying network of user movement. The privacy model has been implemented as a framework that (i) reconstructs the user movement from a series of independent location updates, (ii) identifies routes where user privacy is at risk, and (iii) anonymizes online user requests for LBSs to protect the requester for as long as the service withstands completion. In order to achieve (iii), we propose two anonymization techniques, the K-present (weak) and the K-frequent (strong) trajectory anonymity, and a second chance approach that takes over when anonymization fails to ensure that the privacy of the user is preserved. To the best of our knowledge, this is the first work to propose a trajectory privacy model that utilizes an underlying network of user movement to offer in an interactive way personalized privacy to online user requests on trajectory data.
Lecture Notes in Computer Science, 2010
Publishing transaction data containing individuals’ activities may risk privacy breaches, so the ... more Publishing transaction data containing individuals’ activities may risk privacy breaches, so the need for anonymizing such data before their release is increasingly recognized by organizations. Several approaches have been proposed recently to deal with this issue, but they are still inadequate for preserving both data utility and privacy. Some incur unnecessary information loss in order to protect data, while others
IFIP – The International Federation for Information Processing, 2008
The hiding of sensitive knowledge, mined from transactional databases, is one of the primary goal... more The hiding of sensitive knowledge, mined from transactional databases, is one of the primary goals of privacy preserving data mining. The increased storage capabilities of modern databases and the necessity for hiding solutions of superior quality, paved the way for parallelization of the hiding process. In this paper, we introduce a novel framework for decomposition and parallel solving of a
ACM SIGKDD Explorations Newsletter, 2011
Mobility, Data Mining and Privacy, 2008
Citation: Privacy and Security in Spatio-Temporal Data and Trajectories/V. Verykios, ML Damiani, ... more Citation: Privacy and Security in Spatio-Temporal Data and Trajectories/V. Verykios, ML Damiani, A. Gkoulalas-Divanis-In: Mobility, Data Mining and Privacy: Geographic Knowledge Discovery/[a cura di] F. Giannotti, D. Pedreschi.-Berlin: Springer, 2008.-ISBN ...
Lecture Notes in Computer Science, 2008
This paper introduces a privacy model for location based services that utilizes collected movemen... more This paper introduces a privacy model for location based services that utilizes collected movement data to identify parts of the user trajectories, where user privacy is at an elevated risk. To protect the privacy of the user, the proposed methodology transforms the original requests into anonymous counterparts by offering trajectory K–anonymity. As a proof of concept, we build a working
19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007), 2007
In this paper, we propose a novel, exact border-based approach that provides an optimal solution ... more In this paper, we propose a novel, exact border-based approach that provides an optimal solution for the hiding of sensitive frequent itemsets by (i) minimally extending the original database by a synthetically generated database part - the database extension, (ii) formulating the creation of the database extension as a constraint satisfaction problem that is solved by using binary integer programming, and (Hi) providing an approximate solution close to the optimal one when an ideal solution does not exist. Extending the original database for sensitive itemset hiding is proved to provide optimal solutions to an extended set of hiding problems compared to previous approaches and to provide solutions of higher quality.
ACM SIGKDD Explorations Newsletter, 2008
Advances in telecommunications and GPS sensors technology have made possible the collection of da... more Advances in telecommunications and GPS sensors technology have made possible the collection of data like time series of locations, related to the movement of individuals. The analysis of this, so-called trajectory data, is beneficial both for the individuals (e.g., through location-based services) and for the community as a whole (e.g., decision support for urban planning or traffic control). However, because of the very nature of this data, strict safeguards must be enforced to ensure the privacy of the individuals, whose movement is recorded. In this paper, we present a privacy-aware trajectory tracking query engine that offers strict guarantees about what can be observed by untrusted third parties. Through the query engine, subscribed users can gain restricted access to an in-house trajectory data warehouse, to perform certain analysis tasks. In addition to regular queries involving non-spatial non-temporal attributes, the engine supports a variety of spatiotemporal queries, including range queries, nearest neighbor queries and queries for aggregate statistics. The query results are augmented with fake trajectory data (dummies) to fulfil the requirements of K-anonymity. Through qualitative analysis, we prove the effectiveness of our approach towards blocking certain types of attacks, while minimally distorting the dataset.
Uploads
Papers by Aris Gkoulalas-divanis