Skip to main content

Christoph Koch

Cornell University, Computer Science, Faculty Member

Followers

62

Following

7

Public Views

Address: 4105A Upson Hall
Cornell University
Ithaca, Ny 14853

less

Interests

Uploads

Papers by Christoph Koch

$Research paper thumbnail of $ ${10^{(10^{6})}} $ $ worlds and beyond: efficient representation and processing of incomplete information$

$ ${10^{(10^{6})}} $ $ worlds and beyond: efficient representation and processing of incomplete information

Abstract We present a decomposition-based approach to managing probabilistic information. We intr... more Abstract We present a decomposition-based approach to managing probabilistic information. We introduce world-set decompositions (WSDs), a space-efficient and complete representation system for finite sets of worlds. We study the problem of efficiently evaluating relational algebra queries on world-sets represented by WSDs. We also evaluate our technique experimentally in a large census data scenario and show that it is both scalable and efficient.

Schema-based scheduling of event processors and buffer minimization for queries on structured data streams

Abstract We introduce an extension of the XQuery language, FluX, that supports event-based query ... more Abstract We introduce an extension of the XQuery language, FluX, that supports event-based query processing and the conscious handling of main memory buffers. Purely event-based queries of this language can be executed on streaming XML data in a very direct way. We then develop an algorithm that allows to efficiently rewrite XQueries into the event-based FluX language. This algorithm uses order constraints from a DTD to schedule event handlers and to thus minimize the amount of buffering required for evaluating a query.

Efficient algorithms for the tree homeomorphism problem

Tree pattern matching is a fundamental problem that has a wide range of applications in Web data ... more Tree pattern matching is a fundamental problem that has a wide range of applications in Web data management, XML processing, and selective data dissemination. In this paper we develop efficient algorithms for the tree homeomorphism problem, ie, the problem of matching a tree pattern with exclusively transitive (descendant) edges. We first prove that deciding whether there is a tree homeomorphism is LOGSPACE-complete, improving on the current LOGCFL upper bound.

DLV-a system for declarative problem solving

System description: DLV

DLV is an efficient Answer Set Programming (ASP) system implementing the consistent answer set se... more

Better scripts, better games

Abstract This survey gives an overview of formal results on the XML query language XPath. We iden... more Abstract This survey gives an overview of formal results on the XML query language XPath. We identify several important fragments of XPath, focusing on subsets of XPath 1.0. We then give results on the expressiveness of XPath and its fragments compared to other formalisms for querying trees, algorithms, and complexity bounds for evaluation of XPath queries, as well as static analysis of XPath queries.

DBToaster: higher-order delta processing for dynamic, frequently fresh views

Abstract Applications ranging from algorithmic trading to scientific data analysis require realti... more Abstract Applications ranging from algorithmic trading to scientific data analysis require realtime analytics based on views over databases that change at very high rates. Such views have to be kept fresh at low maintenance cost and latencies. At the same time, these views have to support classical SQL, rather than window semantics, to enable applications that combine current with aged or historical data.

On Information Integration in Large Scientific Collaborations

Abstract We discuss the requirements for information integration in large scientific collaboratio... more Abstract We discuss the requirements for information integration in large scientific collaborations and arrive at the conclusion that an architecture is needed that follows the declarative paradigm for reasoning completeness, maintainability and reuse of previously encoded knowledge but does not take the classical approach of integrating all sources against a single common “global” information model.

Querying the web reconsidered: Design principles for versatile web query languages

Abstract A decade of experience with research proposals as well as standardized query languages f... more

Fast and simple relational processing of uncertain data

Abstract This paper introduces U-relations, a succinct and purely relational representation syste... more Abstract This paper introduces U-relations, a succinct and purely relational representation system for uncertain databases. U-relations support attribute-level uncertainty using vertical partitioning. If we consider positive relational algebra extended by an operation for computing possible answers, a query on the logical level can be translated into, and evaluated as, a single relational algebra query on the U-relational representation.

The dlv system: Model generator and application frontends

During the last years, much research has been done concerning semantics and complexity of Disjunc... more During the last years, much research has been done concerning semantics and complexity of Disjunctive Deductive Databases (DDDBs). While DDDBs | function-free disjunctive logic programs with negation in rule bodies allowed | are now generally considered a powerful tool for common-sense reasoning and knowledge representation, there has been a shortage of actual (let alone e cient) implementations ( ST94, ADN97]). This paper presents a brief overview of the architecture of the dlv (datalog with disjunction) system system currently developed at TU Wien in the FWF project P11580-MAT \A Query System for Disjunctive Deductive Databases", especially focusing on the Model Generator { the \heart" of the dlv system { and the integrated frontends for diagnostic reasoning and SQL3.

MayBMS: Managing incomplete information with probabilistic world-set decompositions

Abstract Managing incomplete information is important in many real world applications. In this de... more

MayBMS: a probabilistic database management system

Abstract MayBMS is a state-of-the-art probabilistic database management system which leverages th... more Abstract MayBMS is a state-of-the-art probabilistic database management system which leverages the strengths of previous database research for achieving scalability. As a proof of concept for its ease of use, we have built on top of MayBMS a Web-based application that offers NBA-related information based on what-if analysis of team dynamics using data available at www. nba. com.

Monadic queries over tree-structured data

Abstract Monadic query languages over trees currently receive considerable interest in the databa... more

Processing queries on tree-structured data efficiently

Abstract This is a survey of algorithms, complexity results, and general solution techniques for ... more Abstract This is a survey of algorithms, complexity results, and general solution techniques for efficiently processing queries on tree-structured data. I focus on query languages that compute nodes or tuples of nodes—conjunctive queries, first-order queries, datalog, and XPath. I also point out a number of connections among previous results that have not been observed before.

Cooperative update exchange in the Youtopia system

Abstract Youtopia is a platform for collaborative management and integration of relational data. ... more Abstract Youtopia is a platform for collaborative management and integration of relational data. At the heart of Youtopia is an update exchange abstraction: changes to the data propagate through the system to satisfy user-specified mappings. We present a novel change propagation model that combines a deterministic chase with human intervention. The process is fundamentally cooperative and gives users significant control over how mappings are repaired.

Approximating predicates and expressive queries on probabilistic databases

Abstract We study complexity and approximation of queries in an expressive query language for pro... more Abstract We study complexity and approximation of queries in an expressive query language for probabilistic databases. The language studied supports the compositional use of confidence computation. It allows for a wide range of new use cases, such as the computation of conditional probabilities and of selections based on predicates that involve marginal and conditional probabilities. These features have important applications in areas such as data cleaning and the processing of sensor data.

Optimizing Queries Using a Meta-level Database

Abstract: Graph simulation (using graph schemata or data guides) has been successfully proposed a... more Abstract: Graph simulation (using graph schemata or data guides) has been successfully proposed as a technique for adding structure to semistructured data. Design patterns for description (such as meta-classes and homomorphisms between schema layers), which are prominent in the object-oriented programming community, constitute a generalization of this graph simulation approach.

Sparse projections onto the simplex

Abstract: The past decade has seen the rise of $\ ell_1 $-relaxation methods to promote sparsity ... more Abstract: The past decade has seen the rise of $\ ell_1 $-relaxation methods to promote sparsity for better interpretability and generalization of learning results. However, there are several important learning applications, such as Markowitz portolio selection and sparse mixture density estimation, that feature simplex constraints, which disallow the application of the standard $\ ell_1 $-penalty. In this setting, we show how to efficiently obtain sparse projections onto the positive and general simplex with sparsity constraints.

$Research paper thumbnail of $ ${10^{(10^{6})}} $ $ worlds and beyond: efficient representation and processing of incomplete information$

$ ${10^{(10^{6})}} $ $ worlds and beyond: efficient representation and processing of incomplete information

Abstract We present a decomposition-based approach to managing probabilistic information. We intr... more Abstract We present a decomposition-based approach to managing probabilistic information. We introduce world-set decompositions (WSDs), a space-efficient and complete representation system for finite sets of worlds. We study the problem of efficiently evaluating relational algebra queries on world-sets represented by WSDs. We also evaluate our technique experimentally in a large census data scenario and show that it is both scalable and efficient.

Schema-based scheduling of event processors and buffer minimization for queries on structured data streams

Abstract We introduce an extension of the XQuery language, FluX, that supports event-based query ... more Abstract We introduce an extension of the XQuery language, FluX, that supports event-based query processing and the conscious handling of main memory buffers. Purely event-based queries of this language can be executed on streaming XML data in a very direct way. We then develop an algorithm that allows to efficiently rewrite XQueries into the event-based FluX language. This algorithm uses order constraints from a DTD to schedule event handlers and to thus minimize the amount of buffering required for evaluating a query.

Efficient algorithms for the tree homeomorphism problem

Tree pattern matching is a fundamental problem that has a wide range of applications in Web data ... more Tree pattern matching is a fundamental problem that has a wide range of applications in Web data management, XML processing, and selective data dissemination. In this paper we develop efficient algorithms for the tree homeomorphism problem, ie, the problem of matching a tree pattern with exclusively transitive (descendant) edges. We first prove that deciding whether there is a tree homeomorphism is LOGSPACE-complete, improving on the current LOGCFL upper bound.

DLV-a system for declarative problem solving

System description: DLV

DLV is an efficient Answer Set Programming (ASP) system implementing the consistent answer set se... more

Better scripts, better games

Abstract This survey gives an overview of formal results on the XML query language XPath. We iden... more Abstract This survey gives an overview of formal results on the XML query language XPath. We identify several important fragments of XPath, focusing on subsets of XPath 1.0. We then give results on the expressiveness of XPath and its fragments compared to other formalisms for querying trees, algorithms, and complexity bounds for evaluation of XPath queries, as well as static analysis of XPath queries.

DBToaster: higher-order delta processing for dynamic, frequently fresh views

Abstract Applications ranging from algorithmic trading to scientific data analysis require realti... more Abstract Applications ranging from algorithmic trading to scientific data analysis require realtime analytics based on views over databases that change at very high rates. Such views have to be kept fresh at low maintenance cost and latencies. At the same time, these views have to support classical SQL, rather than window semantics, to enable applications that combine current with aged or historical data.

On Information Integration in Large Scientific Collaborations

Abstract We discuss the requirements for information integration in large scientific collaboratio... more Abstract We discuss the requirements for information integration in large scientific collaborations and arrive at the conclusion that an architecture is needed that follows the declarative paradigm for reasoning completeness, maintainability and reuse of previously encoded knowledge but does not take the classical approach of integrating all sources against a single common “global” information model.

Querying the web reconsidered: Design principles for versatile web query languages

Abstract A decade of experience with research proposals as well as standardized query languages f... more

Fast and simple relational processing of uncertain data

Abstract This paper introduces U-relations, a succinct and purely relational representation syste... more Abstract This paper introduces U-relations, a succinct and purely relational representation system for uncertain databases. U-relations support attribute-level uncertainty using vertical partitioning. If we consider positive relational algebra extended by an operation for computing possible answers, a query on the logical level can be translated into, and evaluated as, a single relational algebra query on the U-relational representation.

The dlv system: Model generator and application frontends

During the last years, much research has been done concerning semantics and complexity of Disjunc... more During the last years, much research has been done concerning semantics and complexity of Disjunctive Deductive Databases (DDDBs). While DDDBs | function-free disjunctive logic programs with negation in rule bodies allowed | are now generally considered a powerful tool for common-sense reasoning and knowledge representation, there has been a shortage of actual (let alone e cient) implementations ( ST94, ADN97]). This paper presents a brief overview of the architecture of the dlv (datalog with disjunction) system system currently developed at TU Wien in the FWF project P11580-MAT \A Query System for Disjunctive Deductive Databases", especially focusing on the Model Generator { the \heart" of the dlv system { and the integrated frontends for diagnostic reasoning and SQL3.

MayBMS: Managing incomplete information with probabilistic world-set decompositions

Abstract Managing incomplete information is important in many real world applications. In this de... more

MayBMS: a probabilistic database management system

Abstract MayBMS is a state-of-the-art probabilistic database management system which leverages th... more Abstract MayBMS is a state-of-the-art probabilistic database management system which leverages the strengths of previous database research for achieving scalability. As a proof of concept for its ease of use, we have built on top of MayBMS a Web-based application that offers NBA-related information based on what-if analysis of team dynamics using data available at www. nba. com.

Monadic queries over tree-structured data

Abstract Monadic query languages over trees currently receive considerable interest in the databa... more

Processing queries on tree-structured data efficiently

Abstract This is a survey of algorithms, complexity results, and general solution techniques for ... more Abstract This is a survey of algorithms, complexity results, and general solution techniques for efficiently processing queries on tree-structured data. I focus on query languages that compute nodes or tuples of nodes—conjunctive queries, first-order queries, datalog, and XPath. I also point out a number of connections among previous results that have not been observed before.

Cooperative update exchange in the Youtopia system

Abstract Youtopia is a platform for collaborative management and integration of relational data. ... more Abstract Youtopia is a platform for collaborative management and integration of relational data. At the heart of Youtopia is an update exchange abstraction: changes to the data propagate through the system to satisfy user-specified mappings. We present a novel change propagation model that combines a deterministic chase with human intervention. The process is fundamentally cooperative and gives users significant control over how mappings are repaired.

Approximating predicates and expressive queries on probabilistic databases

Abstract We study complexity and approximation of queries in an expressive query language for pro... more Abstract We study complexity and approximation of queries in an expressive query language for probabilistic databases. The language studied supports the compositional use of confidence computation. It allows for a wide range of new use cases, such as the computation of conditional probabilities and of selections based on predicates that involve marginal and conditional probabilities. These features have important applications in areas such as data cleaning and the processing of sensor data.

Optimizing Queries Using a Meta-level Database

Abstract: Graph simulation (using graph schemata or data guides) has been successfully proposed a... more Abstract: Graph simulation (using graph schemata or data guides) has been successfully proposed as a technique for adding structure to semistructured data. Design patterns for description (such as meta-classes and homomorphisms between schema layers), which are prominent in the object-oriented programming community, constitute a generalization of this graph simulation approach.

Sparse projections onto the simplex

Abstract: The past decade has seen the rise of $\ ell_1 $-relaxation methods to promote sparsity ... more Abstract: The past decade has seen the rise of $\ ell_1 $-relaxation methods to promote sparsity for better interpretability and generalization of learning results. However, there are several important learning applications, such as Markowitz portolio selection and sparse mixture density estimation, that feature simplex constraints, which disallow the application of the standard $\ ell_1 $-penalty. In this setting, we show how to efficiently obtain sparse projections onto the positive and general simplex with sparsity constraints.