RDF provides a basic way to represent data for the Semantic Web. We have been experimenting with ... more RDF provides a basic way to represent data for the Semantic Web. We have been experimenting with the query paradigm for working with RDF data in semantic web applications. Query of RDF data provides a declarative access mechanism that is suitable for application usage and remote access. We describe work on a conceptual model for querying RDF data that refines ideas first presented in at the W3C workshop on Query Languages [14] and the design of one possible syntax, derived from [7], that is suitable for application programmers. Further, we present experience gained in three implementations of the query language.
rdf, parallel processing This document describes the hardware and software architecture of a para... more rdf, parallel processing This document describes the hardware and software architecture of a parallel processing framework for RDF data. The aim of the framework is to simplify the implementation of parallel programs and to support execution over cluster of off-the-shelf machines. From a scripting language, logical execution plans are constructed and then compiled into physical execution plans which are scheduled for execution in parallel over a cluster. At runtime, processing elements simply exchange data as streams of tuples. We are not pioneers in this field; however, not many of the existing systems support parallel RDF data processing.
All in-text references underlined in blue are linked to publications on ResearchGate, letting you... more All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
In this paper we describe the architecture of ARQo, a rst approach for SPARQL static query optimi... more In this paper we describe the architecture of ARQo, a rst approach for SPARQL static query optimization in ARQ. Speci cally, we focus on static optimization of BasicGraphPattern (BGP) for in-memory models. Static query optimization is intended as a query rewriting process where the set of triple patterns de ned for a BGP are rewritten according to a speci c order. We propose a rewriting process according to the estimated execution cost of joined triple patterns in increasing order. Speci cally, the estimated execution cost is a function of multiple parameters such as the estimated selectivity of joined triple patterns, the availability of indexes or precalculated result sets.
SIMILE is a joint project between MIT Libraries, MIT Computer Science and Artificial Intelligence... more SIMILE is a joint project between MIT Libraries, MIT Computer Science and Artificial Intelligence Laboratory (CSAIL), HP Labs and the World Wide Web Consortium (W3C). It is investigating the application of Semantic Web tools, such as the Resource Description Framework (RDF), to the problem of dealing with heterogeneous metadata. This report describes how XML and RDF tools are used to perform data conversion, extraction and record linkage on some sample datasets featuring visual images (ARTstor) and learning objects (OpenCourseWare) in the first SIMILE proof of concept demo.
distributed objects; classes; inheritance Atlas is a prototype of an object infrastructure. It is... more distributed objects; classes; inheritance Atlas is a prototype of an object infrastructure. It is designed to support hundreds of millions of objects distributed in a heterogeneous wide-area network of millions of computers. Atlas supports much finer-granularity objects than does a traditional client/server networking model. An Atlas application will typically consist of a large collection of objects that use method invocation as their primary means of communication.
This document describes SPARQL/Update (nicknamed "SPARUL"), an update language for RDF ... more This document describes SPARQL/Update (nicknamed "SPARUL"), an update language for RDF graphs. It uses a syntax derived form SPARQL. Update operations are performed on a collection of graphs in a Graph Store. Operations are provided to change existing RDF graphs as well as create and remove graphs with the Graph Store. This document does not discuss protocol issues. Status of This Document Published for discussion. Table of
This paper describes the design of Clustered TDB, a clustered triple store designed to store and ... more This paper describes the design of Clustered TDB, a clustered triple store designed to store and query very large quantities of Resource Description Framework (RDF) data. It presents an evaluation of an initial prototype, showing that Clustered TDB offers excellent scaling characteristics with respect to load times and query throughput. Design decisions are justified in the context of a literature review on Database Management System (DBMS) and RDF store clustering, and it is shown that many techniques created during the course of DBMS research are applicable to the problem of storing RDF data.
SIMILE is a joint project between MIT Libraries, MIT Computer Science and Artificial Intelligence... more SIMILE is a joint project between MIT Libraries, MIT Computer Science and Artificial Intelligence Laboratory (CSAIL), HP Labs and the World Wide Web Consortium (W3C). It is investigating the application of Semantic Web tools, such as the Resource Description Framework (RDF), to the problem of dealing with heterogeneous metadata. This report describes how XML and RDF tools are used to perform data conversion, extraction and record linkage on some sample datasets featuring visual images (ARTstor) and learning objects (OpenCourseWare) in the first SIMILE proof of concept demo.
The storage and query of RDF is a challenging problem: the graph structure of RDF combined with u... more The storage and query of RDF is a challenging problem: the graph structure of RDF combined with unpredictable query patterns means that even simple queries generate a lot of random access, inhibiting the performance of disk-backed stores. RDF store performance is thus typically poor compared to traditional RDBMSs, inhibiting the use of RDF in large scale, low latency applications. As available memory on mid range servers scales into tens of gigabytes, in-memory DBMSs are an attractive alternative for managing a wide variety of datasets. We argue that RDF would benefit from such in-memory storage, justifying this with a detailed analysis of the popular Jena Memory Model, aimed at evaluating and improving the Memory Model's capacity to work with large datasets for low latency applications. From this analysis we conclude that while in-memory storage offers promising performance, Jena's data structures must be redesigned to support reduced memory overhead while maintaining perfo...
This paper describes some initial work on a NetAPI for accessing and updating RDF data over the w... more This paper describes some initial work on a NetAPI for accessing and updating RDF data over the web. The NetAPI includes actions for conditional extraction or update of RDF data, actions for model upload and download and also the ability to enquire about the capabilities of a hosting server. An initial experimental system is described which partially implements these ideas within the Jena toolkit.
RDF provides a basic way to represent data for the Semantic Web. We have been experimenting with ... more RDF provides a basic way to represent data for the Semantic Web. We have been experimenting with the query paradigm for working with RDF data in semantic web applications. Query of RDF data provides a declarative access mechanism that is suitable for application usage and remote access. We describe work on a conceptual model for querying RDF data that refines ideas first presented in at the W3C workshop on Query Languages [14] and the design of one possible syntax, derived from [7], that is suitable for application programmers. Further, we present experience gained in three implementations of the query language.
rdf, parallel processing This document describes the hardware and software architecture of a para... more rdf, parallel processing This document describes the hardware and software architecture of a parallel processing framework for RDF data. The aim of the framework is to simplify the implementation of parallel programs and to support execution over cluster of off-the-shelf machines. From a scripting language, logical execution plans are constructed and then compiled into physical execution plans which are scheduled for execution in parallel over a cluster. At runtime, processing elements simply exchange data as streams of tuples. We are not pioneers in this field; however, not many of the existing systems support parallel RDF data processing.
All in-text references underlined in blue are linked to publications on ResearchGate, letting you... more All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
In this paper we describe the architecture of ARQo, a rst approach for SPARQL static query optimi... more In this paper we describe the architecture of ARQo, a rst approach for SPARQL static query optimization in ARQ. Speci cally, we focus on static optimization of BasicGraphPattern (BGP) for in-memory models. Static query optimization is intended as a query rewriting process where the set of triple patterns de ned for a BGP are rewritten according to a speci c order. We propose a rewriting process according to the estimated execution cost of joined triple patterns in increasing order. Speci cally, the estimated execution cost is a function of multiple parameters such as the estimated selectivity of joined triple patterns, the availability of indexes or precalculated result sets.
SIMILE is a joint project between MIT Libraries, MIT Computer Science and Artificial Intelligence... more SIMILE is a joint project between MIT Libraries, MIT Computer Science and Artificial Intelligence Laboratory (CSAIL), HP Labs and the World Wide Web Consortium (W3C). It is investigating the application of Semantic Web tools, such as the Resource Description Framework (RDF), to the problem of dealing with heterogeneous metadata. This report describes how XML and RDF tools are used to perform data conversion, extraction and record linkage on some sample datasets featuring visual images (ARTstor) and learning objects (OpenCourseWare) in the first SIMILE proof of concept demo.
distributed objects; classes; inheritance Atlas is a prototype of an object infrastructure. It is... more distributed objects; classes; inheritance Atlas is a prototype of an object infrastructure. It is designed to support hundreds of millions of objects distributed in a heterogeneous wide-area network of millions of computers. Atlas supports much finer-granularity objects than does a traditional client/server networking model. An Atlas application will typically consist of a large collection of objects that use method invocation as their primary means of communication.
This document describes SPARQL/Update (nicknamed "SPARUL"), an update language for RDF ... more This document describes SPARQL/Update (nicknamed "SPARUL"), an update language for RDF graphs. It uses a syntax derived form SPARQL. Update operations are performed on a collection of graphs in a Graph Store. Operations are provided to change existing RDF graphs as well as create and remove graphs with the Graph Store. This document does not discuss protocol issues. Status of This Document Published for discussion. Table of
This paper describes the design of Clustered TDB, a clustered triple store designed to store and ... more This paper describes the design of Clustered TDB, a clustered triple store designed to store and query very large quantities of Resource Description Framework (RDF) data. It presents an evaluation of an initial prototype, showing that Clustered TDB offers excellent scaling characteristics with respect to load times and query throughput. Design decisions are justified in the context of a literature review on Database Management System (DBMS) and RDF store clustering, and it is shown that many techniques created during the course of DBMS research are applicable to the problem of storing RDF data.
SIMILE is a joint project between MIT Libraries, MIT Computer Science and Artificial Intelligence... more SIMILE is a joint project between MIT Libraries, MIT Computer Science and Artificial Intelligence Laboratory (CSAIL), HP Labs and the World Wide Web Consortium (W3C). It is investigating the application of Semantic Web tools, such as the Resource Description Framework (RDF), to the problem of dealing with heterogeneous metadata. This report describes how XML and RDF tools are used to perform data conversion, extraction and record linkage on some sample datasets featuring visual images (ARTstor) and learning objects (OpenCourseWare) in the first SIMILE proof of concept demo.
The storage and query of RDF is a challenging problem: the graph structure of RDF combined with u... more The storage and query of RDF is a challenging problem: the graph structure of RDF combined with unpredictable query patterns means that even simple queries generate a lot of random access, inhibiting the performance of disk-backed stores. RDF store performance is thus typically poor compared to traditional RDBMSs, inhibiting the use of RDF in large scale, low latency applications. As available memory on mid range servers scales into tens of gigabytes, in-memory DBMSs are an attractive alternative for managing a wide variety of datasets. We argue that RDF would benefit from such in-memory storage, justifying this with a detailed analysis of the popular Jena Memory Model, aimed at evaluating and improving the Memory Model's capacity to work with large datasets for low latency applications. From this analysis we conclude that while in-memory storage offers promising performance, Jena's data structures must be redesigned to support reduced memory overhead while maintaining perfo...
This paper describes some initial work on a NetAPI for accessing and updating RDF data over the w... more This paper describes some initial work on a NetAPI for accessing and updating RDF data over the web. The NetAPI includes actions for conditional extraction or update of RDF data, actions for model upload and download and also the ability to enquire about the capabilities of a hosting server. An initial experimental system is described which partially implements these ideas within the Jena toolkit.
Uploads
Papers by Andy Seaborne