Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $9.99/month after trial. Cancel anytime.

Linked Data: Structured data on the Web
Linked Data: Structured data on the Web
Linked Data: Structured data on the Web
Ebook496 pages5 hours

Linked Data: Structured data on the Web

Rating: 3.5 out of 5 stars

3.5/5

()

Read preview

About this ebook

Summary

Linked Data presents the Linked Data model in plain, jargon-free language to Web developers. Avoiding the overly academic terminology of the Semantic Web, this new book presents practical techniques, using everyday tools like JavaScript and Python.

About this Book

The current Web is mostly a collection of linked documents useful for human consumption. The evolving Web includes data collections that may be identified and linked so that they can be consumed by automated processes. The W3C approach to this is Linked Data and it is already used by Google, Facebook, IBM, Oracle, and government agencies worldwide.

Linked Data presents practical techniques for using Linked Data on the Web via familiar tools like JavaScript and Python. You'll work step-by-step through examples of increasing complexity as you explore foundational concepts such as HTTP URIs, the Resource Description Framework (RDF), and the SPARQL query language. Then you'll use various Linked Data document formats to create powerful Web applications and mashups.

Written to be immediately useful to Web developers, this book requires no previous exposure to Linked Data or Semantic Web technologies.

Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

What's Inside
  • Finding and consuming Linked Data
  • Using Linked Data in your applications
  • Building Linked Data applications using standard Web techniques

About the Authors

David Wood is co-chair of the W3C's RDF Working Group. Marsha Zaidman served as CS chair at University of Mary Washington. Luke Ruth is a Linked Data developer on the Callimachus Project. Michael Hausenblas led the Linked Data Research Centre.

Table of Contents
    PART 1 THE LINKED DATA WEB
  1. Introducing Linked Data
  2. RDF: the data model for Linked
  3. Consuming Linked Data
  4. PART 2 TAMING LINKED DATA
  5. Creating Linked Data with
  6. SPARQL—querying the Linked
  7. PART 3 LINKED DATA IN THE WILD
  8. Enhancing results from search
  9. RDF database fundamentals
  10. Datasets
  11. PART 4 PULLING IT ALL TOGETHER
  12. Callimachus: a Linked Data
  13. Publishing Linked Data—a recap
  14. The evolving Web
LanguageEnglish
PublisherManning
Release dateDec 30, 2013
ISBN9781638352167
Linked Data: Structured data on the Web
Author

Luke Ruth

Luke Ruth is a Linked Data developer supporting the Callimachus Project.

Related authors

Related to Linked Data

Related ebooks

Internet & Web For You

View More

Related articles

Reviews for Linked Data

Rating: 3.6666666666666665 out of 5 stars
3.5/5

3 ratings1 review

What did you think?

Tap to rate

Review must be at least 10 words

  • Rating: 4 out of 5 stars
    4/5
    My first book on the subject. Left me wanting for more.

Book preview

Linked Data - Luke Ruth

Copyright

For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact

      Special Sales Department

      Manning Publications Co.

      20 Baldwin Road

      PO Box 261

      Shelter Island, NY 11964

      Email:

[email protected]

©2014 by Manning Publications Co. All rights reserved.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher.

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps.

Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine.

ISBN 9781617290398

Printed in the United States of America

Brief Table of Contents

Copyright

Brief Table of Contents

Table of Contents

Foreword

Preface

Acknowledgments

About this Book

About the Cover Illustration

1. The Linked Data Web

Chapter 1. Introducing Linked Data

Chapter 2. RDF: the data model for Linked Data

Chapter 3. Consuming Linked Data

2. Taming Linked Data

Chapter 4. Creating Linked Data with FOAF

Chapter 5. SPARQL—querying the Linked Data Web

3. Linked Data in the wild

Chapter 6. Enhancing results from search engines

Chapter 7. RDF database fundamentals

Chapter 8. Datasets

4. Pulling it all together

Chapter 9. Callimachus: a Linked Data management system

Chapter 10. Publishing Linked Data—a recap

Chapter 11. The evolving Web

Appendix A. Development environments

Appendix B. SPARQL results formats

 Glossary

Index

List of Figures

List of Tables

List of Listings

Table of Contents

Copyright

Brief Table of Contents

Table of Contents

Foreword

Preface

Acknowledgments

About this Book

About the Cover Illustration

1. The Linked Data Web

Chapter 1. Introducing Linked Data

1.1. Linked Data defined

1.2. What Linked Data won’t do for you

1.3. Linked Data in action

1.3.1. Freeing data

1.3.2. Linked Data with Google rich snippets and Facebook likes

1.3.3. Linked Data to the rescue at the BBC

1.4. The Linked Data principles

1.4.1. Principle 1: Use URIs as names for things

1.4.2. Principle 2: Use HTTP URIs so people can look up those names

1.4.3. Principle 3: When someone looks up a URI, provide useful information

1.4.4. Principle 4: Include links to other URIs

1.5. The Linking Open Data project

1.6. Describing data

1.7. RDF: a data model for Linked Data

1.8. Anatomy of a Linked Data application

1.8.1. Accessing a facility’s Linked Data

1.8.2. Creating the user interface from Linked Data

1.9. Summary

Chapter 2. RDF: the data model for Linked Data

2.1. The Linked Data principles extend RDF

2.2. The RDF data model

2.2.1. Triples

2.2.2. Blank nodes

2.2.3. Classes

2.2.4. Typed literals

2.3. RDF vocabularies

2.3.1. Commonly used vocabularies

2.3.2. Making your own vocabularies

2.4. RDF formats for Linked Data

2.4.1. Turtle—human-readable RDF

2.4.2. RDF/XML—RDF for enterprises

2.4.3. RDFa—RDF in HTML

2.4.4. JSON-LD—RDF for JavaScript Developers

2.5. Issues related to web servers and published Linked Data

2.6. File types and web servers

2.6.1. When you can configure Apache

2.7. When you have limited control over Apache

2.8. Linked Data platforms

2.9. Summary

Chapter 3. Consuming Linked Data

3.1. Thinking like the Web

3.2. How to consume Linked Data

3.3. Tools for finding distributed Linked Data

3.3.1. Sindice

3.3.2. SameAs.org

3.3.3. Data Hub

3.4. Aggregating Linked Data

3.4.1. Aggregating some Linked Data from known datasets

3.4.2. Getting Linked Data and RDF from web pages using browser plug-ins

3.5. Crawling the Linked Data Web and aggregating data

3.5.1. Using Python to crawl the Linked Data Web

3.5.2. Creating HTML output from your aggregated RDF

3.6. Summary

2. Taming Linked Data

Chapter 4. Creating Linked Data with FOAF

4.1. Creating a personal FOAF profile

4.1.1. Introducing the FOAF vocabulary

4.1.2. Method I: manual creation of a basic FOAF profile

4.1.3. Enhancing a basic FOAF profile

4.1.4. Method II: automated generation of a FOAF profile

4.2. Adding more content to a FOAF profile

4.3. Publishing your FOAF profile

4.4. Visualization of a FOAF profile

4.5. Application: linking RDF documents using a custom vocabulary

4.5.1. Creating a wish list vocabulary

4.5.2. Creating, publishing, and linking the wish list document

4.5.3. Adding wish list items to our wish list document

4.5.4. Explanation of our bookmarklet tool

4.6. Summary

Chapter 5. SPARQL—querying the Linked Data Web

5.1. An overview of a typical SPARQL query

5.2. Querying flat RDF files with SPARQL

5.2.1. Querying a single RDF data file

5.2.2. Querying multiple RDF files

5.2.3. Querying an RDF file on the Web

5.3. Querying SPARQL endpoints

5.4. Types of SPARQL queries

5.4.1. The SELECT query

5.4.2. The ASK query

5.4.3. The DESCRIBE query

5.4.4. The CONSTRUCT query

5.4.5. SPARQL 1.1 Update

5.5. SPARQL result formats (XML, JSON)

5.6. Creating web pages from SPARQL queries

5.6.1. Creating the SPARQL query

5.6.2. Creating the HTML page

5.6.3. Creating the JavaScript for the table

5.6.4. Creating JavaScript for the map

5.7. Summary

3. Linked Data in the wild

Chapter 6. Enhancing results from search engines

6.1. Enhancing HTML by embedding RDFa

6.1.1. RDFa markup using FOAF vocabulary

6.1.2. Using the HTML span attribute with RDFa

6.1.3. Extracting Linked Data from a FOAF-enhanced HTML document

6.2. Embedding RDFa using the GoodRelations vocabulary

6.2.1. An overview of the GoodRelations vocabulary

6.2.2. Enhancing HTML with RDFa using GoodRelations

6.2.3. A closer look at selections of RDFa GoodRelations

6.2.4. Extracting Linked Data from GoodRelations-enhanced HTML document

6.3. Embedding RDFa using the schema.org vocabulary

6.3.1. An overview of schema.org

6.3.2. Enhancing HTML with RDFa Lite using schema.org

6.3.3. A closer look at selections of RDFa Lite using schema.org

6.3.4. Extracting Linked Data from a schema.org enhanced HTML document

6.4. How do you choose between using schema.org or GoodRelations?

6.5. Extracting RDFa from HTML and applying SPARQL

6.6. Summary

Chapter 7. RDF database fundamentals

7.1. Classifying RDF databases

7.1.1. Selecting an RDF database systems

7.1.2. RDF databases versus RDBMS

7.1.3. Benefits of RDF database systems

7.2. Transforming spreadsheet data to RDF

7.2.1. A basic RDF conversion of MS Excel

7.2.2. Transforming MS Excel to Linked Data

7.2.3. Finding RDF converter tools

7.3. Application: collecting Linked Data in an RDF database

7.3.1. Outlining the process

7.3.2. Using Python to aggregate our data sources

7.3.3. Understanding the output

7.4. Summary

Chapter 8. Datasets

8.1. Description of a Project

8.1.1. Creating a DOAP profile

8.1.2. Using the DOAP vocabulary

8.2. Documenting your datasets using VoID

8.2.1. The Vocabulary of Interlinked Datasets

8.2.2. Preparing a VoID file

8.3. Sitemaps

8.3.1. Non-semantic sitemaps

8.3.2. Semantic sitemaps

8.3.3. Enabling discovery of your site

8.4. Linking to other people’s data

8.5. Examples of using owl:sameAs to interlink datasets

8.6. Joining Data Hub

8.7. Requesting outgoing links from DBpedia to your dataset

8.8. Summary

4. Pulling it all together

Chapter 9. Callimachus: a Linked Data management system

9.1. Getting started with Callimachus

9.2. Creating web pages using RDF classes

9.2.1. Adding data to Callimachus

9.2.2. Telling Callimachus about your OWL class

9.2.3. Associating a Callimachus view template to your class

9.3. Creating and editing class instances

9.3.1. Creating a new note

9.3.2. Creating a view template for a note

9.3.3. Creating an edit template for notes

9.4. Application: creating a web page from multiple data sources

9.4.1. Making and querying Linked Data from NOAA and EPA

9.4.2. Creating a web page to contain the application

9.4.3. Creating JavaScript to retrieve and display Linked Data

9.4.4. Bringing it all together

9.5. Summary

Chapter 10. Publishing Linked Data—a recap

10.1. Preparing your data

10.2. Minting URIs

10.3. Selecting vocabularies

10.4. Customizing vocabulary

10.5. Interlinking your data to other datasets

10.6. Publishing your data

10.7. Summary

Chapter 11. The evolving Web

11.1. The relationship between Linked Data and the Semantic Web

11.1.1. Demonstrated successes

11.2. What’s coming

11.2.1. Google extended rich snippets

11.2.2. Digital accountability and transparency legislation

11.2.3. Impact of advertising

11.2.4. Enhanced searches

11.2.5. Participation by the big guys

11.3. Conclusion

Appendix A. Development environments

A.1. cURL

A.2. Python

A.3. ARQ

A.4. Fuseki

A.5. Callimachus

Appendix B. SPARQL results formats

B.1. SPARQL XML results format

B.2. SPARQL JSON results format

B.3. SPARQL CSV and TSV results format

 Glossary

Index

List of Figures

List of Tables

List of Listings

Foreword

Linked Data: Structured data on the Web the book is just what Linked Data the technology has needed. It is a friendly introduction to the use and publication of structured data on the World Wide Web.

Linked Data was part of my initial vision for the Web and is an important part of the Web’s future. The Web took off as a web of hyperlinked documents which were exciting to read, but which could not effectively be used as data.

And, yes, in fact, much of the Web is data-driven, and the data has been hidden on files inside the server. In slides from my wrap-up talk at the very first WWW conference in 1994, I pointed out that while documents talk about people and things, such as a title deed saying who owns a house, the system was not capturing the data—the actual ownership fact—in a way that could be processed. As the Web evolved, and became more driven by data, there has been frustration that changing, hidden data is not exposed to the reader. Linked Data standards allow you to publish data in a way that can be read by people and processed by machines so that previously hidden flows of data become evident.

Linked Data may not be as exciting as a hypertext Web to read, but it is more exciting in terms of making everything work more effectively, from business to scientific research. Machines can read, follow, and combine Linked Data much more effectively than they can perform those actions using other forms of data currently on the Web.

The role of machines has previously been subservient to the role of people in the technology used to allow people to communicate. Now machines are beginning to become active participants in the communication. Linked Data allows machines to become more useful partners in our daily lives.

Linked Data has come of age in the last couple of years. In the last two years we have seen Google announce its Knowledge Graph and adopt the JSON-LD serialization format for Gmail, and produce a large set of terms for general use at schema.org; IBM announce that the DB2 database will become a Linked Data server; and Facebook expose Linked Data via its Graph API. Other large companies and government organizations have followed suit. We have needed a book like this one to introduce Linked Data development to a new and wider group of programmers. Linked Data will provide you with the questions to ask, even if it doesn’t answer them all. It is a great place to begin your study and kick-start your development.

I have known Dave Wood for just about a decade. We met when he started his work with the World Wide Web Consortium. We later worked on a Web research project together. Dave has worked tirelessly to develop Semantic Web and Linked Data frameworks since the late 1990s. As a developer, he is well-placed to show others how it is done.

The building blocks of Linked Data are not particularly new. The original proposal for the World Wide Web that I wrote in 1989 for my bosses at CERN included hyperlinks with semantics. The proposal read, in part, The system we need is like a diagram of circles and arrows, where circles and arrows can stand for anything. In fact, the Enquire program I had written in 1980 captured the relationships between things in a graph. That was the vision. Now Linked Data is delivering on this vision, by adding meaning that computers can process.

As we all know, in the basic hypertext Web, the arrows we ended up with all stood for the same thing: There is some interesting information over here! Linked Data extends the document Web by allowing arrows to stand for anything we can name with a URI. Hyperlinks gain the semantics they need, and, in the process become much more useful.

The Web of hypertext-linked documents is complemented by the very powerful Linked Web of Data. Why linked? Well, think of how the value of a Web page is very much a function of what it links to, as well as the inherent value of the information within the Web page. So it is—in a way even more so—also in the Semantic Web of Linked Data. The data itself is valuable, but the links to other data make it much more so.

I believe that the Web should evolve to serve all of us, regardless of our nationality, language, economic motivation, or interests. Linked Data is just one part of that evolution. It is not the end—it is just another part of the beginning. There is still plenty to do, so come join us in building the next generation of the Web!

TIM BERNERS-LEE

DIRECTOR OF THE WORLD WIDE WEB CONSORTIUM (W3C)

3COM FOUNDERS PROFESSOR OF ENGINEERING, MASSACHUSETTS INSTITUTE OF TECHNOLOGY

PROFESSOR IN THE ELECTRONICS AND COMPUTER SCIENCE DEPARTMENT, UNIVERSITY OF SOUTHAMPTON UK

Preface

We love the Web and we love the way it’s evolving from the rather simple web of linked documents of the early 1990s into the framework for the world’s information. Representing data on the Web is an obvious, but slightly harder, next step.

We each came to the Web in our own ways but came to Linked Data nearly together. David found the Web as a programmer and later as an entrepreneur, Marsha as an educator, and Luke as a student. Marsha and David are old enough to have started computing with punch cards and paper tape. The Web was a very welcome degree of abstraction from ones and zeros.

David was introduced to the Web at Digital Equipment Corporation’s fabled Western Research Lab in California in 1993. It was an eye-opener. One of the first large websites showed photos of thousands of pieces of artwork held by the Vatican. Another showed a list of projects that Digital researchers were working on and linked to each of their own individual web servers for detailed documents. David was hooked. Tellingly, it was the project website that he found most interesting. If only you could link into databases and spreadsheets the way you could link to documents.

Marsha also found the Web in the early days, when Gopher was the primary search tool and Web browsers worked in a terminal, and she kept up to date with its rapid changes in order to teach new generations of computer scientists. Her career has lasted long enough for her to see the incredible changes wrought by the invention of spreadsheets and databases on decision making, and this fostered an interest in moving data to the Web.

Marsha gave David the chance to teach at the University of Mary Washington just as the Linking Open Data project was starting. Luke took the first class offered to U.S. undergraduates on Linked Data in 2011, followed by an independent study and an internship, all with David. He was eventually hired by David to work on Linked Data projects.

Luke and David contribute to the Callimachus Project, an open source Linked Data platform described in this book. We’ve used it to build applications for a variety of organizations, from U.S. government agencies and pharmaceutical companies to publishers and health-care companies. Each of those projects is based on the creation, manipulation, and use of Linked Data.

We decided to write a Linked Data book for Web developers because there simply wasn’t one. We all had to learn Linked Data from the specifications or by readying academic papers. There are some other books on Linked Data (David edited two of them), but none are aimed specifically at developers. We thought that our combination of real-world development experience and experience teaching technology would result in a useful book. We hope you agree.

It’s our privilege to work with a loosely affiliated international group of people working to bring data to the Web. We hope that you’ll read this book and then join us. We can’t wait to see what the Web will become next.

Acknowledgments

We would like to extend our gratitude to the original members of the Linking Open Data project, many of whom are quoted in this book. We would like to thank Michael Stephens, Jeff Bleiel, Ozren Harlovic, Maureen Spencer, Mary Piergies, Linda Recktenwald, Elizabeth Martin, and Janet Vail, and the rest of the team at Manning Publications for working so hard to make this book a success.

We also owe thanks to the following reviewers who read and commented on our book through its many iterations and multiple review phases: Alain Buferne, Artur Nowak, Craig Taverner, Cristofer Weber, Curt Tilmes, Daniel Ayers, Gary Ewan Park, Glenn McDonald, Innes Fisher, Luka Raljević, M. Edward Borasky, Michael Brunnbauer, Michael Pendleton, Michael Piscatello, Mike Westaway, Owen Stephens, Paulo Schreiner, Philip Poots, Robert Crowther, Ron Sher, Thomas Baker, Thomas Gängler, and Thomas Horton.

Special thanks to Zachary Whitley, our technical proofreader, for his careful review of the final manuscript shortly before it went into production, and to Tim Berners-Lee for contributing the foreword.

The book was greatly improved by those who contributed to the Author Online Forum, the Public LOD mailing list, and the W3C RDF Working Group. Sincere thanks to the readers who participated in the Manning Early Access Program (MEAP) and left feedback in the Author Online forum. Their comments had a strong impact on the quality of the final manuscript. Lastly, we would like to thank the organizers of the Cambridge, New York City, Washington D.C., Northern Virginia, and Central Maryland SemWeb meet-ups for letting us make presentations on the book.

Dave would like to thank Bernadette, who is always there for him when he starts some silly project, as well as his coauthors for making the creation of this book much less of a silly project.

Marsha would like to extend her gratitude to her husband, Steven, who believed in her and encouraged her to pursue this new venture. A special thanks to her coauthor, David, who solicited her participation and had faith that she could extend her previous teaching experiences into written communications. Thanks to both Luke and David for making writing this book a rewarding experience.

Luke would like to thank Dave and Marsha for including him in this process and teaching him so much about technology—and about the world. He would also like to thank his parents, Rick and Tania, for instilling in him the importance of education and trying new things, and his wife Laura for her constant support.

About this Book

Linked Data is a set of techniques to represent and connect structured data on the Web. This book shows you how to access, create, and use Linked Data. Linked Data has one amazing property: it can be easily combined with other Linked Data to form new knowledge.

Linked Data makes the World Wide Web into a global database that we call the Web of Data. Developers can query Linked Data using a query language called SPARQL from multiple sources at once and combine those results dynamically, something difficult or impossible to do with traditional data-management technologies. The examples in this book are intentionally drawn from public sources, but the techniques illustrated can just as easily be used with private data. You may be unfamiliar with some of the resources that we use, but they’re readily accessible on the Web, and we encourage you to check them out as you encounter them. We apologize in advance for any inconsistencies between the screen shots and URLs referenced in the text and the actual content when you visit those sites on the Web. The Web is a rapidly changing entity, and no printed matter can absolutely represent that. We do promise that all the screen shots and URLs were correct as we entered production.

The techniques of Linked Data enable us to more easily share our knowledge with others. Literally anything can be described by Linked Data. Linked Data on the World Wide Web may be found, shared, and combined with other people’s data. Unlike traditional data-management systems, Linked Data frees information from proprietary containers so anyone can use it. As with any data, the consumer is responsible for evaluating its quality and utility. We use sources whose data we trust.

Intended audience

Linked Data: Structured data on the Web should be read by application developers who want to appreciate, consume, and publish Linked Data. This book assumes that you have a basic familiarity with fundamental web technologies such as HTML, URIs, and HTTP. We introduce you to Linked Data, place it in context, outline its principles, and show you how to use it by walking you through the process of finding, consuming, and publishing Linked Data on the Web. We illustrate this process with real-world applications of gradually increasing complexity.

Roadmap

This book has eleven chapters, divided into four parts, a glossary, and two appendixes.

Part 1 The Linked Data Web provides an introduction to the fundamentals of Linked Data, the Resource Description Framework (RDF) data model, and the common standard serializations used in representing this data. It guides the reader in identifying and consuming Linked Data on the Web.

Chapter 1, an introduction to Linked Data, places it in context, outlines its principles, and shows you how to use it by walking you through a Linked Data application.

Chapter 2 introduces the Resource Description Framework and its relationship to Linked Data. We describe the RDF data model along with the key concepts that you’re likely to use in your own Linked Data. In closing this chapter, we address common issues of file types and web servers and provide techniques for resolving those issues.

Chapter 3 acquaints you with the distributed nature

Enjoying the preview?
Page 1 of 1