Home Project 2017
Home Project 2017
Home Project 2017
Home Project
Linked Data Programming
This home project exercise can be completed at home, using the expertise and skill sets
acquired in previous lectures. All students are invited to present solutions in the exercises for
grade improvement. In order to participate, please proceed as follows:
Presenting and submitting a correct solution (on time) will result in a 0.3 grade improvement
in the final exam.
Involved Endpoints/Datasets
WikiData
Endpoint: https://query.wikidata.org/ (respectively:
https://query.wikidata.org/sparql )
Example request: https://query.wikidata.org/sparql ?query=select distinct
?Concept where {[] a ?Concept} LIMIT 100.
DBpedia:
Endpoint: http:// dbpedia.org/sparql or http://live.dbpedia.org/sparql
Example request: http://dbpedia.org/sparql?query=select distinct ?Concept
where {[] a ?Concept} LIMIT 100
Task Description
Build an application which is able to do the following.
Note: there is a mandatory and an optional part. The mandatory results are required to
achieve the grade improvement. The non-mandatory optional part provides some
suggestions to make your project more sophisticated and stand out against the others.
Mandatory
1. Retrieve all actors who died since 1950 according to DBpedia (advice: the UMBEL
types seem better populated in DBpedia, i.e. you will find more actors by looking for
<http://umbel.org/umbel/rc/Actor> rather than <http://dbpedia.org/ontology/Actor>
2. Fetch for each some general data (incl. name, birth date, birth place, movies starred
in, cause of deatch)
3. Retrieve from WikiData for each some additional information which is not present in
DBpedia: awards received, IMDB ID (if available), Discogs ID (if available and
assuming the actor is a musician as well)
4. Create ranking 1: rank your set of actors according to the amount of awards received
(descending order).
5. Create ranking 2: rank your set of actors according to the highest ratio of awards per
movie (descending order)
6. Lift your data into a representation of your choice (a basic HTML table, a diagram, a
visualization) and show the rankings including the fetched data as described above.
Note: for several of these steps, different approaches for implementation are feasible and
the schemas and representation offer a variety of choices (e.g. on how to identify the group
of actors or to match entities from DBpedia to Wikidata instances). It is your choice on how
to design and implement the solution. While some are more efficient than others, any
solution which appears reasonable and fulfills the (rather low) thresholds described below
will be accepted.
How different was the data you encountered? Note: you can do this very informally or
by providing statistics about inconsistencies, or even through advanced metrics, such
as the Spearmans correlation coefficient to compute the difference between your
rankings.
What is the most frequent cause of death according to DBpedia and according to
WikiData?
Which dataset/KB do you think is more complete at the instance-level?
Which one is more comprehensive with respect to its schema and properties?
Which one appears to be more up to date?
Which one would you prefer to use and why?
Presentation
For your project presentation, you have approximately 10 mins. A good presentation should
cover the following aspects:
- overall approach
- used schema terms from both datasets
- used SPARQL queries
- results (descriptive statistics about the data, for instance, how many actors per
knowledge base, min/max/avg awards per actor in each KB, how many actors are
musicians)
- demo (if available)
- discussion (see the optional part above)
Note: this is not a table of contents but a list of relevant items to cover.
In addition, while there are plenty ways in which the data could be further visualised,
processed or enriched, there will be a special reward for the most convincing project and
presentation. This decision will be made jointly by the lecturer and tutors and will take into
account criteria such as originality, soundness and overall quality.