Home Project 2017

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Summer Semester 2017 Dr.

Stefan Dietze (lecturer)


Knowledge Engineering & Semantic Web Besnik Fetahu (assistant)
Leibniz Universitt Hannover Ujwal Gadiraju (assistant)

Home Project
Linked Data Programming

This home project exercise can be completed at home, using the expertise and skill sets
acquired in previous lectures. All students are invited to present solutions in the exercises for
grade improvement. In order to participate, please proceed as follows:

Prepare your solution (task description below) at home


Indicate your intention to submit and present by 18.05.2017 via email to
[email protected] (no code or results required at this stage)
Present your solution (approach, results) to the class during the KESW17 lecture on
01.06.2017 (max. 10-15 mins presentation, presentation details below)
Provide the source code and generated output (e.g. HTML files) via email to
[email protected] by 02.06.2017

Presenting and submitting a correct solution (on time) will result in a 0.3 grade improvement
in the final exam.

Involved Endpoints/Datasets
WikiData
Endpoint: https://query.wikidata.org/ (respectively:
https://query.wikidata.org/sparql )
Example request: https://query.wikidata.org/sparql ?query=select distinct
?Concept where {[] a ?Concept} LIMIT 100.

DBpedia:
Endpoint: http:// dbpedia.org/sparql or http://live.dbpedia.org/sparql
Example request: http://dbpedia.org/sparql?query=select distinct ?Concept
where {[] a ?Concept} LIMIT 100

Task Description
Build an application which is able to do the following.

Note: there is a mandatory and an optional part. The mandatory results are required to
achieve the grade improvement. The non-mandatory optional part provides some
suggestions to make your project more sophisticated and stand out against the others.
Mandatory

1. Retrieve all actors who died since 1950 according to DBpedia (advice: the UMBEL
types seem better populated in DBpedia, i.e. you will find more actors by looking for
<http://umbel.org/umbel/rc/Actor> rather than <http://dbpedia.org/ontology/Actor>
2. Fetch for each some general data (incl. name, birth date, birth place, movies starred
in, cause of deatch)
3. Retrieve from WikiData for each some additional information which is not present in
DBpedia: awards received, IMDB ID (if available), Discogs ID (if available and
assuming the actor is a musician as well)
4. Create ranking 1: rank your set of actors according to the amount of awards received
(descending order).
5. Create ranking 2: rank your set of actors according to the highest ratio of awards per
movie (descending order)
6. Lift your data into a representation of your choice (a basic HTML table, a diagram, a
visualization) and show the rankings including the fetched data as described above.

Note: for several of these steps, different approaches for implementation are feasible and
the schemas and representation offer a variety of choices (e.g. on how to identify the group
of actors or to match entities from DBpedia to Wikidata instances). It is your choice on how
to design and implement the solution. While some are more efficient than others, any
solution which appears reasonable and fulfills the (rather low) thresholds described below
will be accepted.

Optional questions for discussion


While the data in both knowledge bases is likely to differ to a certain extent, please discuss
these differences, for instance, by discussing the following questions:

How different was the data you encountered? Note: you can do this very informally or
by providing statistics about inconsistencies, or even through advanced metrics, such
as the Spearmans correlation coefficient to compute the difference between your
rankings.
What is the most frequent cause of death according to DBpedia and according to
WikiData?
Which dataset/KB do you think is more complete at the instance-level?
Which one is more comprehensive with respect to its schema and properties?
Which one appears to be more up to date?
Which one would you prefer to use and why?

Presentation
For your project presentation, you have approximately 10 mins. A good presentation should
cover the following aspects:

- overall approach
- used schema terms from both datasets
- used SPARQL queries
- results (descriptive statistics about the data, for instance, how many actors per
knowledge base, min/max/avg awards per actor in each KB, how many actors are
musicians)
- demo (if available)
- discussion (see the optional part above)

Note: this is not a table of contents but a list of relevant items to cover.

Evaluation & Rewards


The presented solution should be able to automatically retrieve and produce the correct data
from WikiData and DBpedia for at least 50 actors (note: this is a very light threshold as the
amount of actors meeting this condition is way larger). All students who present solutions
which meet these criteria will result in a 0.3 grade improvement in the final exam.

In addition, while there are plenty ways in which the data could be further visualised,
processed or enriched, there will be a special reward for the most convincing project and
presentation. This decision will be made jointly by the lecturer and tutors and will take into
account criteria such as originality, soundness and overall quality.

You might also like