Semantic Knowledge Graph Notes
Semantic Knowledge Graph Notes
Semantic Knowledge Graph Notes
In the big data environment with the rapid development of the Internet, the
dependence on information search is becoming stronger and stronger. At present, full
text search based on keywords has been difficult to satisfy people's search needs. In this
case, an information retrieval method based on knowledge graphs is proposed.
The Semantic Knowledge Graph serves as a data scientist's toolkit, allowing you to
discover and compare any entities modeled within a corpus of data from any domain
Problem Statement
What problem we are trying to solve: Let’s say user wants to search as below:
“Machine learning research and development Portland, OR software engineer AND
Hadop, jee“
“machine learning” AND “research and development” AND “Portland, OR” AND “software
engineer” AND “hadop” AND “jee”
From the above, we can see that the best match comes from the semantically expanded
query which is what the semantic knowledge graph.
The below sections describe how semantic knowledge graph provides a greater
contextual data for a given user query
Solution Approach
The solution involves two concepts introduced as part of the solution approach
{"id":"01",age:15,"state":"AZ","hobbies":["soccer","painting","cycling"]},
{"id":"02",age:22,"state":"AZ","hobbies":["swimming","darts","cycling"]},
{"id":"03",age:27,"state":"AZ","hobbies":["swimming","frisbee","painting"]},
{"id":"04",age:33,"state":"AZ","hobbies":["darts"]},
{"id":"05",age:42,"state":"AZ","hobbies":["swimming","golf","painting"]},
{"id":"06",age:54,"state":"AZ","hobbies":["swimming","golf"]},
{"id":"07",age:67,"state":"AZ","hobbies":["golf","painting"]},
{"id":"08",age:71,"state":"AZ","hobbies":["painting"]},
{"id":"09",age:14,"state":"CO","hobbies":["soccer","frisbee","skiing","swimming","skatin
g"]}, {"id":"10",age:23,"state":"CO","hobbies":["skiing","darts","cycling","swimming"]},
{"id":"11",age:26,"state":"CO","hobbies":["skiing","golf"]},
{"id":"12",age:35,"state":"CO","hobbies":["golf","frisbee","painting","skiing"]},
{"id":"13",age:47,"state":"CO","hobbies":["skiing","darts","painting","skating"]},
{"id":"14",age:51,"state":"CO","hobbies":["skiing","golf"]},
{"id":"15",age:64,"state":"CO","hobbies":["skating","cycling"]},
{"id":"16",age:73,"state":"CO","hobbies":["painting"]}, ]'
Query explanation:
"facets":{
"count":16,
"hobby":{
"buckets":[{
"val":"golf",
"count":6, // <1>
"r1":{
"relatedness":0.01225,
"foreground_popularity":0.3125, // <2>
"background_popularity":0.375}, // <3>
"location":{
"buckets":[{
"val":"az",
"count":3,
"r2":{
"relatedness":0.00496, // <4>
"foreground_popularity":0.1875, // <6>
"background_popularity":0.5}}, // <7>
{
"val":"co",
"count":3,
"r2":{
"relatedness":-0.00496, // <5>
"foreground_popularity":0.125,
"background_popularity":0.5}}]}},
{
"val":"painting",
"count":8, // <1>
"r1":{
"relatedness":0.01097,
"foreground_popularity":0.375,
"background_popularity":0.5},
"location":{
"buckets":[{
...
<1> Even though `hobbies:golf` has a lower total facet `count` then
`hobbies:painting`, it has a higher `relatedness` score, indicating that relative to the
Background Set (the entire collection) Golf has a stronger correlation to our Foreground
Set (people age 35+) then Painting.
<4> The state of Arizona (AZ) has a _positive_ relatedness correlation with the
_nested_ Foreground Set (people ages 35+ who play Golf) compared to the Background
Set -- ie: "People in Arizona are statistically more likely to be '35+ year old Golfers' then
the country as a whole."
<5> The state of Colorado (CO) has a _negative_ correlation with the nested
Foreground Set -- ie: "People in Colorado are statistically less likely to be '35+ year old
Golfers' then the country as a whole."
<6> The number documents matching `age:[35 TO *]` _and_ `hobbies:golf` _and_
`state:AZ` is 18.75% of the total number of documents in the Background Set
Architecture
The below diagram shows the Apache Solr architecture:
The components of the semantic knowledge graph implemented in Apache solr shows
how the semantic relationships are processed and derived from the query.
Conclusions
The Semantic Knowledge Graph(SKG) has numerous applications like automatic ontology
building, identifying trending topics over time, predictive analytics on timeseries
data, root-cause analysis surfacing concepts related to failure scenarios from free text,
data cleansing, document summarization, semantic search interpretation and expansion
of queries, recommendation systems, and numerous other forms of anomaly detection.