Suppose D
is a textual document, and
K = < k1, ..., kN >
represents a set of terms contained in the document. For instance:
D = "What a wonderful day, isn't it?"
K = <"wonderful","day">
My objective is to see if document D
talks about all the words in K
as a whole. For instance:
D = "The Ebola in Africa is spreading at high speed"
K = <"Ebola","Africa">
is a case in which D
is strongly related to K
, while:
D = "NEWS 1: Ebola is a dangerous disease that is causing thousands of deaths. Many governments are taking precautions to prevent its spread. NEWS 2: population in Africa is increasing."
K = <"Ebola","Africa">
is a case in which D
is not related to K
, since "Ebola" and "Africa" are mentioned in different points of the document, in separated sentences, and not related.
How can I synthesize this concept of "relatedness" of D
to K
? Is there some technique in the state of the art which can be exploited?
Thanks.