Learning Semantic
Relations from Text
Preslav Nakov1 ,
Diarmuid Ó Séaghdha2 ,
Vivi Nastase3 ,
Stan Szpakowicz4
1
Qatar Computing Research Institute, HBKU
2
Vocal IQ
3
Fondazione Bruno Kessler
4
University of Ottawa
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Outline
1
Introduction
2
Semantic Relations
3
Features
4
Supervised Methods
5
Unsupervised Methods
6
Embeddings
7
Wrap-up
Learning Semantic Relations from Text
2 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Outline
1
Introduction
2
Semantic Relations
3
Features
4
Supervised Methods
5
Unsupervised Methods
6
Embeddings
7
Wrap-up
Learning Semantic Relations from Text
3 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Motivation
The connection is indispensable to the expression of
thought. Without the connection, we would not be able
to express any continuous thought, and we could only
list a succession of images and ideas isolated from
each other and without any link between them.
[Tesnière, 1959]
Learning Semantic Relations from Text
4 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
What Is It All About?
Opportunity and Curiosity find similar rocks on Mars.
Learning Semantic Relations from Text
5 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
What Is It All About?
explorer_of
Mars rover
is_a
is_a
Opportunity and Curiosity find similar rocks on Mars.
located_on
Learning Semantic Relations from Text
5 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
What Is It All About? (1)
Semantic relations
matter a lot
connect up entities in a text
together with entities make up a good chunk of the
meaning of that text
are not terribly hard to recognize
Learning Semantic Relations from Text
6 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
What Is It All About? (2)
Semantic relations between nominals
matter even more in practice
are the target for knowledge acquisition
are key to reaching the meaning of a text
their recognition is fairly feasible
Learning Semantic Relations from Text
6 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Historical Overview (1)
Capturing and describing world knowledge
Artistotle’s Organon
includes a treatise on Categories
objects in the natural world are put into categories called
τ ὰ λεγóµενα (ta legomena, things which are said)
organization based on the class inclusion relation
then, for 20 centuries:
other philosophers
some botanists, zoologists
in the 1970s: realization that a robust Artificial Intelligence
(AI) system needs the same kind of knowledge
capture and represent knowledge: machine-friendly
intersection with language: inevitable
Learning Semantic Relations from Text
7 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Historical Overview (2)
Indian linguistic tradition
Pān.ini’s As.t.ādhyāyı̄
rules describing the process of generating a Sanskrit
sentence from a semantic representation
semantics is conceptualized in terms of kārakas, semantic
relations between events and participants, similar to
semantic roles
covers noun-noun compounds comprehensively from the
perspective of word formation, but not semantics
later, commentators such as Kātyāyana and Patañjali:
compounding is only supported by the presence of a
semantic relation between entities
Learning Semantic Relations from Text
7 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Historical Overview (3)
Ferdinand de Saussure
Course in General Linguistics [de Saussure, 1959]
taught 1906-1911; published in 1916
Learning Semantic Relations from Text
7 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Historical Overview (4)
Ferdinand de Saussure
Course in General Linguistics: two types of relations which
“correspond to two different forms of mental activity, both
indispensable to the workings of language”
syntagmatic relations
hold in context
associative (paradigmatic) relations
come from accumulated experience
BUT no explicit list of relations was proposed
Learning Semantic Relations from Text
7 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Historical Overview (5)
Ferdinand de Saussure
Syntagmatic relations hold between two or more terms in a
sequence in praesentia, in a particular context: “words as
used in discourse, strung together one after the other,
enter into relations based on the linear character of
languages – words must be arranged consecutively in a
spoken sequence. Combinations based on sequentiality
may be called syntagmas.”
Learning Semantic Relations from Text
7 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Historical Overview (6)
Ferdinand de Saussure
Associative (paradigmatic) relations come from
accumulated experience and hold in absentia: “Outside the
context of discourse, words having something in common
are associated together in the memory. [. . . ] All these
words have something or other linking them. This kind of
connection is not based on linear sequence. It is a
connection in the brain. Such connections are part of that
accumulated store which is the form the language takes in
an individual’s brain.”
Learning Semantic Relations from Text
7 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Historical Overview (7)
Syntagmatic vs. paradigmatic relations
[Harris, 1987]: frequently occurring instances of
syntagmatic relations may become part of our memory,
thus becoming paradigmatic
[Gardin, 1965]: instances of paradigmatic relations are
derived from accumulated syntagmatic data
This reflects current thinking on relation extraction from
open texts.
Learning Semantic Relations from Text
7 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Historical Overview (8)
Predicate logic [Frege, 1879]
inherently relational formalism
e.g., the sentence “Google buys YouTube.” is represented as
buy(Google,
YouTube)
Learning Semantic Relations from Text
7 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Historical Overview (9)
Neo-Davidsonian logic representation
additional variables represent the event or relation
it can thus be explicitly modified and subject to
quantification
∃e InstanceOfBuying(e) ∧ agent(e, Google) ∧ patient(e, YouTube)
or perhaps
∃e InstanceOf(e, Buying) ∧ agent(e, Google) ∧ patient(e, YouTube)
existential graphs [Peirce, 1909]
Learning Semantic Relations from Text
7 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Historical Overview (10)
The dual nature of semantic relations
in logic: predicates
used in AI to support knowledge-based agents and
inference
in graphs: arcs connecting concepts
used in NLP to represent factual knowledge
thus, mostly binary relations
in ontologies
as the target in IE
...
Learning Semantic Relations from Text
7 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Historical Overview (11)
The rise of reasoning systems
[McCarthy, 1958]: logic-based reasoning, no language
early NLP systems with semantic knowledge
[Winograd, 1972]: interactive English dialogue system
[Charniak, 1972]: understanding children’s stories
conceptual shift from the “shallow” architecture of primitive
conversation systems such as ELIZA [Weizenbaum, 1966]
large-scale hand-crafted ontologies
Cyc
OpenMind Common Sense
MindPixel
FreeBase – truly large-scale
Learning Semantic Relations from Text
7 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Historical Overview (12)
At the cross-roads between knowledge and language
[Spärck-Jones, 1964]: lexical relations found in a dictionary
can be learned automatically from text
[Quillian, 1962]: semantic network
a graph in which meaning is modelled by labelled
associations between words
vertices are concepts onto which words in a text are mapped
connections – relations between such concepts
WordNet [Fellbaum, 1998]
155,000 words (nouns, verbs, adjectives, adverbs)
a dozen semantic relations, e.g., synonymy, antonymy,
hypernymy, meronymy
Learning Semantic Relations from Text
7 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Historical Overview (13)
Automating knowledge acquisition
learning ontological relations
is-a
[Hearst, 1992]
part-of
[Berland & Charniak, 1999]
bootstrapping [Patwardhan & Riloff, 2007; Ravichandran & Hovy, 2002]
open relation extraction
no pre-specified list/type of relations
learn patterns about how relations are expressed, e.g.,
POS [Fader&al., 2011]
paths in a syntactic tree [Ciaramita&al., 2005]
sequences of high-frequency words [Davidov & Rappoport, 2008]
hard to map to “canonical” relations
Learning Semantic Relations from Text
7 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Why Should We Care about Semantic Relations?
Relation learning/extraction can help
building knowledge repositories
text analysis
NLP applications
Information Extraction
Information Retrieval
Text Summarization
Machine Translation
Question Answering
Paraphrasing
Recognizing Textual Entailment
Thesaurus Construction
Semantic Network Construction
Word-Sense Disambiguation
Language Modelling
Learning Semantic Relations from Text
8 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Example Application: Information Retrieval
[Cafarella&al., 2006]
list all X
list all X
list all X
hull
list all X
list all X
such that X causes cancer
such that X is part of an automobile engine
such that X is material for making a submarine’s
such that X is a type of transportation
such that X is produced from cork trees
Learning Semantic Relations from Text
9 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Example Application: Statistical Machine Translation
[Nakov, 2008]
if the SMT system knows that
oil price hikes is translated to Portuguese as
aumento nos preços do petróleo
note: this is hard to get word-for-word!
if we further interpret/paraphrase oil price hikes as
hikes in oil prices
hikes in the prices of oil
...
then we can use the same fluent Portuguese translation for
the paraphrases
Learning Semantic Relations from Text
10 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Outline
1
Introduction
2
Semantic Relations
3
Features
4
Supervised Methods
5
Unsupervised Methods
6
Embeddings
7
Wrap-up
Learning Semantic Relations from Text
11 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Two Perspectives on Semantic Relations
explorer_of
Mars rover
is_a
is_a
Opportunity and Curiosity find similar rocks on Mars.
located_on
Relations between concepts
. . . arise from, and capture, knowledge about the world
Relations between nominals
. . . arise from, and capture, particular events/situations expressed in texts
Learning Semantic Relations from Text
12 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Two Perspectives on Semantic Relations
explorer_of
Mars rover
is_a
is_a
Opportunity and Curiosity find similar rocks on Mars.
located_on
Relations between concepts
. . . arise from, and capture, knowledge about the world
. . . can be found in texts!
Relations between nominals
. . . arise from, and capture, particular events/situations expressed in texts
. . . can be found using information from knowledge bases
Learning Semantic Relations from Text
12 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
[Casagrande & Hale, 1967]
Asked speakers of
an exotic language
to give definitions for
a given list of words,
then extracted 13
relations from these
definitions.
Relation
attributive
function
operational
exemplification
synonymy
provenience
circularity
contingency
spatial
comparison
class inclusion
antonymy
grading
Example
toad - small
ear - hearing
shirt - wear
circular - wheel
thousand - ten hundred
milk - cow
X is defined as X
lightning - rain
tongue - mouth
wolf - coyote
bee - insect
low - high
Monday - Sunday
Learning Semantic Relations from Text
13 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
[Chaffin & Hermann, 1984]
Asked humans to group instances of 31 semantic relations.
Found five coarser classes.
Relation
Example
constrasts
similars
class inclusion
night - day
car - auto
vehicle - car
part-whole
case relations
airplane - wing
– agent, instrument
Learning Semantic Relations from Text
13 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Semantic Relations in Noun Compounds (1)
Noun compounds (NCs)
Definition: sequences of two or more nouns that function
as a single noun, e.g.,
silkworm
olive oil
healthcare reform
plastic water bottle
colon cancer tumor suppressor protein
Learning Semantic Relations from Text
14 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Semantic Relations in Noun Compounds (2)
Properties of noun compounds
Encode implicit relations: hard to interpret
taxi driver is ‘a driver who drives a taxi’
embassy driver is ‘a driver who is employed by/drives for an
embassy’
embassy building is ‘a building which houses, or belongs to,
an embassy’
Abundant: cannot be ignored
cover 4% of the tokens in the Reuters corpus
Highly productive: cannot be listed in a dictionary
60% of the NCs in BNC occur just once
Learning Semantic Relations from Text
14 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Semantic Relations in Noun Compounds (3)
Noun compounds as a microcosm: representation issues
reflect those for general semantic relations
voluminous literature on their semantics
www.cl.cam.ac.uk/~do242/Resources/compound_bibliography.html
two complementary perspectives
linguistic: find the most comprehensive explanatory
representation
NLP: select the most useful representation for a particular
application
computationally tractable
giving informative output to downstream systems
Learning Semantic Relations from Text
14 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Semantic Relations in Noun Compounds (4)
Do the relations in noun compounds come from a small
closed inventory?
In other words, is there a (reasonably)
small set of relations which could cover
completely what occurs in texts in the
vicinity of (simple) noun phrases?
affirmative: most linguists
early descriptive work [Grimm, 1826; Jespersen, 1942; Noreen, 1904]
generative linguistics [Levi, 1978; Li, 1971; Warren, 1978]
negative: some linguists e.g., [Downing, 1977]
Learning Semantic Relations from Text
14 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
[Warren, 1978] (1)
Relations arising from a comprehensive study of the Brown corpus:
a four-level hierarchy of relations
six major semantic relations
Relation
Example
Possession
Location
Purpose
Activity-Actor
Resemblance
family estate
water polo
water bucket
crime syndicate
cherry bomb
Constitute
clay bird
Learning Semantic Relations from Text
15 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
[Warren, 1978] (2)
A four-level hierarchy of relations
L1:
Constitute
L2:
L2:
L2:
Source-Result
Result-Source
Copula
L3: Adjective-Like_Modifier
L3: Subsumptive
L3: Attributive
L4: Animate_Head (e.g., girl friend)
L4: Inanimate_Head (e.g., house boat)
Learning Semantic Relations from Text
15 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
[Levi, 1978] (1)
Relations (Recoverable Deletable Predicates) which underlie all
compositional non-nominalized compounds in English
RDP
CAUSE1
CAUSE2
HAVE1
HAVE2
MAKE1
MAKE2
USE
BE
IN
FOR
FROM
ABOUT
Example
tear gas
drug deaths
apple cake
lemon peel
silkworm
snowball
steam iron
soldier ant
field mouse
horse doctor
olive oil
price war
Role
object
subject
object
subject
object
subject
object
object
object
object
object
object
Traditional name
causative
causative
possessive/dative
possessive/dative
productive/composit.
productive/composit.
instrumental
essive/appositional
locative
purposive/benefactive
source/ablative
topic
Learning Semantic Relations from Text
15 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
[Levi, 1978] (2)
Nominalizations
Act
Product
Agent
Patient
Subjective
parental refusal
clerical errors
—
student inventions
Objective
dream analysis
musical critique
city planner
—
Multi-modifier
city land acquisition
student course ratings
—
—
Problem: spurious ambiguity
horse doctor is for (RDP)
horse healer is agent (nominalization)
Learning Semantic Relations from Text
15 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
[Vanderwende, 1994]
Relation
Subject
Object
Locative
Time
Possessive
Whole-Part
Part-Whole
Equative
Instrument
Purpose
Material
Causes
Caused-by
Question
Who/what?
Whom/what?
Where?
When?
Whose?
What is it part of?
What are its parts?
What kind of?
How?
What for?
Made of what?
What does it cause?
What causes it?
Example
press report
accident report
field mouse
night attack
family estate
duck foot
daisy chain
flounder fish
paraffin cooker
bird sanctuary
alligator shoe
disease germ
drug death
Learning Semantic Relations from Text
15 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Desiderata for Building a Relation Inventory
1
2
3
4
5
6
the inventory should have good coverage
relations should be disjoint, and should each describe a
coherent concept
the class distribution should not be overly skewed or sparse
the concepts underlying the relations should generalize to other
linguistic phenomena
the guidelines should make the annotation process as simple
as possible
the categories should provide useful semantic information
(adapted from [Ó Séaghdha, 2007])
Learning Semantic Relations from Text
15 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
[Ó Séaghdha, 2007]
BE (identity, substance-form, similarity)
HAVE (possession, condition-experiencer, property-object,
part-whole, group-member)
IN (spatially located object, spatially located event,
temporarily located object, temporarily located event)
ACTOR (participant-event, participant-participant)
INST (participant-event, participant-participant)
ABOUT (topic-object, topic-collection, focus-mental activity,
commodity-charge)
e.g., tax law is topic-object, crime investigation is focus-mental activity,
and they both are also ABOUT.
Learning Semantic Relations from Text
15 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
[Barker & Szpakowicz, 1998]
An inventory of 20 semantic relations.
Relation
Agent
Beneficiary
Cause
Container
Content
Destination
Equative
Instrument
Located
Location
Material
Object
Example
student protest
student price
exam anxiety
printer tray
paper tray
game bus
player coach
laser printer
home town
lab printer
water vapor
horse doctor
Relation
Possessor
Product
Property
Purpose
Result
Source
Time
Topic
Example
company car
automobile factory
blue car
concert hall
cold virus
north wind
morning class
safety standard
Learning Semantic Relations from Text
15 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
[Nastase & Szpakowicz, 2003]
A two-level hierarchy of 31 semantic relations
Causal (4 relations)
cause: flu virus,
effect: exam anxiety, . . .
Participant (12 relations)
Agent: student protest,
Instrument: laser printer, . . .
Quality (8 relations)
Manner: stylish writing,
Measure: expensive book, . . .
Spatial (4 relations)
Direction: outgoing mail,
Location: home town, . . .
Temporal (3 relations)
Frequency: daily experience,
Time_at: morning exercise, . . .
Learning Semantic Relations from Text
15 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
[Girju, 2005]
A list of 21 noun compound semantic relations: a subset of the
35 general semantic relations of [Moldovan&al.,2004].
Relation
Possession
Attribute-Holder
Agent
Temporal
Depiction-Depicted
Part-Whole
Is-a
Cause
Make/Produce
Instrument
Location/Space
Purpose
Source
Topic
Example
family estate
quality sound
crew investigation
night flight
image team
girl mouth
Dallas city
malaria mosquito
shoe factory
pump drainage
Texas university
migraine drug
olive oil
art museum
Relation
Manner
Means
Experiencer
Recipient
Measure
Theme
Result
Example
style performance
bus service
disease victim
worker fatalities
session day
car salesman
combustion gas
Learning Semantic Relations from Text
15 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
[Tratz & Hovy, 2010]
[Tratz&Hovy, 2010]
new inventory
43 relations in 10 categories
developed through an iterative crowd-sourcing
maximize agreement between annotators
Analysis: all previous inventories have commonalities
e.g., have categories for locative, possessive, purpose, etc.
cover essentially the same semantic space
BUT differ in the exact way of partitioning that space
Learning Semantic Relations from Text
15 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
[Rosario, 2001]: Biomedical Relations (1)
18 biomedical noun compound relations (initially 38).
Relation
Subtype
Activity/Physical_process
Produce_genetically
Cause
Characteristic
Defect
Person_Afflicted
Attribute_of_Clinical_Study
Procedure
Frequency/time_of
Measure_of
Instrument
...
Example
headaches migraine
virus reproduction
polyomavirus genome
heat shock
drug toxicity
hormone deficiency
AIDS patient
headache parameter
genotype diagnosis
influenza season
relief rate
laser irradiation
...
Learning Semantic Relations from Text
15 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
[Rosario, 2001]: Biomedical Relations (2)
18 biomedical noun compound relations (initially 38).
Relation
...
Object
Purpose
Topic
Location
Material
Defect_in_location
Example
...
bowel transplantation
headache drugs
headache questionnaire
brain artery
aloe gel
lung abscess
Learning Semantic Relations from Text
15 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
The Opposite View: No Small Set of Semantic
Relations
Much opposition to the previous work
[Zimmer, 1971]: so much variety of relations that it is
simpler to categorize the semantic relations that CANNOT
be encoded in compounds
[Downing, 1977]
plate length (“what your hair is when it drags in your food”)
“The existence of numerous novel compounds like these
guarantees the futility of any attempt to enumerate an
absolute and finite class of compounding relationships.”
Learning Semantic Relations from Text
16 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Noun Compounds: Using Lexical Paraphrases (1)
Lexical items instead of abstract relations
The hidden relation in a noun compound can be made explicit
in a paraphrase.
e.g., weather report
abstract
topic
lexical
report about the weather
report forecasting the weather
Learning Semantic Relations from Text
17 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Noun Compounds: Using Lexical Paraphrases (2)
Using prepositions: the idea
[Lauer, 1995] used just eight prepositions
of, for, in, at, on, from, with, about
olive oil is “oil from olives”
night flight is “flight at night”
odor spray is “spray for odors”
easy to extract from text or the Web [Lapata & Keller, 2004]
[Srikumar&Roth, 2013] 32 relations / 34 prepositions
good at boxing → activity
opened by Annie → agent
travel by road → journey
...
Learning Semantic Relations from Text
17 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Noun Compounds: Using Lexical Paraphrases (3)
Using prepositions: the issues
prepositions are polysemous, e.g., different of
school of music
theory of computation
bell of (the) church
unnecessary distinctions, e.g., in vs.
on
vs.
at
prayer in (the) morning
prayer at night
prayer on (a) feast day
some compounds cannot be paraphrased with
prepositions
woman driver
strange paraphrases
honey bee – is it “bee for honey”?
Learning Semantic Relations from Text
17 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Noun Compounds: Using Lexical Paraphrases (4)
Using paraphrasing verbs
[Nakov, 2008]: a relation is represented as a distribution
over verbs and prepositions which occur in texts
e.g., olive oil is “oil that is extracted from olives” or “oil that
is squeezed from olives”
rich representation, close to what Downing [1977]
demanded
allows comparisons, e.g., olive oil vs. sea salt
similar: both match the paraphrase “N1 is extracted from N2”
different: salt is not squeezed from the sea
Learning Semantic Relations from Text
17 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Noun Compounds: Using Lexical Paraphrases (5)
Abstract Relations vs. Prepositions vs. Verbs
Abstract relations [Nastase & Szpakowicz, 2003; Kim & Baldwin, 2005; Girju, 2007; Ó
Séaghdha & Copestake, 2007]
malaria mosquito: Cause
olive oil: Source
Prepositions [Lauer, 1995]
malaria mosquito: with
olive oil: from
Verbs [Finin, 1980; Vanderwende, 1994; Kim & Baldwin 2006; Butnariu & Veale 2008; Nakov & Hearst
2008]
malaria mosquito: carries, spreads, causes, transmits, brings, has
olive oil: comes from, is made from, is derived from
Learning Semantic Relations from Text
17 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Noun Compounds: Using Lexical Paraphrases (6)
Note 1 on paraphrasing verbs
Can paraphrase a noun compound
chocolate bar: be made of, contain, be composed of, taste like
Can also express an abstract relation
MAKE2 : be made of, be composed of, consist of, be manufactured
from
... but can also be NC-specific
orange juice: be squeezed from
bacon pizza: be topped with
chocolate bar: taste like
Learning Semantic Relations from Text
17 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Noun Compounds: Using Lexical Paraphrases (7)
Note 2 on paraphrasing verbs
Single verb
malaria mosquito: cause
olive oil: be extracted from
Multiple verbs
malaria mosquito: cause, carry, spread, transmit, bring, ...
olive oil: be extracted from, come from, be made from, ...
Distribution over verbs (SemEval-2010 Task 9)
malaria mosquito: carry (23), spread (16), cause (12), transmit (9),
bring (7), be infected with (3), infect with (3), give (2), ...
olive oil: come from (33), be made from (27), be derived from (10), be
made of (7), be pressed from (6), be extracted from (5), ...
Learning Semantic Relations from Text
17 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Noun Compounds: Using Lexical Paraphrases (8)
Free paraphrases at SemEval-2013 Task 4 [Hendrickx & al., 2013]
e.g., for onion tears
tears from onions
tears due to cutting onion
tears induced when cutting onions
tears that onions induce
tears that come from chopping onions
tears that sometimes flow when onions are chopped
tears that raw onions give you
...
Learning Semantic Relations from Text
17 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Relations between Concepts:
Semantic Relations in Ontologies
The easy ones:
is-a
part-of
The backbone of any ontology.
Learning Semantic Relations from Text
18 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Relations between Concepts:
Semantic Relations in Ontologies
The easy ones?
is-a
– class inclusion
TOBLERONE is-a CHOCOLATE – class membership
CHOCOLATE is-a FOOD
and also [Wierzbicka, 1984]
– taxonomic (is-a-kind-of)
ADORNMENT is-a DECORATION – functional
(is-used-as-a-kind-of)
...
CHICKEN is-a BIRD
part-of
Learning Semantic Relations from Text
18 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Relations between Concepts:
Semantic Relations in Ontologies
The easy ones?
is-a
part-of
[Winston & al., 1987]
Relation
component-integral object
member-collection
portion-mass
stuff-object
feature-activity
place-area
Example
pedal - bike
ship - fleet
slice - pie
steel - car
paying - shopping
Everglades - Florida
Learning Semantic Relations from Text
18 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Relations between Concepts:
Semantic Relations in Ontologies
The easy ones?
is-a
part-of
[Winston & al., 1987]
motivation: lack of transitivity
1
2
3
Simpson’s arm is part of Simpson(’s body).
Simpson is part of the Philosophy Department.
*Simpson’s arm is part of the Philosophy Department.
component-object
is incompatible with member-collection
Learning Semantic Relations from Text
18 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Relations in WordNet
Relation
Synonym
Antonym
Hypernym
Hyponym
Member-of holonym
Has-member meronym
Part-of holonym
Has-part meronym
Substance-of holonym
Has-substance meronym
Domain - TOPIC
Domain - USAGE
Domain member - TOPIC
Attribute
Derived form
Derived form
Example
day (Sense 2) / time
day (Sense 4) / night
berry (Sense 2) / fruit
fruit (Sense 1) / berry
Germany / NATO
Germany / Sorbian
Germany / Europe
Germany / Mannheim
wood (Sense 1) / lumber
lumber (Sense 1) / wood
line (Sense 7) / military
line (Sense 21) / channel
ship / porthole
speed (Sense 2) / fast
speed (Sense 2) / quick
speed (Sense 2) / accelerate
Learning Semantic Relations from Text
19 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Conclusions
No consensus on a comprehensive list of relations fit for all
purposes and all domains.
Some shared properties of relations, and of relation
schemata.
Learning Semantic Relations from Text
20 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Properties of Relations (1)
Useful distinctions
Ontological vs. Idiosyncratic
Binary vs. n-ary
Targeted vs. Emergent
First-order vs. Higher-order
General vs. Domain-specific
Learning Semantic Relations from Text
21 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Properties of Relations (2)
Ontological vs. Idiosyncratic
Ontological
come up practically the same in numerous contexts
e.g., is-a(apple, fruit)
can be extracted with both supervised and unsupervised
methods
Idiosyncratic
highly sensitive to the context
e.g., Content-Container(apple, basket)
best extracted with supervised methods
Note: Parallel to paradigmatic vs. syntagmatic relations in the
Course in General Linguistics [de Saussure, 1959].
Learning Semantic Relations from Text
21 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Properties of Relations (3)
Binary vs. n-ary
Binary
most relations
our focus here
n-ary
good for verbs that can take multiple arguments, e.g., sell
can be represented as frames
e.g., a selling event can invoke a frame covering relations
between a buyer, a seller, an object_bought and price_paid
Learning Semantic Relations from Text
21 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Properties of Relations (4)
Targeted vs. Emergent
Targeted
coming from a fixed inventory
e.g., {Cause, Source, Target, Time, Location}
Emergent
not fixed in advance
can be extracted using patterns over parts-of-speech
e.g., (V | V (N | Adj | Adv | Pron | Det)* PP)
can extract invented, is located in or made a deal with
could also use clustering to group similar relations
but then naming the clusters is hard
Learning Semantic Relations from Text
21 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Properties of Relations (5)
First-order vs. Higher-order
First-order
e.g., is-a(apple, fruit)
most relations
Higher-order
e.g., believes(John, is-a(apple, fruit))
can be expressed as conceptual graphs [Sowa, 1984]
important in semantic parsing [Liang & al., 2011; Lu & al., 2008]
also in biomedical event extraction [Kim & al., 2009]
e.g., “In this study we hypothesized that the
phosphorylation of TRAF2 inhibits binding to the CD40
cytoplasmic domain.”
E1: phosphorylation(Theme:TRAF2),
E2: binding(Theme1:TRAF2, Theme2:CD40,
Site:cytoplasmic domain),
E3: negative_regulation(Theme:E2, Cause:E1).
Learning Semantic Relations from Text
21 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Properties of Relations (6)
General vs. Domain-specific
General
likely to be useful in processing all kinds of text or in
representing knowledge in any domain
e.g., location, possession, causation, is-a, or part-of
Domain-specific
only relevant to a specific text genre or to a narrow domain
e.g., inhibits, activates, phosphorylates for gene/protein events
Learning Semantic Relations from Text
21 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Properties of Relation Schemata (1)
Useful distinctions
Coarse-grained vs. Fine-grained
Flat vs. Hierarchical
Closed vs. Open
Learning Semantic Relations from Text
22 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Properties of Relation Schemata (2)
Coarse-grained vs. Fine-grained
Coarse-grained
e.g., 5 relations
Fine-grained
e.g., 30 relations
Infinite, in the extreme
every interaction between entities is a distinct relation with
unique properties
not very practical as there is no generalization
however, a distribution over paraphrases is useful
Learning Semantic Relations from Text
22 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Properties of Relation Schemata (3)
Flat vs. Hierarchical
Flat
most inventories
Hierarchical
e.g., Nastase & Szpakowicz’s [2003] schema has 5
top-level and 30 second-level relations
e.g., Warren’s [1978] schema has four levels:
e.g., Possessor-Legal Belonging is a subrelation of
Possessor-Belonging, which is a subrelation of Whole-Part
under the top-level relation Possession
Learning Semantic Relations from Text
22 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Properties of Relation Schemata (4)
Closed vs. Open
Closed
most inventories
Open
used for the Web
Reflects the distinction between targeted and emergent
relations.
Learning Semantic Relations from Text
22 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
The Focus of this Tutorial
Our focus
relations between entities mentioned in the same sentence
expressed linguistically as nominals
Terminology
Relation type
e.g., hyponymy, meronymy, container, product, location
Relation instance
e.g., “chocolate contains caffeine”
Learning Semantic Relations from Text
23 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Nominal (1)
The standard definition
a phrase that behaves syntactically like a noun or a noun
phrase [Quirk & al., 1985]
Learning Semantic Relations from Text
24 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Nominal (2)
Our narrower definition
a common noun (chocolate, food)
a proper noun (Godiva, Belgium)
a multi-word proper name (United Nations)
a deverbal noun (cultivation, roasting)
a deadjectival noun ([the] rich)
a base noun phrase built of a head noun with optional
premodifiers (processed food, delicious milk
chocolate)
(recursively) a sequence of nominals (cacao tree,
cacao tree growing conditions)
Learning Semantic Relations from Text
24 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Some Clues for Extracting Semantic Relations (1)
Explicit clue
A phrase linking the entity mentions in a sentence
e.g., “Chocolate is a raw or processed food produced from the seed of
the tropical Theobroma cacao tree.”
issue 1: ambiguity
in may indicate a temporal relation (chocolate in the 20th
century)
but also a spatial relation (chocolate in Belgium)
issue 2: over-specification
the relation between chocolate and cultures in “Chocolate was
prized as a health food and a divine gift by the Mayan and Aztec
cultures.”
Learning Semantic Relations from Text
25 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Some Clues for Extracting Semantic Relations (2)
Implicit clue
The relation can be implicit
e.g., in noun compounds
clues come from knowledge about the entities
e.g., cacao tree: CACAO are SEEDS produced by a
TREE
Learning Semantic Relations from Text
25 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Some Clues for Extracting Semantic Relations (3)
Implicit clue
When an entity is an occurrence (event, activity, state)
expressed by a deverbal noun such as cultivation
The relation mirrors that between the underlying verb and
its arguments
e.g., in “the ancient Mayans cultivated chocolate”, chocolate is the
theme
thus, a theme relation in chocolate cultivation
We do not treat nominalizations separately: typically, they
can be also analyzed as normal nominals
but they are treated differently
in some linguistic theories [Levi, 1978]
in some computational linguistics work [Lapata, 2002]
Learning Semantic Relations from Text
25 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Our Assumptions
Entities are given
no entity identification
no entity disambiguation
Entities in the same sentence, no coreference, no ellipsis
Not of direct interest: existing ontologies, knowledge bases
and other repositories
though useful as seed examples or training data
Learning Semantic Relations from Text
26 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Outline
1
Introduction
2
Semantic Relations
3
Features
4
Supervised Methods
5
Unsupervised Methods
6
Embeddings
7
Wrap-up
Learning Semantic Relations from Text
27 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Learning Relations
Methods of Learning Semantic Relations
Supervised
PROs: perform better
CONs: require labeled data and feature representation
Unsupervised
PROs: scalable, suitable for open information extraction
CONs: perform worse
Learning Semantic Relations from Text
28 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Learning Relations: Features
Purpose: map a pair of terms to a vector
Entity features and relational features [Turney, 2006]
Learning Semantic Relations from Text
29 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Features
Entity features
. . . capture some representation of the meaning of an entity –
the arguments of a relation
Relational features
. . . directly characterize the relation – the interaction between its
arguments
Learning Semantic Relations from Text
30 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Entity Features (1)
Basic entity features
The string value of the argument (possibly lemmatized or
stemmed)
Examples:
string value
individual words/stems/lemmata
PROs: often informative enough for good relation assignment
CONs: too sparse
Learning Semantic Relations from Text
31 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Entity Features (2)
Background entity features
Syntactic information, e.g., grammatical role
Semantic information, e.g., semantic class
Can use task-specific inventories, e.g.,
ACE entity types
WordNet features
PROs: solve the data sparseness problem
CONs: manual resources required
Learning Semantic Relations from Text
31 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Entity Features (3)
Background entity features
clusters as semantic class information
Brown clusters [Brown&al., 1992]
Clustering By Committee [Pantel & Lin, 2002]
Latent Dirichlet Allocation [Blei&al., 2003]
Learning Semantic Relations from Text
31 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Entity Features (4)
Background entity features
Direct representation of co-occurrences in feature space
coordination (and/or) [Ó Séaghdha & Copestake, 2008], e.g., dog and cat
distributional representation
relational-semantic representation
Word embeddings [Nguyen & Grishman, 2014; Hashimoto&al., 2015]
Learning Semantic Relations from Text
31 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Entity Features (5)
Background entity features
Distributional representation
Learning Semantic Relations from Text
31 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Entity Features (6)
Background entity features
Distributional representation for the noun paper
what a paper can do: propose, say
what one can do with a paper: read, publish
typical adjectival modifiers: white, recycled
noun modifiers: toilet, consultation
nouns connected via prepositions: on environment, for
meeting, with a title
PROs: captures word meaning by aggregating all
interactions (found in a large collection of texts)
CONs: lumps together different senses
ink refers to the medium for writing
propose refers to writing/publication/document
Learning Semantic Relations from Text
31 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Entity Features (7)
Background entity features
Relational-semantic representation:
it uses related concepts from a semantic network or a
formal ontology
PROs: based on word senses, not on words
CONs: word-sense disambiguation required
Learning Semantic Relations from Text
31 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Entity Features (8)
Background entity features
Determining the semantic class of relation arguments
Clustering
The descent of hierarchy
Iterative semantic specialization
Semantic scattering
Learning Semantic Relations from Text
31 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Entity Features (9)
Background entity features
The descent of hierarchy [Rosario & Hearst, 2002]:
the same relation is assumed for all compounds from the
same hierarchies
e.g., the first noun denotes a Body Region, the second
noun denotes a Cardiovascular System:
limb vein, scalp arteries, finger capillary, forearm
microcirculation
generalization at levels 1-3 in the MeSH hierarchy
generalization done manually
90% accuracy
Learning Semantic Relations from Text
31 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Entity Features (10)
Background entity features
Iterative Semantic Specialization [Girju & al., 2003]
fully automated
applied to Part-Whole
given positive and negative examples
1
2
3
generalize up in WordNet from each example
specialize so that there are no ambiguities
produce rules
Semantic Scattering [Moldovan & al., 2004]
learns a boundary (a cut)
Learning Semantic Relations from Text
31 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Relational Features (1)
Relational features
characterize the relation directly
(as opposed to characterizing each argument in isolation)
Learning Semantic Relations from Text
32 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Relational Features (2)
Basic relational features
model the context
words between the two arguments
words from a fixed window on either side of the arguments
a dependency path linking the arguments
an entire dependency graph
the smallest dominant subtree
Learning Semantic Relations from Text
32 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Relational Features (3)
Basic relational features: examples
Learning Semantic Relations from Text
32 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Relational Features (4)
Background relational features
encode knowledge about how entities typically interact in
texts beyond the immediate context, e.g.,
paraphrases which characterize a relation
patterns with placeholders
clustering to find similar contexts
Learning Semantic Relations from Text
32 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Relational Features (5)
Background relational features
characterizing noun compounds using paraphrases
Nakov & Hearst [2007] extract from the Web verbs,
prepositions and coordinators connecting the arguments
“X
“Y
“X
“Y
that * Y”
that * X”
* Y”
* X”
Butnariu & Veale [2008] use the Google Web 1T n-grams
Learning Semantic Relations from Text
32 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Relational Features (6)
Background relational features
: example for committee member
[Nakov & Hearst, 2007]
Learning Semantic Relations from Text
32 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Relational Features (7)
Background relational features
using features with placeholders: Turney [2006] mines
from the Web patterns like
“Y * causes X” for Cause (e.g., cold virus)
“Y in * early X” for Temporal (e.g., morning frost).
Learning Semantic Relations from Text
32 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Relational Features (8)
Background relational features
can be distributional
Turney & Littman [2005] characterize the relation between
two words as a vector with coordinates corresponding to
the Web frequencies of 128 fixed phrases like “X for Y”
and “Y for X” (for is one of a fixed set of 64 joining
terms: such as, not the, is *, etc. etc. )
can be used directly, or
in singular value decomposition [Turney, 2006]
Learning Semantic Relations from Text
32 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Outline
1
Introduction
2
Semantic Relations
3
Features
4
Supervised Methods
5
Unsupervised Methods
6
Embeddings
7
Wrap-up
Learning Semantic Relations from Text
33 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Supervised Methods
Supervised relation extraction: setup
Task: given a piece of text, find instances of semantic
relations
Subtasks
argument identification (often ignored)
relation classification (core subtask)
Needed
an inventory of possible semantic relations
annotated positive/negative examples: for training, tuning
and evaluation
Learning Semantic Relations from Text
34 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Data
Annotated data for learning semantic relations
small-scale / large-scale
general-purpose / domain-specific
arguments marked / not marked
additional information about the arguments (e.g., senses)
/ no additional information
Learning Semantic Relations from Text
35 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Data: MUC and ACE
Relation Type
Physical
Subtypes
Located
Near
Part-Whole
Geographical
Subsidiary
Personal-Social
Business
Family
Lasting-Personal
OrganizationEmployment
Affiliation
Ownership
Founder
Student-Alum
Sports-Affiliation
Investor-Shareholder
Membership
Agent-Artifact
User-Owner-Inventor-Manufacturer
General Affiliation Citizen-Resident-Religion-Ethnicity
Organization-Location-Origin
Learning Semantic Relations from Text
36 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Data: MUC and ACE
Relation Type
Physical
Subtypes
The arguments of relations are tagged for type!
Located
Near
Employment(Person, Organization):
Part-Whole
Geographical
<PER>He</PER> had previously worked at <ORG>NBC
Entertainment</ORG>.
Subsidiary
Personal-Social
Business
Near(Person, Facility):
Family
<PER>Muslim youths</PER> recently staged a half dozen
rallies in front of <FAC>the embassy</FAC>.
Lasting-Personal
OrganizationEmployment
Citizen-Resident-Religion-Ethnicity(Person,
Geo-political
Affiliation
Ownership
entity):
Some <GPE>Missouri</GPE> <PER>voters</PER>. . .
Founder
Student-Alum
Sports-Affiliation
Investor-Shareholder
Membership
Agent-Artifact
User-Owner-Inventor-Manufacturer
General Affiliation Citizen-Resident-Religion-Ethnicity
Organization-Location-Origin
Learning Semantic Relations from Text
36 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Data: SemEval
a small number of relations
annotated entities
additional entity information (WordNet senses)
sentential context + mining patterns
Learning Semantic Relations from Text
37 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
SemEval-2007 Task 4 (1)
Semantic relations between nominals: inventory
Learning Semantic Relations from Text
38 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
SemEval-2007 Task 4 (2)
Semantic relations between nominals: examples
Learning Semantic Relations from Text
38 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
SemEval-2010 Task 8 (1)
Multi-way semantic relations between nominals: inventory
Learning Semantic Relations from Text
39 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
SemEval-2010 Task 8 (2)
Multi-way semantic relations between nominals: examples
Learning Semantic Relations from Text
39 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Algorithms for Relation Learning (1)
Pretty much any machine learning algorithm can work, but
some are better for relation learning.
Classification with kernels is appropriate because relational
features (in particular) may have complex
structures.
Neural networks are appropriate for capturing complex
interactions and compositionality
Sequential labelling methods are appropriate because the
arguments of a relation have variable span.
Learning Semantic Relations from Text
40 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Algorithms for Relation Learning (2)
Classification with kernels: overview
idea: the similarity of two instances can be computed in a
high-dimensional feature space without the need to
enumerate the dimensions of that space (e.g., using
dynamic programming)
convolution kernels: easy to combine features, e.g., entity
and relational
kernelizable classifiers: SVM, logistic regression, kNN,
Naïve Bayes
Learning Semantic Relations from Text
40 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Algorithms for Relation Learning (3)
Kernels for linguistic structures
string sequencies [Cancedda & al., 2003]
dependency paths [Bunescu & Mooney, 2005]
shallow parse trees [Zelenko & al., 2003]
constituent parse trees [Collins & Duffy, 2001]
dependency parse trees [Moschitti, 2006]
feature-enriched/semantic tree kernel [Plank & Moschitti,
2013; Sun & Han, 2014]
directed acyclic graphs [Suzuki & al., 2003]
Learning Semantic Relations from Text
40 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Algorithms for Relation Learning (4)
Tree kernels
Similarity between two trees is the (normalized) sum of
similarities between their subtrees
Similarity between subtrees based on similarities between
roots and children (leaf nodes or subtrees)
Similarity between leaf (word) nodes can be 0/1 or based
on semantic similarity using e.g., clusters or word
embeddings [Plank & Moschitti, 2013; Nguyen & al., 2015]
Learning Semantic Relations from Text
40 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Algorithms for Relation Learning (5)
Sequential labelling methods
HMMs / MEMMs / CRFs
[Bikel & al., 1999; Lafferty & al., 2001; McCallum & Li, 2003]
useful for
argument identification
e.g., born-in holds between Person and Location
relation extraction
argument order matters for some relations
Learning Semantic Relations from Text
40 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Algorithms for Relation Learning (6)
Sequential labelling: argument identification
words: individual words, previous/following two words, word
substrings (prefixes, suffixes of various lengths), capitalization, digit
patterns, manual lexicons (e.g., of days, months, honorifics, stopwords,
lists of known countries, cities, companies, and so on)
labels: individual labels, previous/following two labels
combinations of words and labels
Learning Semantic Relations from Text
40 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Algorithms for Relation Learning (7)
Sequential labelling: relation extraction
when one argument is known: the task becomes argument
identification
e.g., this GeneRIF is about COX-2
COX-2 expression is significantly more common in
endometrial adenocarcinoma and ovarian serous
cystadenocarcinoma, but not in cervical squamous
carcinoma, compared with normal tissue.
some relations come in order
e.g., Party, Job and Father below
Learning Semantic Relations from Text
40 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Algorithms for Relation Learning (8)
Sequential labelling: relation extraction
HMMs, CRFs [Culotta & al., 2006; Bundschus & al., 2008]
Dynamic graphical model [Rosario & Hearst, 2004]
Learning Semantic Relations from Text
40 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Algorithms for Relation Learning (9)
Neural networks for representing contexts
Recursive networks create a bottom-up representation for a
tree context by recursively combining
representations of siblings [Socher & al., 2012]
Convolutional networks create a representation by sliding a
window over the context and pooling the
representations at each step [Zeng & al., 2014]
Recurrent networks create a representation for a sequence
context by processing each item in the sequence
and updating the representation at each step [Li & al.,
2015]
Context representation can be augmented with traditional entity
features.
Learning Semantic Relations from Text
40 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Algorithms for Relation Learning (10)
Recursive neural networks [Socher & al., 2012]
P REDICTION
oooooo
oooooo
smoking
Word vectors (can be pretrained)
Compositional vectors (RNN):
vparent = f (Wl vl + Wr vr + b)
oooooo
oooooo
oooooo
causes
cancer
Compositional vectors and matrices (MV-RNN):
vparent = f (WVl Mr vl + WVr Ml vr + b)
Mparent = WMl Ml + WMr Mr
Learning Semantic Relations from Text
40 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Algorithms for Relation Learning (11)
oooo
oooo
oooo
oooo
Word vectors (can be pretrained)
oooo
oooo
oooooo
Convolutional neural networks [Zeng & al., 2014, Liu &
al., 2015, dos Santos & al., 2015]
semantics
doesn’t
cause
Position vectors
Window vector (length = 3) at word t:
Plength
vt,win = i
wi,word vt,i,word + wi,position vt,i,position + b
Sentence vector (max pooling):
vsen [i] = max0≤t<|T | vt,win [i]
cancer
Learning Semantic Relations from Text
40 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Beyond Binary Relations (1)
Non-binary relations
Some relations are not binary
Purchase (Purchaser, Purchased_Entity, Price, Seller)
Previous methods generally apply
but there are some issues
Features: not easy to use the words between entity
mentions, or the dependency path between mentions, or
the least common subtree
Partial mentions
Sparks Ltd. bought 500 tons of steel from Steel Ltd.
Steel Ltd. bought 200 tons of coal.
Learning Semantic Relations from Text
41 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Beyond Binary Relations (2)
Non-binary relations
Coping with partial mentions
treat partial mentions as negatives
ignore partial mentions
train a separate model for each combination of arguments
McDonald & al. (2005)
1
2
predict whether two entities are related to each other
use strong argument typing and graph-based global
optimization to compose n-ary predictions
many solutions for Semantic Role Labeling
[Palmer & al., 2010]
Learning Semantic Relations from Text
41 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Supervised Methods: Practical Considerations (1)
Some very general advice
Favour high-performing algorithms such as SVM, logistic
regression or CRF
(CRF only if it makes sense as a sequence-labelling problem)
entity and relational features are almost always useful
the value of background features varies across tasks
e.g., for noun compounds, background knowledge is key,
while context is not very useful
Learning Semantic Relations from Text
42 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Supervised Methods: Practical Considerations (2)
Performance depends on a number of factors
the number and nature of the relations used
the distribution of those relations in data
the source of data for training and testing
the annotation procedure for data
the amount of training data available
...
Conservative conclusion: state-of-the-art systems perform
well above random or majority-class baseline.
Learning Semantic Relations from Text
42 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Supervised Methods: Practical Considerations (3)
Performance at SemEval
SemEval-2007 Task 4
winning system: F=72.4%, Acc=76.3%, using resources
such as WordNet
[Beamer & al., 2007]
later: similar performance, using corpus data only
[Davidov & Rappoport, 2008; Ó Séaghdha & Copestake, 2008; Nakov & Kozareva, 2011]
SemEval-2010 Task 8
winning system: F=82.2%, Acc=77.9%, using many manual
resources
[Rink & Harabagiu, 2010]
later: improvement F=84.1%, neural network with corpus
data only
[dos Santos & al., 2015]
Learning Semantic Relations from Text
42 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Supervised Methods: Practical Considerations (4)
Performance at ACE
Different task
full documents rather than single sentences
relations between specific classes of named entities
F-score
low-to-mid 70s [Jiang & Zhai, 2007; Zhou & al., 2007, 2009]
Granularity matters
moving from <10 ACE relation types to >20 relation
subtypes (on the same data!) decreases F1 by about 20%
Learning Semantic Relations from Text
42 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Outline
1
Introduction
2
Semantic Relations
3
Features
4
Supervised Methods
5
Unsupervised Methods
6
Embeddings
7
Wrap-up
Learning Semantic Relations from Text
43 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Mining Very Large Corpora (1)
Very large corpora
examples
GigaWord (news texts)
PubMed (scientific articles)
World-Wide Web
contain massive amounts of data
cannot all be encoded to train a supervised model
Learning Semantic Relations from Text
44 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Mining Very Large Corpora (2)
Very large corpora
suitable for unsupervised relation mining
useful in extracting relational knowledge
Taxonomic
e.g., What kinds of animals exist?
Ontological
e.g., Which cities are located in the United Kingdom?
Event
e.g., Which companies have bought which other companies?
needed because manual knowledge bases are inherently
incomplete, e.g., Cyc and Freebase
Learning Semantic Relations from Text
44 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Mining Very Large Corpora (3)
Example
Swanson [1987] discovered a connection between
migraines and magnesium
Swanson linking
publication 1: illness A is caused by chemical B
publication 2: drug C reduces chemical B in the body
linking: connection between illness A and drug C
Learning Semantic Relations from Text
44 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Mining Very Large Corpora (4)
Challenges
a lot of irrelevant information
high precision is key
a supervised model might not be feasible
new relations, not seen in training
deep features too expensive
Learning Semantic Relations from Text
44 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Mining Very Large Corpora (5)
Historically important: Crafted patterns
very high precision
low recall
not a problem because of the scale of corpora
low coverage
cover only a small number of relations
Learning Semantic Relations from Text
44 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Mining Very Large Corpora (6)
Brief history
pioneered by Hearst (1992)
initially, taxonomic relations – the backbone of any
taxonomy or ontology
is-a: hyponymy/hypernymy
part-of: meronymy/holonymy
gradually expanded
more relations
larger scale of corpora – Web-scale now within reach
the Never-Ending Language Learner project
the Machine Reading project
Learning Semantic Relations from Text
44 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Early Work: Mining Dictionaries (1)
Extracting taxonomic relations from dictionaries
popular in 1980s
[Ahlswede & Evens, 1988; Alshawi, 1987; Amsler, 1981; Chodorow & al., 1985; Ide & al., 1992;
Klavans & al., 1992]
focus on is-a
hypenymy/hyponymy
subclass/superclass
used dictionaries such as Merriam-Webster
pattern-based
Learning Semantic Relations from Text
45 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Early Work: Mining Dictionaries (2)
Merriam-Webster: GROUP and related concepts
[Amsler, 1981]
GROUP 1.0A – a number of individuals related by a common factor (as physical association, community of
interests, or blood)
CLASS 1.1A – a group of the same general status or nature
TYPE 1.4A – a class, kind, or group set apart by common characteristics
KIND 1.2A – a group united by common traits or interests
KIND 1.2B – CATEGORY
CATEGORY .0A – a division used in classification
CATEGORY .0B – CLASS, GROUP, KIND
DIVISION .2A – one of the parts, sections, or groupings into which a whole is divided
*GROUPING <== W7 – a set of objects combined in a group
SET 3.5A – a group of persons or things of the same kind or having a common characteristic usu. classed
together
SORT 1.1A – a group of persons or things that have similar characteristics
SORT 1.1B - CLASS
SPECIES .IA – SORT, KIND
SPECIES .IB – a taxonomic group comprising closely related organisms potentially able to breed with one
another
Learning Semantic Relations from Text
45 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Early Work: Mining Dictionaries (3)
Merriam-Webster: GROUP and related concepts
[Amsler, 1981]
Learning Semantic Relations from Text
45 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Early Work: Mining Dictionaries (4)
Mining dictionaries: summary
PROs
short, focused definitions
standard language
limited vocabulary
CONs
circularity
hard to identify the key terms
group of persons
number of individuals
limited coverage
Learning Semantic Relations from Text
45 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Mining Relations with Patterns (1)
Relation mining patterns
when matched against a text fragment, identify relation
instances
can involve
lexical items
wildcards
parts of speech
syntactic relations
flexible rules, e.g., as in regular expressions
...
Learning Semantic Relations from Text
46 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Mining Relations with Patterns (2)
Hearst’s (1992) lexico-syntactic patterns
NP such as {NP,}∗ {(or|and)} NP
“. . . bow lute, such as Bambara ndang . . . ”
→ (bow lute, Bambara ndang)
such NP as {NP,}∗ {(or|and)} NP
“. . . works by such authors as Herrick, Goldsmith, and Shakespeare”
→ (authors, Herrick); (authors, Goldsmith); (authors,
NP {, NP}∗ {,} (or|and) other NP
“. . . temples, treasuries, and other important civic buildings . . . ”
→ (important civic buildings, temples); (important
Shakespeare)
civic buildings, treasuries)
NP{,} (including|especially) {NP,}∗ (or|and) NP
“. . . most European countries, especially France, England and Spain . . . ”
→ (European countries, France); (European countries, England); (European
countries, Spain)
Learning Semantic Relations from Text
46 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Mining Relations with Patterns (3)
Hearst’s (1992) lexico-syntactic patterns
designed for very high precision, but low recall
only cover is-a
later, extended to other relations, e.g.,
part-of [Berland & Charniak, 1999]
protein-protein interactions
[Blaschke & al., 1999; Pustejovsky & al., 2002]
N1 inhibits N2
N2 is inhibited by N1
inhibition of N2 by N1
unclear if such patterns can be designed for all relations
Learning Semantic Relations from Text
46 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Mining Relations with Patterns (4)
Hearst’s (1992) lexico-syntactic patterns
ran on Grolier’s American Academic Encyclopedia
small by today’s standards
still, large enough: 8.6 million tokens
very low recall
extracted just 152 examples (but with very high precision)
increase recall
bootstrapping
Learning Semantic Relations from Text
46 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Bootstrapping (1)
Learning Semantic Relations from Text
47 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Bootstrapping (2)
Learning Semantic Relations from Text
47 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Bootstrapping (3)
Bootstrapping
Initialization
few seed examples
e.g., for is-a
cat-animal
car-vehicle
banana-fruit
Expansion
new patterns
new instances
Several iterations
Main difficulty
semantic drift
Learning Semantic Relations from Text
47 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Bootstrapping (4)
Bootstrapping
Context-dependency
not good for context-dependent relations
in one newspaper: “Lokomotiv defeated Porto.”
in a few months: “Porto defeated Lokomotiv Moscow.”
Specificity
good for specific relations such as birthdate
cannot distinguish between fine-grained relations
e.g., different kinds of Part-Whole – maybe
Component-Integral_Object, Member-Collection,
Portion-Mass, Stuff-Object, Feature-Activity and Place-Area
– would share the same patterns
Learning Semantic Relations from Text
47 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Tackling Semantic Drift (1)
Example of semantic drift
Seeds
London
Paris
New York
Added examples
Patterns
→
mayor of X
lives in X
...
→
California
Europe
...
Learning Semantic Relations from Text
48 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Tackling Semantic Drift (2)
Example: Euler diagram for four people-relations [Krause&al.,2012]
Learning Semantic Relations from Text
48 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Tackling Semantic Drift (3)
Some strategies
Limit the number of iterations
Select a small number of patterns/examples per iteration
Use semantic types, e.g., the SNOWBALL system
hOrganizationi’s headquarters in hLocationi
hLocationi-based hOrganizationi
hOrganizationi, hLocationi
Learning Semantic Relations from Text
48 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Tackling Semantic Drift (4)
More strategies
scoring patterns/instances
specificity: prefer patterns that match less contexts
confidence: prefer patterns with higher precision
reliability: based on PMI
argument type checking
coupled training
Learning Semantic Relations from Text
48 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Tackling Semantic Drift (5)
Coupled training [Carlson & al., 2010]
Used in the Never-Ending Language Learner
Learning Semantic Relations from Text
48 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Distant Supervision (1)
Distant supervision
Issue with bootstrapping: starts with a small number of
seeds
Distant supervision uses a huge number
[Craven & Kumlien, 1999]
1
2
3
Get huge seed sets, e.g., from WordNet, Cyc, Wikipedia
infoboxes, Freebase
Find contexts where they occur
Use these contexts to train a classifier
Learning Semantic Relations from Text
49 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Distant Supervision (2)
Example: experiments of Mintz & al. [2009]
102 relations from Freebase, 17,000 seed instances
mapped them to Wikipedia article texts
extracted
1.8 million instances
connecting 940,000 entities
Assumption: all co-occurrences of a pair of entities
express the same relation
Riedel & al. [2010] assume that at least one context
expresses the target relation (rather than all)
Ling & al. [2013] assume that a certain percentage (which
can vary by relation) of the contexts are true positives
Learning Semantic Relations from Text
49 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Distant Supervision (3)
training sentences
1
positive: with the relation
2
negative: without the
relation
train a two-stage classifier:
1
identify the sentences
with a relation instance
2
extract relations from
these sentences
Learning Semantic Relations from Text
49 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Distant Supervision (4)
False negatives
Knowledge bases used to provide distant supervision are
incomplete
1
2
avoid false negatives [Min&al. 2013]
fill in gaps [Xu&al. 2013]
Learning Semantic Relations from Text
49 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Distant Supervision (5)
Distant and partial supervision
Choose representative and useful training examples to
maximize performance
1
2
3
active learning [Angeli&al. 2014]
infusion of labeled data [Pershina&al. 2014]
semantic consistency [Han & Sun, 2014]
Learning Semantic Relations from Text
49 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Unsupervised Relation Extraction
Other issues with bootstrapping
uses multiple passes over a corpus
often undesirable/unfeasible, e.g., on the Web
if we want to extract all relations
no seeds for all of them
Possible solution
unsupervised relation extraction
no pre-specified list of relations, seeds or patterns
Learning Semantic Relations from Text
50 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Extracting is-a Relations (1)
Pantel & Ravichandran [2004]
cluster nouns using cooccurrence as in [Pantel & Lin, 2002]
Apple, Google, IBM, Oracle, Sun Microsystems, ...
extract hypernyms using patterns
Apposition (N:appo:N), e.g., . . . Oracle, a company known
for its progressive employment policies . . .
Nominal subject (-N:subj:N), e.g., . . . Apple was a hot
young company, with Steve Jobs in charge . . .
Such as (-N:such as:N), e.g., . . . companies such as IBM
must be weary . . .
Like (-N:like:N), e.g., . . . companies like Sun
Microsystems do not shy away from such challenges . . .
is-a between the hypernym and each noun in the cluster
Learning Semantic Relations from Text
51 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Extracting is-a Relations (2)
[Kozareva & al., 2008]
uses a doubly-anchored pattern (DAP)
“sem-class such as term1 and *”
similar to the Hearst pattern
NP0 such as {NP1 , NP2 , . . ., (and | or)} NPn
but different
exactly two arguments after such as
and is obligatory
prevents sense mixing
cats–jaguar –puma
predators–jaguar –leopard
cars–jaguar –ferrari
Learning Semantic Relations from Text
51 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Extracting is-a Relations (3)
[Kozareva & Hovy, 2010]: DAPs can yield a taxonomy
Learning Semantic Relations from Text
51 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Extracting is-a Relations (4)
[Kozareva & Hovy, 2010]: DAPs can yield a taxonomy
Learning Semantic Relations from Text
51 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Emergent Relations (1)
Emergent relations in open relation extraction
no fixed set of relations
need to identify novel relations
use verbs, prepositions
different verbs, same relation: shot against the flu, shot to
prevent the flu
verb, but no relation: “It rains.” or “I do.”
no verb, but relation: flu shot
use clustering
string similarity
distributional similarity
Learning Semantic Relations from Text
52 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Emergent Relations (2)
Clustering with distributional similarity
using paraphrases from dependency parses
[Lin & Pantel, 2001; Pasca, 2007]
e.g., DIRT for X solves Y
Y is solved by X, X resolves Y, X finds a solution to Y, X tries to solve Y, X deals with Y, Y is
resolved by X, X addresses Y, X seeks a solution to Y, X does something about Y, X
solution to Y, Y is resolved in X, Y is solved through X, X rectifies Y, X copes with Y, X
overcomes Y, X eases Y, X tackles Y, X alleviates Y, X corrects Y, X is a solution to Y, X
makes worse Y, X irons out Y
extracted shared property model
[Yates & Etzioni, 2007]
e.g., if (lacks, Mars, ozone layer) and (lacks, Red Planet,
ozone layer), then Mars and Red Planet share the property
(lacks, *, ozone layer)
Learning Semantic Relations from Text
52 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Emergent Relations (3)
[Davidov & Rappoport, 2008]
Prefix CW1 Infix CW2 Postfix
label
(pets, dogs)
(phone, charger)
patterns
{ such X as Y, X such as Y, Y and other X }
{ buy Y accessory for X!, shipping Y for X,
Y is available for X, Y are available for X,
Y are available for X systems, Y for X }
These (CW1 , CW2 ) clusters are efficient as background
features for supervised models.
Learning Semantic Relations from Text
52 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Self-Supervised Relation Extraction (1)
Self-supervision
algorithm
1
2
3
parse a small corpus
extract and annotate relation instances: e.g., based on
heuristics and the connecting path between entity mentions
train relation extractors on these instances
not guided by or assigned to any particular relation type
features: shallow lexical and POS, dependency path
applicable on the Web
used in the Machine Reading project at U Washington
Learning Semantic Relations from Text
53 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Self-Supervised Relation Extraction (2)
Self-supervision
Issues with the extracted relations
not coherent
e.g., The Mark 14 was central to the torpedo scandal of the
fleet. → was central torpedo
uninformative
e.g., . . . is the author of . . . → is
too specific
e.g., is offering only modest greenhouse gas reductions
targets at
Learning Semantic Relations from Text
53 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Self-Supervised Relation Extraction (3)
Self-supervision
Improving the relation quality
constraints: syntactic, positional and frequency [Fader & al., 2011]
focus on functional relations, e.g., birthplace [Lin & al., 2010]
use redundancy: the “KnowItAll hypothesis” [Downey & al., 2005,
2010] – extractions from more distinct sentences in a corpus
are more likely to be correct
high frequency is not enough though:
"Elvis killed JFK" yields 1,360 hits (on September 17, 2015)
still, "Oswald killed JFK" had 7,310 hits
Learning Semantic Relations from Text
53 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Web-Scale Relation Extraction (1)
Two major large-scale knowledge acquisition projects that
harvest the Web continuously
Never-Ending Language Learner (NELL)
at Carnegie-Mellon University
http://rtw.ml.cmu.edu/rtw/
Machine Reading
at the University of Washington
http://ai.cs.washington.edu/projects/
open-information-extraction
Learning Semantic Relations from Text
54 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Web-Scale Relation Extraction (2)
Never-Ending Language Learner [Mohamed & al., 2011]
starting with a seed ontology
600 categories and relations
each with 20 seed examples
learns
new concepts
new concept instances
new instances of the existing relations
novel relations
approach: bootstrapping, coupled learning, manual
intervention, clustering
learned (as of September 17, 2015)
50 million confidence-scored relations (beliefs)
2,575,848 with high confidence scores
Learning Semantic Relations from Text
54 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Web-Scale Relation Extraction (3)
Machine Reading at U Washington
KnowItAll [Etzioni & al., 2005] – bootstrapping using Hearst patterns
TextRunner [Banko & al., 2007] – self-supervised, specific relation
models from a small corpus, applied to a large corpus
Kylin [Wu & Weld, 2007] and WPE [Hoffmann & al., 2010] bootstrapping
starting with Wikipedia infoboxes and associated articles
WOE [Wu & Weld, 2010] extends Kylin to open information
extraction, using part-of-speech or dependency patterns
ReVerb [Fader & al., 2011] – lexical and syntactic constraints on
potential relation expressions
OLLIE [Mausam & al., 2012] – extends WOE with better patterns
and dependencies (e.g., some relations are true for some
period of time, or are contingent upon external conditions)
Learning Semantic Relations from Text
54 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Other Large-Scale Knowledge Acquisition Projects (1)
YAGO-NAGA [Hoffart&al., 2015]
harvest, search, and rank knowledge from the Web
large-scale, highly accurate, machine-processible
integration with Wikipedia and WordNet
started in 2016, several subprojects
Learning Semantic Relations from Text
55 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Other Large-Scale Knowledge Acquisition Projects (2)
BabelNet [Navigli&Ponzetto, 2012]
multilingual semantic network
integrates several knowledge sources
no additional Web mining (just integration)
Learning Semantic Relations from Text
55 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Unsupervised Methods: Summary
Unsupervised relation extraction
good for
large text collections or the Web
context-independent relations
methods
bootstrapping (but semantic drift)
coupled learning
distant supervision
semi-supervision
self-supervision
applications
continuous open information extraction
NELL
Machine Reading
Learning Semantic Relations from Text
56 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Outline
1
Introduction
2
Semantic Relations
3
Features
4
Supervised Methods
5
Unsupervised Methods
6
Embeddings
7
Wrap-up
Learning Semantic Relations from Text
57 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Word Embeddings (1)
Word Embedding
What is it?
mapping words to vectors of real numbers in a low
dimensional space
How is it done?
neural networks (e.g., CBOW, skip-gram) [Mikolov&al.2013a]
dimensionality reduction (e.g., LSA, LDA, PCA)
explicit representation (words in the context)
Why should we care?
useful for a number of NLP tasks
. . . including semantic relations
Learning Semantic Relations from Text
58 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Word Embeddings (2)
Word Embeddings from a Neural LM [Bengio &al.2003]
Learning Semantic Relations from Text
58 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Word Embeddings (3)
Continuous Bag of Words (“predict word”) [Mikolov &al.2013a]
Learning Semantic Relations from Text
58 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Word Embeddings (4)
Skip-gram (“predict context”) [Mikolov &al.2013a]
Learning Semantic Relations from Text
58 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Word Embeddings (5)
Skip-gram: projection with PCA
Learning Semantic Relations from Text
58 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Word Embeddings (6)
Skip-gram: properties [Mikolov&al.2013a]
Word embeddings have linear structure that enables
analogies with vector arithmetics
Due to training objective: input and output (before softmax)
are in a linear relationship
Learning Semantic Relations from Text
58 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Word Embeddings (7)
Skip-gram: vector arithmetics
inspired by analogy problems
Learning Semantic Relations from Text
58 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Word Embeddings (8)
Recurrent Neural Network Language Model (RNNLM)
[Mikolov&al.2013b]
Learning Semantic Relations from Text
58 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Word Embeddings (9)
RNNLM: beyond semantic relations [Mikolov&al.2013b]
gender, number, etc.
Learning Semantic Relations from Text
58 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Syntactic Word Embeddings (1)
Dependency-based embeddings [Levy&Goldberg,2014a]
Learning Semantic Relations from Text
59 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Syntactic Word Embeddings (2)
Dependency- vs. word-based embeddings [Levy&Goldberg,2014a]
Words: topical
Dependencies: functional
also true for explicit representations [Lin,1998; Padó&Lapata,2007]
Example: Turing
Words: nondeterministic, non-deterministic, computability,
deterministic, finite-state
Dependencies: Pauling, Hotelling, Heting, Lessing,
Hamming
Learning Semantic Relations from Text
59 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Word Embeddings: Should We Care?
Embeddings vs. Explicit Representations
embeddings are better across many tasks [Baroni&al., 2014]
semantic relatedness
synonym detection
concept categorization
selectional preferences
analogy
BUT explicit representation can be as good on analogies,
with a better objective [Levy&Goldberg,2014b]
Learning Semantic Relations from Text
60 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Embeddings for Relation Extraction (1)
Recursive Neural Networks (RNN) [Socher&al., 2012]
P REDICTION
oooooo
oooooo
smoking
Word vectors (can be pretrained)
Compositional vectors (RNN):
vparent = f (Wl vl + Wr vr + b)
oooooo
oooooo
oooooo
causes
cancer
Compositional vectors and matrices (MV-RNN):
vparent = f (WVl Mr vl + WVr Ml vr + b)
Mparent = WMl Ml + WMr Mr
Learning Semantic Relations from Text
61 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Embeddings for Relation Extraction (2)
MV-RNN: Matrix-Vector RNN [Socher&al., 2012]
vectors: for compositionality
matrices: for operator semantics
Learning Semantic Relations from Text
61 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Embeddings for Relation Extraction (3)
MV-RNN for Relation Classification [Socher&al., 2012]
Learning Semantic Relations from Text
61 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Embeddings for Relation Extraction (4)
CNN: Convolutional Deep Neural Network [Zeng&al., 2014]
Learning Semantic Relations from Text
61 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Embeddings for Relation Extraction (5)
CNN (sentence level features) [Zeng&al., 2014]
WF: word vectors; PF: position vectors (distance to e1 , e2 )
Learning Semantic Relations from Text
61 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Embeddings for Relation Extraction (6)
FCM: Factor-based Compositional Embed. Model [Yu&al., 2014]
Extension of the model coming at EMNLP’2015 [Gormley&al., 2015]
Learning Semantic Relations from Text
61 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Embeddings for Relation Extraction (7)
FCM (continued) [Yu&al., 2014]
extension of the model at EMNLP’2015!
[Gormley&al., 2015]
Learning Semantic Relations from Text
61 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Embeddings for Relation Extraction (8)
CR-CNN: Classification by Ranking CNN [dos Santos&al., 2015]
pairwise ranking loss
word, class, position, sentence embeddings
Learning Semantic Relations from Text
61 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Embeddings for Relation Extraction (9)
SDP-LSTM: Shortest dependency path LSTM [Yan Xu&al., 2015]
to be presented at EMNLP’2015!
Learning Semantic Relations from Text
61 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Embeddings for Relation Extraction (10)
depLCNN: Dependency CNN (w/ neg. sampling) [Kun Xu&al., 2015]
to be presented at EMNLP’2015!
Learning Semantic Relations from Text
61 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Embeddings for Relation Extraction (11)
Comparison on SemEval-2010 Task 8 [Kun Xu&al., 2015]
Learning Semantic Relations from Text
61 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Outline
1
Introduction
2
Semantic Relations
3
Features
4
Supervised Methods
5
Unsupervised Methods
6
Embeddings
7
Wrap-up
Learning Semantic Relations from Text
62 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Lessons Learned
Semantic relations
are an open class
just like concepts, they can be organized hierarchically
some are ontological, some idiosyncratic
the way we work with them depends on
the application
the method
Learning Semantic Relations from Text
63 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Lessons Learned
Learning to identify or discover relations
investigate many detailed features in a (small)
fully-supervised setting, and try to port them into an open
relation extraction setting
set an inventory of targeted relations, or allow them to
emerge from the analyzed data
use (more or less) annotated data to bootstrap the learning
process
exploit resources created for different purposes for our own
ends (Wikipedia!)
Learning Semantic Relations from Text
63 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Extracting Relational Knowledge from Text
The bigger picture: NLP finds knowledge in a lot of text
and then gets the deeper meaning of a little text
Manual construction of knowledge bases
PROs: accurate (insofar as people who do it do not make mistakes)
CONs: costly, inherently limited in scope
Automated knowledge acquisition
PROs: scalable, e.g., to the Web
CONs: inaccurate, e.g., due to semantic drift or
inaccuracies in the analyzed text
Learning relations
PROs: reasonably accurate
CONs: needs relation inventory and annotated training
data, does not scale to large corpora
Learning Semantic Relations from Text
64 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
The Future
Hot research topics and future directions
embeddings, deep learning
Web-scale relation mining
continuous, never-ending learning
distant supervision
use of large knowledge sources such as Wikipedia,
DBpedia
semi-supervised methods
combining symbolic and statistical methods
e.g., ontology acquisition using statistics
Learning Semantic Relations from Text
65 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Relevant Literature is Huge! (1)
Relevant papers at EMNLP’2015
[Li&al., 2015] compare recursive (based on syntactic trees)
vs. recurrent (inspired by LMs) neural networks on four
tasks, including semantic relation extraction
[Kun Xu&al., 2015] learn robust relation representations
from shortest dependency paths through a convolution
neural network using simple negative sampling
[Yan Xu&al., 2015] use long short term memory networks
along shortest dependency paths for relation classification
[Gormley&al., 2015] propose a compositional embedding
model for relation extraction that combines (unlexicalized)
hand-crafted features with learned word embeddings
Learning Semantic Relations from Text
66 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Relevant Literature is Huge! (2)
Relevant papers at EMNLP’2015
[Zeng&al., 2015] propose piecewise convolutional neural
networks for relation extraction using distant supervision
[Batista&al., 2015] use word embeddings and
bootstrapping for relation extraction
[Li&Jurafsky, 2015] propose a multi-sense embedding
model based on Chinese Restaurant Processes,applied to
a number of tasks including semantic relation identification
[D’Souza&Ng, 2015] use expanding parse trees with
sieves for spatial relation extraction
Learning Semantic Relations from Text
66 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Relevant Literature is Huge! (3)
Relevant papers at EMNLP’2015
[Grycner&al., 2015] mine relational phrases and their
hypernyms
[Kloetzer&al., 2015] acquire entailment pairs of binary
relations on a large-scale
[Gupta&al., 2015] use distributional vectors for fine-grained
semantic attribute extraction
[Su&al., 2015] use bilingual correspondence recursive
autoencoder to model bilingual phrases in translation
[Qiu&al., 2015] compare syntactic and n-gram based word
embeddings for Chinese analogy detection and mining
Learning Semantic Relations from Text
66 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Relevant Literature is Huge! (4)
Relevant papers at EMNLP’2015
[Luo&al, 2015] infer binary relation schemas for open
information extraction
[Petroni&al., 2015] propose context-aware open relation
extraction with factorization machines
[Augenstein&al., 2015] extract relations between
non-standard entities using distant supervision and
imitation learning
[Tuan&al., 2015] use trustiness and collective
synonym/contrastive evidence into taxonomy construction
Learning Semantic Relations from Text
66 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Relevant Literature is Huge! (5)
Relevant papers at EMNLP’2015
[Bovi&al., 2015] perform knowledge base relation
unification via sense embeddings and disambiguation
[Garcia-Duran&al.,2015] perform link prediction in
knowledge bases by composing relationships with
translations in the embedding space
[Zhong&al., 2015] perform link predictions in KBs and
relational fact extraction by aligning knowledge and text
embeddings by entity descriptions
[Gardner&Mitchell, 2015] extract relations using subgraph
feature selection for knowledge base completion
Learning Semantic Relations from Text
66 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Relevant Literature is Huge! (6)
Relevant papers at EMNLP’2015
[Toutanova&al., 2015] learn joint embeddings of text and
knowledge bases for knowledge base completion
[Luo&al., 2015] present context-dependent knowledge
graph embedding for link prediction and triple classification
[Kotnis&al., 2015] extend knowledge bases with missing
relations, using bridging entities
[Lin&al., 2015] embed entities and relations using a
path-based representation for knowledge base completion
and relation extraction
Learning Semantic Relations from Text
66 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Relevant Literature is Huge! (7)
Relevant papers at EMNLP’2015
[Mitra&Baral, 2015] extract relations to automatically solve
logic grid puzzles
[Seo&al., 2015] extract relations from text and visual
diagrams to solve geometry problems
[Li&Clark, 2015] use semantic relations for background
knowledge construction for answering elementary science
questions
Learning Semantic Relations from Text
66 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Relevant Literature is Huge! (8)
Relevant papers at EMNLP’2015
28 out of the 312 papers at EMNLP’2015, or 9%, are about
relation extraction
topics: embeddings, various neural network types and
architectures
applications: knowledge base and taxonomy enrichment,
question answering, problem solving (e.g., math), machine
translation
We probably miss some relevant EMNLP’2015 papers...
... and there is much more recent work beyong
EMNLP’2015
Learning Semantic Relations from Text
66 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Read the Book!
doi:10.2200/S00489ED1V01Y201303HLT019
Learning Semantic Relations from Text
67 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
—————————————————————————-
Learning Semantic Relations from Text
68 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Thank you!
Questions?
Learning Semantic Relations from Text
68 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Bibliography I
Thomas Ahlswede and Martha Evens.
Parsing vs. text processing in the analysis of dictionary definitions.
In Proc. 26th Annual Meeting of the Association for Computational Linguistics, Buffalo, NY, USA, pages
217–224, 1988.
Hiyan Alshawi.
Processing dictionary definitions with phrasal pattern hierarchies.
Americal Journal of Computational Linguistics, 13(3):195–202, 1987.
Robert Amsler.
A taxonomy for English nouns and verbs.
In Proc. 19th Annual Meeting of the Association for Computational Linguistics, Stanford University, Stanford,
CA, USA, pages 133–138, 1981.
Gabor Angeli, Julie Tibshirani, Jean Wu, and Christopher D. Manning.
Combining distant and partial supervision for relation extraction.
In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP),
pages 1556–1567, Doha, Qatar, October 2014. Association for Computational Linguistics.
URL http://www.aclweb.org/anthology/D14-1164.
Isabelle Augenstein, Andreas Vlachos, and Diana Maynard.
Extracting relations between non-standard entities using distant supervision and imitation learning.
In Proc. Conference on Empirical Methods in Natural Language Processing, pages 747–757, 2015.
Learning Semantic Relations from Text
69 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Bibliography II
Michele Banko, Michael Cafarella, Stephen Sonderland, Matt Broadhead, and Oren Etzioni.
Open information extraction from the Web.
In Proc. 22nd Conference on the Advancement of Artificial Intelligence, Vancouver, BC, Canada, pages
2670–2676, 2007.
Ken Barker and Stan Szpakowicz.
Semi-automatic recognition of noun modifier relationships.
In Proc. 36th Annual Meeting of the Association for Computational Linguistics, Montréal, Canada, pages
96–102, 1998.
Marco Baroni, Georgiana Dinu, and Germán Kruszewski.
Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors.
In Proc. of the Annual Meeting of the Association for Computational Linguistics, pages 238–247, 2014.
David S. Batista, Bruno Martins, and Mário J. Silva.
Semi-supervised bootstrapping of relationship extractors with distributional semantics.
In Proc. Conference on Empirical Methods in Natural Language Processing, pages 499–504, 2015.
Brandon Beamer, Suma Bhat, Brant Chee, Andrew Fister, Alla Rozovskaya, and Roxana Girju.
UIUC: a knowledge-rich approach to identifying semantic relations between nominals.
In Proc. 4th International Workshop on Semantic Evaluations (SemEval-1), Prague, Czech Republic, pages
386–389, 2007.
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin.
A neural probabilistic language model.
J. Mach. Learn. Res., 3:1137–1155, March 2003.
Learning Semantic Relations from Text
70 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Bibliography III
Matthew Berland and Eugene Charniak.
Finding parts in very large corpora.
In Proc. 37th Annual Meeting of the Association for Computational Linguistics, College Park, MD, USA,
pages 57–64, 1999.
Daniel M. Bikel, Richard Schwartz, and Ralph M. Weischedel.
An algorithm that learns what’s in a name.
Machine Learning, 34(1-3):211–231, February 1999.
URL http://dx.doi.org/10.1023/A:1007558221122.
Christian Blaschke, Miguel A. Andrade, Christos Ouzounis, and Alfonso Valencia.
Automatic extraction of biological information from scientific text: protein-protein interactions.
In Proc. 7th International Conference on Intelligent Systems for Molecular Biology (ISMB-99), Heidelberg,
Germany, 1999.
David M. Blei, Andrew Y. Ng, and Michael I. Jordan.
Latent Dirichlet allocation.
Journal of Machine Learning Research, 3:993–1022, 2003.
Peter F. Brown, Peter V. deSouza, Robert L. Mercer, Vincent J. Della Pietra, and Jenifer C. Lai.
Class-Based n-gram Models of Natural Language.
Computational Linguistics, 18:467–479, 1992.
Learning Semantic Relations from Text
71 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Bibliography IV
Razvan Bunescu and Raymond J. Mooney.
A shortest path dependency kernel for relation extraction.
In Human Language Technology Conference and Conference on Empirical Methods in Natural Language
Processing (HLT-EMNLP-05), Vancouver, Canada, 2005.
Cristina Butnariu and Tony Veale.
A concept-centered approach to noun-compound interpretation.
In Proc. 22nd International Conference on Computational Linguistics, pages 81–88, Manchester, UK, 2008.
Michael Cafarella, Michele Banko, and Oren Etzioni.
Relational Web search.
Technical Report 2006-04-02, University of Washington, Department of Computer Science and Engineering,
2006.
Nicola Cancedda, Eric Gaussier, Cyril Goutte, and Jean-Michel Renders.
Word-sequence kernels.
Journal of Machine Learning Research, 3:1059–1082, 2003.
URL http://jmlr.csail.mit.edu/papers/v3/cancedda03a.html.
Andrew Carlson, Justin Betteridge, Richard C. Wang, Estevam R. Hruschka Jr., and Tom M. Mitchell.
Coupled semi-supervised learning for information extraction.
In Proc. Third ACM International Conference on Web Search and Data Mining (WSDM 2010), 2010.
Learning Semantic Relations from Text
72 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Bibliography V
Joseph B. Casagrande and Kenneth Hale.
Semantic relationships in Papago folk-definition.
In Dell H. Hymes and William E. Bittleolo, editors, Studies in southwestern ethnolinguistics, pages 165–193.
Mouton, The Hague and Paris, 1967.
Roger Chaffin and Douglas J. Herrmann.
The similarity and diversity of semantic relations.
Memory & Cognition, 12(2):134–141, 1984.
Eugene Charniak.
Toward a model of children’s story comprehension.
Technical Report AITR-266 (hdl.handle.net/1721.1/6892), Massachusetts Institute of Technology, 1972.
Martin S. Chodorow, Roy Byrd, and George Heidorn.
Extracting semantic hierarchies from a large on-line dictionary.
In Proc. 23th Annual Meeting of the Association for Computational Linguistics, Chicago, IL, USA, pages
299–304, 1985.
Massimiliano Ciaramita, Aldo Gangemi, Esther Ratsch, Jasmin Šarić, and Isabel Rojas.
Unsupervised learning of semantic relations between concepts of a molecular biology ontology.
In Proc. 19th International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, pages 659–664,
2005.
Learning Semantic Relations from Text
73 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Bibliography VI
Michael Collins and Nigel Duffy.
Convolution kernels for natural language.
In Proc. 15th Conference on Neural Information Processing Systems (NIPS-01), Vancouver, Canada, 2001.
URL http://books.nips.cc/papers/files/nips14/AA58.pdf.
M. Craven and J. Kumlien.
Constructing biological knowledge bases by extracting information from text sources.
In Proc. Seventh International Conference on Intelligent Systems for Molecular Biology, pages 77–86, 1999.
Dmitry Davidov and Ari Rappoport.
Classification of semantic relationships between nominals using pattern clusters.
In Proc. 46th Annual Meeting of the Association for Computational Linguistics: Human Language
Technologies, Columbus, OH, USA, pages 227–235, 2008.
Ferdinand de Saussure.
Course in General Linguistics.
Philosophical Library, New York, 1959.
Edited by Charles Bally and Albert Sechehaye. Translated from the French by Wade Baskin.
Claudio Delli Bovi, Luis Espinosa Anke, and Roberto Navigli.
Knowledge base unification via sense embeddings and disambiguation.
In Proc. Conference on Empirical Methods in Natural Language Processing, pages 726–736, 2015.
Cícero Nogueira dos Santos, Bing Xiang, and Bowen Zhou.
Classifying relations by ranking with convolutional neural networks.
In Proceedings of ACL-15, Beijing, China, 2015.
Learning Semantic Relations from Text
74 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Bibliography VII
Doug Downey, Oren Etzioni, and Stephen Soderland.
A probabilistic model of redundancy in information extraction.
In Proc. 9th International Joint Conference on Artificial Intelligence, Edinburgh, UK, pages 1034–1041, 2005.
Doug Downey, Oren Etzioni, and Stephen Soderland.
Analysis of a probabilistic model of redundancy in unsupervised information extraction.
Artificial Intelligence, 174(11):726–748, 2010.
Pamela Downing.
On the creation and use of English noun compounds.
Language, 53(4):810–842, 1977.
Jennifer D’Souza and Vincent Ng.
Sieve-based spatial relation extraction with expanding parse trees.
In Proc. Conference on Empirical Methods in Natural Language Processing, pages 758–768, 2015.
Oren Etzioni, Michael Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland,
Daniel S. Weld, and Alexander Yates.
Unsupervised named-entity extraction from the web: an experimental study.
Artificial Intelligence, 165(1):91–134, June 2005.
ISSN 0004-3702.
Anthony Fader, Stephen Soderland, and Oren Etzioni.
Identifying relations for open information extraction.
In Proc. Conference of Empirical Methods in Natural Language Processing (EMNLP ’11), Edinburgh,
Scotland, UK, July 27-31 2011.
Learning Semantic Relations from Text
75 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Bibliography VIII
Christiane Fellbaum, editor.
WordNet – An Electronic Lexical Database.
MIT Press, 1998.
Timothy Finin.
The semantic interpretation of nominal compounds.
In Proc. 1st National Conference on Artificial Intelligence, Stanford, CA, USA, 1980.
Gottlob Frege.
Begriffschrift.
Louis Nebert, Halle, 1879.
Alberto Garcia-Duran, Antoine Bordes, and Nicolas Usunier.
Composing relationships with translations.
In Proc. Conference on Empirical Methods in Natural Language Processing, pages 286–290, 2015.
Jean Claude Gardin.
SYNTOL.
Graduate School of Library Service, Rutgers, the State University (Rutgers Series on Systems for the
Intellectual Organization of Information, Susan Artandi, ed.), New Brunswick, New Jersey, 1965.
Matt Gardner and Tom Mitchell.
Efficient and expressive knowledge base completion using subgraph feature extraction.
In Proc. Conference on Empirical Methods in Natural Language Processing. Association for Computational
Linguistics, 2015.
Learning Semantic Relations from Text
76 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Bibliography IX
Roxana Girju.
Improving the Interpretation of Noun Phrases with Cross-linguistic Information.
In Proc. 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic,
pages 568–575, 2007.
Roxana Girju, Adriana Badulescu, and Dan Moldovan.
Learning semantic constraints for the automatic discovery of part-whole relations.
In Proc. Human Language Technology Conference of the North American Chapter of the Association for
Computational Linguistics, Edmonton, Alberta, Canada, 2003.
Roxana Girju, Dan Moldovan, Marta Tatu, and Daniel Antohe.
On the semantics of noun compounds.
Computer Speech and Language, 19:479–496, 2005.
Matthew R. Gormley, Mo Yu, and Mark Dredze.
Improved relation extraction with feature-rich compositional embedding models.
In Proc. Conference on Empirical Methods in Natural Language Processing, pages 1774–1784, 2015.
Adam Grycner, Gerhard Weikum, Jay Pujara, James Foulds, and Lise Getoor.
RELLY: Inferring hypernym relationships between relational phrases.
In Proc. Conference on Empirical Methods in Natural Language Processing, pages 971–981, 2015.
Abhijeet Gupta, Gemma Boleda, Marco Baroni, and Sebastian Padó.
Distributional vectors encode referential attributes.
In Proc. Conference on Empirical Methods in Natural Language Processing, pages 12–21, 2015.
Learning Semantic Relations from Text
77 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Bibliography X
Xianpei Han and Le Sun.
Semantic consistency: A local subspace based method for distant supervised relation extraction.
In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2:
Short Papers), pages 718–724, Baltimore, Maryland, June 2014. Association for Computational Linguistics.
URL http://www.aclweb.org/anthology/P14-2117.
Roy Harris.
Reading Saussure: A Critical Commentary on the Cours le Linquistique Generale.
Open Court, La Salle, Ill., 1987.
Kazuma Hashimoto, Pontus Stenetorp, Makoto Miwa, and Yoshimasa Tsuruoka.
Task-oriented learning of word embeddings for semantic relation classification.
arXiv preprint arXiv:1503.00095, 2015.
Marti Hearst.
Automatic acquisition of hyponyms from large text corpora.
In Proc. 15th International Conference on Computational Linguistics, Nantes, France, pages 539–545, 1992.
Johannes Hoffart, Fabian M. Suchanek, Klaus Berberich, and Gerhard Weikum.
YAGO2: A spatially and temporally enhanced knowledge base from wikipedia.
Artif. Intell., 194:28–61, January 2013.
Raphael Hoffmann, Congle Zhang, and Daniel Weld.
Learning 5000 relational extractors.
In Proc. 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, pages
286–295, 2010.
Learning Semantic Relations from Text
78 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Bibliography XI
Nancy Ide, Jean Veronis, Susan Warwick-Armstrong, and Nicoletta Calzolari.
Principles for encoding machine-readable dictionaries.
In Fifth Euralex International Congress, pages 239–246, University of Tampere, Finland, 1992.
Jing Jiang and ChengXiang Zhai.
Instance Weighting for Domain Adaptation in NLP.
In Proc. 45th Annual Meeting of the Association for Computational Linguistics, ACL ’07, pages 264–271,
Prague, Czech Republic, 2007.
URL http://www.aclweb.org/anthology/P07-1034.
Karen Spärck Jones.
Synonymy and Semantic Classification.
PhD thesis, University of Cambridge, 1964.
Su Nam Kim and Timothy Baldwin.
Automatic Interpretation of noun compounds using WordNet::Similarity.
In Proc. 2nd International Joint Conference on Natural Language Processing, Jeju Island, South Korea,
pages 945–956, 2005.
Su Nam Kim and Timothy Baldwin.
Interpreting semantic relations in noun compounds via verb semantics.
In Proc. 21st International Conference on Computational Linguistics and 44th Annual Meeting of the
Association for Computational Linguistics, Sydney, Australia, pages 491–498, 2006.
Learning Semantic Relations from Text
79 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Bibliography XII
Judith L. Klavans, Martin S. Chodorow, and Nina Wacholder.
Building a knowledge base from parsed definitions.
In George Heidorn, Karen Jensen, and Steve Richardson, editors, Natural Language Processing: The
PLNLP Approach. Kluwer, New York, NY, USA, 1992.
Julien Kloetzer, Kentaro Torisawa, Chikara Hashimoto, and Jong-Hoon Oh.
Large-scale acquisition of entailment pattern pairs by exploiting transitivity.
In Proc. Conference on Empirical Methods in Natural Language Processing, pages 1649–1655, 2015.
Bhushan Kotnis, Pradeep Bansal, and Partha P. Talukdar.
Knowledge base inference using bridging entities.
In Proc. Conference on Empirical Methods in Natural Language Processing, pages 2038–2043, 2015.
Zornitsa Kozareva and Eduard Hovy.
A Semi-Supervised Method to Learn and Construct Taxonomies using the Web.
In Proc. 2010 Conference on Empirical Methods in Natural Language Processing, Cambridge, MA, USA,
pages 1110–1118, 2010.
Zornitsa Kozareva, Ellen Riloff, and Eduard Hovy.
Semantic class learning from the Web with hyponym pattern linkage graphs.
In Proc. 46th Annual Meeting of the Association for Computational Linguistics ACL-08: HLT, pages
1048–1056, 2008.
Sebastian Krause, Hong Li, Hans Uszkoreit, and Feiyu Xu.
Large-scale learning of relation-extraction rules with distant supervision from the web.
In Proc. International Conference on The Semantic Web, pages 263–278, 2012.
Learning Semantic Relations from Text
80 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Bibliography XIII
John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira.
Conditional random fields: Probabilistic models for segmenting and labeling sequence data.
In Proc. Eighteenth International Conference on Machine Learning, ICML ’01, pages 282–289, San
Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc.
ISBN 1-55860-778-1.
URL http://dl.acm.org/citation.cfm?id=645530.655813.
Maria Lapata.
The disambiguation of nominalizations.
Computational Linguistics, 28(3):357–388, 2002.
Mirella Lapata and Frank Keller.
The Web as a baseline: Evaluating the performance of unsupervised Web-based models for a range of NLP
tasks.
In Proc. Human Language Technology Conference and Conference on Empirical Methods in Natural
Language Processing, pages 121–128, Boston, USA, 2004.
Mark Lauer.
Designing Statistical Language Learners: Experiments on Noun Compounds.
PhD thesis, Macquarie University, 1995.
Judith N. Levi.
The Syntax and Semantics of Complex Nominals.
Academic Press, New York, 1978.
Learning Semantic Relations from Text
81 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Bibliography XIV
Omer Levy and Yoav Goldberg.
Dependency-based word embeddings.
In Proc. 52nd Annual Meeting of the Association for Computational Linguistics, pages 302–308, 2014a.
Omer Levy and Yoav Goldberg.
Linguistic regularities in sparse and explicit word representations.
In Proc. Conference on Computational Natural Language Learning, pages 171–180, 2014b.
Jiwei Li and Dan Jurafsky.
Do multi-sense embeddings improve natural language understanding?
In Proc. Conference on Empirical Methods in Natural Language Processing, pages 1722–1732, 2015.
Jiwei Li, Thang Luong, Dan Jurafsky, and Eduard Hovy.
When are tree structures necessary for deep learning of representations?
In Proc. Conference on Empirical Methods in Natural Language Processing, pages 2304–2314, 2015.
Yang Li and Peter Clark.
Answering elementary science questions by constructing coherent scenes using background knowledge.
In Proc. Conference on Empirical Methods in Natural Language Processing, pages 2007–2012, 2015.
Dekang Lin.
An information-theoretic definition of similarity.
In Proc. International Conference on Machine Learning, pages 296–304, 1998.
Learning Semantic Relations from Text
82 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Bibliography XV
Dekang Lin and Patrick Pantel.
Discovery of inference rules for question-answering.
Natural Language Engineering, 7(4):343–360, 2001.
ISSN 1351-3249.
Thomas Lin, Mausam, and Oren Etzioni.
Identifying functional relations in web text.
In Proc. 2010 Conference on Empirical Methods in Natural Language Processing, pages 1266–1276,
Cambridge, MA, October 2010.
Yankai Lin, Zhiyuan Liu, Huanbo Luan, Maosong Sun, Siwei Rao, and Song Liu.
Modeling relation paths for representation learning of knowledge bases.
In Proc. Conference on Empirical Methods in Natural Language Processing, pages 705–714, 2015.
Xiao Ling, Peter Clark, and Daniel S. Weld.
Extracting meronyms for a biology knowledge base using distant supervision.
In Proceedings of Automated Knowledge Base Construction (AKBC) 2013: The 3rd Workshop on
Knowledge Extraction at CIKM 2013, San Francisco, CA, October 27-28 2013.
Kangqi Luo, Xusheng Luo, and Kenny Zhu.
Inferring binary relation schemas for open information extraction.
In Proc. Conference on Empirical Methods in Natural Language Processing, pages 555–560, 2015a.
Yuanfei Luo, Quan Wang, Bin Wang, and Li Guo.
Context-dependent knowledge graph embedding.
In Proc. Conference on Empirical Methods in Natural Language Processing, pages 1656–1661, 2015b.
Learning Semantic Relations from Text
83 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Bibliography XVI
Tuan Luu Anh, Jung-jae Kim, and See Kiong Ng.
Incorporating trustiness and collective synonym/contrastive evidence into taxonomy construction.
In Proc. Conference on Empirical Methods in Natural Language Processing, pages 1013–1022, 2015.
Mausam, Michael Schmitz, Robert Bart, Stephen Soderland, and Oren Etzioni.
Open language learning for information extraction.
In Proc. 2012 Conference on Empirical Methods in Natural Language Processing, Jeju Island, Korea, pages
523–534, 2012.
Andrew McCallum and Wei Li.
Early results for Named Entity Recognition with Conditional Random Fields, feature induction and
Web-enhanced lexicons.
In Proc. 7th Conference on Natural Language Learning at HLT-NAACL 2003 – Volume 4, CONLL ’03, pages
188–191, 2003.
doi: 10.3115/1119176.1119206.
URL http://dx.doi.org/10.3115/1119176.1119206.
John McCarthy.
Programs with common sense.
In Proc. Teddington Conference on the Mechanization of Thought Processes, 1958.
Ryan McDonald, Fernando Pereira, Seth Kulik, Scott Winters, Yang Jin, and Pete White.
Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE.
In Proc. 43rd Annual Meeting of the Association for Computational Linguistics (ACL-05), Ann Arbor, MI, 2005.
Learning Semantic Relations from Text
84 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Bibliography XVII
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean.
Distributed representations of words and phrases and their compositionality.
In C.J.C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors, Advances in Neural
Information Processing Systems 26, pages 3111–3119, 2013a.
Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig.
Linguistic regularities in continuous space word representations.
In Proc. Conference of the North American Chapter of the Association for Computational Linguistics: Human
Language Technologies, pages 746–751, Atlanta, Georgia, 2013b.
Bonan Min, Ralph Grishman, Li Wan, Chang Wang, and David Gondek.
Distant supervision for relation extraction with an incomplete knowledge base.
In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, pages 777–782, Atlanta, Georgia, June 2013. Association for
Computational Linguistics.
URL http://www.aclweb.org/anthology/N13-1095.
Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky.
Distant supervision for relation extraction without labeled data.
In Proc. Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference
on Natural Language Processing of the AFNLP: Volume 2, ACL ’09, pages 1003–1011, 2009.
Arindam Mitra and Chitta Baral.
Learning to automatically solve logic grid puzzles.
In Proc. Conference on Empirical Methods in Natural Language Processing, pages 1023–1033, 2015.
Learning Semantic Relations from Text
85 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Bibliography XVIII
Thahir Mohamed, Estevam Hruschka Jr., and Tom Mitchell.
Discovering relations between noun categories.
In Proc. 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, UK, pages
1447–1455, 2011.
Dan Moldovan, Adriana Badulescu, Marta Tatu, Daniel Antohe, and Roxana Girju.
Models for the semantic classification of noun phrases.
In Proc. HLT-NAACL Workshop on Computational Lexical Semantics, pages 60–67. Association for
Computational Linguistic, 2004.
Alessandro Moschitti.
Efficient convolution kernels for dependency and constituent syntactic trees.
Proc. 17th European Conference on Machine Learning (ECML-06), 2006.
URL http://dit.unitn.it/~moschitt/articles/ECML2006.pdf.
Preslav Nakov.
Improved Statistical Machine Translation using monolingual paraphrases.
In Proc. 18th European Conference on Artificial Intelligence, Patras, Greece, pages 338–342, 2008.
Preslav Nakov and Marti Hearst.
UCB: System description for SemEval Task #4.
In Proc. 4th International Workshop on Semantic Evaluations (SemEval-2007), pages 366–369, Prague,
Czech Republic, 2007.
Learning Semantic Relations from Text
86 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Bibliography XIX
Preslav Nakov and Marti Hearst.
Solving relational similarity problems using the Web as a corpus.
In Proc. 6th Annual Meeting of the Association for Computational Linguistics: Human Language
Technologies, Columbus, OH, USA, pages 452–460, 2008.
Preslav Nakov and Zornitsa Kozareva.
Combining relational and attributional similarity for semantic relation classification.
In Proc. International Conference on Recent Advances in Natural Language Processing, Hissar, Bulgaria,
pages 323–330, 2011.
Vivi Nastase and Stan Szpakowicz.
Exploring noun-modifier semantic relations.
In Proc. 6th International Workshop on Computational Semantics, Tilburg, The Netherlands, pages 285–301,
2003.
Roberto Navigli and Simone Paolo Ponzetto.
BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic
network.
Artificial Intelligence, 193:217–250, 2012.
Thien Huu Nguyen and Ralph Grishman.
Employing word representations and regularization for domain adaptation of relation extraction.
In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2:
Short Papers), pages 68–74, Baltimore, Maryland, June 2014. Association for Computational Linguistics.
URL http://www.aclweb.org/anthology/P14-2012.
Learning Semantic Relations from Text
87 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Bibliography XX
Diarmuid Ó Séaghdha and Ann Copestake.
Semantic classification with distributional kernels.
In Proc. 22nd International Conference on Computational Linguistics, pages 649–656, Manchester, UK,
2008.
Marius Paşca.
Organizing and searching the World-Wide Web of facts – step two: harnessing the wisdom of the crowds.
In 16th International World Wide Web Conference, Banff, Canada, pages 101–110, 2007.
Sebastian Padó and Mirella Lapata.
Dependency-based construction of semantic space models.
Computational Linguistics, 33(2):161–199, 2007.
Martha Palmer, Daniel Gildea, and Nianwen Xue.
Semantic Role Labeling.
Synthesis Lectures on Human Language Technologies. Morgan & Claypool, 2010.
Patrick Pantel and Dekang Lin.
Discovering word senses from text.
In Proc. 8th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta,
Canada, pages 613–619, 2002.
Patrick Pantel and Deepak Ravichandran.
Automatically labeling semantic classes.
In Proc. Human Language Technology Conference of the North American Chapter of the Association for
Computational Linguistics, Boston, MA, USA, pages 321–328, 2004.
Learning Semantic Relations from Text
88 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Bibliography XXI
Siddharth Patwardhan and Ellen Riloff.
Effective information extraction with semantic affinity patterns and relevant regions.
In Proc. 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational
Language Learning, Prague, Czech Republic, pages 717–727, 2007.
Charles Sanders Peirce.
Existential graphs (unpublished 1909 manuscript).
In Justus Buchler, editor, The philosophy of Peirce: selected writings. Harcourt, Brace & Co., 1940.
Jeffrey Pennington, Richard Socher, and Christopher Manning.
Glove: Global vectors for word representation.
In Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543,
Doha, Qatar, 2014.
Maria Pershina, Bonan Min, Wei Xu, and Ralph Grishman.
Infusion of labeled data into distant supervision for relation extraction.
In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2:
Short Papers), pages 732–738, Baltimore, Maryland, June 2014. Association for Computational Linguistics.
URL http://www.aclweb.org/anthology/P14-2119.
Fabio Petroni, Luciano Del Corro, and Rainer Gemulla.
CORE: Context-aware open relation extraction with factorization machines.
In Proc. Conference on Empirical Methods in Natural Language Processing, pages 1763–1773, 2015.
Learning Semantic Relations from Text
89 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Bibliography XXII
Barbara Plank and Alessandro Moschitti.
Embedding semantic similarity in tree kernels for domain adaptation of relation extraction.
In Proceedings of ACL-13, Sofia, Bulgaria, 2013.
James Pustejovsky, José M. Castaño, Jason Zhang, M. Kotecki, and B. Cochran.
Robust relational parsing over biomedical literature: Extracting inhibit relations.
In Proc. 7th Pacific Symposium on Biocomputing (PSB-02), Lihue, HI, USA, 2002.
Likun Qiu, Yue Zhang, and Yanan Lu.
Syntactic dependencies and distributed word representations for chinese analogy detection and mining.
In Proc. Conference on Empirical Methods in Natural Language Processing, pages 2441–2450, 2015.
M. Ross Quillian.
A revised design for an understanding machine.
Mechanical Translation, 7:17–29, 1962.
Randolph Quirk, Sidney Greenbaum, Geoffrey Leech, and Jan Svartvik.
A comprehensive grammar of the English language.
Longman, 1985.
Deepak Ravichandran and Eduard Hovy.
Learning surface text patterns for a Question Answering system.
In Proc. 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA< USA, pages
41–47, 2002.
Learning Semantic Relations from Text
90 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Bibliography XXIII
Sebastian Riedel, Limin Yao, and Andrew McCallum.
Modeling relations and their mentions without labeled text.
In Proc. European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD
’10), volume 6232 of Lecture Notes in Computer Science, pages 148–163. Springer, 2010.
Bryan Rink and Sanda Harabagiu.
UTD: Classifying Semantic Relations by Combining Lexical and Semantic Resources.
In Proc. 5th International Workshop on Semantic Evaluation, pages 256–259, Uppsala, Sweden, July 2010.
Association for Computational Linguistics.
URL http://www.aclweb.org/anthology/S10-1057.
Barbara Rosario and Marti Hearst.
Classifying the semantic relations in noun compounds via a domain-specific lexical hierarchy.
In Proc. 2001 Conference on Empirical Methods in Natural Language Processing, Pittsburgh, PA< USA,
pages 82–90, 2001.
Barbara Rosario and Marti Hearst.
The descent of hierarchy, and selection in relational semantics.
In Proc. 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, pages
247–254, 2002.
Minjoon Seo, Hannaneh Hajishirzi, Ali Farhadi, Oren Etzioni, and Clint Malcolm.
Solving geometry problems: Combining text and diagram interpretation.
In Proc. Conference on Empirical Methods in Natural Language Processing, pages 1466–1476, 2015.
Learning Semantic Relations from Text
91 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Bibliography XXIV
Richard Socher, Brody Huval, Christopher D. Manning, and Andrew Y. Ng.
Semantic compositionality through recursive matrix-vector spaces.
In Proc. 2012 Conference on Empirical Methods in Natural Language Processing, Jeju, Korea, 2012.
Vivek Srikumar and Dan Roth.
Modeling semantic relations expressed by prepositions.
Transactions of the ACL, 2013.
Jinsong Su, Deyi Xiong, Biao Zhang, Yang Liu, Junfeng Yao, and Min Zhang.
Bilingual correspondence recursive autoencoder for statistical machine translation.
In Proc. Conference on Empirical Methods in Natural Language Processing, pages 1248–1258, 2015.
Le Sun and Xianpei Han.
A feature-enriched tree kernel for relation extraction.
In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2:
Short Papers), pages 61–67, Baltimore, Maryland, June 2014. Association for Computational Linguistics.
URL http://www.aclweb.org/anthology/P14-2011.
Jun Suzuki, Tsutomu Hirao, Yutaka Sasaki, and Eisaku Maeda.
Hierarchical directed acyclic graph kernel: Methods for structured natural language data.
In Proce. 41st Annual Meeting of the Association for Computational Linguistics (ACL-03), Sapporo, Japan,
2003.
Don R. Swanson.
Two medical literatures that are logically but not bibliographically connected.
JASIS, 38(4):228–233, 1987.
Learning Semantic Relations from Text
92 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Bibliography XXV
Diarmuid Ó Séaghdha.
Designing and Evaluating a Semantic Annotation Scheme for Compound Nouns.
In Proc. 4th Corpus Linguistics Conference (CL-07), Birmingham, UK, 2007.
URL www.cl.cam.ac.uk/~do242/Papers/dos_cl2007.pdf.
Diarmuid Ó Séaghdha and Ann Copestake.
Co-occurrence contexts for noun compound interpretation.
In Proc. ACL Workshop on A Broader Perspective on Multiword Expressions, pages 57–64. Association for
Computational Linguistics, 2007.
Lucien Tesnière.
Éléments de syntaxe structurale.
C. Klincksieck, Paris, 1959.
Kristina Toutanova, Danqi Chen, Patrick Pantel, Hoifung Poon, Pallavi Choudhury, and Michael Gamon.
Representing text for joint embedding of text and knowledge bases.
In Proc. Conference on Empirical Methods in Natural Language Processing, pages 1499–1509, 2015.
Stephen Tratz and Eduard Hovy.
A taxonomy, dataset, and classifier for automatic noun compound interpretation.
In Proc. 48th Annual Meeting of the Association for Computational Linguistics, pages 678–687, Uppsala,
Sweden, 2010.
Learning Semantic Relations from Text
93 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Bibliography XXVI
Peter Turney.
Similarity of semantic relations.
Computational Linguistics, 32(3):379–416, 2006.
Peter Turney and Michael Littman.
Corpus-based learning of analogies and semantic relations.
Machine Learning, 60(1-3):251–278, 2005.
Lucy Vanderwende.
Algorithm for the automatic interpretation of noun sequences.
In Proc. 15th International Conference on Computational Linguistics, Kyoto, Japan, pages 782–788, 1994.
Beatrice Warren.
Semantic Patterns of Noun-Noun Compounds.
In Gothenburg Studies in English 41, Goteburg, Acta Universtatis Gothoburgensis, 1978.
Joseph Weizenbaum.
ELIZA – a computer program for the study of natural language communication between man and machine.
Communications of the ACM, 9(1):36–45, 1966.
Terry Winograd.
Understanding natural language.
Cognitive Psychology, 3(1):1–191, 1972.
Learning Semantic Relations from Text
94 / 97
Wrap-up
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Bibliography XXVII
Fei Wu and Daniel S. Weld.
Autonomously Semantifying Wikipedia.
In Proc. ACM 17th Conference on Information and Knowledge Management (CIKM 2008), Napa Valley, CA,
USA, pages 41–50, 2007.
Fei Wu and Daniel S. Weld.
Open information extraction using Wikipedia.
In Proc. 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, pages
118–127, 2010.
Kun Xu, Yansong Feng, Songfang Huang, and Dongyan Zhao.
Semantic relation classification via convolutional neural networks with simple negative sampling.
In Proc. Conference on Empirical Methods in Natural Language Processing, pages 536–540, 2015a.
Wei Xu, Raphael Hoffmann, Le Zhao, and Ralph Grishman.
Filling knowledge base gaps for distant supervision of relation extraction.
In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short
Papers), pages 665–670, Sofia, Bulgaria, August 2013. Association for Computational Linguistics.
URL http://www.aclweb.org/anthology/P13-2117.
Yan Xu, Lili Mou, Ge Li, Yunchuan Chen, Hao Peng, and Zhi Jin.
Classifying relations via long short term memory networks along shortest dependency paths.
In Proc. Conference on Empirical Methods in Natural Language Processing, pages 1785–1794, 2015b.
Learning Semantic Relations from Text
95 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Bibliography XXVIII
Alexander Yates and Oren Etzioni.
Unsupervised resolution of objects and relations on the Web.
In Proc. Human Language Technologies 2007: The Conference of the North American Chapter of the
Association for Computational Linguistics, Rochester, NY, USA, pages 121–130, 2007.
Mo Yu, Matthew R. Gormley, and Mark Dredze.
Factor-based compositional embedding models.
In The NIPS 2014 Learning Semantics Workshop, 2014.
Mo Yu, Matthew R. Gormley, and Mark Dredze.
Combining word embeddings and feature embeddings for fine-grained relation extraction.
In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, pages 1374–1379, Denver, Colorado, May–June 2015.
Association for Computational Linguistics.
URL http://www.aclweb.org/anthology/N15-1155.
Dmitry Zelenko, Chinatsu Aone, and Anthony Richardella.
Kernel methods for relation extraction.
Journal of Machine Learning Research, 3:1083–1106, 2003.
Daojian Zeng, Kang Liu, Siwei Lai, Guangyou Zhou, and Jun Zhao.
Relation classificaiton via convolutional deep neural network.
In Proceedings of COLING-14, Dublin, Ireland, 2014.
Learning Semantic Relations from Text
96 / 97
Introduction
Semantic Relations
Features
Supervised Methods
Unsupervised Methods
Embeddings
Wrap-up
Bibliography XXIX
Daojian Zeng, Kang Liu, Yubo Chen, and Jun Zhao.
Distant supervision for relation extraction via piecewise convolutional neural networks.
In Proc. Conference on Empirical Methods in Natural Language Processing, pages 1753–1762, 2015.
Guo Dong Zhou, Min Zhang, Dong Hong Ji, and Qiao Ming Zhu.
Tree kernel-based relation extraction with context-sensitive structured parse tree information.
In Proc. 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational
Natural Language Learning (EMNLP-CoNLL-07), pages 728–736, Prague, Czech Republic, 2007.
Guo Dong Zhou, Long Hua Qian, and Qiao Ming Zhu.
Label propagation via bootstrapped support vectors for semantic relation extraction between named entities.
Computer Speech and Language, 23(4):464–478, 2009.
Karl E. Zimmer.
Some General Observations about Nominal Compounds.
Working Papers on Language Universals, Stanford University, 5, 1971.
Learning Semantic Relations from Text
97 / 97