Towards Spatial Word Embeddings
Paul Mousset, Yoann Pitarch, Lynda Tamine-Lechani
To cite this version:
Paul Mousset, Yoann Pitarch, Lynda Tamine-Lechani. Towards Spatial Word Embeddings. 41st
European Conference on Information Retrieval (ECIR 2019), Apr 2019, Cologne, Germany. pp.53-61.
hal-02494099
HAL Id: hal-02494099
https://hal.archives-ouvertes.fr/hal-02494099
Submitted on 28 Feb 2020
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Open Archive Toulouse Archive Ouverte
OATAO is an open access repository that collects the work of Toulouse
researchers and makes it freely available over the web where possible
This is an author’s version published in:
http://oatao.univ-toulouse.fr/24752
Official URL
DOI : https://doi.org/10.1007/978-3-030-15719-7_7
To cite this version: Mousset, Paul and Pitarch, Yoann and Tamine,
Lynda Towards Spatial Word Embeddings. (2019) In: 41st European
Conference on Information Retrieval (ECIR 2019), 14 April 2019 - 18
April 2019 (Cologne, Germany).
Any correspondence concerning this service should be sent
to the repository administrator:
[email protected]
Towards Spatial Word Embeddings
Paul Mousset1,2(B) , Yoann Pitarch1 , and Lynda Tamine1
1
IRIT, Université de Toulouse, CNRS, Toulouse, France
{paul.mousset,yoann.pitarch,lynda.tamine}@irit.fr
2
Atos Intégration, Toulouse, France
Abstract. Leveraging textual and spatial data provided in spatiotextual objects (eg., tweets), has become increasingly important in realworld applications, favoured by the increasing rate of their availability
these last decades (eg., through smartphones). In this paper, we propose
a spatial retrofitting method of word embeddings that could reveal the
localised similarity of word pairs as well as the diversity of their localised
meanings. Experiments based on the semantic location prediction task
show that our method achieves significant improvement over strong
baselines.
Keywords: Word embeddings
1
· Retrofitting · Spatial
Introduction
The last decades have witnessed an impressive increase of geo-tagged content
known as spatio-textual data or geo-texts. Spatio-textual data includes Places
Of Interest (POI) with textual descriptions, geotagged posts (eg., tweets), geotagged photos with textual tags (eg., Instagram photos) and check-ins from
location-based services (eg., Foursquare). The interplay between text and location provides relevant opportunities for a wide range of applications such as
crisis management [11] and tourism assistance [5]. This prominence gives also
rise to considerable research issues underlying the matching of spatio-textual
objects which is the key step in diverse tasks such as querying geo-texts [24],
location mention [6,9] and semantic location prediction [3, 25]. Existing solutions
for matching spatio-textual objects are mainly based on using a combination of
textual and spatial features either for building scalable object representations
[24] or for designing effective object-object matching models [3,25]. The goal of
our work is to explore the idea of jointly leveraging spatial and textual knowledge to build enhanced representations of textual units (namely words) that
could be used at either object representation and matching levels. The central
thesis of our work is driven by two main intuitions: (1) co-occurrences of word
pairs within spatio-textual objects reveal localised word similarities. For instance
dinosaur and museum are semantically related near a natural history museum,
but less related near an art museum; (2) As a corollary of intuition 1, distinct
https://doi.org/10.1007/978-3-030-15719-7_ 7
meanings of the same word could be conveyed using the spatial word distribution as source of evidence. For instance dinosaur can refer to a prehistoric
animal or to a restaurant chain specifically in New York. Thus, we exploit the
spatial distribution of words to jointly identify semantically related word pairs
as well as localised word meanings. To conceptualise our intuitions, we propose
a retrofitting strategy [7,20] as means of refining pre-trained word embeddings
using spatial knowledge. We empirically validate our research intuitions and then
show the effectiveness of our proposed spatial word embeddings within semantic
location prediction as the downstream task.
2
2.1
Preliminaries
Definitions and Intuitions
Definition 1 (Spatio-textual object). A spatio-textual object o is a geotagged text (eg., a POI with a descriptive text). The geotag is represented
by its coordinates (lat, lon) referring to the geographic location l denoted o.l
(eg., the physical location of a POI). We adopt a word-based vectorial representation of object o including all its textual attributes (eg., POI description)
(o)
(o)
(o)
o = [w1 , . . . , wm ] where each word wi is drawn from a vocabulary W.
Definition 2 (Spatial distance). The spatial distance between spatio-textual
objects oi , oj refers to the geographic distance, under a distance metric, between
locations oi .l and oj .l. The spatial distance between words wi , wj refers to an
aggregated (eg., average) spatial object-object distance over the sets of spatiotextual objects Oi ,Oj they respectively belong to.
Intuition 1. Words that occur in close spatio-textual objects tend to have similar meanings. Basically, the spatially closer the words are, regarding the distance
between their associated objects, the closer are their meanings (eg., intuitively
cup is semantically closer to football in Europe than in the USA).
Intuition 2. Let us consider a localised meaning of a word as being represented
by the set of spatially similar words with respect to intuition 1. A word could
convey different localised meanings depending on the geographical area where it
is spatially dense (eg., football in Europe does not refer to the same sport as in
the USA).
2.2
Problem Definition
Based on intuition 1, we conjecture that spatial signals could contribute to the
building of distributed representations of word vectors. As previously suggested
[7,20], one relevant way is to inject external knowledge into initial learned word
embeddings. However different meanings of the same word are conflated into
a single embedding [10,13]. Thus, from intuition 2, we build for each word
a set of embedding vectors based on its occurrence statistics over the associated spatio-textual objects. Formally, given a set of word vector representations
= {w
n }, where w
i is the k-dimensional embedding vector built for
1, . . . , w
W
target word wi ∈ W, using a standard neural language model (eg., Skip-gram
model [14]), the problem is how to build for each word wi the set of associated
s
s
s
i,1
i,j
i,n
is = {w
,...,w
,...,w
}. Each spatial word
spatial word embeddings w
i
s
i , refers to the localised dis i,j , derived from an initial embedding w
vector w
tributional representation of word wi over a dense spatial area, and ni is the
number of distinct localised meanings of word wi derived from its spatial distribution over the spatio-textual objects Oi it belongs to.
3
Methodology
3.1
Overview
Our algorithm for building the spatial word embeddings is described in Algorithm 1. For each word wi , we first identify the spatio-textual objects Oi it
belongs to. To identify dense spatial areas of word wi , we perform a K-Means
clustering [12]. More formally, for each word wi , we determine ni spatial clusters represented with their respective barycenters Bi = {Bi,1 , · · · , Bi,ni }, where
Bi,j is the j-th barycenter of word wi and ni the optimal number of clusters for
word wi determined using the silhouette analysis [19]. Each barycenter Bi,j can
be seen as a spatial representative of the area that gives rise to a local word
s
. We detail in the
meanings of word wi represented by the distributed vector wi,j
s
based on
following section the key step of building the spatial embedding wi,j
i and considering both spatially
a retrofitting process from word embedding w
+
−
neighbour words Wi,j and distant words Wi,j with respect to barycenter Bi,j .
Algorithm 1. Algorithm for building spatial word embeddings
1
2
3
4
5
= {w
1, . . . ; w
|W| }; Set of
Input: Vocabulary W; Set of word embeddings W
spatio-textual objects O
s
s
s
, . . . , w1,n
, . . . , w|W|,n
}
Output: Set of spatial word embeddings Ws = {w1,1
1
k
for i ∈ {1, .., |W|} do
Oi =ExtractObjects(wi , O)
SpatialClustering(Oi , Bi , ni )
end
repeat
for i ∈ {1, .., |W|} do
for j ∈ {1, .., ni } do
+
Wi,j
=Neighbours(wi , Bi,j )
−
Wi,j =Distant(wi , Bi,j )
−
+
s
) (see Sect. 4.2)
, Wi,j
= Retrofit (w
i , Wi,j
wij
end
end
until Convergence;
3.2
Spatially Constrained Word Embedding
Our objective here is to learn the set of spatial word embeddings Ws . We want
s
(i) to be semantically close (under a distance metthe inferred word vector wi,j
i , (ii) to be semantically close to its sparic) to the associated word embedding w
+
tial neighbour words Wi,j and (iii) to be semantically unrelated to the spatially
−
. Thus, the objective function to be minimised is given by:
distant words Wi,j
Ψ (Ws ) =
|W| ni
i=1 j=1
s
i) + β
α d(wi,j , w
+
wk ∈Wi,j
s
k) + γ
d(wi,j
,w
−
wk ∈Wi,j
s
k )
1 − d(wi,j
,w
where d(wi , wj ) = 1−sim(wi , wj ) is a distance derived from a similarity measure
+
−
(resp. Wi,j
) is the set of words spatially close to (resp.
(eg., cosinus), Wi,j
distant from) the word wi,j , ie., words within (resp. beyond) a radius r+ (resp.
r− ) around its barycenter Bi,j , and α, β, γ ≥ 0 are hyperparameters that control
the relative importance of each term. In our experimental setting, r+ and r− are
set to 100 and 500 meters and α = β = γ = 1.
4
4.1
Evaluation
Experimental Setup
Evaluation Task and Dataset. We consider the semantic location prediction
task [3, 25]. Given the tweet t, the task consists in identifying, if any, the POI p
that the tweet t semantically focuses on (ie., reviews about). Formally, semantic
location identifies a single POI p which is the topmost p∗ ∈ P of a ranked list of
candidate POIs returned by a semantic matching function. We employ a dataset
of English geotagged tweets released by Zhao et al. [25]. The dataset, consists
of 74K POI-related tweets, collected from 09.2010 to 01.2015 in New York (NY)
and Singapore (SG). Using the Foursquare API, we collected 800K POIs located
in NY and SG cities including user-published reviews. The entire dataset consists
of 238,369 distinct words, on which we applied K-Means clustering (see Sect. 3.1).
As result of clustering, we found 630,732 spatial word clusters with around 2.6
s
created per word wi . We notice that 166,139 (69.7%)
local word meaning wi,j
words have only one local meaning.
Baselines, Scenarios and Metrics. We compare our approach with a set
of stat-of-the-art matching baseline models: (1) Dist [4]: the Haversine distance
Tweet-POI; (2) BM25 [18] ; (3) Class [25]: a POI ranking model that combines
spatial distance with a text-based language model. To evaluate the effectiveness
of our approach, we inject the embedding into the Class model as follows: (a)
Class-Match (CM): we compute the cosine similarity of a pair (t,p) instead
of the language model score. (b) Class-Expand (CE): we expand the tweet
with the top likely similar words following the approach proposed by Zamani
and Croft [23]. For the two above scenarios we consider either the traditional or
the spatial word embeddings. Practically, for scenarios using spatial word embed(t)
(p)
dings, we use the closest local word wi (resp. wi ) by minimising the Haversine
distance between tweet (resp. POI) location t.l (resp. p.l) and word barycenters
Bi,j . We exploit two well-known evaluation metrics, namely Acc@k [17] and Mean
Reciprocal Rank (MRR) [2]. Given the semantic location task description, it is
worth to mention that low values of k are particularly considered.
4.2
Analysis of Spatial Driven Word Similarities
To validate the intuitions presented in Sect. 2.1, we first build as shown in Fig. 1,
the heat-map of the similarity values between the embedding vectors of a sample of insightful words where the darker the cell, the more similar the pair of
words. To exhibit the localised meanings of the words, we partition the dataset
in two distinct subsets depending on the city the tweets were emitted from (ie.,
either in NY or SG). For each subset, cosine similarities are then damped by
a spatial factor fs (wi , wj ) which conveys how spatially close are the word wi
dist(Bi ,Bj )−µ
} where
and wj . Formally, fs (wi , wj ) is defined as fs (wi , wj ) = exp{−
σ
dist(Bi , Bj ) is the Haversine distance between the barycenters of wi and wj
and µ (resp. σ) is the average distance (resp. standard deviation) between all
word pairs that describe the POIs located in the city. For simplicity purposes,
we consider one barycenter per word for each subset. The heat-map of these
weighted matrices are shown in Figs. 1b and c for NY and SG respectively. We
can see for instance, that the cell (restaurants, dinosaur ) is darker in Fig. 1b
than in Fig. 1a while the cell is lighter in Fig. 1c than in Fig. 1a for the same
word pair. Generally speaking, there is no objective obvious reason about why
the words restaurants and dinosaur should be related to each other, as outlined
by the similarity of their word embeddings in Fig. 1a. However, some restaurants
in NY are named Dinosaur Bar-B-Que leading to an over-representativeness of
tweets where these two terms co-occur in NY, leading to a local stronger semantic relation within this word pair in NY as revealed by Fig. 1b. This fits with
our intuition 1. Besides, cross-looking at Fig. 1a and its spatial variants Figs. 1b
and c provides some clues on why our intuition 2 is well-founded. Indeed, we
can see that words dinosaur and museum are similar regardless of the location.
By relating this observation with the previous one, we can infer that dinosaur
could refer to both museum and restaurant specifically in NY as revealed by the
strength of its similarity with words such as burger and cheese in Fig. 1b which
is clearly less pronounced in Fig. 1c.
4.3
Effectiveness
Table 1 summarises the effectiveness results obtained based on the semantic location prediction task. We compute relative changes (R-Chg) using the ratio of the
geometric means of the M RR and compute the relative improvements suited
for non aggregated measures for Acc@k. Overall, we can see that the scenarios
involving matching with spatial embeddings (CM-Ws and CE-Ws ) significantly
(a) WE SIM
(b) WE D-SIM (NY)
(c) WE D-SIM (SG)
Fig. 1. Cosine similarities of traditional WE SIM (a), WE SIM damped by word-word
barycenter distances in NY dataset (b) and in SG dataset (c)
overpass all the compared models. For instance, CE-Ws displays better results
in terms of MRR with relative changes ranging between 140.7% and 161.3% compared to Dist, Bm25 and Class models. More precisely, CE-Ws allows a more
effective mapping tweet-POI: more than 48% of the tweets are associated with
the relevant POI based on the top-1 result, against 43% for Dist. In addition,
we can observe that while injecting embeddings (either traditional or spatial)
allows to improve the effectiveness of the Class model, the spatial embeddings
allow the achievement of significant better performance. For instance, the M RR
of the scenario CE significantly increases by 119%. Specifically looking at the
two scenarios involving spatial embeddings, we can notice that CE-Ws improves
These results could
M RR by 128.2% and Acc@1 by 5.05% compared to CM-W.
be explained by the approach used to inject the embeddings. While in CE-Ws ,
spatial embedding vectors are intrinsically used to expand the tweet descrip to build
tion before the matching, they are rather used in the scenario CM-W
tweet and POI embeddings using an Idf weighted average of embeddings which
might generate biases in their representations. This observation clearly shows
the positive impact of the intrinsic use of the spatial embeddings.
5
Related Work
A standard approach for improving traditional word embeddings is to inject
external knowledge, mainly lexical resource constraints, using either an online
or offline approach [14,16]. The online approach exploits external knowledge
during the learning step [8,21,22]. For instance, Yu et al. and Xu et al. [21,22]
propose the RCM model which extends the skip-gram objective function with
semantic relation between word pairs, as provided by a lexical resource, based on
the assumption that related words yield similar contexts. The offline approach,
also called retrofitting, uses external resources outside the learning step [7,15,
20]. For instance, Faruqui et al. [7] propose a method for refining vector space
representations by favouring related words, as provided by a lexical resource (eg.,
Table 1. Effectiveness evaluation. R-Chg: CE-Ws relative changes. R-Imp: CE-Ws
relative improvements. Significant Student’s t-test ∗: p < 0.05.
MRR
Value R-Chg
Dist. based
Dist
Text based
BM25
Text-Dist. based Class
Traditional
CM-W
Embeddings
CE-W
Spatial
Embeddings
Acc@1
Value R-Imp
Acc@5
Value R-Imp
0.514 +140.7 ∗ 0.430 +19.61 ∗ 0.605 +15.45 ∗
0.423 +161.3 ∗ 0.307 +64.68 ∗ 0.668 +4.49 ∗
0.507 +159.9 ∗ 0.401 +25.85 ∗ 0.624 +11.79 ∗
0.521 +128.0 ∗ 0.413 +24.52 ∗ 0.640
0.563 +119.0 ∗ 0.470 +9.41 ∗ 0.659
CM-Ws 0.577 +128.2 ∗ 0.489
−
0.515
CE-Ws 0.604
+5.05 ∗
−
0.675
0.698
+9.06 ∗
+5.94 ∗
+3.36 ∗
−
WordNet, FramNet), to have similar vector representations. To the best of our
knowledge, our work is the first attempt for retrofitting word embeddings using
spatial knowledge. To tackle the meaning conflation deficiency issue of word
embeddings [1,10, 13], the general approach is to jointly learn the words and
their senses. For instance, Iacobacci et al. [10] first disambiguate words using
the Babelfy resource, and then revise the continuous bag of words (CBOW)
objective function to learn both word and sense embeddings.
6
Conclusion
In this paper, we introduced spatial word embeddings as a result of retrofitting
traditional word embeddings. The retrofitting method leverages spatial knowledge toward revealing localised semantic similarities of word pairs, as well
as localised meanings of words. The experimental evaluation shows that our
proposed method successfully refines pre-trained word embeddings and allows
achieving significant results over the semantic location prediction task. As future
work, we plan to evaluate the effectiveness of our proposed spatial word embeddings within other location-sensitive tasks including spatial summarization of
streaming objects such as tweets.
Acknowledgments. This research was supported by IRIT and ATOS Intégration
research program under ANRT CIFRE grant agreement #2016/403.
References
1. Cheng, J., Wang, Z., Wen, J.R., Yan, J., Chen, Z.: Contextual text understanding
in distributional semantic space. In: Proceedings of CIKM 2015, pp. 133–142 (2015)
2. Craswell, N.: Mean reciprocal rank. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of
Database Systems, p. 1703. Springer, Boston (2009). https://doi.org/10.1007/9780-387-39940-9
3. Dalvi, N., Kumar, R., Pang, B., Tomkins, A.: A translation model for matching
reviews to objects. In: Proceedings of CIKM 2009, pp. 167–176 (2009)
4. De Smith, M., Goodchild, M.F.: Geospatial Analysis: A Comprehensive Guide to
Principles, Techniques and Software Tools. Metador (2007)
5. Deveaud, R., Albakour, M.D., Macdonald, C., Ounis, I.: Experiments with a venuecentric model for personalised and time-aware venue suggestion. In: Proceedings
of CIKM 2015, pp. 53–62 (2015)
6. Fang, Y., Chang, M.W.: Entity linking on microblogs with spatial and temporal
signals. Trans. Assoc. Comput. Linguist. 2, 259–272 (2014)
7. Faruqui, M., Dodge, J., Jauhar, S.K., Dyer, C., Hovy, E., Smith, N.A.: Retrofitting
word vectors to semantic lexicons. In: Proceedings of NAACL 2015, pp. 1606–1615
(2015)
8. Glavaš, G., Vulić, I.: Explicit retrofitting of distributional word vectors. In: Proceedings of ACL 2018, pp. 34–45 (2018)
9. Han, J., Sun, A., Cong, G., Zhao, W.X., Ji, Z., Phan, M.C.: Linking fine-grained
locations in user comments. Trans. Knowl. Data Eng. 30(1), 59–72 (2018)
10. Iacobacci, I., Pilehvar, M.T., Navigli, R.: SensEmbed: learning sense embeddings
for word and relational similarity. In: Proceedings of ACL and IJCNLP 2017, pp.
95–105 (2017)
11. Imran, M., Castillo, C., Diaz, F., Vieweg, S.: Processing social media messages in
mass emergency: a survey. ACM Comput. Surv. 47(4), 67:1–67:38 (2015)
12. MacQueen, J., et al.: Some methods for classification and analysis of multivariate
observations. In: Proceedings of BSMSP 1967, pp. 281–297 (1967)
13. Mancini, M., Camacho-Collados, J., Iacobacci, I., Navigli, R.: Embedding words
and senses together via joint knowledge-enhanced training. In: Proceedings of
CoNLL 2017, pp. 100–111 (2017)
14. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS
2013, pp. 3111–3119 (2013)
15. Mrkšić, N., et al.: Counter-fitting word vectors to linguistic constraints. arXiv
preprint (2016)
16. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of EMNLP 2014, pp. 1532–1543 (2014)
17. Powers, D.M.: Evaluation: from precision, recall and f-measure to ROC, informedness, markedness & correlation. J. Mach. Learn. Technol. 2, 37–63 (2011)
18. Robertson, S.E., Jones, K.S.: Relevance weighting of search terms. J. Am. Soc. Inf.
Sci. 27(3), 129–146 (1976)
19. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation
of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
20. Vulić, I., Mrkšić, N.: Specialising word vectors for lexical entailment. In: Proceedings of NAACL-HLT 2018, pp. 1134–1145 (2018)
21. Xu, C., et al.: RC-NET: a general framework for incorporating knowledge into
word representations. In: Proceedings of CIKM 2014, pp. 1219–1228 (2014)
22. Yu, M., Dredze, M.: Improving lexical embeddings with semantic knowledge. In:
Proceedings of ACL 2014, pp. 545–550 (2014)
23. Zamani, H., Croft, W.B.: Estimating embedding vectors for queries. In: Proceedings of ICTIR 2016, pp. 123–132 (2016)
24. Zhang, D., Chan, C.Y., Tan, K.L.: Processing spatial keyword query as a top-k
aggregation query. In: Proceedings of SIGIR 2014, pp. 355–364 (2014)
25. Zhao, K., Cong, G., Sun, A.: Annotating points of interest with geo-tagged tweets.
In: Proceedings of CIKM 2016, pp. 417–426 (2016)