LDL 2018 Presentation
LDL 2018 Presentation
LDL 2018 Presentation
- A sub-discipline of historical linguistics that is concerned with the development of individual words (and
other lexemes) over time and which attempts to trace their origins as far back as the record supports
- A single such history of a word (or other lexeme). We will focus on this sense in what follows.
Etymologies in the second sense are commonly found in general purpose dictionaries as well as in
more specialist works. The issue of how to properly model this kind of data as linked data is therefore
of some relevance given the current trend towards migrating retrodigitized dictionaries into the RDF
format.
Previous Work
Previous work on explicitly representing etymologies in linked data include proposed extensions of
lemon by (Chiarcos et. al., 2016) and a linked data based etymological wordnet (De Melo, 2014). See
also (Moran and Bruemmer, 2013).
However In the current work we have also been influenced by attempts in other computational lexicon
standards to represent ‘deep’ etymological information.
These include proposals for modelling etymologies in LMF (Salmon-Alt, 2006) and especially
Bowers and Romary’s proposals for a TEI-based encoding of etymologies in (Bowers and Romary,
2016).
An example etymological entry
GIRL, a female child, young woman. (E.) ME. gerle, girle, gyrle, formerly used of
either sex, and signifying either a boy or girl. In Chaucer, C.T. 3767 (A 3769) gerl
is a young woman; but in C.T. 666 (A 664), the pl. girles means young people of
both sexes. In Will. of Palerne, 816, and King Alisander, 2802, it means ‘young
women;’ in P. Plowman, B.i.33, it means ‘boys;’ cf. B. x. 175. Answering to an AS.
form *gyr-el-, Teut. *gur-wil-, a dimin. form from Teut. base *gur-. Cf. NFries. gor,
a girl; Pomeran. goer, a child; O. Low G. gor, a child; see Bremen Wortebuch, ii.
528. Cf. Swiss gurre, gurrli,a depriciatory term for a girl; Sanders, G. Dict. i. 609,
641; also Norw. gorre, a small child (Aasen); Swed. dial. garra, guerre (the same).
Root uncertain. Der. girl-ish, girlish-ly, girl-ish-ness, girl-hood.
An example etymological entry
GIRL, a female child, young woman. (E.) ME. gerle, girle, gyrle, formerly used of
either sex, and signifying either a boy or girl. In Chaucer, C.T. 3767 (A 3769) gerl
is a young woman; but in C.T. 666 (A 664), the pl. girles means young people of
both sexes. In Will. of Palerne, 816, and King Alisander, 2802, it means ‘young
women;’ in P. Plowman, B.i.33, it means ‘boys;’ cf. B. x. 175. Answering to an AS.
form *gyr-el-, Teut. *gur-wil-, a dimin. form from Teut. base *gur-. Cf. NFries. gor,
a girl; Pomeran. goer, a child; O. Low G. gor, a child; see Bremen Wortebuch, ii.
528. Cf. Swissof gurre,
Description the historygurrli,a
and depriciatory term for a girl; Sanders, G. Dict. i. 609,
development of the word
641; also Norw. gorre, a small child (Aasen); Swed. dial. garra, guerre (the same).
Root uncertain. Der. girl-ish, girlish-ly, girl-ish-ness, girl-hood.
Another Example Entry
girl
Three different hypotheses for the origin
of the same word
, whence girlish, derives from ME girle, varr gerle, gurle: o.o.o.: perh of C origin: cf
Ga and Ir caile, EIr cale, a girl; with Anglo-Ir girleen (dim -een), a (young) girl, cf
Ga-Ir cailin (dim -in), a girl. But far more prob, girl is of Gmc origin: Whitehall
postulates the OE etymon *gyrela or *gyrele and adduces Southern E dial girls,
primrose blossoms, and grlopp, a lout, and tentatively LG goere, a young person
(either sex). Ult, perh, related to L puer, puella, with basic idea '(young) growing
thing'.
Another Example Etymology
The English word friar has an interesting history…
Latin frāter brother< Old French frere brother, also member of a religious order of 'brothers'<
Middle English frere, friar<modern English friar
Another Example Etymology
The English word friar has an interesting history…
Latin frāter brother< Old French frere brother, also member of a religious order of 'brothers'<
Middle English frere, friar<modern English friar
We have two kinds of links between the items in the etymology. The salmon pink coloured links are
words that are inherited from an earlier stage of a language or from a parent language, and the blue
link is a borrowing from one language into another.
NB. The ‘<’ symbol is commonly used in etymological sources to mean ‘is derived from’
Etymons and Cognates
It would be useful to distinguish the lexical entries in a lexicon and the words featured in individual
etymologies that do not belong to the languages covered by the lexicon.
E.g., an English language lexicon containing even a minimal amount of etymological information
might potentially contain thousands of French, Greek, and Latin words. An etymology might contain
form, sense, and semantic information for these words. If we encode these words as separate lexical
entries (without distinguishing them) we will end up with an English-French-Greek-Latin lexicon.
An etymon is a source word for another word. That is, it is a direct ancestor of a given word, whereas
a cognate has an ancestor in common with a word. For instance the English word obligation has as
etymons the latin words obligare, ligare, and the reconstructed Proto-Indo-European root *leig-. It has
cognates such as ligament, league in English and obbligare in Italian and obrigado in Portuguese --
but apparently not ありがとう (arigato) which is a false cognate.
We will make both concepts (etymon+cognate) classes in our proposed vocabulary for etymologies.
Etymologies Resemble Family Trees
Etymologies Resemble Family Trees
Etymologies Resemble Family Trees
Neccessity of Including Uncertain Information
Time for Tempura*
Let’s introduce an example which we will model in detail. Since we’re in Japan
let’s look at the English (and Italian) word tempura which refers to a popular
Japanese dish of fried seafood prepared using a light, wheat-flour based batter.
The word comes from the Japanese ‘ 天麩羅’ (tenpura). But this word, like
the dish itself, was borrowed from the Portuguese. However there are two
different etymologies. Tenpora either comes from:
- tempora a Latin word used by Portuguese missionaries to refer to Catholic feast days
in which red meat could not be consumed, or
- tempero a Portuguese noun meaning ‘condiment’ or ‘seasoning’, or the Portuguese
verb temperar meaning ‘to season’.
The example also shows that etymologies can potentially feature a variety of different languages,
scripts, relevant kinds of linguistic information (we need to assume that there exists classes/properties
to model this).
It also shows the utility of modelling etymons/cognates as separate individuals (and not a ‘part of’
each etymology). How many times will we come across the Latin word tempus in etymologies for
English words for instance?
Etymons, Cognates, and Etymologies
Our proposed new vocabulary, lemonEty, an extension of ontolex-lemon, features the classes Etymon
and Cognate. These are both subclasses of Lexical Entry and therefore inherit the properties of that
class while at the same allowing for a distinction to be made with Lexical Entry.
We decided to reify the etymological shifts between different words, because of the usefulness of
attaching different kinds of information to these shifts, and have therefore created a new class called
Etymological Link.
Individuals of this class can be subtyped as ‘Inheritance’ or ‘Borrowing’ (or other). We then define
individuals of the class Etymology as consisting of a series of individuals belonging to Etymological
Link.
lemonEty - the core of the model
The diagram on the right represents the core new
elements in the lemonEty vocabulary.
An initial version of this vocabulary has been published but it will need to be extended/modified in
order e.g., to deal more explicitly with ‘etymologies’ as hypotheses, represent levels of confidence,
along with attestations and references to the secondary literature.
We have seen how etymologies are similar to family trees and these are fairly straightforward to
represent in RDF/OWL (there exists a popular OWL tutorial that uses a family tree as its guiding
example). Representing the dynamic/temporal aspect of etymologies is a little bit trickier...Luckily
I’m out of time!
Thank You!
どうもありがとう
Gramercy
Iċ þē þancie
Obrigado
Merci Beaucoup