Ilak Pos Tagging
Ilak Pos Tagging
Ilak Pos Tagging
tagging
Parts of Speech
Perhaps starting with Aristotle in the West (384–322
BC), there was the idea of having parts of speech
a.k.a lexical categories, word classes, “tags”, POS
It comes from Dionysius Thrax of Alexandria (c. 100 BC)
the idea that is still with us that there are 8 parts of
speech
But actually his 8 aren’t exactly the ones we are taught today
Thrax: noun, verb, article, adverb, preposition, conjunction, participle, pronoun
School grammar: noun, verb, adjective, adverb, preposition, conjunction, pronoun,
interjection {^[A-Za-z0-9._-]+@[[A-Za-z0-9.-]+$}
POS Tagging
POS TAGGING 5
English POS Tagsets
Original Brown corpus used a large set of 87 POS tags.
Most common in NLP today is the Penn Treebank set of 45 tags.
Reduced from the Brown set for use in the context of a parsed corpus (i.e.
treebank).
The C5 tagset used for the British National Corpus (BNC) has
61 tags.
POS TAGGING 6
Why POS
POS tell us a lot about a word (and the words near it).
E.g, adjectives often followed by nouns
personal pronouns often followed by verbs
possessive pronouns by nouns
Pronunciations depends on POS, e.g.
object (first syllable NN, second syllable VM), content, discount
First step in many NLP applications
POS TAGGING 7
Word Classes
Basic word classes: Noun, Verb, Adjective, Adverb, Preposition, …
Open vs. Closed classes
◦ Open:
◦Nouns, Verbs, Adjectives, Adverbs.
◦Why “open”?
◦ Closed:
◦determiners: a, an, the
◦pronouns: she, he, I
◦prepositions: on, under, over, near, by, …
POS TAGGING 8
Closed vs. Open Class
Closed class categories are composed of a small, fixed set of
grammatical function words for a given language.
prepositions: on, under, over, …
particles: up, down, on, off, …
determiners: a, an, the, …
pronouns: she, who, I, ..
conjunctions: and, but, or, …
auxiliary verbs: can, may should, …
POS TAGGING 9
Closed vs. Open Class
Open class categories have large number of words and new ones are
easily invented.
Nouns new nouns: Internet, website, URL, CD-ROM, email, newsgroup,
bitmap, modem, multimedia
New verbs have also : download, upload, reboot, right-click, double-
click,
Verbs (Google),
Adjectives (geeky)
Abverb (chompingly)
POS TAGGING 1
0
English Parts of Speech (Nouns)
Noun (person, place or thing)
Singular (NN): dog, fork
POS TAGGING 1
1
English Parts of Speech (Nouns)
Proper nouns (Penn, Philadelphia, Davidson)
POS TAGGING 13
English Parts of Speech (Adjectives)
Adjective (modify nouns, identify properties or qualities of nouns)
Basic (JJ): red, tall
Comparative (JJR): redder, taller
Superlative (JJS): reddest, tallest
Adjective ordering restrictions in English:
Old blue book, not Blue old book
the 44th president
a green product
a responsible investment
the dumbest, worst leader
POS TAGGING 14
English Parts of Speech (Adverbs)
Adverb (modify verbs)
Basic (RB): quickly
Comparative (RBR): quicker
Superlative (RBS): quickest
Unfortunately, John walked home extremely slowly yesterday
Directional/locative adverbs (here, downhill)
Degree adverbs (extremely, very, somewhat)
Manner adverbs (slowly, slinkily, delicately)
Temporal adverbs (yesterday, tomorrow)
POS TAGGING 15
English Parts of Speech (Determiner)
Is a word that occurs together with a noun or noun phrase and serves to
express the reference of that noun or noun phrase in the context.
That is, a determiner may indicate whether the noun is referring to a
definite or indefinite element of a class, to a closer or more distant
element, to an element belonging to a specified person or thing, to a
particular number or quantity, etc.
POS TAGGING 16
English Parts of Speech(Determiner)
POS TAGGING 17
English Parts of Speech
( preposition)
Preposition (IN): a word governing, and usually preceding, a noun or
pronoun and expressing a relation to another word or element in the
clause, as in ‘the man on the platform’, ‘she arrived after dinner’.
Ex: on, in, by, to, with
POS TAGGING 18
English Parts of Speech
Coordinating Conjunction (CC): that connects words, sentences, phrases or
clauses.
the truth of nature, and the power of giving interest
Ex: and, but, or.
Particle (RP): a particle is a function word that must be associated with
another word or phrase to impart meaning, i.e., does not have its own
lexical definition.
Ex: off (took off), up (put up)
POS TAGGING 19
POS tagging
POS Tagging is a process that attaches each word in a
sentence with a suitable tag from a given set of tags.
Tagging is the assignment of a single part-of-speech tag to
each word (and punctuation marker) in a corpus.
POS TAGGING 20
POS tagging
There are so many parts of speech, potential distinctions we can
draw.
To do POS tagging, we need to choose a standard set of tags to
work with.
Could pick very coarse tag sets.
N , V, Adj, Adv.
POS TAGGING 22
Measuring Ambiguity
POS TAGGING 23
How Hard is POS Tagging?
About 11% of the word types in the Brown corpus are ambiguous
with regard to part of speech
POS TAGGING 24
Penn TreeBank POS Tagset
POS TAGGING 25
Using the Penn Tagset
The/DT grand/JJ jury/NN commmented/VBD on/IN a/DT
number/NN of/IN other/JJ topics/NNS ./.
Prepositions and subordinating conjunctions marked IN
(“although/IN I/PRP..”)
Except the preposition/complementizer “to” is just marked
“TO”.
POS TAGGING 26
Process
List all possible tag for each word in sentence.
Choose best suitable tag sequence.
Example
”People jump high”.
People : Noun/Verb
jump : Noun/Verb
high : Noun/Verb/Adjective
We can start with probabilities.
POS TAGGING 27
How difficult is POS tagging?
JJ
NNP NNS VBD VBN .
Intrinsic flaws remained undetected .
POS TAGGING 32
Step1: Start with a Dictionary
she: PRP
promised: VBN,VB
to: TO D
back: VB, JJ, RB, NN
the: DT
bill: NN, VB
Etc… for the ~100,000 words of English with more than 1 tag
POS TAGGING 33
Step2: Assign Every Possible Tag
NN
VBN RB
PRP VBD J
She promisedJ to back the bill
V
B
POS TAGGING 34
Step3: Write Rules to Eliminate Tags
VBN RB JJ VB
PRP VBD TOVB DT NN
She promised to back the bill
POS TAGGING 35
POS TAGGING 36
END
POS TAGGING 48