Intro NLP
Intro NLP
Intro NLP
1.Intro to NLP
1
What is NLP?
Natural Language Processing (NLP) is a field in
Artificial Intelligence (AI) devoted to creating
computers that use natural language as input
and/or output.
2
Why NLP?
3
Early days of NLP:
Machines that Can Speak
4
Machines that Can Speak (cont.)
C3PO KITT
in Star Wars in Knight Rider
5
But Still a Sci-Fi…
6
Test for Intelligence – Turing Test
Alan Turing (1950) proposed a
test of a machine's capability to
perform human-like
conversation.
7
Early Conversational Programs
• ELIZA (by Joseph Weizenbaum), 1966
– A psychotherapist
– No real understanding; simple pattern-matching
to respond to user input -> canned responses
8
>> Hello.
How do you do. Please state your problem.
>> Men are all alike.
In what way ?
>> They're alwas bugging us about something or other.
I'm not sure I understand you fully.
>> They're always bugging us about something or other.
Can you think of a specific example ?
>> Well, my boyfriend made me come here.
Your boyfriend made you come here ?
>> He says I'm depressed much of the time.
I am sorry to hear that you are depressed.
9
Modern NLP:
NLP in the Days of Big Data
Three trends:
1. An enormous amount of information is now
available in machine readable form as natural
language text (newspapers, web pages, medical
records, financial filings, product reviews, discussion
forums, etc.)
2. Conversational agents are becoming an important
form of human-computer communication
3. Much of human-human interaction is now mediated by
computers via social media
12
Text Analytics (cont.)
• Typically this involves the extraction of limited kinds of
semantic and pragmatic information from texts
– Entity mentions
– Concept identification
– Sentiment
• Concept Extraction
– http://aylien.com/concept-extraction/
14
Conversational Agents
• Combine
– Speech recognition/synthesis
– Question answering
• From the web and from structured information sources (freebase,
dbpedia, yago, etc.)
– Simple agent-like abilities
• Create/edit calendar entries
• Reminders
• Directions
• Invoking/interacting with other apps
16
Question Answering
• Traditional information retrieval provides
documents/resources that provide users with what they
need to satisfy their information needs.
• Question answering on the other hand directly provides
an answer to information needs posed as questions.
https://www.youtube.com/watch?v=WFR3lOm_xhE
18
Machine Translation
• The automatic translation of texts between languages is one of the
oldest non-numerical applications in Computer Science.
• In the past 15 years or so, MT has gone from a niche academic
curiosity to a robust commercial industry.
19
Text Mining Applications – Unsupervised
• Text clustering • Trend analysis
20
Text Mining Applications – Supervised
– Many typical predictive modeling or
classification applications can be
enhanced by incorporating textual data in
addition to traditional input variables.
• churning propensity models that include
customer center notes, website forms, e-
mails, and Twitter messages
• hospital admission prediction models
incorporating medical records notes as a
new source of information
• insurance fraud modeling using adjustor
notes
• sentiment categorization (next page)
• stylometry or forensic applications that
identify the author of a particular writing
sample
Sentiment Analysis
• The field of sentiment analysis deals with categorization (or
classification) of opinions expressed in textual documents
Green color represents positive tone, red color represents negative tone, and
product features and model names are highlighted in blue and brown, respectively.
22
Structured + Text Data in Predictive
Models
• Use of both types of data in building predictive
models.
24
1. Part-Of-Speech (POS) Tagging
• POS tagging is a process of assigning a POS or lexical
class marker to each word in a sentence (and all
sentences in a corpus).
25
2. Named Entity Recognition (NER)
• NER is to process a text and identify named entities in a
sentence
– e.g. “U.N. official Ekeus heads for Baghdad.”
26
3. Shallow Parsing
• Shallow (or Partial) parsing identifies the (base) syntactic phases in
a sentence.
28
Source: J. Choi, CSE842, MSU
Bridgestone Sports Co. said Friday it had set up a joint venture in Taiwan with a
local concern and a Japanese trading house to produce golf clubs to be supplied
to Japan.
The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new
Taiwan dollars, will start production in January 1990 with production of 20,000
iron and “metal wood” clubs a month.
template filling
TIE-UP-1 ACTIVITY-1
Relationship: TIE-UP Activity: PRODUCTION
Entities: “Bridgestone Sport Co.” Company:
“a local concern” “Bridgestone Sports Taiwan Co.”
“a Japanese trading house” Product:
Joint Venture Company: “iron and ‘metal wood’ clubs”
“Bridgestone Sports Taiwan Co.” Start Date:
Activity: ACTIVITY-1 DURING: January 1990
Amount: NT$200000000
29
But NLP very is hard..
• Understanding natural languages is hard …
because of inherent ambiguity
• Engineering NLP systems is also hard …
because of:
– Huge amount of data resources needed (e.g.
grammar, dictionary, documents to extract
statistics from)
– Computational complexity (intractable) of
analyzing a sentence
30
Ambiguity (1)
35
The Bottom Line
• Complete NL Understanding (thus general
intelligence) is impossible.
• But we can make incremental progress.
• Also we have made successes in limited domains.
36
The Big Picture Approach
All of these applications operate by exploiting
underlying regularities in human languages.
Sometimes in complex ways, sometimes in pretty trivial
ways.
39
Topics: Techniques
• Finite-state methods
• Context-free methods Supervised machine
• Probabilistic models learning methods
• Neural network models
Semantic
Semantic
Interpretation
Semantic
Interpretation
Semantic
Interpretation
Semantic
Syntactic Interpretation
Semantic
Syntactic Interpretation
Semantic
Analysis
Syntactic Interpretation
Semantic
Analysis
Syntactic Interpretation
Semantic
Morphological Analysis
Syntactic Interpretation
Semantic
Analysis
Syntactic Interpretation
Semantic
Processing Analysis
Syntactic Interpretation
Semantic
Analysis Interpretation
Semantic
Analysis Interpretation
Semantic
Interpretation
Semantic
Interpretation
Semantic
Interpretation
Semantic
Interpretation
Semantic
Interpretation
Interpretation