Chapter 4
Chapter 4
Chapter 4
ARTIFICIAL INTELLIGENCE
Natural Language Processing
INTRODUCTION
Natural Language Processing
• Natural language processing (NLP) is a branch of AI
that gives the machines the ability to read,
understand and derive meaning from human
languages.
• NLP combines the field of linguistics and computer
science to decipher language structure and
guidelines and to make models which can
comprehend break down and separate significant
details from text and speech data
INTRODUCTION
Natural Language Processing
• To make both components work, computers need to process human
language in the form of text or speech data and understand the full
meaning of the words.
• This becomes a problem because:
• Language can be ambiguous.
• The meaning of the word varies depending on the context.
• When solving NLP problems, it is important to consider how AI is going
to learn the meaning of words and understand potential mistakes.
INTRODUCTION
Natural Language Processing
• Several NLP tasks break down human text and speech data in ways that
help the computer make sense of what it's ingesting:
• Speech recognition also called speech-to-text, is the task of reliably
converting voice data into text data.
• Part of speech tagging, also called grammatical tagging, is the process
of determining the part of speech of a particular word or piece of text
based on its use and context.
• Word sense disambiguation is the selection of the meaning of a word
with multiple meanings through a process of semantic analysis that
determine the word that makes the most sense in the given context.
INTRODUCTION
Natural Language Processing
• Several NLP tasks break down human text and voice data in ways that help the
computer make sense of what it's ingesting:
• Named entity recognition identifies words or phrases as useful entities.
• Co-reference resolution is the task of identifying if and when two words
refer to the same entity.
• Sentiment analysis attempts to extract subjective qualities—attitudes,
emotions, sarcasm, confusion, suspicion—from text.
• Natural language generation is sometimes described as the opposite of
speech recognition; it's the task of putting structured information into
human language.
INTRODUCTION
Natural Language Processing use cases
• NLP is the driving force behind machine intelligence in many modern real-world
applications. Here are a few examples:
• Spam detection: best spam detection technologies use NLP's text classification
capabilities to scan emails for language that often indicates spam or phishing.
• Machine translation: Google Translate is an example of widely available NLP
technology at work. A great way to test any machine translation tool is to
translate text to one language and then back to the original.
• Virtual agents and chatbots: Virtual agents such as Apple's Siri and Amazon's
Alexa use speech recognition to recognize patterns in voice commands and
natural language generation to respond with appropriate action or helpful
comments.
INTRODUCTION
Natural Language Processing use cases
• NLP is the driving force behind machine intelligence in many modern real-world
applications. Here are a few examples:
• Social media sentiment analysis: NLP has become an essential business tool
for uncovering hidden data insights from social media channels.
• Text summarization: Text summarization uses NLP techniques to digest huge
volumes of digital text and create summaries and synopses for indexes,
research databases, or busy readers who don't have time to read full text.
INTRODUCTION
Natural Language Processing
• One of the first steps in NLP is to preprocess the inputs (data) so that
the machine can better understand the intended meaning.
• This chapter focuses on data preprocessing for NLP.
DATA PREPROCESSING
London is the capital and most populous city of England and the
United Kingdom.
Standing on the River Thames in the southeast of the island of Great
Britain, London has been a major settlement for two millennia.
It was founded by the Romans, who named it Londinium.
Result of Sentence Segmentation
DATA PREPROCESSING PIPELINE
Tokenization
• Even though, stemmer is simple to use and runs very fast, there
is a danger of over-stemming.
news new
make mak
DATA PREPROCESSING PIPELINE
Lemmatization
• Similar to stemming as it maps several words into one common root.
• Group together different inflated forms of a word called lemma.
• Output of lemmatization is a proper word.
• Lemmatization is more intensive (hence, slower) than stemming, but is
more accurate.
• For example, a lemmatizer should map gone, going -> go
Result of Tagging
DATA PREPROCESSING PIPELINE
Named Entity Tagging
• Locates named entities in a structured text data and classifies entities
into predefined categories.
• Relate the machine to pop culture references and everyday names by
flagging names of movies, important personalities or locations, etc that
may occur in the document.