Week 7 Introduction

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

1. Elaborate on the process of Text Summarization.

Ans: Text Summarization is one of those applications of Natural Language Processing (NLP) which is bound
to have a huge impact on our lives. With growing digital media and ever-growing publishing – who has the
time to go through entire articles / documents / books to decide whether they are useful or not?
Thankfully – this technology is already here.
Automatic Text Summarization is one of the most challenging and interesting problems in the field of
Natural Language Processing (NLP). It is a process of generating a concise and meaningful summary of text
from multiple text resources such as books, news articles, blog posts, research papers, emails, and tweets.
The demand for automatic text summarization systems is spiking these days thanks to the availability of
large amounts of textual data.
Text summarization can broadly be divided into two categories — Extractive Summarization and
Abstractive Summarization.
1. Extractive Summarization: These methods rely on extracting several parts, such as phrases and
sentences, from a piece of text and stack them together to create a summary. Therefore,
identifying the right sentences for summarization is of utmost importance in an extractive method.
2. Abstractive Summarization: These methods use advanced NLP techniques to generate an entirely
new summary. Some parts of this summary may not even appear in the original text.
TextRank is an extractive and unsupervised text summarization technique. Let’s take a look at the flow of
the TextRank algorithm that we will be following:

 The first step would be to concatenate all the text


contained in the articles
 Then split the text into individual sentences
 In the next step, we will find vector representation
(word embeddings) for each and every sentence
 Similarities between sentence vectors are then
calculated and stored in a matrix
 The similarity matrix is then converted into a graph, with sentences as vertices and similarity scores
as edges, for sentence rank calculation
 Finally, a certain number of top-ranked sentences form the final summary

2. Describe the features of Natural Language Processing.


Ans: Natural Language Processing, or NLP for short, is broadly defined as the automatic manipulation of
natural language, like speech and text, by software. Also, Natural language processing helps computers
communicate with humans in their own language and scales other language-related tasks.
These underlying tasks are often used in higher-level NLP capabilities, such as:

 Content categorization. A linguistic-based document summary, including search and indexing,


content alerts and duplication detection.
 Topic discovery and modelling. Accurately capture the meaning and themes in text collections, and
apply advanced analytics to text, like optimization and forecasting.
 Corpus Analysis. Understand corpus and document structure through output statistics for tasks
such as sampling effectively, preparing data as input for further models and strategizing modelling
approaches.
 Contextual extraction. Automatically pull structured information from text-based sources.
 Sentiment analysis. Identifying the mood or subjective opinions within large amounts of text,
including average sentiment and opinion mining.
 Speech-to-text and text-to-speech conversion. Transforming voice commands into written text,
and vice versa.
 Document summarization. Automatically generating synopses of large bodies of text and detect
represented languages in multi-lingual corpora (documents).
 Machine translation. Automatic translation of text or speech from one language to another.

3. Discuss about What is importance of Natural Language Processing (NLP)?and Why do


we need to do NLP?
Ans: Importance of NLP: The following is a list of some of the most researched tasks in natural language
processing. Some of these tasks have direct real-world applications, while others more commonly serve as
subtasks that are used to aid in solving larger tasks. Though natural language processing tasks are closely
intertwined, they can be subdivided into categories for convenience. A coarse division is given below.
Text and speech processing

 Optical character recognition (OCR)


 Speech recognition
 Speech segmentation
 Text-to-speech
 Word segmentation (Tokenization)
Morphological analysis
 Lemmatization
 Morphological segmentation
 Part-of-speech tagging
 Stemming
Syntactic analysis
 Grammar induction
 Sentence breaking
 Parsing
Lexical semantics
 Distributional semantics
 Named entity recognition (NER)
 Sentiment analysis
 Terminology extraction
 Word-sense disambiguation (WSD)
 Entity linking
And many more.
Why do we need NLP: Natural language processing includes many different techniques for interpreting
human language, ranging from statistical and machine learning methods to rules-based and algorithmic
approaches. We need a broad array of approaches because the text- and voice-based data varies widely, as
do the practical applications.
Basic NLP tasks include tokenization and parsing, lemmatization/stemming, part-of-speech tagging,
language detection and identification of semantic relationships. If you ever diagrammed sentences in grade
school, you’ve done these tasks manually before.
In general terms, NLP tasks break down language into shorter, elemental pieces, try to understand
relationships between the pieces and explore how the pieces work together to create meaning.

You might also like