A34 NLP Expt 02
A34 NLP Expt 02
A34 NLP Expt 02
02
A.1 Aim: Perform and analyse an n- gram modelling for three different
corpuses using Virtual Lab.
A.4 Theory:
N-gram is probably the easiest concept to understand in the whole machine learning
space, I guess. An N-gram means a sequence of N words. So for example, “Medium
blog” is a 2-gram (a bigram), “A Medium blog post” is a 4-gram, and “Write on
Medium” is a 3-gram (trigram). Well, that wasn’t very interesting or exciting. True,
but we still have to look at the probability used with n-grams, which is quite
interesting.
USES
Say you have the partial sentence “Please hand over your”.
• Then it is more likely that the next word is going to be “test” or “assignment” or
Example:
book_words %>%
arrange(desc(tf_idf)) %>%
group_by(book) %>%
top_n(15) %>%
ungroup() %>%
geom_col(show.legend = FALSE) +
coord_flip()
PART B
(PARTB:TOBECOMPLETEDBYSTUDENTS)
(Students must submit the soft copy as per following segments within two hours of the practical. The
soft copy must be uploaded on the ERP or emailed to the concerned lab in charge faculties at the end
of the practical in case the there is no ERP access available)
Class:B.E.-A Batch:A-2
Grade:
N-gram refers to a contiguous sequence of NNN items from a given sample of text or
speech. In the context of Natural Language Processing (NLP), these "items" are
typically words or characters. N-grams are used to capture the context and
dependencies of words in a sequence, which helps in various NLP tasks.
1. Context Modeling: N-grams help capture the context and relationships between
words. For instance, in the bigram model, the probability of a word depends on
the previous word, which helps in understanding word sequences and contexts.
2. Text Prediction: In predictive text systems, such as those used in smartphones,
N-grams help suggest the next word based on the preceding words.
3. Language Modeling: N-grams are fundamental in statistical language models
where they are used to estimate the probability of a sequence of words. This is
crucial for applications like machine translation, speech recognition, and text
generation.
4. Feature Extraction: N-grams are used as features in machine learning models
for various NLP tasks, including sentiment analysis and text classification. 5.
Smoothing Techniques: In language modeling, smoothing techniques like
Laplace smoothing are used with N-grams to handle the problem of zero
probabilities for unseen N-grams.
2. Give an example of application of N-gram used in NLP from a recent
research paper.
Title: “N-gram Based Language Modeling for Code-Mixed Text”
Authors: S. R. Sharma, K. S. Dhillon, and A. S. Bansal
Published in: 2023 Conference on Empirical Methods in Natural Language Processing
(EMNLP)
Summary: This paper explores the use of N-gram models to handle code-mixed text, which
involves mixing multiple languages in a single document or conversation. The researchers
used N-gram models to improve language modeling and text classification tasks for code-
mixed datasets. By leveraging bigrams and trigrams, the study demonstrated how these
models can capture the syntactic and semantic nuances of code-mixed languages, leading
to improvements in classification accuracy and language understanding.
Key Insights:
● Language Switching: The paper highlights how N-gram models can help manage
and predict language switching within code-mixed text.
● Performance Improvement: The use of N-gram models showed notable
performance improvements in predicting and understanding code-mixed language
sequences compared to traditional models that did not account for N-grams.