NLP Practicals
NLP Practicals
NLP Practicals
NLTK is an essential library that supports tasks such as classification, stemming, tagging, parsing,
semantic reasoning, and tokenization in Python.
It's your primary tool for natural language processing and machine learning. Today it serves as an
educational foundation for Python developers who are dipping their toes in this field (and machine
learning).
2)scikit-learn:
This handy NLP library provides developers with a wide range of algorithms for building machine-
learning models. It offers many functions for the bag-of-words method of creating features to tackle
text classification problems. The strength of this library is the intuitive class methods.
3)Pattern:
Pattern is a Python library designed for web mining, natural language processing, and machine
learning tasks. It provides modules for various text analysis tasks, including part-of-speech tagging,
sentiment analysis, word lemmatization, and language translation.
4) Polyglot Library:
Polyglot is a multilingual NLP library that supports over 130 languages. It offers functionalities for
tasks such as tokenization, named entity recognition, sentiment analysis, language detection, and
translation. Polyglot’s extensive language support makes it suitable for analyzing text data from
diverse sources.
5)FastText:
FastText is a library developed by Facebook AI Research for efficient text classification and word
representation learning. It provides tools for training and utilizing word embeddings and text
classifiers based on neural network architectures.
FastText’s key feature is its ability to handle large text datasets quickly, making it suitable for
applications requiring high-speed processing, such as sentiment analysis, document classification,
and language identification in diverse languages.
Pra 2
import nltk
# Sample text
text = "The quick brown foxes were running quickly through the forest."
# Tokenization
tokens = word_tokenize(text)
print("Tokens:", tokens)
# Stemming
stemmer = PorterStemmer()
print("Stems:", stems)
# Lemmatization
lemmatizer = WordNetLemmatizer()
output:
Tokens: ['The', 'quick', 'brown', 'foxes', 'were', 'running', 'quickly', 'through', 'the', 'forest', '.']
Stems: ['the', 'quick', 'brown', 'fox', 'were', 'run', 'quickli', 'through', 'the', 'forest']
Lemmatized Tokens: ['The', 'quick', 'brown', 'fox', 'were', 'running', 'quickly', 'through', 'the', 'forest',
'.']
Pra 3
import re
text = """
"""
phone_pattern = r'\b(?:\+?\d{1,4}[\s-]?)?\(?\d{2,4}\)?[\s-]?\d{3,4}[\s-]?\d{3,4}\b'
print("Dates:", dates)
print("Emails:", emails)
print("Phones:", phones)
output:
Pra4
import re
text = """
"""
email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
# Print result
print("Emails:", emails)
output:
1) what is toke?
Tokenization, in the realm of Natural Language Processing (NLP) and machine learning, refers to the
process of converting a sequence of text into smaller parts, known as tokens. These tokens can be as
small as characters or as long as words.
One of the core tasks in Natural Language Processing (NLP) is Parts of Speech (PoS) tagging, which is
giving each word in a text a grammatical category, such as nouns, verbs, adjectives, and adverbs.
POS tagging is useful for machine translation, named entity recognition, and information extraction,
among other things. It also works well for clearing out ambiguity in terms with numerous meanings
and revealing a sentence's grammatical structure.
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
sentance ='With Colab you can harness the full power of popular Python libraries to analyze'
token = word_tokenize(sentance)
print(token)
pos_tag = nltk.pos_tag(token)
print(pos_tag)
output:
[nltk_data] /root/nltk_data...
['With', 'Colab', 'you', 'can', 'harness', 'the', 'full', 'power', 'of', 'popular', 'Python', 'libraries', 'to',
'analyze']
[('With', 'IN'), ('Colab', 'NNP'), ('you', 'PRP'), ('can', 'MD'), ('harness', 'VB'), ('the', 'DT'), ('full', 'JJ'),
('power', 'NN'), ('of', 'IN'), ('popular', 'JJ'), ('Python', 'NNP'), ('libraries', 'NNS'), ('to', 'TO'), ('analyze',
'VB')]
import spacy
nlp = spacy.load('en_core_web_sm')
sentance = 'Pattern is a Python library designed for web mining, natural language processing, and
machine learning tasks.'
doc = nlp(sentance)
print(doc)
print(f'{token.text}:{token.pos_}')
output:
Pattern is a Python library designed for web mining, natural language processing, and machine
learning tasks.
Pattern:NOUN
is:AUX
a:DET
Python:PROPN
library:NOUN
designed:VERB
for:ADP
web:NOUN
mining:NOUN
,:PUNCT
natural:ADJ
language:NOUN
processing:NOUN
,:PUNCT
and:CCONJ
machine:NOUN
learning:NOUN
tasks:NOUN
.:PUNCT
Pra 6
import spacy
nlp = spacy.load("en_core_web_sm")
text = """The Gemini API gives you access to Gemini models created by Google DeepMind"""
doc = nlp(text)
print(doc)
output:
The Gemini API gives you access to Gemini models created by Google DeepMind