NLP Practicals

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Pra 1

1)Natural Language Toolkit:

NLTK is an essential library that supports tasks such as classification, stemming, tagging, parsing,
semantic reasoning, and tokenization in Python.

It's your primary tool for natural language processing and machine learning. Today it serves as an
educational foundation for Python developers who are dipping their toes in this field (and machine
learning).

2)scikit-learn:

This handy NLP library provides developers with a wide range of algorithms for building machine-
learning models. It offers many functions for the bag-of-words method of creating features to tackle
text classification problems. The strength of this library is the intuitive class methods.

3)Pattern:

Pattern is a Python library designed for web mining, natural language processing, and machine
learning tasks. It provides modules for various text analysis tasks, including part-of-speech tagging,
sentiment analysis, word lemmatization, and language translation.

4) Polyglot Library:

Polyglot is a multilingual NLP library that supports over 130 languages. It offers functionalities for
tasks such as tokenization, named entity recognition, sentiment analysis, language detection, and
translation. Polyglot’s extensive language support makes it suitable for analyzing text data from
diverse sources.

5)FastText:

FastText is a library developed by Facebook AI Research for efficient text classification and word
representation learning. It provides tools for training and utilizing word embeddings and text
classifiers based on neural network architectures.

FastText’s key feature is its ability to handle large text datasets quickly, making it suitable for
applications requiring high-speed processing, such as sentiment analysis, document classification,
and language identification in diverse languages.

Pra 2

import nltk

from nltk.tokenize import word_tokenize

from nltk.stem import PorterStemmer


from nltk.stem import WordNetLemmatizer

from nltk.corpus import stopwords

from nltk.corpus import wordnet

# Sample text

text = "The quick brown foxes were running quickly through the forest."

# Tokenization

tokens = word_tokenize(text)

print("Tokens:", tokens)

# Stemming

stemmer = PorterStemmer()

stems = [stemmer.stem(token) for token in tokens if token.isalpha()]

print("Stems:", stems)

# Lemmatization

lemmatizer = WordNetLemmatizer()

lemmatized_tokens = [lemmatizer.lemmatize(token) for token in tokens]

print("Lemmatized Tokens:", lemmatized_tokens)

output:

Tokens: ['The', 'quick', 'brown', 'foxes', 'were', 'running', 'quickly', 'through', 'the', 'forest', '.']

Stems: ['the', 'quick', 'brown', 'fox', 'were', 'run', 'quickli', 'through', 'the', 'forest']

Lemmatized Tokens: ['The', 'quick', 'brown', 'fox', 'were', 'running', 'quickly', 'through', 'the', 'forest',
'.']

Pra 3

import re

text = """

Contact us at [email protected] or [email protected].

Call us at +1 800-555-5555 or (123) 456-7890.

You can also reach out via email on [email protected].

Important dates: 12/31/2024, 31-12-2024, 2024-12-31, and December 31, 2024.

"""

# Define regex patterns

date_pattern = r'\b(?:\d{2}/\d{2}/\d{4}|\d{2}-\d{2}-\d{4}|\d{4}-\d{2}-\d{2}|\d{1,2} \w{3,9} \d{4})\b'


email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'

phone_pattern = r'\b(?:\+?\d{1,4}[\s-]?)?\(?\d{2,4}\)?[\s-]?\d{3,4}[\s-]?\d{3,4}\b'

# Extract data using regex

dates = re.findall(date_pattern, text)

emails = re.findall(email_pattern, text)

phones = re.findall(phone_pattern, text)

# Print extracted data

print("Dates:", dates)

print("Emails:", emails)

print("Phones:", phones)

output:

Dates: ['12/31/2024', '31-12-2024', '2024-12-31']

Emails: ['[email protected]', '[email protected]', '[email protected]']

Phones: ['1 800-555-5555', '123) 456-7890']

Pra4

import re

text = """

Contact us at [email protected] or [email protected].

You can also reach out via email on [email protected].

"""

# Case-insensitive pattern for email addresses

email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'

# Find email addresses (case-insensitive)

emails = re.findall(email_pattern, text, re.IGNORECASE)

# Print result

print("Emails:", emails)

output:

Emails: ['[email protected]', '[email protected]', '[email protected]']


Pra 5

1) what is toke?

Tokenization, in the realm of Natural Language Processing (NLP) and machine learning, refers to the
process of converting a sequence of text into smaller parts, known as tokens. These tokens can be as
small as characters or as long as words.

2) what is POS tagging?

One of the core tasks in Natural Language Processing (NLP) is Parts of Speech (PoS) tagging, which is
giving each word in a text a grammatical category, such as nouns, verbs, adjectives, and adverbs.

3)why we use POS tagging?

POS tagging is useful for machine translation, named entity recognition, and information extraction,
among other things. It also works well for clearing out ambiguity in terms with numerous meanings
and revealing a sentence's grammatical structure.

import nltk

from nltk.tokenize import word_tokenize

nltk.download('punkt')

nltk.download('averaged_perceptron_tagger')

sentance ='With Colab you can harness the full power of popular Python libraries to analyze'

token = word_tokenize(sentance)

print(token)

pos_tag = nltk.pos_tag(token)

print(pos_tag)

output:

[nltk_data] Downloading package punkt to /root/nltk_data...

[nltk_data] Unzipping tokenizers/punkt.zip.

[nltk_data] Downloading package averaged_perceptron_tagger to

[nltk_data] /root/nltk_data...

[nltk_data] Unzipping taggers/averaged_perceptron_tagger.zip.

['With', 'Colab', 'you', 'can', 'harness', 'the', 'full', 'power', 'of', 'popular', 'Python', 'libraries', 'to',
'analyze']

[('With', 'IN'), ('Colab', 'NNP'), ('you', 'PRP'), ('can', 'MD'), ('harness', 'VB'), ('the', 'DT'), ('full', 'JJ'),
('power', 'NN'), ('of', 'IN'), ('popular', 'JJ'), ('Python', 'NNP'), ('libraries', 'NNS'), ('to', 'TO'), ('analyze',
'VB')]
import spacy

nlp = spacy.load('en_core_web_sm')

sentance = 'Pattern is a Python library designed for web mining, natural language processing, and
machine learning tasks.'

doc = nlp(sentance)

print(doc)

for token in doc:

print(f'{token.text}:{token.pos_}')

output:

Pattern is a Python library designed for web mining, natural language processing, and machine
learning tasks.

Pattern:NOUN

is:AUX

a:DET

Python:PROPN

library:NOUN

designed:VERB

for:ADP

web:NOUN

mining:NOUN

,:PUNCT

natural:ADJ

language:NOUN

processing:NOUN

,:PUNCT

and:CCONJ

machine:NOUN

learning:NOUN

tasks:NOUN

.:PUNCT
Pra 6

import spacy

nlp = spacy.load("en_core_web_sm")

text = """The Gemini API gives you access to Gemini models created by Google DeepMind"""

doc = nlp(text)

print(doc)

for ent in doc.ents:

print(f"Entity: {ent.text}, Label: {ent.label_}")

output:

The Gemini API gives you access to Gemini models created by Google DeepMind

Entity: Gemini, Label: GPE

Entity: Google DeepMind, Label: ORG

You might also like