NLP Subject Orientation SH23

Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

Subject Orientation

Course Code: CSDC7013


Course Name: Natural Language Processing
(NLP)

Dr. Mahendra Pawar


1
Drpartment Vision

• To develop a center of excellence in computer engineering and


produce globally competent engineers who contribute towards the
progress of the engineering community and society as a whole.

2
Department Mission

● To provide students with diversified engineering knowledge to work


in a multidisciplinary environment.
● To provide a platform to cultivate research, innovation, and
entrepreneurial skills.
● To produce world-class computer engineering professionals with
moral values and leadership abilities for the sustainable development of
society.

3
4
5
6
Pre-requisite: Theory of Computer Science, System Programming & Compiler Construction

Course Objectives: Course aims to


1. To define natural language processing and to learn how to apply basic algorithms in this field
2. To describe basic concepts and algorithmic description of the main language levels:
morphology, syntax, semantics, and pragmatics & discourse analysis
3. To design and implement various language models and POS tagging techniques.
4. To design and learn NLP applications such as Information Extraction, Question answering,
Machine translation etc.
5. To design and implement applications based on natural language processing.

7
Course Outcomes: Students will be able
1. To describe the field of natural language processing
2. To design language model for word level analysis, syntactic, semantics and pragmatics for text processing.
3. To design various language models and POS tagging techniques.
4. To design, implement and test algorithms for semantic analysis
5. To formulate the discourse segmentation and anaphora resolution
6. To apply NLP techniques to design real world NLP applications.

8
Textbooks:
T1. Daniel Jurafsky, James H. and Martin, Speech and Language Processing, Second
Edition, Prentice Hall, 2008.
T2. Christopher D.Manning and HinrichSchutze, Foundations of Statistical Natural
Language, Processing, MIT Press, 1999.

9
References:
R1. Siddiqui and Tiwary U.S., Natural Language Processing and Information Retrieval, Oxford University Press, 2008.

R2. Daniel M Bikel and ImedZitouni ― Multilingual natural language processing applications: from theory to practice, IBM Press, 2013.

R3. Alexander Clark, Chris Fox, Shalom Lappin ― The Handbook of Computational Linguistics and Natural Language Processing, John
Wiley and Sons, 2012.

R4. Nitin Indurkhya and Fred J. Damerau, ―Handbook of Natural Language Processing, Second Edition, Chapman and Hall/CRC Press,
2010.

R5. Niel J le Roux and SugnetLubbe, A step by step tutorial: An introduction into R application and programming.

R6. Steven Bird, Ewan Klein and Edward Loper, Natural language processing with Python: Analyzing text with the natural language
toolkit, O’Reilly Media, 2009.

10
Digital References or Web:

1. http://www.cse.iitb.ac.in/~cs626-449

2. http://cse24-iiith.virtual-labs.ac.in/#

3. https://nptel.ac.in/courses/106105158

NPTEL Course: Pawan Goyal [Best]

11
Assessment

Internal Assessment

Assessment consists of two class tests of 20 marks each. The first class test is to be conducted when approx. 40%
syllabus is completed and second class test when additional 40% syllabus is completed. Duration of each test shall be
one hour.

End Semester Theory Examination:

1. Question paper will comprise of total six questions.

2. All question carries equal marks

3. Questions will be mixed in nature (for example supposed Q.2 has part (a) from module 3 then part (b) will
be from any module other than module 3)

4. Only Four question need to be solved.

5. In question paper weightage of each module will be proportional to number of respective lecture hours as
mention in the syllabus.
12
No. of Hours, Weightage, Nature of
Questions
Units No. Of Hours Weightage Nature of Questions
I 3 8 Theory
II 9 23 Theory+Problems
III 10 26 Theory+Problems
IV 7 18 Theory+Problems
V 5 13 Theoy
VI 5 13 Theory

Note: Theory Questions:


Dont ask explain and describe onlt. Make a subdivision and ask. Otherwise students write vague answer
Problem/Exercise: We will share exercise on our NLP Whatsapp group

13
Module Detailed Content Content
1 Introduction to NLP (3 Hours)
Origin & History of NLP; Internet, Pawan goyal
Language, Knowledge and Grammar in language processing; NPTEL Course,
Stages in NLP; Books T1 & R1
Ambiguities and its types in English and Indian Regional Languages;
Challenges of NLP;
Applications of NLP

14
Sample Questions
15
Module Detailed Content Content
2 Word Level Analysis (9 Hours)
Basic Terms: Tokenization, Stemming, Lemmatization; T1 & R2, Pawan
Survey of English Morphology, Inflectional Morphology, Derivational Morphology; Goyal NPTEL
Regular expression with types; Course
Morphological Models: Dictionary lookup, finite state morphology;
Morphological parsing with FST (Finite State Transducer);
Lexicon free FST Porter Stemmer algorithm;
Grams and its variation: Bigram, Trigram
Simple (Unsmoothed) N-grams; N-gram Sensitivity to the Training Corpus;
Unknown Words: Open versus closed vocabulary tasks; Evaluating N-grams:
Perplexity; Smoothing: Laplace Smoothing, Good-Turing Discounting;

16
17
18
19
Exercise on steeming by porter stemmer, n-gram, k-gram, laplace smoothing,
good turing, FSA, FST

20
Module Detailed Content Contents
3 Syntax analysis (10 Hours)
Part-Of-Speech tagging(POS); Tag set for English (Upenn Treebank); T1, T2,
Difficulties /Challenges in POS tagging; Pawan
Rule-based, Stochastic and Transformation-based tagging; Goyal
Generative Model: Hidden Markov Model (HMM Viterbi) for POS tagging; NPTEL
Issues in HMM POS tagging; Discriminative Model: Maximum Entropy model, Conditional Course
random Field (CRF); Parsers: Top down and Bottom up;
Modeling constituency; Bottom Up Parser: CYK, PCFG (Probabilistic Context Free
Grammar), Shift Reduce Parser; Top Down Parser: Early Parser, Predictive Parser

21
22
Exercises on
HMM Model : Formation of Emission Probability Matrix, State Transition Matrix,
HMM Viterbi ALgorithm
Exercises on Parser: Bottom Up Parser: CYK, PCFG (Probabilistic Context Free Grammar), Shift Reduce
Parser; Top Down Parser: Early Parser, Predictive Parser
23
24
Module Detailed Content Hours
4 Semantic Analysis 7
Introduction, meaning representation; Lexical Semantics; T1,T2,
NPTEL
Corpus study; Study of Various language dictionaries like wordnet, Babelnet;
by
Relations among lexemes & their senses –Homonymy, Polysemy, Synonymy, Hyponymy; Pawan
Goyal
Semantic Ambiguity; Word Sense Disambiguation (WSD);
Knowledge based approach(Lesk’s Algorithm), Supervised (Naïve Bayes, Decision List),
Introduction to Semi-supervised method (Yarowsky) Unsupervised (Hyperlex)

25
Small Exercise or Think Questions on following topics
Knowledge based approach(Lesk’s Algorithm), Supervised (Naïve Bayes, Decision List),
Introduction to Semi-supervised method (Yarowsky) Unsupervised (Hyperlex)

26
Module Detailed Content Hours
5 Pragmatic & Discourse Analysis 5
Discourse: Reference Resolution, Reference Phenomena, Syntactic & Semantic constratint T1,T2
on coherence;
Anaphora Resolution using Hobbs and Centering Algorithm

27
Module Detailed Content Hours

6 Applications of NLP 5
Case studies on (preferable in regional language): T1, R2, R2
a) Machine translation;
NPTEL: a,b,c,d
b) Text Summarization; Sharavari Madam’s paper:QA
c) Sentiment analysis;
d) Information retrieval;
e) Question Answering system

28
CSDL7013: Natural Language processing Lab

29
Lab Objectives :
1. To understand the key concepts of NLP.
2. To learn various phases of NLP.
3. To design and implement various language models and POS tagging techniques.
4. To understand various NLP Algorithms
5. To learn NLP applications such as Information Extraction, Sentiment Analysis, Question answering,
Machine translation etc.
6. To design and implement applications based on natural language processing

30
Lab Outcomes:
At the end of the course student should be able to:
1. Apply various text processing techniques.
2. Design language model for word level analysis.
3. Model linguistic phenomena with formal grammar.
4. Design, implement and analyze NLP algorithms.
5. To apply NLP techniques to design real world NLP applications such as machine translation,
sentiment analysis, text summarization, information extraction, Question Answering system etc.
6. Implement proper experimental methodology for training and evaluating empirical NLP systems.

31
Suggested List of Experiments: (Select a Case Study(Mini Project)for performingthe experiments)
Sr. No. Name of the Experiment
Study various applications of NLP and Formulate the Problem Statement for Mini Project based on chosen
real world NLP applications: [Machine Translation, Text Categorization, Text summarization, chat Bot,
1
Plagarism, Spelling & Grammar checkers, Sentiment / opinion analysis, Question answering, Personal
Assistant, Tutoring Systems, etc.]
Apply various text preprocessing techniques for any given text : Tokenization and Filtration & Script
2
Validation.
Apply various other text preprocessing techniques for any given text :Stop Word Removal, Lemmatization
3
/ Stemming.
4 Perform morphological analysis and word generation for any given text.

32
Sr. No. Name of the Experiment
5 Implement N-Gram model for the given text input.
6 Study the different POS taggers and Perform POS tagging on the given text.
7 Perform Chunking for the given text input.
8 Implement Named Entity Recognizer for the given text input.
9 Implement Text Similarity Recognizer for the chosen text documents.
10 Exploratory data analysis of a given text (Word Cloud)
Mini Project Report:For any one chosen real world NLP application.Implementation and
Presentation of Mini Project

iNLTK package for indian languages: Hindi, Punjabi, Marathi, Bengali, Sanskrit

Natural Language Geneeration

PyTorch
33
Term Work:
1 Term work should consist of 8 experiments and mini project
2 Case Study/ Mini projectis to be conducted on Indian Languages(Preferably).
3 The final certification and acceptance of term work ensures that satisfactory performance of
laboratory work and minimum passing marks in term work.

34
Thank You

35

You might also like