DS SEM 8 curriculum
DS SEM 8 curriculum
DS SEM 8 curriculum
Course Prerequisites:
Python, Data Mining
Course Objectives: The course aims
1 To have a strong foundation on text, web and social media analytics.
2 To understand the complexities of extracting the text from different data sources and analysing it.
3 To enable students to solve complex real-world problems using sentiment analysis and Recommendation
systems.
Course Outcomes:
After successful completion of the course, the student will be able to:
1 Extract Information from the text and perform data pre-processing
2 Apply clustering and classification algorithms on textual data and perform prediction.
3 Apply various web mining techniques to perform mining, searching and spamming of web data.
4 Provide solutions to the emerging problems with social media using behaviour analytics and
Recommendation systems.
5 Apply machine learning techniques to perform Sentiment Analysis on data from social media.
Module
Topics Hours.
No.
1.0 Introduction 06
1.1 Introduction to Text Mining: Introduction, Algorithms for Text Mining, Future
Directions
1.2 Information Extraction from Text: Named Entity Recognition, Relation Extraction,
Unsupervised Information Extraction
1.3 Text Representation: tokenization, stemming, stop words, NER, N-gram modelling
188
2.1 Text Clustering: Feature Selection and Transformation Methods, distance based
Clustering Algorithms, Word and Phrase based Clustering, Probabilistic document
Clustering
2.2 Text Classification: Feature Selection, Decision tree Classifiers, Rule-based Classifiers,
Probabilistic based Classifiers, Proximity based Classifiers.
2.3 Text Modelling: Bayesian Networks, Hidden Markovian Models, Markov random
Fields, Conditional Random Fields
Web-Mining:
3.0 05
3.1 Introduction to Web-Mining: Inverted indices and Compression, Latent Semantic
Indexing, Web Search,
3.3 Web Spamming: Content Spamming, Link Spamming, hiding Techniques, and
Combating Spam
6.4 Opinion Spam Detection: Supervised Learning, Abnormal Behaviours, Group Spam
Detection.
Total 48
Textbooks:
1 Daniel Jurafsky and James H. Martin, “Speech and Language Processing,” 3rd edition, 2020
2 Charu. C. Aggarwal, Cheng Xiang Zhai, Mining Text Data, Springer Science and Business Media, 2012.
3 BingLiu, “Web Data Mining-Exploring Hyperlinks, Contents, and Usage Data”, Springer, Second Edition, 2011.
189
4 Reza Zafarani, Mohammad Ali Abbasiand Huan Liu, “Social Media Mining- An Introduction”, Cambridge
University Press, 2014
Assessment:
Internal Assessment: (20)
1 Assessment consists of two class tests of 20 marks each.
2 The first-class test is to be conducted when approx. 40% syllabus is completed and second-class
test when additional 40% syllabus is completed.
3 Duration of each test shall be one hour.
End Semester Theory Examination: (80)
1 Question paper will comprise of total 06 questions, each carrying 20 marks.
2 Question No: 01 will be compulsory and based on the entire syllabus wherein 4 to 5 sub-questions
will be asked.
3 Remaining questions will be mixed in nature and randomly selected from all the modules.
4 Weightage of each module will be proportional to number of respective lecture hours as mentioned
in the syllabus.
5 Total 04 questions need to be solved.
190