Ccs334 - Big Data Analytics
Ccs334 - Big Data Analytics
Ccs334 - Big Data Analytics
2023
COURSE OBJECTIVES:
understand the usage of Hadoop related tools for Big Data Analytics
Introduction to big data – convergence of key trends – unstructured data – industry examples of
big data – web analytics – big data applications– big data technologies – introduction to Hadoop
– open source technologies – cloud and big data – mobile business intelligence – Crowd
sourcing analytics – inter and trans firewall analytics.
Introduction to NoSQL – aggregate data models – key-value and document data models –
relationships – graph databases – schemaless databases – materialized views – distribution
models – master-slave replication – consistency - Cassandra – Cassandra data model –
Cassandra examples – Cassandra clients
MapReduce workflows – unit tests with MRUnit – test data and local tests – anatomy of
MapReduce job run – classic Map-reduce – YARN – failures in classic Map-reduce and YARN –
job scheduling – shuffle and sort – task execution – MapReduce types – input formats – output
formats.
Data format – analyzing data with Hadoop – scaling out – Hadoop streaming – Hadoop pipes –
design of Hadoop distributed file system (HDFS) – HDFS concepts – Java interface – data flow
– Hadoop I/O – data integrity – compression – serialization – Avro – file-based data structures -
Cassandra – Hadoop integration.
Hbase – data model and implementations – Hbase clients – Hbase examples – praxis.
Pig – Grunt – pig data model – Pig Latin – developing and testing Pig Latin scripts.
Hive – data types and file formats – HiveQL data definition – HiveQL data manipulation –
HiveQL queries.
30 PERIODS
COURSE OUTCOMES:
CO1:Describe big data and use cases from selected business domains.
CO2:Explain NoSQL big data management.
CO3:Install, configure, and run Hadoop and HDFS.
CO4:Perform map-reduce analytics using Hadoop.
CO5:Use Hadoop-related tools such as HBase, Cassandra, Pig, and Hive for big data analytics.
1. Downloading and installing Hadoop; Understanding different Hadoop modes. Startup scripts,
Configuration files.
2. Hadoop Implementation of file management tasks, such as Adding files and directories,
retrieving files and Deleting files
3. Implement of Matrix Multiplication with Hadoop Map Reduce
4. Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm.
5. Installation of Hive along with practice examples.
7. Installation of HBase, Installing thrift along with Practice examples
8. Practice importing and exporting data from various databases.
Software Requirements:
Cassandra, Hadoop, Java, Pig, Hive and HBase.
TOTAL:60 PERIODS
TEXT BOOKS:
1. Michael Minelli, Michelle Chambers, and AmbigaDhiraj, "Big Data, Big Analytics: Emerging
Business Intelligence and Analytic Trends for Today's Businesses", Wiley, 2013.
2. Eric Sammer, "Hadoop Operations", O'Reilley, 2012.
3. Sadalage, Pramod J. “NoSQL distilled”, 2013
REFERENCES: