0% found this document useful (0 votes)

20 views2 pages

Hadoop Map-Reduce

Uploaded by

This document provides instructions for running a hands-on lab to perform a word count using Hadoop MapReduce on a single node Hadoop instance. It outlines downloading and extracting Hadoop, verifying the installation, downloading sample input text, running the word count MapReduce job, and viewing the output. It also includes an example of performing a word count on a smaller sample file.

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

Hadoop Map-Reduce

Uploaded by

samiullah

0% found this document useful (0 votes)

20 views2 pages

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

0% found this document useful (0 votes)

20 views2 pages

Hadoop Map-Reduce

Uploaded by

samiullah

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

You are on page 1/ 2

10/5/23, 4:13 PM about:blank

Hands-on lab on Hadoop Map-Reduce (20 mins)

Objectives

Run a single-node Hadoop instance

Perform a word count using Hadoop Map Reduce.

Set up Single-Node Hadoop

The steps outlined in this lab use the single-node Hadoop Version 3.2.3. Hadoop is most useful when deployed in a fully distributed mode on a large cluster of
networked servers sharing a large volume of data. However, for basic understanding, we will configure Hadoop on a single node.

In this lab, we will run the WordCount example with an input text and see how the content of the input file is processed by WordCount.

1. Start a new terminal

2. Download hadoop-3.2.3.tar.gz to your theia environment by running the following command.

1. 1

1. curl https://dlcdn.apache.org/hadoop/common/hadoop-3.2.3/hadoop-3.2.3.tar.gz --output hadoop-3.2.3.tar.gz

Copied!

3. Extract the tar file in the currently directory.

1. 1

1. tar -xvf hadoop-3.2.3.tar.gz

Copied!

4. Navigate to the hadoop-3.2.3 directory.

1. 1

1. cd hadoop-3.2.3

Copied!

5. Check the hadoop command to see if it is setup. This will display the usage documentation for the hadoop script.
1. 1

1. bin/hadoop

Copied!

6. Run the following command to download data.txt to your current directory.

1. 1

1. curl https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-BD0225EN-SkillsNetwork/labs/data/data.txt --output data.

Copied!

7. Run the Map reduce application for wordcount on data.txt and store the output in /user/root/output

1. 1

1. bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.3.jar wordcount data.txt output

Copied!

This may take some time.

8. Once the word count runs successfully, you can run the following command to see the output file it has generated.

1. 1

1. ls output

Copied!

You should see part-r-00000 with _SUCCESS indicating that the wordcount has been done.

While it is still processing, you may only see ‘_temporary’ listed in the output directory. Wait for a couple of minutes and run the command again till
you see output as shown above.

9. Run the following command to see the word count output.

1. 1

1. cat output/part-r-00000

Copied!

The image below shows how the MapReduce wordcount happens.

about:blank 1/2
10/5/23, 4:13 PM about:blank

Practice Lab
1. Do a word count on a file with the following content.

1. 1
2. 2
3. 3

1. Italy Venice
2. Italy Pizza
3. Pizza Pasta Gelato

Copied!

Click here for a hint on how to get started

- Delete the data.txt file and output folder
1. 1

1. rm data.txt

Copied!

1. 1

1. rm -rf output

Copied!

Click here for hint on how to create a file to wordcount

Create data.txt with the required content. You may either use the file editor.
Click here for solution on how to do word count on the file
Run the following command
1. 1

1. bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.3.jar wordcount data.txt output

Copied!

Click here for sample output

The output will be as below.

Congratulations! You have:

Deployed Hadoop using Docker
Copied data into HDFS
Used MapReduce to do a word count

Tweet and share your achievement!

Author(s)
Lavanya T S

Contributor(s)
Aije Egwaikhide

Changelog
Date Version Changed by Change Description
05-04-2022 1.3 Sourabh Updated Hadoop version
18-01-2022 1.2 Lavanya Changed to single node hadoop
16-07-2021 1.1 Aije Modified multiple areas
11-07-2021 1.0 Lavanya Created lab instructions for Word count using MapReduce

about:blank 2/2

Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
Map Reduce
No ratings yet
Map Reduce
38 pages
Hadoop Classroom Notes
100% (2)
Hadoop Classroom Notes
76 pages
Amazon Web Services (Aws)
100% (1)
Amazon Web Services (Aws)
13 pages
CS702_Big_Data_Programs
No ratings yet
CS702_Big_Data_Programs
58 pages
Bigdata Lab
No ratings yet
Bigdata Lab
55 pages
PDC All Labs
100% (1)
PDC All Labs
129 pages
Mapreduce: Simplified Data Processing On Large Clusters
No ratings yet
Mapreduce: Simplified Data Processing On Large Clusters
38 pages
CIA3 Answer
No ratings yet
CIA3 Answer
5 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
32 pages
CS-702 (D) BigData
No ratings yet
CS-702 (D) BigData
61 pages
Notes
No ratings yet
Notes
53 pages
BDA Practicalfile
No ratings yet
BDA Practicalfile
19 pages
Hadoop Single Node Cluster Setup Steps
No ratings yet
Hadoop Single Node Cluster Setup Steps
7 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
Assignment 10
No ratings yet
Assignment 10
5 pages
L Hadoop 1 PDF
No ratings yet
L Hadoop 1 PDF
12 pages
Big Data Akshat
No ratings yet
Big Data Akshat
57 pages
BDA Unit-4
No ratings yet
BDA Unit-4
38 pages
Hadoop
No ratings yet
Hadoop
13 pages
BigData Module 2
No ratings yet
BigData Module 2
41 pages
MR YARN - Lab 2 - Cloud - Updated-V2.0
No ratings yet
MR YARN - Lab 2 - Cloud - Updated-V2.0
22 pages
Cloud PDF
No ratings yet
Cloud PDF
47 pages
GC-MapRdeuce
No ratings yet
GC-MapRdeuce
7 pages
BigData Hadoop Online Training by Experts
No ratings yet
BigData Hadoop Online Training by Experts
41 pages
Mapreduce, Hadoop and Amazon Aws: Yasser Ganjisaffar
No ratings yet
Mapreduce, Hadoop and Amazon Aws: Yasser Ganjisaffar
33 pages
Assignment 11 DSBDA
No ratings yet
Assignment 11 DSBDA
4 pages
Word Count using MapReduce on Hadoop
No ratings yet
Word Count using MapReduce on Hadoop
14 pages
100+ Hadoop Interview Questions From Interviews
No ratings yet
100+ Hadoop Interview Questions From Interviews
32 pages
BDA record
No ratings yet
BDA record
58 pages
Big Data Analytics Mid 2
No ratings yet
Big Data Analytics Mid 2
9 pages
BDA Lab Manual_organized (2) (1) - Copy
No ratings yet
BDA Lab Manual_organized (2) (1) - Copy
69 pages
BDA unit-4
No ratings yet
BDA unit-4
38 pages
PRACTICAL 4 - Single and Multi Node Hadoop Install
No ratings yet
PRACTICAL 4 - Single and Multi Node Hadoop Install
11 pages
Best Hadoop Online Training
No ratings yet
Best Hadoop Online Training
41 pages
Big Dataa-Lab-Manual
No ratings yet
Big Dataa-Lab-Manual
24 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
20 pages
Hadoop Interview Questions
No ratings yet
Hadoop Interview Questions
9 pages
C21053 Jay Vijay Karwatkar-Big Data Analytics & Visualization
No ratings yet
C21053 Jay Vijay Karwatkar-Big Data Analytics & Visualization
210 pages
Apache Hive
No ratings yet
Apache Hive
77 pages
Big Data Analytics With Hadoop and Apache Spark
No ratings yet
Big Data Analytics With Hadoop and Apache Spark
17 pages
Big Data Lab Manual and Syllabus
No ratings yet
Big Data Lab Manual and Syllabus
71 pages
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
No ratings yet
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
11 pages
Module-1: Hdfs Basics Running Example Programs and Benchmarks Hadoop Mapreduce Framework Mapreduce Programming
No ratings yet
Module-1: Hdfs Basics Running Example Programs and Benchmarks Hadoop Mapreduce Framework Mapreduce Programming
33 pages
Big Data 1 PDF
No ratings yet
Big Data 1 PDF
17 pages
Exercise 6 PDF
No ratings yet
Exercise 6 PDF
2 pages
HADOOP One Day Crash Course
No ratings yet
HADOOP One Day Crash Course
19 pages
bda-manual
No ratings yet
bda-manual
33 pages
Bda Lab
No ratings yet
Bda Lab
37 pages
Hadoop Admin Interview Questions and Answers
No ratings yet
Hadoop Admin Interview Questions and Answers
9 pages
Big Data File
No ratings yet
Big Data File
16 pages
Interview Questions - Introduction To Hadoop and MapReduce Programming
No ratings yet
Interview Questions - Introduction To Hadoop and MapReduce Programming
4 pages
Tutorial MapReduce
No ratings yet
Tutorial MapReduce
13 pages
L Apachepigdataquery PDF
No ratings yet
L Apachepigdataquery PDF
10 pages
ccs 334 bigdata manual
No ratings yet
ccs 334 bigdata manual
45 pages
New 9
No ratings yet
New 9
3 pages
U-3 Big Data
No ratings yet
U-3 Big Data
23 pages
Cloudera CCD-333
No ratings yet
Cloudera CCD-333
42 pages
Performing Indexing Operation Using Hadoop MapReduce
No ratings yet
Performing Indexing Operation Using Hadoop MapReduce
5 pages
MapReduce Arch
No ratings yet
MapReduce Arch
29 pages
The Mac Terminal Reference and Scripting Primer
From Everand
The Mac Terminal Reference and Scripting Primer
Jay Docherty
4.5/5 (3)
ETLandDataPipelineswithShell AirflowandKafka Badge20231004-28-M3mpri
No ratings yet
ETLandDataPipelineswithShell AirflowandKafka Badge20231004-28-M3mpri
1 page
Coal A1
No ratings yet
Coal A1
2 pages
CSL 210 Lab06 Inheritance
No ratings yet
CSL 210 Lab06 Inheritance
7 pages
14 Exception Handling
No ratings yet
14 Exception Handling
27 pages
Communiction Skills: Chapter # 01
No ratings yet
Communiction Skills: Chapter # 01
7 pages
04-Encapsulation & UML
No ratings yet
04-Encapsulation & UML
37 pages
04-Encapsulation & UML
No ratings yet
04-Encapsulation & UML
37 pages
Sap Ecc 6.0: DMO For Using SUM 2.0 SP 10
No ratings yet
Sap Ecc 6.0: DMO For Using SUM 2.0 SP 10
2 pages
Part - 1 Introduction To
No ratings yet
Part - 1 Introduction To
4 pages
A Beginner&#039 S Guide To Back-End Development
No ratings yet
A Beginner&#039 S Guide To Back-End Development
11 pages
Oracle Cloud Infrastructure Foundations 2020 Associate: Oracle 1z0-1085-20 Dumps Available Here at
No ratings yet
Oracle Cloud Infrastructure Foundations 2020 Associate: Oracle 1z0-1085-20 Dumps Available Here at
5 pages
Unit 1 Cloud Computing Pune University
100% (2)
Unit 1 Cloud Computing Pune University
14 pages
Chapter 2 - Architecture
No ratings yet
Chapter 2 - Architecture
15 pages
PIS-SPOJENO Merged
No ratings yet
PIS-SPOJENO Merged
153 pages
MBSE - Practical Use and Applications
No ratings yet
MBSE - Practical Use and Applications
37 pages
AZ-900 Prepaway Premium Exam 222q
No ratings yet
AZ-900 Prepaway Premium Exam 222q
183 pages
Cloud Intro XLR
No ratings yet
Cloud Intro XLR
11 pages
SAD Slide07 GuidanceForArchitectI
No ratings yet
SAD Slide07 GuidanceForArchitectI
33 pages
Spring Boot Reference
100% (1)
Spring Boot Reference
434 pages
Life Cycle Phases: Unit - Iii
No ratings yet
Life Cycle Phases: Unit - Iii
20 pages
3465 - Software Engineering-II Autumn 2021
No ratings yet
3465 - Software Engineering-II Autumn 2021
4 pages
Billing Management Console
No ratings yet
Billing Management Console
2 pages
Web Api
No ratings yet
Web Api
16 pages
Developing Applications For The Java EE Platform (FJ-310-EE5)
No ratings yet
Developing Applications For The Java EE Platform (FJ-310-EE5)
2 pages
2.1,2.2-Service Models of Cloud Computing
No ratings yet
2.1,2.2-Service Models of Cloud Computing
17 pages
Restful Web Services: Principles, Patterns, Emerging Technologies
No ratings yet
Restful Web Services: Principles, Patterns, Emerging Technologies
21 pages
Building A Python Web Service With Ray
No ratings yet
Building A Python Web Service With Ray
37 pages
Geico. Chevy Chasey, MD Nov 2019 Till Date Azure Devops Architect
No ratings yet
Geico. Chevy Chasey, MD Nov 2019 Till Date Azure Devops Architect
3 pages
20 Spring REST Web Service Interview Questions: What Does REST Stand For?
No ratings yet
20 Spring REST Web Service Interview Questions: What Does REST Stand For?
7 pages
Cloud Computing Answers
No ratings yet
Cloud Computing Answers
3 pages
Unit4 IOT
No ratings yet
Unit4 IOT
84 pages
Udemy C - Cpi - 13
No ratings yet
Udemy C - Cpi - 13
26 pages
Software Communications Architecture: Neli Hayes
No ratings yet
Software Communications Architecture: Neli Hayes
226 pages
"Web Age Speaks!" Webinar Series: API Management
No ratings yet
"Web Age Speaks!" Webinar Series: API Management
39 pages
Lakshmi Prasanna
No ratings yet
Lakshmi Prasanna
6 pages
AWS API Gateway and S3 Integration (Encouraging The Correct Way) - by Sayed Imran - Medium
No ratings yet
AWS API Gateway and S3 Integration (Encouraging The Correct Way) - by Sayed Imran - Medium
26 pages