Big Data Manual Ai

Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

Sri Muthukumaran Institute of Technology

Chikkarayapuram, Near Mangadu, Chennai – 600 069.


Academic Year 2022-2023 / Even Semester

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

CCS334- BIG DATA ANALYTICS LABORATORY

REGULATION 2021

LAB MANUAL
CCS334 BIG DATA ANALYTICS LTPC2023

LIST OF EXPERIMENTS:

1. Downloading and installing Hadoop; Understanding different Hadoop modes. Startup scripts,
Configuration files.
2. Hadoop Implementation of file management tasks, such as Adding files and directories, retrieving files
and Deleting files
3. Implement of Matrix Multiplication with Hadoop Map Reduce
4. Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm.
5. Installation of Hive along with practice examples.
6. Installation of HBase, Installing thrift along with Practice examples
7. Practice importing and exporting data from various databases.
Software Requirements: Cassandra, Hadoop, Java, Pig, Hive and HBase
CONTENT

NAME OF THE EXPERIMENT

1 DOWNLOADING AND INSTALLING HADOOP;


UNDERSTANDING DIFFERENT HADOOP MODES. STARTUP
SCRIPTS, CONFIGURATION FILES
2 HADOOP IMPLEMENTATION OF FILE MANAGEMENT TASKS

3 IMPLEMENTATION OF MATRIX MULTIPLICATION WITH


HADOOP MAP REDUCE

4 WORD COUNT MAP REDUCE PROGRAM USING MAP


REDUCE PARADIGM
INSTALLATION OF HIVE ALONG WITH PRACTICE
5
EXAMPLES
INSTALLATION OF HBASE ALONG WITH PRACTICE
6 EXAMPLES
7 HIVE EXECUTION WITH PRACTICE EXAMPLES

8 IMPORT DATA BETWEEN HDFS AND RDBMS IN SQOOP


DOWNLOADING AND INSTALLING HADOOP; UNDERSTANDING DIFFERENT HADOOP
MODES. STARTUP SCRIPTS,CONFIGURATION FILES.

AIM:
To Download and install Hadoop, Understanding different Hadoop modes, Startup scripts,
Configuration files.

PROCEDURE:
1. Install Java JDK 1.8.0 under “C:\JAVA”
2. Install Hadoop . Extract file Hadoop .
3. Set the path HADOOP_HOME Environment variable on windows 10
4. Set the path JAVA_HOME Environment variable on windows 10.
5. Next we set the Hadoop bin directory path and JAVA bin directory path.
6. Configuring Hadoop
7. Start Hadoop Cluster
8. Access Hadoop Namenode and Resource Manager.
INSTALLATION PROCEDURE:

1. Hardware Requirement
* RAM — Min. 8GB, if you have SSD in your system then 4GB RAM would also work.

* CPU — Min. Quad core, with at least 1.80GHz

2. JRE 1.8 — Offline installer for JRE

3. Java Development Kit — 1.8

4. A Software for Un-Zipping like 7Zip or Win Rar


5. Download Hadoop zip
2. Unzip and Install Hadoop
 After Downloading the Hadoop, we need to Unzip the hadoop-3.5.5.tar.gz file.

 Now we can organize our Hadoop installation, we can create a folder and move the final extracted file
in it.

 Please note while creating folders, DO NOT ADD SPACES IN BETWEEN THE FOLDER NAME

3. Setting Up Environment Variables

Another important step in setting up a work environment is to set your Systems environment variable.
To edit environment variables,
go to Control Panel > System > click on the “Advanced system settings” link
Alternatively, We can Right click on This PC icon and click on Properties and click on the “Advanced
system settings” link
Or, easiest way is to search for Environment Variable in search bar and there you go.

INSTALLATION PROCEDURE: PsuedoDistributed Mode( Locally )

Hadoop Installation:

Steps for Installation

1. Edit the file /home/Hadoop_dev/hadoop2/etc/hadoop/core-site.xml as below:


<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

Note: This change sets the namenode ip and port.


2. Edit the file /home/Hadoop_dev/hadoop2/etc/hadoop/hdfs-site.xml as below:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

Note: This change sets the default replicationcount for blocks used by HDFS.

3. We need to setup password less login so that themaster will be able to do a password-less ssh to start
the daemons on all the slaves.Check if ssh server is running on your host or not:

a. ssh localhost( enter your password and if youare able to login then ssh server is running)

b. In step a. if you are unable to login, then installssh as follows:


sudo apt-get install ssh
c.Setup password less login as below:

i. ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

ii. cat ~/.ssh/id_dsa.pub >> ~/.ssh/


We can run Hadoop jobs locally or on YARN in this mode. In this Post, we will focus on
authorized_keys
4. running thejobs locally.
5. Format the file system. When we format namenode it formats the meta-data related to data-nodes.
By doing that, all the information on the datanodes are lost and they can be reused for newdata:

a. bin/hdfs namenode –format

6. Start the daemons

a. sbin/start-dfs.sh (Starts NameNode andDataNode)

You can check If NameNode has started successfully or not by using the following web
interface: http://0.0.0.0:50070 .

If you are unable tosee this, try to check the logs in the /home/ hadoop_dev/hadoop2/logs folder.
7. You can check whether the daemons are runningor not by issuing Jps command.

8. This finishes the installation of Hadoop in pseudodistributed mode.

9. Let us run the same example we can in theprevious blog post:


i) Create a new directory on the hdfs
bin/hdfs dfs -mkdir –p /user/hadoop_dev
Copy the input files for the program to hdfs:
bin/hdfs dfs -put etc/hadoop input
ii) Run the program:
bin/hadoop jar share/hadoop/mapreduce/ hadoop-mapreduce-examples-2.6.0.jar grep
input output 'dfs[a-z.]+'
iii) View the output on hdfs:
bin/hdfs dfs -cat output/*

10. Stop the daemons when you are done executing the jobs, with the below command:
sbin/stop-dfs.sh

Hadoop Installation – PsuedoDistributed Mode( YARN )

Steps for Installation

1. Edit the file /home/hadoop_dev/hadoop2/etc/hadoop/mapred-site.xml as below:


<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

2. Edit the fie /home/hadoop_dev/hadoop2/etc/hadoop/yarn-site.xml as below:


<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

Note: This particular configuration tells


MapReduce how to do its shuffle. In this case ituses the mapreduce_shuffle.

3. Format the NameNode:


bin/hdfs namenode –format

4. Start the daemons using the command:

sbin/start-yarn.sh
This starts the daemons ResourceManager andNodeManager.

Once this command is run, you can check if ResourceManager is running or not by visiting the
following URL on browser : http://0.0.0.0:8088 . If you are unable to see this, check for the logs in the
directory: /home/hadoop_dev/hadoop2/logs

5. To check whether the services are running, issuea jps command. The following shows all the services
necessary to run YARN on a single server:
$ jps
15933 Jps

15567 ResourceManager
15785 NodeManager

6. Let us run the same example as we ran before:

i) Create a new directory on the hdfs


bin/hdfs dfs -mkdir –p /user/hadoop_dev

Copy the input files for the program to hdfs:


bin/hdfs dfs -put etc/hadoop input
ii) Run the program:
bin/yarn jar share/hadoop/mapreduce/ hadoop-mapreduce-examples-2.6.0.jar grep
input output 'dfs[a-z.]+'
iii) View the output on hdfs:
bin/hdfs dfs -cat output/*

7. Stop the daemons when you are done executingthe jobs, with the below command:
sbin/stop-yarn.sh

This completes the installation part of Hadoop.


OUTPUT:

Result:
Thus the Installing of Hadoop in three operating modes has been successfully completed.
HADOOP IMPLEMENTATION OF FILE MANAGEMENT TASKS, SUCH AS ADDING FILES
AND DIRECTORIES,RETRIEVING FILES AND DELETING FILES
AIM:
To implement file management tasks in Hadoop

a) Adding Files and Directories


b) Retrieving files
c) Deleting Files

PROCEDURE:
1. Adding Files and Directories from HDFS
2. Retrieving files from HDFS
3. Retrieve the files from HDFS
4. Delete the files from HDFS
5. Copy the data from NFS to HDFS
6. Verify the files
SYNTAX AND COMMANDS TO ADD, RETRIEVE AND DELETE DATA FROM HDFS

Step-1: Adding Files and Directories to HDFS

Before you can run Hadoop programs on data stored in HDFS, you‘ll need to put the data into
HDFS first. Let‘s create a directory and put a file in it. HDFS has a default working directory of
/user/$USER, where $USER is your login username. This directory isn‘t automatically created for you,
though, so let‘s create it with the mkdir command. Login with your hadoop user

Firstly, we start those Hadoop service by running this command on terminal :

start-all.sh

For the purpose of illustration, we use chuck. You should substitute your user name in the example
commands.

hadoop fs -mkdir /chuck

hadoop fs -put example.txt /chuck

Step-2 :

Retrieving Files from HDFS

The Hadoop command get copies files from HDFS back to the local filesystem. To retrieve example.txt, we
can run the following command.

hadoop fs –cat /chuck/example.txt


Step-3:
Deleting Files from HDFS
hadoop fs -rm example.txt
● Command for creating a directory in hdfs is “hdfs dfs -mkdir /lendicse”
● Adding directory is done through the command “hdfs dfs –mkdir sanjay_english .
Step-4:
Copying Data from NFS to HDFS
First create set of glossaries as text file.
nano glossary
Put your glossary text in their
Copying from directory command is “hdfs dfs -copyFromLocal /home/hadoop/gloassary
/sanjay_english

● View the file by using the command “hdfs dfs -cat /sanjay_english/glossary”

● Command for listing of items in Hadoop is “hdfs dfs -ls hdfs://localhost:9000/

● Command for Deleting files is “hdfs dfs –rmdir /lendicse

OUTPUT:

RESULT:

Thus the implementation of file management tasks is done successfully


IMPLEMENT OF MATRIX MULTIPLICATION WITH HADOOP MAP REDUCE
AIM:
To Develop a MapReduce program to implement Matrix Multiplication.

Algorithm for Map Function

Map Function – It takes a set of data and converts it into another set of data, where individual elements are
broken down into tuples (Key-Value pair).
a. for each element mij of M do produce (key,value) pairs as ((i,k), (M,j,mij), for k=1,2,3,.. upto the
number of columns of N
b. for each element njk of N do produce (key,value) pairs as ((i,k),(N,j,Njk), for i = 1,2,3,.. Upto the
number of rows of M.
c. return Set of (key,value) pairs that each key (i,k), has list with values (M,j,mij) and (N, j,njk) for
all possible values of j.
Algorithm for Reduce Function

Reduce Function – Takes the output from Map as an input and combines those data tuples into a smaller
set of tuples.
a. for each key (i,k) do
b. sort values begin with M by j in list M sort values begin with N by j in listN multiply mij and njk
for jth value of each list
c. sum up mij x njk return (i,k), Σj=1 mij x njk

PROCEDURE:
1. Download the hadoop jar files.
2. Creating Mapper file for Matrix Multiplication.
3. Compiling the program in particular folder.
4. Running the program in particular folder
In mathematics, matrix multiplication or the matrix product is a binary operation that produces a
matrix from two matrices. The definition is motivated by linear equations and linear transformations on
vectors, which have numerous applications in applied mathematics, physics, and engineering. In more
detail, if A is an n × m matrix and B is an m × p matrix, their matrix product AB is an n × p matrix, in which
the m entries across a row of A are multiplied with the m entries down a column of B and summed to
produce an entry of AB. When two linear transformations are represented by matrices, then the matrix
product represents the composition of the two transformations.

Program:
Download Hadoop Common Jar files :
wget https://goo.gl/G4MyHp -O hadoop-common-3.1.2.jar
Download Hadoop Mapreduce Jar File :
wget https://goo.gl/KT8yfB -O hadoop-mapreduce-client-core-3.1.2.jar

map.java
package com.lendap.hadoop;

import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

public class Map


extends org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, Text, Text> {
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
Configuration conf = context.getConfiguration();
int m = Integer.parseInt(conf.get("m"));
int p = Integer.parseInt(conf.get("p"));
String line = value.toString();
// (M, i, j, Mij);
String[] indicesAndValue = line.split(",");
Text outputKey = new Text();
Text outputValue = new Text();
if (indicesAndValue[0].equals("M")) {
for (int k = 0; k < p; k++) {
outputKey.set(indicesAndValue[1] + "," + k);
// outputKey.set(i,k);
outputValue.set(indicesAndValue[0] + "," + indicesAndValue[2]
+ "," + indicesAndValue[3]);
// outputValue.set(M,j,Mij);
context.write(outputKey, outputValue);
}
} else {
// (N, j, k, Njk);
for (int i = 0; i < m; i++) {
outputKey.set(i + "," + indicesAndValue[2]);
outputValue.set("N," + indicesAndValue[1] + ","
+ indicesAndValue[3]);
context.write(outputKey, outputValue);
}
}
}
}

Reduce.java
package com.lendap.hadoop;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;
import java.util.HashMap;

public class Reduce


extends org.apache.hadoop.mapreduce.Reducer<Text, Text, Text, Text> {
@Override
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
String[] value;
//key=(i,k),
//Values = [(M/N,j,V/W),..]
HashMap<Integer, Float> hashA = new HashMap<Integer, Float>();
HashMap<Integer, Float> hashB = new HashMap<Integer, Float>();
for (Text val : values) {
value = val.toString().split(",");
if (value[0].equals("M")) {
hashA.put(Integer.parseInt(value[1]), Float.parseFloat(value[2]));
} else {
hashB.put(Integer.parseInt(value[1]), Float.parseFloat(value[2]));
}
}
int n = Integer.parseInt(context.getConfiguration().get("n"));
float result = 0.0f;
float m_ij;
float n_jk;
for (int j = 0; j < n; j++) {
m_ij = hashA.containsKey(j) ? hashA.get(j) : 0.0f;
n_jk = hashB.containsKey(j) ? hashB.get(j) : 0.0f;
result += m_ij * n_jk;
}
if (result != 0.0f) {
context.write(null,
new Text(key.toString() + "," + Float.toString(result)));
}
}
}

Matrix multiplication.java
package com.lendap.hadoop;
import org.apache.hadoop.conf.*; import org.apache.hadoop.fs.Path; import
org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class MatrixMultiply {

public static void main(String[] args) throws Exception {


if (args.length != 2) {
System.err.println("Usage: MatrixMultiply <in_dir> <out_dir>");
System.exit(2);
}
Configuration conf = new Configuration();
// M is an m-by-n matrix; N is an n-by-p matrix.
conf.set("m", "1000");
conf.set("n", "100");
conf.set("p", "1000");
@SuppressWarnings("deprecation")
Job job = new Job(conf, "MatrixMultiply");
job.setJarByClass(MatrixMultiply.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);

job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.addInputPath(job, new Path(args[0]));


FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.waitForCompletion(true);
}
}

Compiling the program in particular folder named as operation/


javac -cp hadoop-common-3.1.2.jar:hadoop-mapreduce-client-core-3.1.2.jar:operation/:. -
d operation/ Map.java
javac -cp hadoop-common-3.1.2.jar:hadoop-mapreduce-client-core-3.1.2.jar:operation/:. -
d operation/ Reduce.java
javac -cp hadoop-common-3.1.2.jar:hadoop-mapreduce-client-core-3.1.2.jar:operation/:. -
d operation/ MatrixMultiply.java

Uploading the M, N file which contains the matrix multiplication data to HDFS.
hadoop fs -mkdir Matrix/
hadoop fs -copyFromLocal M Matrix/
hadoop fs -copyFromLocal N Matrix/

Executing the jar file using hadoop command and thus how fetching record fromHDFS and storing
output in HDFS.
hadoop jar MatrixMultiply.jar
Output:

Result:
Thus the MapReduce program to implement Matrix Multiplication has been successfully completed.
WORD COUNT MAP REDUCE PROGRAM USING MAP REDUCE PARADIGM

AIM:
To implement word count map reduce program using map reduce paradigm
ALGORITHM:

1. First Open Eclipse -> then select File -> New -> Java Project ->Name it WordCount -> then Finish.
2. Create Three Java Classes into the project. Name them WCDriver(having the main
function), WCMapper, WCReducer.
3. You have to include two Reference Libraries for that:
Right Click on Project -> then select Build Path-> Click on Configure Build Path
4. You can see the Add External JARs option on the Right Hand Side. Click on it and add the below
mention files.
You can find these files in /usr/lib/
1. /usr/lib/hadoop-0.20-mapreduce/hadoop-core-2.6.0-mr1-cdh5.13.0.jar
2. /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.13.0.jar
PROCEDURE:

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

public static class TokenizerMapper


extends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);


private Text word = new Text();

public void map(Object key, Text value, Context context


) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}

public static class IntSumReducer


extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,


Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}

public static void main(String[] args) throws Exception {


Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

OUTPUT:
Run the application:
$ bin/hadoop jar wc.jar WordCount /user/joe/wordcount/input
/user/joe/wordcount/output
OUTPUT
$ bin/hadoop fs -cat /user/joe/wordcount/output/part-r-00000
Bye 1
Goodbye 1
Hadoop 2
Hello 2
World 2

RESULT:
Thus a basic Word Count Map Reduce program is executed successfully to understand Map Reduce
Paradigm.
INSTALLATION OF HIVE ALONG WITH PRACTICE EXAMPLES

AIM:
Installation of Hive along with practice examples.
Prerequisites
There are some prerequisites to install hive on any machine:

1. Java Installation

2. Hadoop Installation

ALGORITHM:

Step 1:

 Verify Java is installed.

 Open the Terminal and Type the Command.

Step 2:

 Verify Hadoop is installed.

 Open the Terminal and type the command.

Install Hive:
Step 1: Download the tar file.

Step 2: Extract the file.

Step 3: Move apache files to /usr/local/hive directory.

Step 4: Set up the Hive environment by appending the following lines to ~/.bashrc file

Step 5: Execute the bashrc file.

Step 6: Hive Configuration- Edit hive-env.sh file to append this:


Prerequisites

There are some prerequisites to install hive on any machine:

3. Java Installation
4. Hadoop Installation
ALGORITHM:
Step 1:
 Verify Java is installed.
 Open the Terminal and Type the Command.

Java-Version

 If java is installed on the system, it will give you the version or else an error. In my case, Java is
already installed and below is the output of the command.

 In case, Java is not installed in your system. You can visit the below link and download java and
install it.
 http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads- 1880260.html.
Java Installation

1. Extract the downloaded.


2. Move it to “/usr/local/”.
3. Set up PATH and JAVA_HOME variables.

Step 2:

 Verify Hadoop is installed.


 Open the Terminal and type the command.

Hadoop-Version

 If Hadoop is already installed, this command will give you the version or else an error.
 In my case, Hadoop has already installed hence the below output.

 You can now observe I am working with a CDH5 machine.


 If Hadoop is not installed, Download the Hadoop from Apache software foundation.

Hadoop Installation

1. Setup Hadoop

2. Configure Hadoop

Files required to be edited to configure Hadoop are:

 core-site.xml
 hdfs-site.xml
 yarn-site.xml
 mapred-site.xml
3. Setup Namenode using the command:

Hdfs namenode -format

4. Start dfs using the following command:

start -dfs.sh

5. Start yarn using the command:

Start -yarn.sh

Install Hive:

Below the points in respect to Hive Installation

 The first thing we need to do is download the hive release which can be performed by clicking the
link below: https://apachemirror.wuchna.com/hive/

 Above link will give the link from which you have to choose stable-2 highlighted below in yellow:

 After opening stable-2, choose the bin file (highlighted yellow in the screenshot) and right click and
“copy link address”.

Steps to Install Hive

Below are the steps in Hive Installation:

Step 1: Download the tar file.

http://apachemirror.wuchna.com/hive/stable-2/apache-hive-2.3.6-bin.tar.gz0

Step 2: Extract the file.

sudo tar zxvf /Downloads/apache-hive-* -C /usr/local

Step 3: Move apache files to /usr/local/hive directory.

sudo mv /usr/local/apache-hive-* /usr/local/hive

Step 4: Set up the Hive environment by appending the following lines to ~/.bashrc file
export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$HIVE_HOME/bin
export CLASSPATH=$CLASSPATH:/usr/local/Hadoop/lib/*:.
export CLASSPATH=$CLASSPATH:/usr/local/hive/lib/*:.

Step 5: Execute the bashrc file.

$ source ~/.bashrc

Step 6: Hive Configuration- Edit hive-env.sh file to append this:

export HADOOP_HOME=/usr/local/Hadoop

Step 7: Edit using the below commands:

$ cd $HIVE_HOME/conf
$ cp hive-env.sh.template hive-env.sh

 Now to verify the hive is installed or not, use command hive-version.


 Here, hive-version enters the hive shell, which means the hive is installed. However, in my case, it is
the older version hence giving the warning.

OUTPUT :
Set them in HDFS before verifying Hive. Use the following commands:
$ $HADOOP_HOME/bin/hadoop fs -mkdir /tmp
$ $HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse
$ $HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$ $HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse

The following commands are used to verify Hive installation:


$ cd $HIVE_HOME
$ bin/hive
On successful installation of Hive, you get to see the following response:
Logging initialized using configuration in jar:file:/home/hadoop/hive-0.9.0/lib/hive-common-
0.9.0.jar!/hive-log4j.properties
Hive history file=/tmp/hadoop/hive_job_log_hadoop_201312121621_1494929084.txt
………………….
hive>
The following sample command is executed to display all the tables:
hive> show tables;
OK
Time taken: 2.798 seconds
hive>

RESULT:

Thus the Hive was installed and Practiced some examples .

HIVE EXECUTION WITH PRACTICE EXAMPLES

AIM:
To execute Hive Commands in Cloudera
PROCEDURE:
1. To enter hive shell
hive
2. To display databases/ list the databases
Show databases;
3. Create a new database
Create database employeedb;
4. Use a database
Use employeedb;
5. Create table employee ( id bigint, name string, address string);
6. To list tables in a database
tables;
7. Display schema of a hive table
Describe customers;
8. Describe table with extra information
Describe formatted customers;
9. Insert record into hive table
Insert into employee values (11,”Abi”,”Banglore”);
10. Insert multiple record into hive table:
Insert into employee values(22,”Ane”,”Chennai”),(33,”vino”,”salem”),(44,”Diana”,”theni”);
[Insert will trigger a mapreduce job in background]
11. To check the hive warehouse directory in terminal
Hadoop fs –ls /user/hive/warehouse
Hadoop fs –ls /user/hive/warehouse/employeedb.db
12. To check the folder structure of customers table in
employee database from terminal:
Hadoop fs –ls /user/hive/warehouse/employeedb.db/employee
13. Read the contents of customers table from terminal
Hadoop fs –cat /user/hive/warehouse/employeedb.db/employee/*
14. Display table data with where condition:
Select * from employee where address =”Chennai”;
Select name, address from employee where address = “chennai”;
Select name, address from employee where address=”Chennai” and id> 22;
15. To display distinct values
Select DISTINT address from employee;
16. To display records with order by clause
Select name, address from employee order by address;
17. To display no. of records in a table
Select count(*) from employee;
18. To display records with group by clause
Select address, count(*) from employee group by address;
Select address, count(*) as employee_count from employee group by address;
19. Display records using limit clause
Select * from employee limit 1;
. To exit from hive shell
Exit
21. Create a new hive table with <if not exist> statement
Create table if not exists menu_orders(
Id bigint, product_id string, customer_id bigint, quantity int, amount double);
22. Insert a record into orders table
Insert into menu_orders values(11,”pho”,11,3,120);
Insert into menu_orders values(12,”phon”,12,4,130), (13,”phone”,13,5,140);

OUTPUT:
1. Create database

2. Creating a table
3. Display Database

4. Describe Database

RESULT:
Thus the Hive commands are executed successfully in Cloudera.
INSTALLATION OF HBASE ALONG WITH PRACTICE EXAMPLES

AIM:
Installation of HBase along with Practice examples.
ALGORITHM:

Standalone mode installation (No dependency on Hadoop system)

 This is default mode of HBase


 It runs against local file system
 It doesn’t use Hadoop HDFS
 Only HMaster daemon can run
 Not recommended for production environment
 Runs in single JVM
PROGRAM:

1. Connect to hbase

hbase shell
2. To list the tables in hbase

list
3. To check the hbase master and region server stopped or working

sudo service –status-all


sudo service hbase-master restart
sudo service hbase-regionserver restart
4. Exit from hbase and connect back

exit
5. create ‘students’, ‘personal_detials’, ‘contact_details’,’marks’
list
put ‘students’,’student1’,’personal_details:name’,’kiruba’
put ‘students’,’student1’,’personal_details:email’,’[email protected]
6. To see all records using scan
scan ‘students’
get ‘students’,’student1’
get ‘students’,’student1’, {column=> ‘personal_details’}
7. Delete email id column for student1
delete ‘students’,’students1’,’personal_details:email’
scan ‘students’
describe ‘students’
exists ‘students’
8. Drop a table
Drop is used to delete a hbase table. But this operator can’t be applied directly to the table. Instead,
the table is first disabled and then dropped.
disable ‘students’
drop ‘students’

OUTPUT:
Create

List

Describe

Drop

Alter

RESULT:
Thus the hbase installed with practice examples.
IMPORT DATA BETWEEN HDFS AND RDBMS USING APACHE SQOOP
AIM:
Import data between HDFS and RDBMS using APACHE

ALGORITHM:
1 Sqoop — IMPORT Command

2 Sqoop — IMPORT Command with target directory

1 Sqoop — IMPORT Command

Import command is used to importing a table from relational databases to HDFS. In our case, we are going to
tables from MySQL databases to HDFS.

2 Sqoop — IMPORT Command with target directory


You can also import the table in a specific directory in HDFS using the below command:

sqoop import --connect jdbc:mysql://localhost/employees --username edureka --table employees --m 1 -


-target-dir /employees

OUTPUT:
After the code is executed, you can check the Web UI of HDFS i.e. localhost:50070 where the data is
imported.

RESULT:
Thus the Import data between HDFS and RDBMS using APACHE.

You might also like