Java-Hadoop 2.X Setting Up

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Hadoop 2.

x Setup Veeraravi Kumar Singiri, +91 9986755336

Setting up the environment:


In this tutorial you will know step by step process for setting up a Hadoop Single Node cluster, so that you can play
around with the framework and learn more about it.
In This tutorial we are using following Software versions, you can download same by clicking the hyperlinks:

Ubuntu Linux 12.04.3 LTS (steps are same for any version)
Hadoop 2.2.0, (steps are same for Any version)
If you are using putty to access your Linux box remotely, please install openssh by running this command, this also
helps in configuring SSH access easily in the later part of the installation:
sudo apt-get install openssh-server
Prerequisites:

1.
2.
3.
4.

Installing Java v1.7


Adding dedicated Hadoop system user.
Configuring SSH access.
Disabling IPv6.
Before starting of installing any applications or softwares, please makes sure your list of packages from all repositories
and PPAs is up to date or if not update them by using this command:
sudo apt-get update

1. Installing Java v1.7:


For running Hadoop it requires Java v1. 6+ but use latest version
Note: There are multiple ways to install Java in linux machine.

Offline:
If it is offline setup (not connected to internet) download the JDK V1.6 and later please it in desired location
like /usr/local/
Step 1) Download JDK from oracle website.(download 1.8)
(http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html)
While it is downloading, please change the owner and permission for /usr/local/.
Step1) sudo chown R <username>:<groupname> /usr/local/ (it will ask for password please enter you login or
root password)
:- Usename is your system username. Groupname you are using. Most of the time, usename and groupname will be
same.
Step 2) Change the permissions to /usr/local/ : sudo chmod R 777 /usr/local/
Step 3 ) copy and paste the downloaded java tar file from download dir to /usr/local/.

Hadoop 2.x Setup Veeraravi Kumar Singiri, +91 9986755336

Step 4) extract the tar ball which you have downloaded from online

$cd /usr/local/
$tar zxvf jdk-7u75-linux-x64.tar.gz

(it should be name of downloaded java tar)

Steps to edit bashrc file:


Step 5) Edit the .bashrc file for your user as follows:
To enter into .bashrc file, please follow the below steps.
1.

cd and press enter.

2.

gedit .bashrc or vi .bashrc (if you have knowledge about VI editor, use VI cmd)
if it doesnt open then execute export DISPLAY=:0.0 then try gedit .bashrc again. it will open bash rc in
another window (text editior)

3.

go to end of the file and give java home path

export JAVA_HOME=/usr/local/java
export PATH=$PATH:$JAVA_HOME/bin
Step 6) save the file and close .bashrc file., and Run source .bashrc cmd for update the classpath.

$source .bashrc
Step 7) Verify the java version by running the following command

$ java -version (it will display the version of java which you have installed)
java version "1.7.0_75"
Java(TM) SE Runtime Environment (build 1.7.0_75-b13)

Online:

if you are connected to internet, the java installation is simple

Step 1) sudo apt-get install openjdk-8-jdk (it will ask for password please enter you login or root password)

sudo apt-get install openjdk-8-jdk ## to install JDK


sudo apt-get install openjdk-8-jre ## to install JRE
Step 2) execute update-java-alternatives -l ## will tell which all java versions installed if there are more than one.
and also would give path for installation for example
/usr/lib/jvm/java-1.8.0-openjdk-amd64
Step 3) Follow the above steps mentioned under section steps to update bashrc file above to update the bashrc file.
Step 4) use java version to verify the JAVA

Or Below steps also help you to install java into Ubuntu machine.
a. Download Latest oracle Java Linux version of the oracle website by using this command
wget https://edelivery.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux-x64.tar.gz

Hadoop 2.x Setup Veeraravi Kumar Singiri, +91 9986755336

If it fails to download, please check with this given command which helps to avoid passing username and password.

wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com"


"https://edelivery.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux-x64.tar.gz"

b. Unpack the compressed Java binaries, in the directory:

sudo tar xvzf jdk-7u25-linux-x64.tar.gz

c. Create a Java directory using mkdir under /user/local/ and change the directory to /usr/local/Java by using this
command

mkdir -R /usr/local/Java
cd /usr/local/Java

d. Copy the Oracle Java binaries into the /usr/local/Java directory.

sudo cp -r jdk-1.7.0_45 /usr/local/java

e. Edit the system PATH file /etc/profile and add the following system variables to your system path

sudo nano /etc/profile or sudo gedit /etc/profile

Hadoop 2.x Setup Veeraravi Kumar Singiri, +91 9986755336

f. Scroll down to the end of the file using your arrow keys and add the following lines below to the end of your
/etc/profile file:

export JAVA_HOME=/usr/local/Java/jdk1.7.0_45
export PATH=$PATH:$HOME/bin:$JAVA_HOME/bin

g. Inform your Ubuntu Linux system where your Oracle Java JDK/JRE is located. This will tell the system that the new
Oracle Java version is available for use.

sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/local/java/jdk1.7.0_45/bin/javac" 1


sudo update-alternatvie --set javac /usr/local/Java/jdk1.7.0_45/bin/javac

This command notifies the system that Oracle Java JDK is available for use
h. Reload your system wide PATH /etc/profile by typing the following command:

. /etc/profile
Test to see if Oracle Java was installed correctly on your system.

Java version

================JAVA INSTALLATION COMPLETED==========================

2. Adding dedicated Hadoop system user.


We will use a dedicated Hadoop user account for running Hadoop. While thats not required but it is recommended,
because it helps to separate the Hadoop installation from other software applications and user accounts running on the
same machine.
a. Adding group:

sudo addgroup Hadoop


b. Creating a user and adding the user to a group:

sudo adduser ingroup Hadoop hduser


It will ask to provide the new UNIX password and Information as shown in below image.

Hadoop 2.x Setup Veeraravi Kumar Singiri, +91 9986755336

3. Configuring SSH access:


The need for SSH Key based authentication is required so that the master node can then login to slave nodes (and the
secondary node) to start/stop them and also local machine if you want to use Hadoop with it. For our single-node setup
of Hadoop, we therefore need to configure SSH access to localhost for the hduser user we created in the previous
section.
Before this step you have to make sure that SSH is up and running on your machine and configured it to allow SSH public
key authentication.
Generating an SSH key for the hduser user (hduser is user which we have used in setup, it may be different in your
machine).
a. Login as hduser with sudo

su - hduser

b. Run this Key generation command:

ssh-keyegen -t rsa -P ""

c. It will ask to provide the file name in which to save the key, just press has entered so that it will generate the key at

/home/hduser/ .ssh

Hadoop 2.x Setup Veeraravi Kumar Singiri, +91 9986755336

d. Enable SSH access to your local machine with this newly created key.

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys


e. change the permission for /authorized_keys
$ chmod 0650 ~/.ssh/authorized_keys
$ exit
f. The final step is to test the SSH setup by connecting to your local machine with the hduser user.

ssh hduser@localhost
This will add localhost permanently to the list of known hosts

4. Disabling IPv6.
We need to disable IPv6 because Ubuntu is using 0.0.0.0 IP for different Hadoop configurations. You will need to run the
following commands using a root account:

sudo gedit /etc/sysctl.conf


Add the following lines to the end of the file and reboot the machine, to update the configurations correctly.

#disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Hadoop 2.x Setup Veeraravi Kumar Singiri, +91 9986755336

Hadoop Installation:
Go to Apache Downloads and download Hadoop version 2.2.0 (prefer to download any stable versions)
i. Download hadoop from https://archive.apache.org/dist/hadoop/core/ (prefer to download any stable versions)
and copy the hadoop-2.6.4.tar.gz to /usr/local/
ii. Unpack the compressed hadoop file by using this command: go to /usr/local/

cd /usr/local/
tar xvzf hadoop-2.6.4.tar.gz
iii. Make sure to change the owner of all the files to the hduser user and hadoop group by using this command:

sudo chown -R hduser:hadoop Hadoop

Configuring Hadoop:
The following are the required files we will use for the perfect configuration of the single node Hadoop cluster.
a. yarn-site.xml:
b. core-site.xml
c. mapred-site.xml
d. hdfs-site.xml
e. Update $HOME/.bashrc
We can find the list of files in Hadoop directory which is located in

cd /usr/local/hadoop/etc/Hadoop

Hadoop 2.x Setup Veeraravi Kumar Singiri, +91 9986755336

Note: Select the above listed file and right-click and open as textfiles and modify with respective
configurations.
a.yarn-site.xml:
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>

b. core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

c. mapred-site.xml:
If this file does not exist, copy mapred-site.xml.template as mapred-site.xml
i. Edit the mapred-site.xml file
ii. Add the following entry to the file and save and quit the file.
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

Hadoop 2.x Setup Veeraravi Kumar Singiri, +91 9986755336

d. hdfs-site.xml:
i. Edit the hdfs-site.xml file
ii. Create two directories to be used by namenode and datanode.

mkdir -p $HADOOP_HOME/yarn_data/hdfs/namenode
sudo mkdir -p $HADOOP_HOME/yarn_data/hdfs/namenode
mkdir -p $HADOOP_HOME/yarn_data/hdfs/datanode
iii. Add the following entry to the file and save and quit the file:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/yarn_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/yarn_data/hdfs/datanode</value>
</property>
</configuration>

e. Update $HOME/.bashrc
i. Go back to the home directory by
vi .bashrc

cd cmd and edit the .bashrc file.

Hadoop 2.x Setup Veeraravi Kumar Singiri, +91 9986755336

ii. Add the following lines to the end of the file.


Add below configurations:
# Set Hadoop-related environment variables
export HADOOP_PREFIX=/usr/local/hadoop
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
# Native Path
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"
#Java path
export JAVA_HOME='/usr/locla/Java/jdk1.7.0_45'
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin:$JAVA_PATH/bin:$HADOOP_HOME/sbin

Formatting and Starting/Stopping the HDFS filesystem via the NameNode:


i. The first step to starting up your Hadoop installation is formatting the Hadoop filesystem which is implemented on top
of the local filesystem of your cluster. You need to do this the first time you set up a Hadoop cluster. Do not format a
running Hadoop filesystem as you will lose all the data currently in the cluster (in HDFS). To format the filesystem
(which simply initializes the directory specified by the dfs.name.dir variable), run the

hadoop namenode format

Hadoop 2.x Setup Veeraravi Kumar Singiri, +91 9986755336

ii. Start Hadoop Daemons by running the following commands:


Name node:

$ hadoop-daemon.sh start namenode


Data node:

$ hadoop-daemon.sh start datanode


Resource Manager:

$ yarn-daemon.sh start resourcemanager

Node Manager:

$ yarn-daemon.sh start nodemanager


Job History Server:

$ mr-jobhistory-daemon.sh start historyserver

Hadoop 2.x Setup Veeraravi Kumar Singiri, +91 9986755336

v. Stop Hadoop by running the following command

stop-dfs.sh
stop-yarn.sh
vi. We can also start the Hadoop by using below commands.

start-dfs.sh
start-yarn.sh
Hadoop Web Interfaces:
Hadoop comes with several web interfaces which are by default available at these locations:

HDFS Namenode and check health using http://localhost:50070


HDFS Secondary Namenode status using http://localhost:50090

By this we are done in setting up a single node hadoop cluster v2.2.0, hope this step by step guide helps you to setup
same environment at your end.

You might also like