Java-Hadoop 2.X Setting Up
Java-Hadoop 2.X Setting Up
Java-Hadoop 2.X Setting Up
Ubuntu Linux 12.04.3 LTS (steps are same for any version)
Hadoop 2.2.0, (steps are same for Any version)
If you are using putty to access your Linux box remotely, please install openssh by running this command, this also
helps in configuring SSH access easily in the later part of the installation:
sudo apt-get install openssh-server
Prerequisites:
1.
2.
3.
4.
Offline:
If it is offline setup (not connected to internet) download the JDK V1.6 and later please it in desired location
like /usr/local/
Step 1) Download JDK from oracle website.(download 1.8)
(http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html)
While it is downloading, please change the owner and permission for /usr/local/.
Step1) sudo chown R <username>:<groupname> /usr/local/ (it will ask for password please enter you login or
root password)
:- Usename is your system username. Groupname you are using. Most of the time, usename and groupname will be
same.
Step 2) Change the permissions to /usr/local/ : sudo chmod R 777 /usr/local/
Step 3 ) copy and paste the downloaded java tar file from download dir to /usr/local/.
Step 4) extract the tar ball which you have downloaded from online
$cd /usr/local/
$tar zxvf jdk-7u75-linux-x64.tar.gz
2.
gedit .bashrc or vi .bashrc (if you have knowledge about VI editor, use VI cmd)
if it doesnt open then execute export DISPLAY=:0.0 then try gedit .bashrc again. it will open bash rc in
another window (text editior)
3.
export JAVA_HOME=/usr/local/java
export PATH=$PATH:$JAVA_HOME/bin
Step 6) save the file and close .bashrc file., and Run source .bashrc cmd for update the classpath.
$source .bashrc
Step 7) Verify the java version by running the following command
$ java -version (it will display the version of java which you have installed)
java version "1.7.0_75"
Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
Online:
Step 1) sudo apt-get install openjdk-8-jdk (it will ask for password please enter you login or root password)
Or Below steps also help you to install java into Ubuntu machine.
a. Download Latest oracle Java Linux version of the oracle website by using this command
wget https://edelivery.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux-x64.tar.gz
If it fails to download, please check with this given command which helps to avoid passing username and password.
c. Create a Java directory using mkdir under /user/local/ and change the directory to /usr/local/Java by using this
command
mkdir -R /usr/local/Java
cd /usr/local/Java
e. Edit the system PATH file /etc/profile and add the following system variables to your system path
f. Scroll down to the end of the file using your arrow keys and add the following lines below to the end of your
/etc/profile file:
export JAVA_HOME=/usr/local/Java/jdk1.7.0_45
export PATH=$PATH:$HOME/bin:$JAVA_HOME/bin
g. Inform your Ubuntu Linux system where your Oracle Java JDK/JRE is located. This will tell the system that the new
Oracle Java version is available for use.
This command notifies the system that Oracle Java JDK is available for use
h. Reload your system wide PATH /etc/profile by typing the following command:
. /etc/profile
Test to see if Oracle Java was installed correctly on your system.
Java version
su - hduser
c. It will ask to provide the file name in which to save the key, just press has entered so that it will generate the key at
/home/hduser/ .ssh
d. Enable SSH access to your local machine with this newly created key.
ssh hduser@localhost
This will add localhost permanently to the list of known hosts
4. Disabling IPv6.
We need to disable IPv6 because Ubuntu is using 0.0.0.0 IP for different Hadoop configurations. You will need to run the
following commands using a root account:
#disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
Hadoop Installation:
Go to Apache Downloads and download Hadoop version 2.2.0 (prefer to download any stable versions)
i. Download hadoop from https://archive.apache.org/dist/hadoop/core/ (prefer to download any stable versions)
and copy the hadoop-2.6.4.tar.gz to /usr/local/
ii. Unpack the compressed hadoop file by using this command: go to /usr/local/
cd /usr/local/
tar xvzf hadoop-2.6.4.tar.gz
iii. Make sure to change the owner of all the files to the hduser user and hadoop group by using this command:
Configuring Hadoop:
The following are the required files we will use for the perfect configuration of the single node Hadoop cluster.
a. yarn-site.xml:
b. core-site.xml
c. mapred-site.xml
d. hdfs-site.xml
e. Update $HOME/.bashrc
We can find the list of files in Hadoop directory which is located in
cd /usr/local/hadoop/etc/Hadoop
Note: Select the above listed file and right-click and open as textfiles and modify with respective
configurations.
a.yarn-site.xml:
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
b. core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
c. mapred-site.xml:
If this file does not exist, copy mapred-site.xml.template as mapred-site.xml
i. Edit the mapred-site.xml file
ii. Add the following entry to the file and save and quit the file.
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
d. hdfs-site.xml:
i. Edit the hdfs-site.xml file
ii. Create two directories to be used by namenode and datanode.
mkdir -p $HADOOP_HOME/yarn_data/hdfs/namenode
sudo mkdir -p $HADOOP_HOME/yarn_data/hdfs/namenode
mkdir -p $HADOOP_HOME/yarn_data/hdfs/datanode
iii. Add the following entry to the file and save and quit the file:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/yarn_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/yarn_data/hdfs/datanode</value>
</property>
</configuration>
e. Update $HOME/.bashrc
i. Go back to the home directory by
vi .bashrc
Node Manager:
stop-dfs.sh
stop-yarn.sh
vi. We can also start the Hadoop by using below commands.
start-dfs.sh
start-yarn.sh
Hadoop Web Interfaces:
Hadoop comes with several web interfaces which are by default available at these locations:
By this we are done in setting up a single node hadoop cluster v2.2.0, hope this step by step guide helps you to setup
same environment at your end.