Hadoop

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 19

Q 11.

Write a program in eclipse to find Max temperature of


cities in HDFS.

1. Open the eclipse ide present on cloudera.

2. Create an input file with the list of cities and corresponding


temperature.

3. Create a new Java Mapreduce project.


File>New>Project>JavaProject> Then name the project as
“MaxTemp” and click finish.

4. Adding hadoop libraries to our project.Right click on project


MaxTemp>Select “properties”> Click on “java build path”.
Click on Add external jar files>Filesystem>usr>lib>hadoop.

Select All jar files and click OK.

5. We need more external libs. Click on ”add external jars“ again and
select all jar files in “client” and click OK.

6. Create Java Mapper, reducer program. Right click on “src” folder of


MaxTemp. Click on New> Class> In name textbox write “MaxTemp”
and then click Finish.
7. In MaxTemp.java file write below code.
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MaxTemp {


public static void main(String[] args) throws Exception {
// Create a new job
@SuppressWarnings("deprecation")
Job job = new Job();
// Set job name to locate it in the distributed environment
job.setJarByClass (MaxTemp.class);
job.setJobName("Max Temperature");
// Set input and output Path, note that we use the default input format
// which is TextInputFormat (each record is a line of input)
FileInputFormat.addInputPath(job, new Path("/home/cloudera/Desktop/TempData"));
FileOutputFormat.setOutputPath(job, new Path("/home/cloudera/Desktop/TempDatal"));
//Set Mapper and Reducer class
job.setMapperClass (MaxTempMapper.class);
job.setCombinerClass (MaxTempReducer.class);
job.setReducerClass (MaxTempReducer.class);
// Set Output key and value
job.setOutputKeyClass (Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
MaxTempMapper.java
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;|
import org.apache.hadoop.mapreduce.Mapper;

public class MaxTempMapper extends Mapper<Longwritable, Text, Text, IntWritable>{


private IntWritable max = new IntWritable();
private Text word = new Text();
@Override
protected void map (Longwritable key, Text value, Context context) throws IOException,
InterruptedException {
StringTokenizer line = new StringTokenizer (value.toString(),"\t");
word.set(line.nextToken());
max.set(Integer.parseInt(line.nextToken()));
context.write(word,max);
}
}
MaxTempReducer.java
import java.util.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce. Reducer;

public class MaxTempReducer extends Reducer<Text, IntWritable, Text, IntWritable>{


private int max_temp = Integer.MIN_VALUE;
private int temp = 0;
@Override
protected void reduce (Text key, Iterable<IntWritable> values, Context context) throws
IOException, InterruptedException {
Iterator<IntWritable> itr = values.iterator();
while (itr.hasNext()) {
temp itr.next().get();
if temp max temp)
{
max_temp = temp;
}
}
context.write(key, new IntWritable(max_temp));
}
}

8. Run program by clicking on the run button.


9. An output file with maximum temperature will be created
Q12. Unix Commands

1. Make directory
hadoop fs -mkdir /user/input/directory1 /user/input/directory2

2. List the contents of directory


hadoop fs -ls /user/input/directory1

3. Upload file from local file system to Hadoop file system


hadoop fs -put /home/input/sample1.text
/home/input/sample2.txt

4. Download files to local file system


hadoop fs -get <hdfs_source> <local_destination>
hadoop fs -get /user/input/dtrectory1/sample.txt /home/

5. View content of a file


hadoop fs -cat /user/input/directory/sample.txt
6. Copy a file from source to destination
hadoop fs -cp /user/input/directory/sample.txt
/user/input/directory2

7. Copy a file from/To Local file system to HDFS copyFromLocal


Similar to put command, except that the source is restricted to a
local file reference.
Syntax: hadoop fs -copyFromLocal URI
hadoop fs -copyFromLocal /home/input/sample.txt
/user/input/sample.txt
Similar to get command, except that the destination is restricted to a
local file reference.

8. Remove a file or directory in HDFS


Remove files specified as arguments. Deletes directory only when it is
empty.
Syntax: hadoop fs -rm
hadoop fs -rm /user/input/directory/sample.txt
Recursive version of delete.
hadoop fs -rmr /user/input/
9. Display the last few lines of a file.
hadoop fs -tail /user/input/directory/sample.txt
10. Display the aggregate length of a file
hadoop fs -du /user/input/directory/sample.txt
Q 13. Write a Matrix Multiply program in eclipse in HDFS.

1. Open the eclipse ide present on cloudera.

2. Create a new Java Mapreduce project.


File>New>Project>JavaProject> Then name the project as “Matrix”
and click finish.
3. Adding hadoop libraries to our project.Right click on project
Matrix>Select “properties”> Click on “java build path”.

Click on Add external jar files>Filesystem>usr>lib>hadoop.

Select All jar files and click OK.


4. We need more external libs. Click on ”add external jars“ again and
select all jar files and click OK.
5. Create Java Mapper, reducer program. Right click on “src” folder of
Matrix. Click on New> Class> In name textbox write “MatrixDriver”,
”MatrixMapper” and ”MatrixReducer” and then click Finish.
6. In MatrixDriver.java file write below code.
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class MatrixDriver


{
public static void main(String[] args) throws Exception
{
Configuration conf = new Configuration();
// M is an m-by-n matrix; N is an n-by-p matrix.
conf.set("m", "2");
conf.set("n", "2");
conf.set("p", "2");
Job job = Job.getInstance(conf, "MatrixMultiplication");
job.setJarByClass(MatrixDriver.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(MatrixMapper.class);
job.setReducerClass(MatrixReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.submit();
}
}
MatrixMapper.java
import java.io.IOException;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;

public class MatrixMapper extends Mapper<LongWritable, Text, Text, Text>


{
public void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException
{
Configuration conf = context.getConfiguration();
int m = Integer.parseInt(conf.get("m"));
int p = Integer.parseInt(conf.get("p"));
String line = value.toString();
String[] indicesAndValue = line.split(",");
Text outputKey = new Text();
Text outputValue = new Text();
if (indicesAndValue[0].equals("M"))
{
for (int k = 0; k < p; k++)
{
outputKey.set(indicesAndValue[1] + "," + k);
outputValue.set("M," + indicesAndValue[2] + "," +
indicesAndValue[3]);
context.write(outputKey, outputValue);
}
}
else
{
for (int i = 0; i < m; i++)
{
outputKey.set(i + "," + indicesAndValue[2]);
outputValue.set("N," + indicesAndValue[1] + "," +
indicesAndValue[3]); context.write(outputKey, outputValue);
}
}
}
}
MatrixReducer.java
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;

public class MatrixReducer extends Reducer<Text, Text, Text, Text>


{
public void reduce(Text key, Iterable<Text> values, Context context) throws
IOException, InterruptedException
{
String[] value;
HashMap<Integer, Float> hashA = new HashMap<Integer, Float>();
HashMap<Integer, Float> hashB = new HashMap<Integer, Float>();
for (Text val : values)
{
value = val.toString().split(",");
if (value[0].equals("M"))
{
hashA.put(Integer.parseInt(value[1]), Float.parseFloat(value[2]));
}
else
{
hashB.put(Integer.parseInt(value[1]), Float.parseFloat(value[2]));
}
}
int n = Integer.parseInt(context.getConfiguration().get("n")); float result =
0.0f;
float a_ij; float b_jk;
for (int j = 0; j < n; j++)
{
a_ij = hashA.containsKey(j) ? hashA.get(j) : 0.0f;
b_jk = hashB.containsKey(j) ? hashB.get(j) : 0.0f;
result += a_ij * b_jk;
}
if (result != 0.0f)
{
context.write(null, new Text(key.toString() + "," +
Float.toString(result)));
}
}
}
7. Export project as jar.Right click on wordcount>select ”export”>Java
>Jar file>Next>.In jar file textbox write
/home/cloudera/Matrix.jar .Click Finish and then OK.

8. View jar file exported .Click Applications> System tools>Terminal


Run command cd/home/cloudera and then ls in terminal.

9. Create input file for Map reduce program. In terminal write vi


M.txt. It will open inputfile.txt in and editor. Press Insert to write in
file. Then to save file click esc and then shift +:Write wq and press
enter. To view text file write cat M.txt
10. To look at hadoop file system write hdfs dfs -ls /
11. To Create a input directory run hdfs dfs -mkdir/input

12. Moving the input file to the hadoop file system hdfs dfs
-put/home/cloudera/M.txt/input/

13. Run mapreduce program by running hadoop jar/usr/lib/hadoop-


mapreduce/hadoop-mapreduce-examples.jar Matrix
/input/M.txt/outputfile.

14. To view the output of the program/job executed run hdfs dfs -ls
outputfile To view the output file run hdfs dfs -cat /outputfile/part-r-
00000

You might also like