Cloud Computing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

EX NO : 1 VIRTUALIZATION

AIM:

To run the Virtual Machine of different Configuration and to check how many virtual
machines can be utilized at a particular time.

REQUIREMENTS:

1. ORACLE VIRTUAL BOX


2. OPEN NEBULA SANDBOX

PROCEDURE:

1. Open Virtual box


2. File →import Appliance
3. Browse OpenNebula-Sandbox-5.0.ova file
4. Then go to setting, select Usb and choose USB 1.1
5. Then Start the Open Nebula
6. Login using username: root, password: opennebula
7. Open Browser, type localhost:9869
8. Login using username: oneadmin, password: opennebula
9. Click on instances, select VMs then follow the steps to create Virtual machine
a. Expand the + symbol
b. Select user oneadmin
c. Then enter the VM name, number of instance, CPU.
d. Then click on create button.
e. Repeat the steps the C,D for creating more than one VMs.
OUTPUT:
RESULT:

Thus the Virtual Machines of different Configuration are created successfully and
checked how many virtual machines can be utilized at a particular time.
Ex No : 2 ATTACH A VIRTUAL BLOCK TO VIRTUAL MACHINE

AIM:

To attach a Virtual Block to Virtual Machine and to check whether it holds the data even
after the release of virtual machine.

REQUIREMENTS:

3. ORACLE VIRTUAL BOX


4. OPEN NEBULA SANDBOX

PROCEDURE:

METHOD 1:

1. Open the virtual box


2. Power off the VM which you want to add virtual box
3. Then right click on that VM,select setting
4. Then click on storage,find controller IDE .
5. In the top right find add hard disk icon, the pop up window display
6. On that window select create new disk, and then click next and next then finish.
7. Then find attributes icon ,hard disk as IDE secondary slave.
METHOD 2:

1. Open Browser, type localhost:9869


2. Login using username: oneadmin, password: opennebula
3. Click on instances, select VMs then follow the steps to add virtual block
a. Select any one VM from the list and power off the VM
b. Then click on that VM ,find the storage tab then click on that
c. Then find the attach disk button
d. Click on that button ,the new pop window display
e. On that window select either image or volatile disk
f. Click on attach button.
OUTPUT:
RESULT:
Thus a Virtual Block is attached to a Virtual Machine and checked whether it holds the data even
after the release of virtual machine.
EX NO: 3 INSTALL A C COMPILER IN THE VIRTUAL MACHINE AND
EXECUTE A SAMPLE PROGRAM

AIM:

To install a C Compiler in the Virtual Machine and execute a sample program

REQUIREMENTS:

1. ORACLE VIRTUAL BOX


2. OPEN NEBULA SANDBOX
3. UBUNTU Gt6.Ova

PROCEDURE:
STEP 1:

ubuntu_gt6 installation:

• Open Virtual box


• File →import Appliance
• Browse ubuntu_gt6.ova file
• Then go to setting, select Usb and choose USB 1.1
• Then Start the ubuntu_gt6
• Login using username: dinesh, password:99425.
STEP 2:

Open the terminal

STEP 3:

//to install gcc

Sudo add-apt repository ppa:ubuntu-toolchain-r/test

sudo apt-get update

sudo apt-get install gcc-6 gcc-6-base

STEP 4:

To type a sample c program and save it

gedit hello.c

STEP 5:

To compile and run a sample c program

gcc hello.c

./a.out
OUTPUT:
RESULT:

Thus the C Compiler in the Virtual Machine is installed and a sample program is
executed successfully.
EX NO: 4 SHOW THE VIRTUAL MACHINE MIGRATION BASED ON CERTAIN
CONDITION FROM ONE NODE TO OTHER

AIM:

To show the virtual machine migration based on the certain condition from one node to the
other.

REQUIREMENTS:

1. ORACLE VIRTUAL BOX


2. OPEN NEBULA SANDBOX
3. UBUNTU Gt6.Ova

PROCEDURE:

STEP:1

Open Browser, type localhost:9869

STEP:2

Login using username: oneadmin, password: opennebula

STEP:3

Then follow the steps to migrate VMs

a. Click on infrastructure
b. Select clusters and enter the cluster name
c. Then select host tab, and select all host
d. Then select Vnets tab, and select all vnet
e. Then select datastores tab, and select all datastores
f. And then choose host under infrastructure tab
g. Click on + symbol to add new host, name the host then click on create.

STEP:4

On instances, select VMs to migrate then follow the stpes

h. Click on 8th icon ,the drop down list display


i. Select migrate on that ,the popup window display
j. On that select the target host to migrate then click on migrate.
OUTPUT:

Before migration

Host:one-sandbox
After Migration:

Host:one-sandbox
RESULT:

Thus the virtual machine migration based on the certain condition from one node to the
other has been successfully executed.
EX NO: 5 STORAGE CONTROLLER

AIM:

To install Storage Controller and to interact with it.

REQUIREMENTS:

1. ORACLE VIRTUAL BOX


2. OPEN NEBULA SANDBOX
3. UBUNTU Gt6.Ova

PROCEDURE:

Nova compute instances support the attachment and detachment of Cinder storage
volumes. This procedure details the steps involved in creating a logical volume in the cinder-
volumes volume group using the cinder command line interface.

METHOD: 1(using ubuntu GT6)


1. After login plug-in the USB drive
2. Right Click on the USB icon at bottom right corner(4th Icon)
3. Select your device name like jetflash, sandisk etc
4. Explorer window open.
5. Then do read, write operations on the USB.

METHOD: 2

STEP: 1

Log in to the dashboard.


STEP: 2

Select the appropriate project from the drop down menu at the top left.
STEP: 3
On the Project tab, open the Compute tab and click Access & Security category.
STEP: 4
On the Access & Security tab, click API Access category and click Download Openstack RC File
V2.0
STEP: 5
source ./admin-openrc.sh

STEP: 6
Create Cinder Volume. Use the cinder create command to create a new volume.
$ cinder create --display_name NAME SIZE.

OUTPUT:

RESULT:

Thus the Storage Controller is installed and interaction with it, is done successfully.
EX NO: 6 HADOOP INSTALLATION

AIM:

To install Hadoop Software and to set up the one node Hadoop cluster.

PROCEDURE:

STEP 1:INSTALLING JAVA

Step 1.1:Download jdk tar.gz file for Ubuntu 64 bit OS

$tar zxvf jdk-8u60-linux-x64.tar.gz

$cd jdk1.8.0-60/

$pwd

/home/Downloads/jdk→copy the path of jdk

Step 1.2:To set the environment variables for java

$ sudo nano /etc/profile

Pwd:

Add following three lines at the mid file

JAVA_HOME=/home/Downloads/jdk 1.8.0_60(paste path here)

PATH=$PATH:$JAVA_HOME/bin

Export PATH JAVA_HOME

Save the file by pressing ctrl+x,press y&enter

Step 1.3:source /etc/profile

Step 1.4:java -version

You will get “java hotspot(TM)64 bit server as last line

If you are not getting this,Update java alternatives

#update_alternatives__install “us a/bin/java”java

#java_version
STEP 2:INSTALLING HADOOP

Step 2.1:Download latest version of hadoop tar.gz(hadoop-2.7.0)

#cd..

#tar zxvf hadoop-2.7.0 tar.gz

#cd hadoop-2.7.0/ (to go to file)

#pwd

/home/Downloads/hadoop-2.7.0-→copy the path of hadoop

Step 2.2:To set the environment variables for hadoop

#sudo nano /etc/profile

Pwd:

Add the following three lines

HADOOP_PREFIX=/home/Downloads/hadoop 2.7.0

PATH=$PATH:$HADOOP_PREFIX/bin

Export PATH JAVA_HOME HADOOP_PREFIX

Save the file by pressing ctrl+x, press y&enter

Step 2.3:#source /etc/profile

#cd $HADOOP_PREFIX

#bin/hadoop version

Hadoop 2.7.0

Step 2.4:Update java,hadoop path to hadoop environment file

#cat /etc/profile(copy JAVA_HOME lines)HADOOP_PREFIX

#cd $HADOOP_PREFIX /etc/hadoop

#nano hadoop-env.sh

After lastline,paste the path and add export at the file

export JAVA_HOME=/home/Downloads/jdk

export HADOOP_PREFIX=/home/downloads/hadoop
STEP 3:INSTALLING PACKAGE OPEN SSH(configuring SSH)

Step 3.1:#sudo apt-get install open ssh-server

Press y for all

Unable to fetch

#sudo nano /etc/resolv.conf

Add nameserver 8.8.8.8

Save file by pressing ctrl+x, press y and enter

#sudo apt-get install open ssh-server

Press y for all

Step 3.2:To establish password less communication between system

#ssh localhost

#ssh-keygen

Given enter for all

#ssh-copy-id -i localhost

Press y

Yes

Pwd:

You will get “No.of keys added:/”

STEP 4:TO CONFIGURE FOUR XML FILES

(to configure what is the file system in which its configured)

Step 4.1:Modify core-site.xml

$nano core-site.xml

In that file, paste


<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Save file by processing ctrl+x,press y and enter

Step 4.2:To configure number of replication, modify hdfs-site.xml

$nano hdfs-site.xml

In hdfs-site.xml file, paste

<configuration>
<property>

<name>dfs.replication</name>

<value>1</value>

</property>
</configuration>

ctrl+x, press y and enter

Step 4.3:Modify mapred-site.xml

$cp mapred-site.xml.template mapred-site.xml

$nano mapred-site.xml

In that file, paste

<configuration>

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

</configuration>

Step 4.4:Modify yarn-site.xml

In that file, paste

<configuration>

<property><name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</valu
e></property>

</configuration>

STEP 5:FORMAT HDFS FILE SYSTEM (via name node for first time)

#cd $HADOOP_PREFIX

#bin/hadoop namenode -format

Step 5.1:Start name node and data node(port 50070)

#sbin/start-dfs.sh

Press y, yes

If error occurs ssh-add

To know running name node and data node Daemon type

#jps[In browser, type local host:50070,give enter , name node information is displayed]

Step 5.2:Start resource manager and node manager Daemon(port 8088)

#sbin/start-yarn.sh

#jps

Step 5.3:To stop the running process

#sbin/stop-dfs.sh

#sbin/stop-yarn.sh
OUTPUT:

RESULT:

Thus the Hadoop Software is installed and the one node Hadoop cluster is set up
successfully
EX NO: 7 MOUNT THE ONE NODE HADOOP CLUSTER USING FUSE

AIM:
To Mount the one node Hadoop cluster using FUSE.
.
PROCEDURE:

STEP: 1
wget http://archive.cloudera.com/cdh5/one-click-install/trusty/amd64/cdh5-repository_1.0_all.deb

STEP: 2
sudo dpkg -i cdh5-repository_1.0_all.deb

STEP: 3
sudo apt-get update

STEP: 4
sudo apt-get install hadoop-hdfs-fuse

STEP: 5
sudo mkdir -p xyz

STEP: 6
cd hadoop-2.7.0/

STEP: 7
bin/hadoop namenode -format

STEP: 8
sbin/start-all.sh
STEP: 9
hadoop-fuse-dfs dfs://localhost:9000 /home/it08/Downloads/xyz/

STEP: 10
sudo chmod 777 /home/it08/Downloads/xyz/

STEP: 11
hadoop-fuse-dfs dfs://localhost:9000 /home/it08/Downloads/xyz/

STEP: 12
cd /home/it08/Downloads/xyz/

STEP: 13
mkdir a
ls
OUTPUT:
RESULT:
Thus one node Hadoop cluster is mounted using FUSE successfully.
EX NO: 8 A WORD COUNT PROGRAM USING MAP-REDUCE TASKS

AIM:

To write a wordcount program to demonstrate the use of Map and Reduce tasks.

PROCEDURE:

STEP: 1
Download Hadoop-core-1.2.1.jar, which is used to compile and execute the MapReduce
program. Visit the following link http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-
core/1.2.1 to download the jar. Let us assume the downloaded folder is /home/hadoop/.
STEP: 2
The following commands are used for compiling the WordCount.java program.
javac -classpath hadoop-core-1.2.1.jar -d . WordCount.java

STEP: 3

Create a jar for the program.


jar -cvf sample1.jar sample1/

STEP: 4

cd $HADOOP_PREFIX

bin/hadoop namenode -format

sbin/start-dfs.sh

sbin/start-yarn.sh

jps

STEP: 5

The following command is used to create an input directory in HDFS.


bin/hdfs dfs -mkdir /input

STEP: 6
The following command is used to copy the input file named sal.txt in the input directory of
HDFS.
bin/hdfs dfs -put /home/it08/Downloads/sal.txt /input

STEP: 7

The following command is used to run the application by taking the input files from the input
directory.
bin/hadoop jar /home/it08/Downloads/sample1.jar sample1.WordCount /input /output

PROGRAM:

WordCount.java

package sample;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);


private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}

public static class IntSumReducer


extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

INPUT FILE: (Sample.txt)

1949 12 12 34 23 45 34 12 23
1987 13 11 32 34 45 56 12 34
1997 12 12 12 12 12 11 34 12
1998 23 34 23 34 45 56 23 34
2000 10 11 12 23 14 13 15 16

OUTPUT:
RESULT:

Thus the word count program to demonstrate the use of Map and Reduce tasks is
executed successfully.
EX NO: 9 Unstructured data into NoSQL data and do all operations such

as NoSQL query with API

AIM:

Converting unstructured data into NoSQL format and performing operations using NoSQL
queries with an API involves several steps. I'll provide a general guide using Python and
MongoDB as an example of a NoSQL database. Keep in mind that the specifics might vary
depending on the NoSQL database you're using.

PROCEDURE:

Step 1: Install MongoDB and pymongo

Ensure you have MongoDB installed and running. You can install the pymongo library using
pip:

pip install pymongo

Step 2: Connect to MongoDB

from pymongo import MongoClient

# Connect to MongoDB

client = MongoClient('localhost', 27017)

db = client['your_database_name']

collection = db['your_collection_name']

Step 3: Insert Data

Assuming your unstructured data is in a list of dictionaries:

data = [

{"name": "John", "age": 25, "city": "New York"},


{"name": "Jane", "age": 30, "city": "San Francisco"},

# ... more data

# Insert data into MongoDB

collection.insert_many(data)

Step 4: Query Data

# Query data

result = collection.find({"age": {"$gt": 25}})

for document in result:

print(document)

Step 5: Update Data

# Update data

collection.update_one({"name": "John"}, {"$set": {"age": 26}})

Step 6: Delete Data

# Delete data

collection.delete_one({"name": "Jane"})

Step 7: Set up API (Optional)

You can expose your MongoDB operations through an API using a framework like Flask or
FastAPI. Here's a basic example using Flask:

from flask import Flask, jsonify, request

app = Flask(__name__)
@app.route('/get_data', methods=['GET'])

def get_data():

result = collection.find({})

data = [document for document in result]

return jsonify(data)

if __name__ == '__main__':

app.run(debug=True)

This is a simple example, and you might need to add more routes for different operations.

Remember to secure your API and handle errors appropriately. Additionally, adjust the code
according to the NoSQL database you're using, as the syntax for queries and operations may
differ between databases.

RESULT:
Thus Unstructured data into NoSQL data and do all operations such as NoSQL query
with API is mounted using FUSE successfully.
EX NO: 10 K-means clustering using map reduce

K-means clustering is an iterative algorithm that partitions a dataset into K clusters, where each
data point belongs to the cluster with the nearest mean. MapReduce is a programming model for
processing and generating large datasets that can be parallelized across a distributed cluster of
computers. Here's a high-level overview of how you might implement K-means clustering using
MapReduce:

Step-by-Step Procedure:

Initialization:

Randomly select K initial cluster centroids.

Map Phase:

Read the input data distributed across your cluster of machines.

For each data point, calculate the distance to each centroid and emit the data point with the ID of
the nearest centroid.

Mapper Output: <centroid_id, data_point>

Combine/Group Phase:

Group the emitted data points by centroid ID.

This step is typically handled automatically in MapReduce frameworks.

Reduce Phase:

For each group (centroid), calculate the new centroid by computing the mean of the data points
assigned to that cluster.

Reducer Output: <centroid_id, new_centroid>

Update Centroids:
Collect the new centroids from the reducers.

Use these new centroids as the input centroids for the next iteration.

Convergence Check:

Check for convergence by comparing the new centroids with the previous centroids.

If the centroids have not changed significantly, stop the iterations.

Iteration:

If the convergence criteria are not met, repeat the map-reduce steps with the updated centroids.

Final Output:

The final output will be the K centroids representing the clusters.

Pseudocode:

Here is a simplified pseudocode representation of the MapReduce steps:

Map (data_point):

for each centroid in centroids:

distance = calculate_distance(data_point, centroid)

emit(centroid_id, (data_point, distance))

Reduce (centroid_id, data_points):

new_centroid = calculate_mean(data_points)

emit(centroid_id, new_centroid)

Notes:
The distance calculation and mean calculation functions depend on your data and application
context.

The MapReduce framework, like Hadoop or Apache Spark, handles the distribution of data and
tasks across the cluster.

The number of iterations needed for convergence can vary based on the data and initial centroids.

Keep in mind that the MapReduce paradigm might not be the most efficient for iterative
algorithms like K-means due to the overhead of repeated map and reduce phases. More modern
distributed computing frameworks, like Apache Spark, provide iterative algorithms that are more
optimized for these types of tasks.

RESULT:
Thus K-means clustering using map reduce is executed successfully
EX NO: 11 Page Rank Computation

PageRank computation in the context of cloud computing and big data often involves
distributing the computation across multiple nodes to handle the large-scale processing required
for web-scale graphs. PageRank is an algorithm that measures the importance of webpages in a
hyperlink graph, and it was famously used by Google to rank search engine results.

Here's a high-level overview of how PageRank computation can be implemented in a cloud


computing and big data lab environment:

1. Data Representation:

Web graphs are often represented as adjacency lists or matrices, where each node
represents a webpage, and edges represent hyperlinks.

2. Data Partitioning:

Break the web graph into smaller partitions that can be distributed across multiple nodes
in the cloud. This step is crucial for parallel processing.

3. Distributed Storage:

Store the graph data in a distributed storage system like Hadoop Distributed File System
(HDFS) or a cloud-based equivalent (e.g., Amazon S3, Google Cloud Storage).

4. Distributed Computation:

Leverage a distributed computing framework like Apache Hadoop or Apache Spark to


perform the PageRank computation in parallel across multiple nodes.
MapReduce paradigm is commonly used for such computations. The graph is partitioned
into chunks, and each chunk is processed independently by different nodes.
5. Iterative Algorithm:

PageRank is an iterative algorithm where the scores are updated until convergence. Each
iteration involves processing the graph data and updating the PageRank scores.

6. Fault Tolerance:

Implement fault-tolerant mechanisms to handle failures that may occur in a distributed


environment. For instance, Hadoop and Spark have built-in fault tolerance features.

7. Scaling:
Cloud computing allows you to dynamically scale resources based on the workload. As
the size of the web graph grows or changes, you can allocate more resources to the
computation.

8. Optimizations:

Apply optimization techniques to enhance the efficiency of PageRank computation. This


may include using graph partitioning strategies, optimizing communication between
nodes, and employing caching mechanisms.

9. Result Aggregation:

After several iterations, the final PageRank scores need to be aggregated and presented.
This result can be stored in a distributed storage system or used for further analysis.

10. Monitoring and Visualization:

Implement monitoring tools to keep track of the progress of the computation.


Visualization tools can help in understanding the structure of the web graph and the
distribution of PageRank scores.

In summary, cloud computing and big data technologies provide the infrastructure and tools
necessary to efficiently compute PageRank on large-scale graphs. Leveraging distributed
computing frameworks and storage systems allows for the parallel processing and storage of vast
amounts of data, making it feasible to compute PageRank for the entire web graph.

RESULT:
Thus the procedure of the Page Rank Computation and process is executed successfully

You might also like