Answerkey

KONGU ENGINEERING COLLEGE, PERUNDURAI 638 060
CONTINUOUS ASSESSMENT TEST I
Course Code : 20ADH02

Course Name : Internet of Things and Machine Learning
Answer Key
PART-A
1. 1. Business understanding
2. Data understanding
3. Data preparation
4. Modelling
5. Evaluation
6. Deployment
2. IBM Watson IoT Platform, Microsoft IoT-Azure IoT suite, Google Cloud IoT, Amazon AWS
IoT
3. Storing real-time generated events

Running analytical queries over stored events
Performing analytics using AI/ML/DL techniques over the data to gain insights and make
predictions
4.
5. Latency
Energy-efficiency
Privacy
Scalability
6. with hdfs.open('/tmp/file1.txt','wb') as f:
f.write(b'You are Awesome!')
with hdfs.open('/tmp/file1.txt') as f:
print(f.read())
7. with open(os.path.join(data_folder,data_file),newline='') as csvfile:

csvreader = csv.reader(csvfile)
for row in csvreader:
print(row)
df = pd.read_csv('temp.csv')
print(df)
arr = np.loadtxt('temp.csv', skiprows=1, usecols=(2,3), delimiter=',')
8. model = LinearRegressor(d)
loss = model.fit(X_train, Y_train, 20000) #Epochs = 20000
model.predict()
9. Supervised Vs Unsupervised
Supervised Unsupervised
Uses Known and Labeled Data as input Uses Unknown Data as input
The number of Classes is known The number of Classes is not known
Ex: Optical Character Recognition Ex: Face Recognition
10. Linear regression is a supervised learning task. It helps us to find the relationship
between the dependent variable y and the independent variable(s) x.
PART-B
11. In HDF5 files, data is organized into groups and datasets. A group is a collection of
groups or datasets. A dataset is a multidimensional homogeneous array.
Pytables:
1. Get the numeric data:
import numpy as np
arr = np.loadtxt('temp.csv', skiprows=1, usecols=(2,3),
delimiter=',')
2. Open the HDF5 file:
import tables
h5filename = 'pytable_demo.hdf5'
with tables.open_file(h5filename,mode='w') as h5file:
3. Get the root node:
root = h5file.root
4. Create a group with create_group() or a dataset with create_array(), and
repeat this until all the data is stored:
h5file.create_array(root,'global_power',arr)
5. Close the file:
h5file.close()
Pandas:
import pandas as pd
import numpy as np
arr = np.loadtxt('temp.csv', skiprows=1, usecols=(2,3), delimiter=',')
import pandas as pd
store=pd.HDFStore('hdfstore_demo.hdf5')
print(store)
store['global_power']=pd.DataFrame(arr)
store.close()
h5py:
import h5py
hdf5file = h5py.File('pytable_demo.hdf5')
ds=hdf5file['/global_power']
print(ds)
for i in range(len(ds)):
print(arr[i])
hdf5file.close()
12. Data Parallelism

Data parallelism is mainly used to partition the training data into several subsets and
then perform the training on different machines in a parallel fashion. The training
model in each computational node is the same. Then a center server recording all the
parameters is required to synchronize all the local updates. Each local computational
node will be refreshed by downloading new parameters. This method mainly has two
drawbacks. Firstly, a centralized parameter server offsets the benefits of distributed
computation due to the relative communication delay, especially for IoT system.
Secondly, it requires a replica of machine learning model on each computational
node, which may limit the machine learning model size due to the limited
computational resource.
Model Parallelism:
model parallelism based machine learning algorithms will partition the model into
multiple sub-models. For each training sample, each sub-model will collaborate with
each other to perform the optimization. However, this training method is very
sensitive to communication delay and may result in training failure if one
computational node is down. Furthermore, the intermediate data size is huge for
training algorithms such as stochastic gradient descent (SGD) like algorithms.
Therefore, a model parallelism based machine learning algorithm, which can
minimize the communication without sacrificing the model size, is greatly needed.
13. Hashing technique

Weight Quantization
Connection Pruning
Distilled Knowledge
Matrix Decomposition
14.
import matplotlib.pyplot as plt
from sklearn import linear_model
from sklearn.datasets import make_regression
X,y = make_regression(n_samples = 50,n_features=1,noise=0.1)
plt.scatter(X,y,color='green')
plt.title('Regression among X and y')
plt.xlabel('X - axis - X')
plt.ylabel('Y- Dependent - y')
regr = linear_model.LinearRegression()
regr.fit(X,y)
print('Intercept: \n', regr.intercept_)
print('Coefficients: \n', regr.coef_)
print('\nThe Regression Equation is',regr.coef_,'* X +',regr.intercept_)
# Fit the model for the given data
pred = regr.predict(X)
plt.plot(X,pred)
# Compute Adjusted R squared Error
print("\nAdjusted R Squared for Regression model:",regr.score(X,y))

Answerkey

Uploaded by

Copyright:

Available Formats

Answerkey

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Answerkey

Uploaded by

Copyright:

Available Formats

KONGU ENGINEERING COLLEGE, PERUNDURAI 638 060

CONTINUOUS ASSESSMENT TEST I

Course Code : 20ADH02

3. Storing real-time generated events

7. with open(os.path.join(data_folder,data_file),newline='') as csvfile:

arr = np.loadtxt('temp.csv', skiprows=1, usecols=(2,3), delimiter=',')

12. Data Parallelism

13. Hashing technique

You might also like