Azure Machine Learning
Azure Machine Learning
Azure Machine Learning
R:
Neural Networks correct
Reference data can be fetched on demand using ___________.
All the options correct
Small Memory Foot print can be acquired through ______________.
Decision Jungle correct
The process of determining the causes that play a role in House Price increase in a
particular area to model a House Price predictor is called ___________.
Feature engineering correct
In general, Linear Machine Learning Models are associated with __________.
Fast Training correct
A K-Means Cluster is said to be well defined if the cluster is ________.
Ellipse correct
Which of the following is false about Train Data and Test Data in Azure ML Studio?
Train data and Test data split should follow a thumb-rule of 80 : 20. correct
What is the output of Azure Data Factory pipeline that uses the
AzureMLBatchExecution activity to retrain a model?
model.ilearner file correct
Azure Data Factory can be linked with ________ service to manage big data clusters
on demand.
HD Insight correct
To retrain the predictive model and update the web service through Azure Data
Factory, which of the datasets are required?
An Azure Storage blob for the output of An AzureMLUpdateResource activity. correct
What service can be used along with Azure ML Studio Services to predict Machine
Failure in real-time through data collected from sensors?
Azure Stream Analytics correct
You create an Azure Stream Analytics job in which you want to call an Azure Machine
Learning web service that is managed in a different Azure subscription. What can be
done to use the Machine Learning Web service?
Azure ML function is added to the Stream Analytics job, specifying the URL and key
for the web service. correct
To prevent running Import Data module each time, we run an experiment, when data is
imported from Azure Blob storage, we can use _____________.
Cached Results
Select the option that represents the correct order of following tasks: (A)
Predictive Experiment (B) Training Experiment (C) Model Evaluation (D) Data
Preprocessing (E) API Publishing
B , D, C, A, E correct
-------------------------------------------------
Azure - Machine Learning
Machine Learning on Cloud
According to Microsoft:
Machine learning is a data science technique that allows computers to use existing
data to forecast future behaviors, outcomes, and trends.
As you move forward, you will learn how to use Azure Machine Learning Studio as an
integrated, end-to-end data science and advanced analytics solution, enabling data
scientists to prepare data, develop experiments, and deploy models at cloud scale.
(3) Different models are trained and tested to create a predictive experiment model
with required accuracy.
(4) The predictive experiment encapsulates the machine learning model and the
associated data transformations, ready to be used with new data.
(5) Your predictive experiment can then be published as a web service, which client
applications and processes can call to generate predicted values
Jupyter Notebook
Apache Spark
Docker
Kubernetes
Python
Conda
Microsoft Machine Learning Library for Apache Spark
Microsoft Cognitive Toolkit.
Azure ML is compatible with scikit-learn, TensorFlow and Spark ML
Conda acts as a package manager for compatible python packages like sckit-learn,
matplotlib, TensorFlow, etc...
Docker and Kubernetes provide the underlying environment for the webservice
deployment.
Apache Spark and the Dependant Libraries help in Data gathering from Big Data Jobs
which can be linked with the Azure ML Studio.
Apart from these underlying components Azure ML Studio can further be connected to
services such as Azure Storage, Azure SQL Storage, etc.. facilitating a single
environment for the whole Data Science life-cycle.
Other ML Tools
In addition to Azure Machine Learning Studio, there are a variety of options at
Microsoft to build, deploy, and manage your machine learning models.
It is currently in preview and provides the environment with much better support to
open frameworks in python such as Tensorflow, sci-kit learn etc..
It supports Jupyter Notebooks, Visual Studio Code Tools for AI, Azure Batch AI and
Containerised Deployment.
Moving Forward
As you move forward with the course, you will learn
Use Azure ML functionality to clean data, create ML models and deploy them as web
services.
Azure ML Platform
This helps understand using Azure ML platform on the web to generate predictions by
creating and deploying ML models.
Workspace
Workspace
Workspace is provision given on Azure Subscription which can be referred to as a
playground for Machine Learning Experimentation.
A Workspace contains
Training Experiment
Predictive Experiment and
Web Service collection for a user.
To explore the Azure ML Platform, we must have an Azure subscription and must
create a workspace.
Notebooks
Notebooks
Azure ML platform has built-in support to execute R and Python scripts using
Jupyter Notebook.
Notebooks can be used to transform, clean, visualise data and train models
according to requirements.
Projects
Projects
Projects further enable us to easily manage various resources.
The Projects tab gives summaries of all the assets i.e experiments, notebooks, and
datasets used by us and added to the project.
Experiments can be copied from the gallery into a workspace and can be
explored/modified for a better understanding.
We can also share our work on experiments and web services for others to learn and
explore.
Datasets
An Azure ML experiment requires at least one dataset on which the model is created.
The data can be imported directly or into Azure Storage and used for creating a
model.
Hive or U-SQL jobs are used to clean and prepare for analysis.
A business process stores large volumes of data in a database or data warehouse.
This data can be stored on Azure Storage and can be easily imported to Azure ML
Workspace. Data can also be imported from Hive or Azure SQL database.
Further in the topic you will learn different types of data used in experiments and
ways to use them
Multiple files can be uploaded to the Azure ML workspace but maximum size is less
than 2 GBs.
However, import up to 10 GB of data from other sources is possible.
If you need to work with even larger volumes of data, statistical sampling
techniques must be used to sample ten gigabytes of data for training.
Training Data
Datasets are broadly classified into two types in Azure ML perspective.
Based on the client using the service, the reference data to be used can be varied
accordingly to provide customisation and flexibility to the web service.
Data can also be imported from On-premises SQL Server or several other online
sources using Import module.
Multiple Data formats such as .txt, .csv, .nh.csv, .tsv, .nh.tsv, Excel file, Azure
table, Hive table, SQL database table, .RData etc.. are supported.
Data types recognised by ML Studio are String, Integer, Double, Boolean, DateTime
and TimeSpan.
For more details refer the following:
Import Data
Prepare Data
Scenarios for advanced analytics.
The Lifecycle
A Data Science solution involves:
Adopting ML Studio
Azure Machine Learning Studio helps through the entire lifecycle of a Data Science
solution.
We already had a look at Data Import from various sources, which can be considered
as Data Extraction.
Data Cleaning and Transformation are done based on our business problem and the
approach we take to provide a solution for it. You can see this in the videos over
the following cards.
Azure ML Studio provides these features through Filters, Scale and Reduce,
Manipulation and Sample and Split.
Filters : Transforms and cleans digital data and can help in Speech Processing
Manipulation : Cleans missing values, Meta data editing, SQL transforms, etc..
Scale and Reduce : Normalization, Grouping, Clipping, etc..
Sample and Split : Partition and Sample data.
Check this, for more info on Data Cleaning and Transformation.
Data Visualisation
Exploratory Data Analysis and Data Visualisation are facilitated by Notebooks in
the Azure ML Studio.Data Visualisation is readily available in Azure ML Studio over
the right-click of uploaded datasets.
However, Notebooks can be used to further visualise data in a required manner with
Python/R scripts adding flexibility and functionality.
Watch the following video to learn using Notebooks for data visualisation.
Play
00:00
-06:09
Mute
Settings
Enter fullscreen
They are divided into 4 major classes for both Structured and unstructured data.
- Regression
- Classification
- Anomaly Detection
- Clustering
Go through Machine Learning Axioms and other ML courses for model selection and
feature selection.
Supervised Learning
Azure ML Platform is equipped with over 20 types of supervised learning methods in-
built and ready to use.
Users can also write their own python scripts and embed them into the ML workflow
for customised and optimised models using Notebooks and supported ML libraries like
sci-kit learn, Tensorflow, etc.
Move to the next cards to check out examples of various Supervised Learning
Algorithms along with concepts of Data cleaning and Transformation.
Regression Model
Watch this video to understand how to train a regression model from a sample
dataset using Azure ML Experimentation service.
Classification Model
Watch this video to understand how to train a classification model from a sample
dataset using Azure ML Experimentation service.
Moving Further
Supervised learning is the widely used ML modelling technique for structured data.
Different algorithms have their own pros and cons. Our requirements must be decided
on to select the optimal algorithm.
Refer the following links for a better understanding on Supervised learning methods
w.r.t Azure ML platform: Feature Engineering, Algorithm Selection and Evaluating ML
Model.
Unsupervised Learning
Unsupervised Learning is used to work on unstructured and un-labelled data.
Clustering is the most commonly used method where similar data is grouped by
finding features and grouping them based on their feature set.
K-Means Clustering
Watch the video to learn using K-Means Clustering Algorithm and visualise data
after clustering.
Recommenders
Recommenders, as their name suggests, are used in sectors of e-commerce, ads and
social platforms etc. They recommend related items of interest based on users
previous interaction.
What's Next?
In this topic, you learned how to create an ML model from datasets taken and
different types of ML models supported on Azure ML Platform.
Move over to next topic to learn to deploy the ML models as web services.
Refer the following links for further info on: K-Means Clustering and Match Box
Recommender.
Once the trained models are tested for accuracy and optimised, they need to be
deployed for consumption through API.
This topic helps to learn to deploy the trained model as a web service using a
predictive experiment.
Predictive Experiment
Predictive Experiment is created from successful Training Experiment.
Webservice input and output will be the new blocks added to the experiment.
Creating a Predictive Experiment
Check the following video to help understand how to create a predictive experiment.
Webservice
A Webservice is to be created from a Predictive Experiment to consume the generated
ML model.
The API end-points take inputs required by the ML model and given JSON output with
predictions.
Webservice Workflow
The following workflow is adopted for building a web service with predictive
experiment:
Deploying a Webservice
Check the following video to understand how to deploy a web service.
Now as the basics of creating and deploying a Webservice are understood, Move over
to the Next topic to learn more about managing and customising Webservices.
Diving Deep - Web services
This topic covers the
Management
Consumption and
Customizing of Web services according to the client requirements.
You will also learn about metrics and logging regarding the usage of Web services.
Parameters can be passed to retrieve reference data from the required database to
service different clients.
Can be managed and monitored with respect to utilization and activity through
logging.
Consumption
Web service can be consumed from the published API either in
Request-Response mode or
Batch Mode in asynchronous way.
API endpoints and API Keys are used according to the requirement.
The APIs are built as REST APIs and can be consumed by required client application
by passing required parameters as an HTTPS request.
Microsoft Excel along with Azure ML Plug-in can also be used to consume the
service.
Consuming APIs
The following video explains how to consume API service using API endpoints and
also using excel.
Parameters
Web Service can be customised to take required parameters to access additional
data.
For example, you can specify a database to fetch the data. It will be provided as
an additional parameter apart from the required parameters for a prediction.
This helps in customised client consumption such that each client will use the same
service for prediction but data output/retrieval is different for each of them.
Monitoring
Azure ML Platform enables Web service management through the Usage Statistics and
Logging.
Dashboards provide an overview of the total no.of requests made to the API and
their success/fail rate over a selected period of time. They also give the average
compute time and latency associated with the API.
Logging can be enabled for a more detailed JSON response files stored automatically
on Azure storage providing a detailed report of the request and response.
Moving Further
Refer the following links for more info on Consuming Webservice, Adding Parameters,
Managing Webservice and Logging.
Move over to the next topic to learn using these Webservices in Big Data scenarios.
Launch the hands-on and use a Personal Azure Trail Account or 8 hour Guest Trail to
start working in an Azure Workspace.
Find the Step by Step hands-on instructions here. A PDF will be downloaded with
Instructions to upload and visualise data.
However, if we want to use ML models in scenarios having large sets of data i.e for
Big Data, processing must be done in batches at scheduled intervals and in optimal
times to reduce latency and promote the asynchronous way of obtaining predictions
for our data.
The Azure Data Factory and its pipeline come in to play in these kinds of
scenarios.
Pipeline
Consists of a sequence of activities, which are performed using the linked
services.
Makes datasets, which define the input and output for each activity.
Azure ML in Pipeline
When Big Data batch processing is done by pipeline, predictions can be part of the
pipeline with Azure ML being used as a linked service.
Azure ML batch execution activity is used to call a predictive web service from a
pipeline.
The input data set is passed to the web service input and, the predicted output
from the web service is returned continuing to the next activity in the pipeline.
Other Linked services are data sources, like Azure Storage or Azure SQL Database,
and compute services, like Azure HD Insight or Azure Data Link Analytics.
We need fresh data to train the ML model to improve the model accuracy with the
changing inflow of data.
However, considering the amount of data required to be handled while Big Data
Processes are ongoing, this is neither efficient nor recommended.
In this situation, Azure Data Factory Pipeline and Azure ML Update Resource Service
comes to the rescue to automate the retraining of ML models.
Automating Retraining
To automate the process of retraining, Azure ML provides us a feature to publish
training experiment as a retraining web service.
The following activities can be executed in sequence, in the Azure Data Factory
Pipeline, to achieve the automation task:
Azure ML Batch Execution Activity is used to call the retraining web service to
generate a new model as a file.
The model file is passed to an Azure ML Update Resource Activity that updates the
scoring experiment replacing the existing model.
Moving Further
Refer the following links for more info on Predictive Pipelines and Update Resource
Activity.
As Big Data processing for predictions is looked into, Move over to the next topic
to learn process real-time data for predictions using the Webservices of Azure ML
Platform.
Streaming Process
A Streaming Process is considered as data processing in real-time.
Azure includes a number of services that you can use to implement a streaming
process for real-time data.
Input : Are often event hubs or IoT hubs that are used to ingest real-time data at
scale.
Streaming Job : Used to process the data. Generally, an Azure Stream Analytics
query.
Output : The expected result, could be anything from a database update to a real-
time dashboard with analysis.