The Big Book of Mlops: Ebook

Download as pdf or txt
Download as pdf or txt
You are on page 1of 36

eBook

The Big Book


of MLOps
A data-centric approach
to establish and scale
machine learning

ModelOps DataOps DeOps

JOSEPH BRADLEY | RAFI KURLANSIK | M AT T T H O M S O N | NIALL TURBITT


2

Contents C H A P T E R 1 : 
In 3
People and process 4
People 5
Process 6
Why should I care about MLOps? 8
Guiding principles 9

C H A P T E R 2 :  11
Semantics of dev, staging and prod 11
ML deployment patterns 15

C H A P T E R 3 :  19
Architecture components 19
Data Lakehouse 19
AUTHORS:

Joseph Bradley 19
Lead Product Specialist 20
Feature Store 20
Lead Product Specialist 20
Matt Thomson Databricks SQL 20
Director, EMEA Product Specialists Databricks Jobs 20
Reference architecture 21
Lead Data Scientist 22
Dev 23
Staging 27
Prod 30
3

CHAPTER 1:

Put simply, MLOps = + +

Note:

project to help make our customers


4

People and process

ML WORKFLOW AND PERSONAS


5

People

ML PERSONAS

Responsible for building Responsible for Responsible for deploying Responsible for using the Responsible for ensuring
data pipelines to process, understanding the business machine learning models to model to make decisions for that data governance,
organize and persist data the business or product, and data privacy and other
sets for machine learning data to understand governance, monitoring and responsible for the business compliance measures are
if machine learning is value that the model is adhered to across the
applications applicable, and then training, practices such as continuous model development and
tuning and evaluating a integration and continuous
typically involved in day-to-
6

Process

ML PROCESS

Data preparation

Exploratory data analysis (EDA)


Analysis is conducted by data scientists to assess statistical properties of the data available, and determine
7

Data scientists clean data and apply business logic and specialized transformations to engineer features for

Model training

some baseline level of performance, in addition to meeting any other technical, business or regulatory

Deployment

Monitoring
8

Why should I care about MLOps?

inherent to the system itself and risk of

the company
it to destroy or delete the illegally harvested data, and all models or algorithms

to
manage
ML systems
9

Guiding principles

Just as the core purpose of ML in a business is to enable data-driven decisions and products, the core
purpose of MLOps is to ensure that those data-driven applications remain stable, are kept up to date and

reduce operational costs or risks?

to govern and reproduce results, secure the data in cloud storage and make that storage available to your
EBOOK: THE BIG BOOK OF MLOPS 10

responsibilities to clarify the modular structure of your ML application


11

CHAPTER 2:

Semantics of dev, staging and prod

Note:

environment, human error can pose the greatest risk to business continuity, and so the least number of
12

organization could create distinct environments across multiple cloud accounts, multiple Databricks

E N V I R O N M E N T S E PA R AT I O N PAT T E R N S

2
13

Code

Models
While models are usually marked as dev, staging or prod according to their lifecycle phase, it is important to

T

T

and its Model Registry support managing model artifacts


14

Data

| |

ASSET SEMANTICS S E PA R AT E D BY

development, testing and connections Workspace access controls

Models Labeled according to model lifecycle


phase storage permissions

Data Labeled according to its origin Table access controls or cloud storage
permissions
environments

Git repository branches


development lifecycle phase
15

ML deployment patterns
The fact that models and code can be managed separately results in multiple possible patterns for getting

DEPLOY MODELS

DEPLOY CODE
16

Deploy models

In the second pattern, the code to train models is developed in the dev environment, and this code is

learning curve for handing code off to collaborators can be steep for many data scientists, so opinionated

You might also like