The Big Book of Mlops: Ebook
The Big Book of Mlops: Ebook
The Big Book of Mlops: Ebook
Contents C H A P T E R 1 :
In 3
People and process 4
People 5
Process 6
Why should I care about MLOps? 8
Guiding principles 9
C H A P T E R 2 : 11
Semantics of dev, staging and prod 11
ML deployment patterns 15
C H A P T E R 3 : 19
Architecture components 19
Data Lakehouse 19
AUTHORS:
Joseph Bradley 19
Lead Product Specialist 20
Feature Store 20
Lead Product Specialist 20
Matt Thomson Databricks SQL 20
Director, EMEA Product Specialists Databricks Jobs 20
Reference architecture 21
Lead Data Scientist 22
Dev 23
Staging 27
Prod 30
3
CHAPTER 1:
Note:
People
ML PERSONAS
Responsible for building Responsible for Responsible for deploying Responsible for using the Responsible for ensuring
data pipelines to process, understanding the business machine learning models to model to make decisions for that data governance,
organize and persist data the business or product, and data privacy and other
sets for machine learning data to understand governance, monitoring and responsible for the business compliance measures are
if machine learning is value that the model is adhered to across the
applications applicable, and then training, practices such as continuous model development and
tuning and evaluating a integration and continuous
typically involved in day-to-
6
Process
ML PROCESS
Data preparation
Data scientists clean data and apply business logic and specialized transformations to engineer features for
Model training
some baseline level of performance, in addition to meeting any other technical, business or regulatory
Deployment
Monitoring
8
the company
it to destroy or delete the illegally harvested data, and all models or algorithms
to
manage
ML systems
9
Guiding principles
Just as the core purpose of ML in a business is to enable data-driven decisions and products, the core
purpose of MLOps is to ensure that those data-driven applications remain stable, are kept up to date and
to govern and reproduce results, secure the data in cloud storage and make that storage available to your
EBOOK: THE BIG BOOK OF MLOPS 10
CHAPTER 2:
Note:
environment, human error can pose the greatest risk to business continuity, and so the least number of
12
organization could create distinct environments across multiple cloud accounts, multiple Databricks
E N V I R O N M E N T S E PA R AT I O N PAT T E R N S
2
13
Code
Models
While models are usually marked as dev, staging or prod according to their lifecycle phase, it is important to
T
T
Data
| |
ASSET SEMANTICS S E PA R AT E D BY
Data Labeled according to its origin Table access controls or cloud storage
permissions
environments
ML deployment patterns
The fact that models and code can be managed separately results in multiple possible patterns for getting
DEPLOY MODELS
DEPLOY CODE
16
Deploy models
In the second pattern, the code to train models is developed in the dev environment, and this code is
learning curve for handing code off to collaborators can be steep for many data scientists, so opinionated