Snowflake To Lakehouse Migration Assessment 5-23
Snowflake To Lakehouse Migration Assessment 5-23
Snowflake To Lakehouse Migration Assessment 5-23
Strategy Workshop
1
Tech leaders are to the right of the Data Maturity Curve
From hindsight to foresight
Supporting the business Automated
Decision
Making
Prescriptive
Analytics
Competitive Advantage
Predictive
Modeling Automatically make the best
decision
Data
Exploration
How should we
Ad Hoc respond?
Queries
What will
Reports
Clean happen?
Data
Data + AI Maturity 2
Why not use a cloud data warehouse?
Data Warehouses can’t fully support all your data
Structured
Semi-Structured
Unstructured
DATA LAKE
Lock-in / proprietary
Data is replicated format
Why not use a cloud data warehouse?
Data Warehouses are inefficient with data transformation
Up to 6x the cost for ELT
workloads
Not optimized for
Data Engineering
Structured
Semi-Structured
Limited support
for streaming
Unstructured
DATA LAKE
Lock-in / proprietary
Data is replicated format
Why not use a cloud data warehouse?
Pay a premium for all workloads
Up to 6x the cost for ELT BI Reports, Dashboards & SQL
workloads Compute cost for all
Not optimized for data access ELT
Data Engineering
Structured
Unstructured
DATA LAKE
Lock-in / proprietary
Data is replicated format
Why not use a cloud data warehouse?
A cloud data warehouse is NOT a modern data platform
Up to 6x the cost for ELT BI Reports, Dashboards & SQL
workloads Compute cost for all
Not optimized for data access ELT
Data Engineering
Structured
Unstructured
Inefficient DE, ETL, Data Sharing Delta Lake / Live Tables / Sharing
(Snowpipe / Snowpark, Partner Integrations) (Industry-leading data engine, multiple language support)
Phase 4
Phase 2 Phase 3 Phase 5
Phase 1 Discovery TurnKey Delivery
Assessment Strategy Execution
Proposal
Reference
Assessment, Technology implementation of a
Migration specific
Design, Tooling, mapping, migration production use case, Migration execution
discovery and
Accelerators, workshop, Overall migration and support
consultation
Sizing, Partners migration planning implementation
plan
10
Architectural Discovery - Snowflake
Begin with a review of the customer’s current Snowflake architecture
❏ ETL Environment
❏ Third-party Tools? (Fivetran, Talend, dbt, etc.)
❏ Data velocity
❏ Data Types Snowflake Migration Scoping
❏ BI Processes Questionnaire (leave behind)
❏ BI Tools
❏ Report/Dashboard requirements
❏ ML Use Cases
❏ Scale of use cases
❏ Model requirements (languages, libraries, compute)
❏ Use our Snowflake Profiler to understand current workloads; Partner Analyzers for deeper dive
11
Pointers for discovery of the current Snowflake architecture
12
The Snowflake Profiler is an important step
The Snowflake Profiler is a notebook which
runs in your environment to answer core
questions:
Reach out to your Partner Account Manager, Sales Leader, or email [email protected] to
engage your Databricks counterparts.
Align on Target Architecture
Designing a Well Architected Lakehouse
14
Guiding Principles for the Lakehouse
Curate Data and Offer Trusted Data- Adopt an Organization-wide Data
as-Products Governance Strategy
15
Cloud Data Analytics Framework
Data Engineer ML Engineer Data Scientist Business Analyst Business Partners
Data
Workflow Ingest & Advanced Analytics, Data Sharing
Engine
Mgmt Transform ML & AI Warehouse
16
Cloud Data Analytics Framework
Data Engineer ML Engineer Data Scientist Business Analyst Business Partners
SQL
IDE support ETL & DS tools Notebooks
Editor BI Tools Consumption
Delta
Sharing
Workflows Auto Loader,
(Jobs, DLT) DLT SQL
ETL runtime ML runtime Model Serving Data
Connectors
Workflow Ingest & Advanced Analytics, Databricks Data SQL Sharing
Batch Engine
Mgmt Transform ML & AI Warehouse
Warehouse
Runtimes
Data Quality Streaming
Photon
(DLT)
Unity
Catalog Data Governance Governance
17
Databricks Lakehouse Reference Architecture
The Databricks Lakehouse Platform can support your core data workloads
Optional DW
18
Different Data Models on the Lakehouse
The Lakehouse supports any data model
However, the right platform capabilities will ease the implementation of a particular data model
● Databricks Lakehouse has the technical enablers for teams to produce and consume data in a centralized or
decentralized but governed way
● Lakehouse is a polyglot technology that works with any data modeling concept
● Lakehouse applies at all scales (startups to large orgs)
● The Medallion architecture can fit into whatever strategy you want
19
animated
Satellite Satellite
21
animated
22
* Batch process with CDF, DLT, … ** currently lineage is restricted per