Microsoft Fabric

Microsoft Fabric
A unified analytics solution for the era of AI
James Serra
Industry Advisor
Microsoft, Federal Civilian
[email protected]
6/16/23
About Me
▪ Microsoft, Data & AI Solution Architect in Microsoft Federal Civilian
▪ At Microsoft for most of the last nine years as a Data & AI Architect , with a brief stop at EY
▪ In IT for 35 years, worked on many BI and DW projects
▪ Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM
architect, PDW/APS developer
▪ Been perm employee, contractor, consultant, business owner
▪ Presenter at PASS Summit, SQLBits, Enterprise Data World conference, Big Data Conference
Europe, SQL Saturdays, Informatica World
▪ Blog at JamesSerra.com
▪ Former SQL Server MVP
▪ Author of book “Deciphering Data Architectures: Choosing Between a Modern Data Warehouse,
Data Fabric, Data Lakehouse, and Data Mesh”
My upcoming book First two chapters available now:
Deciphering Data Architectures (oreilly.com)
- Foundation
- Big data
- Types of data architectures
- Architecture Design Session
- Common data architecture concepts
- Relational Data Warehouse
- Data Lake
- Approaches to Data Stores
- Approaches to Design
- Approaches to Data Modeling
Table of contents - Approaches to Data Ingestion
- Data Architectures
- Modern Data Warehouse (MDW)
- Data Fabric
- Data Lakehouse
- Data Mesh Foundation
- Data Mesh Adoption
- People, Process, and Technology
- People and process
- Technologies
- Data architectures on Microsoft Azure
Agenda
▪ What is Microsoft Fabric?
▪ Workspaces and capacities
▪ OneLake
▪ Lakehouse
▪ Data Warehouse
▪ ADF
▪ Power BI / DirectLake
▪ Resources
▪ Not covered:
▪ Real-time analytics
▪ Spark
▪ Data science
▪ Fabric capacities
▪ Billing / Pricing
▪ Reflex / Data Activator
▪ Git integration
▪ Admin monitoring
▪ Purview integration
▪ Data mesh
▪ Copilot
Microsoft Fabric does it all—in a unified solution
An end-to-end analytics platform that brings together all the data and analytics tools that
organizations need to go from the data lake to the business user
Data Integration Data Engineering Data Warehouse Data Science Real Time Analytics Business Intelligence Observability
Data Factory Synapse Synapse Synapse Synapse Power BI Data Activator
Unified data foundation

OneLake
UNIFIED
SaaS product experience Security and governance Compute and storage Business model
Microsoft Fabric
The data platform for the era of AI
Single…
Onboarding and trials
Data Synapse Data Synapse Data Synapse Data Synapse Real Data
Sign-on
Power BI
Factory Engineering Science Warehousing Time Analytics Activator Navigation model
UX model
AI Assisted Workspace organization
Shared Workspaces
Collaboration experience
Data Lake
Universal Compute Capacities Storage format
OneSecurity
Data copy for all engines
Security model
OneLake CI/CD
Monitoring hub
The Intelligent data foundation Data Hub
Governance & compliance
SaaS
“it just works"
5 seconds to signup, 5 minutes to wow
Success Centralized
5x5
by Default administration
Frictionless onboarding Minimal knobs Tenant-wide governance
Centralized security
Instant Provisioning Auto optimized
management
Quick results w/ Intuitive UX Auto Integrated Compliance built-in

Old vs New
Understanding Microsoft Fabric / FAQ
• Think of it as taking the PBI workspace and adding a SaaS version of Synapse to it
• You will wake up one day and PBI workspaces will be automatically migrated to Fabric workspaces: PBI
capacities will become fabric capacities. Your PBI tenant will have the Fabric workloads automatically built-
in
• Aligned to backend fabric capacity. Similar to Power BI capacity – specific amount of compute assigned to it.
A universal bucket of compute. No more Synapse DWU’s, Spark clusters, etc
• Serverless Pool and Dedicated Pool combined into one – no more relational storage or dedicated resources.
Everything is serverless. All about data lakehouse
• No Azure portal, subscriptions, creating storage. User won’t even realize they are using Azure
• Fabric has strong separation between person who buys and pays the bill, with person who builds stuff. In
Azure, the person building the solution has to also have the power to buy
• This is not just for departmental use. It’s not PaaS services (i.e., Synapse) vs Fabric. Fabric is the future.
Fabric is going to run your entire data estate: departmental projects as well as the largest data warehouse,
data lakehouses and data science projects
• One platform for enterprise data professional and citizen developer (next slide)
Data Engineers Data Scientists Data Analysts Data Citizens
• Execute faster with the ability to spin up • Quickly tune a custom model by • Avoid slow, progress-stagnating • Make more data-driven decisions
a Spark VM cluster in seconds, or integrating a model built and trained in data wrangling by seamlessly triggering with actionable insights and intelligence
configure with familiar experiences like Azure ML in a Spark notebook a workflow that can unlock data in your preferred applications
Git DevOps pipelines for data engineering tools and capabilities quickly.
• Work faster with the ability to user your • Maintain access to all the data you
engineering artifacts
preferred data science frameworks, • Accelerate your work with visual and need, without being overwhelmed by
• Streamline your work with a single languages, and tools SQL based tools for self-serve data data ancillary to your role thanks to fine
platform to build and operate real-time transformations and modeling as well as grain data access management controls
• Bypass engineering dependencies
analytics pipelines, data lakes, lake self-serve tools for reporting, dashboards,
with the ability to use your preferred no-
houses, warehouses, marts, and cubes and data visualizations
code ML Ops to deploy and operate
using your preferred IDE, plug-ins, and
models in production • Turn data into impact with industry-
tools.
leading BI tools and integration with the
• Tap into proven-at-scale models and
• Reduce costly data replication and apps your people use everyday like
services to accelerate your AI
movement with the ability to produce Microsoft 365
differentiation (AOAI, Cognitive Services,
base datasets that can serve data analysts
ONNX integration, etc).
and data scientists without needing to Serve data via Serve Serve insights
build pipelines warehouse or transformed via
lakehouse data embedding
Supporting experiences: Supporting experiences Supporting experiences Supporting experiences
Data Factory Data Warehouse

Data Real-time
Data Science Azure ML Power BI Power BI Microsoft 365
Warehouse analytics
Data Engineering Real-time analytics
Serve data via warehouse or lakehouse
Data Stewards
• Maintain visibility and control of costs with a unified consumption and cost model that provides evergreen spend optics on your end-to-end data estate
• Gain full visibility and governance over your entire analytics estate from data sources and connections to your data lake, to users and their insights
Workspaces and capacities
Company examples
Create fabric capacity
Capacity is a dedicated set of resources reserved for exclusive use. It offers dependable,
consistent performance for your content. Each capacity offers a selection of SKUs, and
each SKU provides different resource tiers for memory and computing power. You pay
for the provisioned capacity whether you use it or not.
A capacity is a quota-based system, and scaling up or down a capacity doesn't involve

provisioning compute or moving data, so it’s instant.
Once the capacity is created, we can see the capacity on the Admin portal- Capacity Settings pane under the "Fabric Capacity" tab
Turning on Microsoft Fabric
Enable Microsoft Fabric for your organization - Microsoft Fabric | Microsoft Learn
Demo
OneLake
OneLake for all data 2
“The OneDrive for data”
OneLake
A single unified logical SaaS data lake for
the whole organization (no silos) Data Synapse Data Synapse Data Synapse Data Synapse Real Data
Power BI
Factory Warehousing Engineering Science Time Analytics Activator
Organize data into domains
Foundation for all Fabric data items
Provides full and open access through
industry standard APIs and formats to any
application (no lock-in)
One Copy
One Security OneLake
OneLake Data Hub Intelligent data fabric

One Copy for all computes 4
Real separation of compute and storage
Compute powers the applications and

experiences in Fabric. The compute is
separate from the storage.
Data Synapse Data Synapse Data Synapse Data Synapse Real Data
Power BI
Factory Warehousing Engineering Science Time Analytics Activator
Multiple compute engines are available, and
all engines can access the same data without
needing to import or export it. You are able T-SQL Spark Serverless Compute Analysis
KQL
services
to choose the right engine for the right job.
Non-Fabric engines can also read/write

to the same copy of data using the
ADLS APIs or added through shortcuts Finance
Customer
360
Service
telemetry
Business
KPIs
Warehouse Lakehouse Lakehouse Warehouse
No matter which engine or item you use, Workspace A

OneLake Workspace B
everyone contributes to building the same lake.
Unified management and governance
Engines are being optimized to work with
Delta Parquet as their native format
Shortcuts virtualize data across domains and clouds
No data movements or duplication
A shortcut is a symbolic link which points

from one data location to another
Create a shortcut to make data from a Data

Factory
Synapse Data
Warehousing
Synapse Data
Engineering
Synapse Data
Science
Synapse Real
Time Analytics
Power BI
Data
Activator
warehouse part of your lakehouse
Create a shortcut within Fabric to consolidate

data across items or workspaces without
changing the ownership of the data. Data can be Unified management and governance
reused multiple times without data duplication.
Existing ADLS gen2 storage accounts and

Amazon S3 buckets can be managed Finance
Customer
360
Service
telemetry
Business
KPIs
externally to Fabric and Microsoft while still

being virtualized into OneLake with shortcuts
Warehouse Lakehouse
OneLake Lakehouse Warehouse
Workspace A Workspace B
All data is mapped to a unified namespace

and can be accessed using the same APIs Azure Amazon
including the ADLS Gen2 DFS APIs
OneLake Scenarios
Use OneLake with existing data lakes Use and land data directly in OneLkae
OneLake Data Hub
Discover, manage and use data in one place
Central location within Fabric to discover,

manage, and reuse data
Data can be easily discovered by its domain

(e.g. Finance) so users can see what matters
for them
Efficient data discovery using search, filter

and sort
Explorer capability to easily browse and find

data by its folder (workspace) hierarchy
Lakehouse
Lakehouse
Data Source Ingestion Store Expose
Shortcut Enabled Shortcuts Lakehouse(s) PBI
Structured / Pipelines & Lake Warehouse

Unstructured Dataflows
Transform
Notebooks &
Dataflows
Lakehouse – Lakehouse mode
Managed
Unmanaged
Double-click file to view it
Right-click –> Load to Delta table

Right click –> View table files
Table - This is a virtual view of the managed area in your lake. This is the main container to host
tables of all types (CSV, Parquet, Delta, Managed tables and External tables). All tables, whether
automatically or explicitly created, will show up as a table under the managed area of the Lakehouse.
This area can also include any types of files or folder/subfolder organizations.
Files - This is a virtual view of the unmanaged area in your lake. It can contain any files and
folders/subfolder’s structure. The main distinction between the managed area and the unmanaged
area is the automatic delta table detection process which runs over any folders created in the
managed area. Any delta format files (parquet + transaction log) will be automatically registered as a
table and will also be available from the serving layer (TSQL)
Automatic Table Discovery and Registration
Lakehouse Table Automatic discovery and registration is a feature of the lakehouse that provides a fully managed
file to table experience for data engineers and data scientists. Users can drop a file into the managed area of the
lakehouse and the file will be automatically validated for supported structured formats, which is currently only
Delta tables, and registered into the metastore with the necessary metadata such as column names, formats,
compression and more. Users can then reference the file as a table and use SparkSQL syntax to interact with the
data. So don’t need to explicitly call CREATE TABLE statement to create tables to use with SQL
Lakehouse – SQL endpoint mode
NOTE: “Warehouse mode” was renamed “SQL endpoint”
Can query tables (not files).

Cannot modify data SQL Query
Visual Query
Lakehouse – shortcuts (to lakehouse)
Workspaces and capacities accessing OneLake
Workspace A Capacity A
OneLake
Workspace B Capacity B Shortcut Sales
Lakehouse
Workspace C
Each tenant will have only one OneLake, and any tenant can
access files in a OneLake from other tenants via shortcuts
Demo
Data Warehouse
Data warehouse
Data Source Ingestion Store Expose
Shortcut Enabled Mounts Data Warehouse PBI
Structured / Pipelines & Warehouse

Unstructured Dataflows
Transform
Procedures
Synapse Data Warehouse
Infinitely scalable and open
Synapse Data Warehouse in Fabric 1 Open standard format in an open

data lake replaces proprietary
Data Data Data Data
formats as the native storage
Warehouse Warehouse Warehouse Warehouse
• First transactional data warehouse natively
embracing an open standard format
Relational Engine • Data is stored in Delta – Parquet with no

vendor lock-in
Infinite serverless compute • Is auto-integrated and auto-optimized with
minimal knobs
Open Storage Format • Extends full SQL ecosystem benefits

1
in customer owned Data Lake
Synapse Data Warehouse
Infinitely scalable and open
Synapse Data Warehouse in Fabric 2 Dedicated clusters are replaced by

serverless compute infrastructure
Data Data Data Data
Warehouse Warehouse Warehouse Warehouse • Physical compute resources assigned
within milliseconds to jobs
• Infinite scaling with dynamic resource

Relational Engine allocation tailored to data volume and
query complexity
2 Infinite serverless compute
• Instant scaling up/down with no physical
provisioning involved
Open Storage Format • Resource pooling providing significant

1
in customer owned Data Lake efficiencies and pricing
Workspaces and capacities accessing OneLake
Workspace A Capacity A
OneLake
Workspace B Capacity B Sales
Warehouse
Workspace C
Each tenant will have only one OneLake, and any tenant can
access files in a OneLake from other tenants via shortcuts
Data Warehouse
Use this to build a relational layer on top of the physical data

in the Lakehouse and expose it to analysis and reporting
tools using T-SQL/TDS end-point.
This offers a transactional data warehouse with T-SQL DML

support, stored procedures, tables, and views
How can I control “bad actor” queries?

Fabric compute is designed to automatically classify queries
to allocate resources and ensure high priority queries (i.e. ETL,
data preparation, and reporting) are not impacted by
potentially poorly written ad hoc queries.
How is the classification for an incoming query determined?

Queries are intelligently classified by a combination of the
source (i.e., pipeline vs. Power BI) and the query type (I.e.,
INSERT vs. SELECT)
Where is the physical storage for the Data Warehouse? All

data for Fabric is stored in OneLake in the open Delta format.
A single COPY of the data is therefore exposed to all the
compute engines of Fabric without needing to move or
duplicate data
Access via other tools
Demo
Microsoft Fabric
Synapse Data Synapse Data

Engineering Warehousing
Use Spark Notebooks Use SQL Queries &

Stored Procedures
Full T-SQL support
Python R Scala
Write data into Write data into

Lakehouse tables Warehouse tables
Courtesy Simon Whiteley, Advancing Analytics

Why two options?
Delta lake shortcomings:

- No multi-table transactions
- Lack of full T-SQL support (no
updates, limited reads)
- Performance problem for trickle
transactions
Microsoft Fabric

Microsoft Fabric
Bronze Silver Gold

Lakehouse Lakehouse Warehouse

ADF
ADF Review Mapping data flows Wrangling data flows
Standard View
Diagram View
Power BI: Dataflows

Synapse: No
Synapse: Pipelines Synapse: Data flows

ADF Review Mapping data flows Wrangling data flows
Standard View
Data Pipelines Don’t

Dataflow Gen2
Exist Diagram View
Power BI: Dataflows Dataflow Gen1

Synapse: No
Synapse: Pipelines Synapse: Data flows

Data Factory in Fabric
What is Dataflows Gen2?
This is the new generation of Dataflows Gen1. Dataflows provide a low-code

interface for ingesting data from 100s of data sources, transforming your data
using 300+ data transformations and loading the resulting data into multiple
ADF: Power Query ADF Pipelines destinations such as Azure SQL Databases, Lakehouse, and more
PQ UI with the power of ADF (think New interface, but basically same We currently have multiple Dataflows experiences with Power BI Dataflows
of it as the next version of ADF PQ). as ADF
Scale is still Excel/PBI scale, not yet Gen1, Power Query Dataflows and ADF Data flows. What is the strategy with
ADF cloud scale Fabric with these various experiences?
Our goal is to evolve over time with a single Dataflow that combines the ease of
ADF Data flows do not exist in Fabric use of PBI, Power Query and the scale of ADF
Power Query is now called Dataflow Gen2 (which helps in
that Power Query does more than just query). Scalable What is Fabric Pipelines?
Power BI Dataflows are now called Dataflows Gen1
Mounting option available to use ADF mapping data flows
in Fabric (no option for Synapse yet). Can then do
Fabric pipelines enable powerful workflow capabilities at cloud-scale. With data
changes in Fabric (but not in ADF) pipelines, you can build complex workflows that can refresh your dataflow, move
PB-size data, and define sophisticated control flow pipelines. Use data pipelines to
build complex ETL and Data factory workflows that can perform a number of
different tasks at scale. Control flow capabilities are built into pipelines that will
allow you to build workflow logic which provides loops and conditional.
Power BI / DirectLake
For best performance you should compress the data using Benefits:
No more scheduled imports
the VORDER compression method (50%-70% more
compression). Stored this way by ADF by default
Should I use Fabric now?
 Yes, for prototyping
 Yes, if you won’t be in production for several months
 You have to be OK with bugs, missing features, and possible performance issues
 Don’t use if have hundreds of terabytes
If building in Synapse, how to make transition to Fabric smooth?
 Do not use dedicated pools, unless needed for serving and performance
 Don’t use any stored procedures to modify data in dedicated pools
 Use ADF for pipelines and for PowerQuery, and don’t use ADF mapping data flows. Don’t use
Synapse pipelines or mapping data flows
 Embrace the data lakehouse architecture
Resources
Microsoft Fabric webinar series: https://aka.ms/fabric-webinar-series
New documentation: https://aka.ms/fabric-docs. Check out the tutorials.
Data Mesh, Data Fabric, Data Lakehouse – (video from Toronto Data Professional Community on 2/15/23)
Build videos:
Build 2-day demos
Microsoft Fabric Synapse data warehouse, Q&A
My intro blog on Microsoft Fabric (with helpful links at the bottom)
Fabric notes
Advancing Analytics videos
Ask me Anything (AMA) about Microsoft Fabric!

Q&A ?
James Serra, Microsoft, Industry Advisor
Email me at: [email protected]
Follow me at: @JamesSerra
Link to me at: www.linkedin.com/in/JamesSerra
Visit my blog at: JamesSerra.com

Microsoft Fabric - James Serra - Public

Uploaded by

Copyright:

Available Formats

Microsoft Fabric - James Serra - Public

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Microsoft Fabric - James Serra - Public

Uploaded by

Copyright:

Available Formats

A unified analytics solution for the era of AI

Data Factory Synapse Synapse Synapse Synapse Power BI Data Activator

Unified data foundation

5 seconds to signup, 5 minutes to wow

Frictionless onboarding Minimal knobs Tenant-wide governance

Quick results w/ Intuitive UX Auto Integrated Compliance built-in

Supporting experiences: Supporting experiences Supporting experiences Supporting experiences

Data Factory Data Warehouse

Serve data via warehouse or lakehouse

A capacity is a quota-based system, and scaling up or down a capacity doesn't involve

One Security OneLake

OneLake Data Hub Intelligent data fabric

Compute powers the applications and

Non-Fabric engines can also read/write

Warehouse Lakehouse Lakehouse Warehouse

No matter which engine or item you use, Workspace A

A shortcut is a symbolic link which points

Create a shortcut to make data from a Data

Create a shortcut within Fabric to consolidate

Existing ADLS gen2 storage accounts and

externally to Fabric and Microsoft while still

All data is mapped to a unified namespace

Central location within Fabric to discover,

Data can be easily discovered by its domain

Efficient data discovery using search, filter

Explorer capability to easily browse and find

Data Source Ingestion Store Expose

Shortcut Enabled Shortcuts Lakehouse(s) PBI

Structured / Pipelines & Lake Warehouse

Right-click –> Load to Delta table

Can query tables (not files).

Workspace B Capacity B Shortcut Sales

Data Source Ingestion Store Expose

Shortcut Enabled Mounts Data Warehouse PBI

Structured / Pipelines & Warehouse

Synapse Data Warehouse in Fabric 1 Open standard format in an open

Relational Engine • Data is stored in Delta – Parquet with no

Open Storage Format • Extends full SQL ecosystem benefits

Synapse Data Warehouse in Fabric 2 Dedicated clusters are replaced by

• Infinite scaling with dynamic resource

Open Storage Format • Resource pooling providing significant

Workspace B Capacity B Sales

Use this to build a relational layer on top of the physical data

This offers a transactional data warehouse with T-SQL DML

How can I control “bad actor” queries?

How is the classification for an incoming query determined?

Where is the physical storage for the Data Warehouse? All

Synapse Data Synapse Data

Use Spark Notebooks Use SQL Queries &

Full T-SQL support

Write data into Write data into

Courtesy Simon Whiteley, Advancing Analytics

Delta lake shortcomings:

Courtesy Simon Whiteley, Advancing Analytics

Bronze Silver Gold

Courtesy Simon Whiteley, Advancing Analytics

Power BI: Dataflows

Synapse: Pipelines Synapse: Data flows

Data Pipelines Don’t