Azure Synapse Analytics PoC Environment

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 8

Synapse Analytics

PoC Architecture

Overview Network & Authentication & Logging, Monitoring, & Best Practices
Connectivity Authorization Telemetry
Overview

Deployment Instructions
• Login to the Azure Cloud Shell and select Bash (https://shell.azure.com)

• @Azure:~$ git clone https://github.com/shaneochotny/Azure-Synapse-Analytics-PoC


• @Azure:~$ cd Azure-Synapse-Analytics-PoC
• @Azure:~$ bash deploySynapse.sh
Azure Synapse Analytics

Synapse Analytics Workspace with a provisioned


What’s Deployed Dedicated SQL Pool for the Data Warehouse

• Azure Synapse Analytics Workspace


• DW1000 Dedicated SQL Pool

• Azure Data Lake Storage Gen2 Azure Platform


• config container for Azure Synapse Analytics Workspace
• data container for queried/ingested data
Logging Storage Synapse Analytics
• Azure Log Analytics
• Logging and telemetry for Azure Synapse Analytics
• Logging and telemetry for Azure Data Lake Storage Gen2

Log Analytics Data Lake Synapse


What’s Configured
• Enable Result Set Caching (details)

• Create a pipeline to auto pause/resume the Dedicated SQL Pool (details)


Azure Data Lake Storage Gen2
• Feature flag to enable/disable Private Endpoints (details) Storage for Synapse Analytics Workspace
configuration data along with PoC data
• Serverless SQL Demo Data Database

• Proper service and user permissions for Azure Synapse Analytics Workspace and Azure Data Lake
Storage Gen 2 Azure Log Analytics

• Parquet Auto Ingestion pipeline to optimize data ingestion using best practices Logging, monitoring, and telemetry for Azure Synapse
Analytics and Azure Data Lake Storage Gen2
Advanced Deployment: Bicep

Overview
The Bicep and Terraform deployment templates both support the same options and deploy the same
exact environment. They’re simply best practice examples on how to create Synapse templates using
both methods. We do this because some people are simply interested in a PoC environment, while
others are interested in example deployment templates.

Azure Synapse Analytics


Deployment Instructions Synapse Analytics Workspace with a provisioned
Dedicated SQL Pool for the Data Warehouse
• Login to the Azure Cloud Shell and select Bash (https://shell.azure.com)

• @Azure:~$ git clone https://github.com/shaneochotny/Azure-Synapse-Analytics-PoC


• @Azure:~$ cd Azure-Synapse-Analytics-PoC
• @Azure:~$ code Bicep/main.parameters.json Azure Platform
• @Azure:~$ az deployment sub create --template-file Bicep/main.bicep --parameters
Bicep/main.parameters.json --name Azure-Synapse-Analytics-PoC --location eastus
• @Azure:~$ bash deploySynapse.sh Logging Storage Synapse Analytics

Editing main.parameters.json
• azure_region The Azure region that Synapse and all the supporting services should be deploy. Log Analytics Data Lake Synapse
• resource_group_name The resource group that Synapse and all the supporting services will be
deployed into.
• synapse_sql_pool_name Name of the Dedicated SQL Pool database.
• synapse_sql_administrator_login Native SQL account for administration.
• synapse_sql_administrator_password Password for the native SQL account for administration.
This password is also used for the Resource Class Logins. Azure Data Lake Storage Gen2
• synapse_azure_ad_admin_object_id Object ID (GUID) for the Azure AD administrator of Synapse. Storage for Synapse Analytics Workspace
This can also be a group, but only one value can be specified. (i.e. XXXXXXXX-XXXX-XXXX-XXXX- configuration data along with PoC data
XXXXXXXXXXXXXXXXX). "az ad user show --id "[email protected]" --query objectId --
output tsv"
• enable_private_endpoints If true, create Private Endpoints for Synapse Analytics. This assumes
you have other Private Endpoint requirements configured and in place such as virtual networks,
Azure Log Analytics
VPN/Express Route, and private DNS forwarding.
• private_endpoint_virtual_network Name of the Virtual Network where you want to create the
Logging, monitoring, and telemetry for Azure Synapse
Private Endpoints. (i.e. vnet-data-platform)
Analytics and Azure Data Lake Storage Gen2
• private_endpoint_virtual_network_subnet Name of the Subnet within the Virtual Network where
you want to create the Private Endpoints. (i.e. private-endpoint-subnet)
Advanced Deployment: Terraform

Overview
The Bicep and Terraform deployment templates both support the same options and deploy the same
exact environment. They’re simply best practice examples on how to create Synapse templates using
both methods. We do this because some people are simply interested in a PoC environment, while
others are interested in example deployment templates.

Azure Synapse Analytics


Deployment Instructions Synapse Analytics Workspace with a provisioned
Dedicated SQL Pool for the Data Warehouse
• Login to the Azure Cloud Shell and select Bash (https://shell.azure.com)

• @Azure:~$ git clone https://github.com/shaneochotny/Azure-Synapse-Analytics-PoC


• @Azure:~$ cd Azure-Synapse-Analytics-PoC
• @Azure:~$ code Terraform/terraform.tfvars Azure Platform
• @Azure:~$ terraform -chdir=Terraform init
• @Azure:~$ terraform -chdir=Terraform plan
• @Azure:~$ terraform -chdir=Terraform apply Logging Storage Synapse Analytics
• @Azure:~$ bash deploySynapse.sh

Editing terraform.tfvars
Log Analytics Data Lake Synapse
• azure_region The Azure region that Synapse and all the supporting services should be deploy.
• resource_group_name The resource group that Synapse and all the supporting services will be
deployed into.
• synapse_sql_pool_name Name of the Dedicated SQL Pool database.
• synapse_sql_administrator_login Native SQL account for administration.
• synapse_sql_administrator_password Password for the native SQL account for administration. Azure Data Lake Storage Gen2
This password is also used for the Resource Class Logins. Storage for Synapse Analytics Workspace
• synapse_azure_ad_admin_upn UserPrincipcalName (UPN) for the Azure AD administrator of configuration data along with PoC data
Synapse. This can also be a group, but only one value can be specified. (i.e. [email protected])
• enable_private_endpoints If true, create Private Endpoints for Synapse Analytics. This assumes
you have other Private Endpoint requirements configured and in place such as virtual networks,
VPN/Express Route, and private DNS forwarding.
Azure Log Analytics
• private_endpoint_virtual_network Name of the Virtual Network where you want to create the
Private Endpoints. (i.e. vnet-data-platform)
Logging, monitoring, and telemetry for Azure Synapse
• private_endpoint_virtual_network_subnet Name of the Subnet within the Virtual Network where
Analytics and Azure Data Lake Storage Gen2
you want to create the Private Endpoints. (i.e. private-endpoint-subnet)
Networking & Connectivity (Public)

Azure Data Lake Storage Gen2


Authenticated connectivity via the public IP
endpoints. Connectivity restrictions can be applied
via the integrated firewall.

Azure Synapse Analytics


Azure Platform
Authenticated connectivity via the public IP
Azure Log Analytics Logging Storage Synapse Analytics endpoints. Connectivity restrictions can be applied
via the integrated firewall.
Authenticated connectivity via the public IP endpoint.

Log Analytics Data Lake Synapse

Internet
Networking & Connectivity (Private)

Azure Platform Azure Virtual Network Private Endpoints allow platform services (PaaS), such as Azure Data
Lake and Synapse Analytics to be assigned private IP addresses.
This allows for traffic to route over VPN/Express Route like any
other normal internal host.
Service Endpoint Subnet
(10.x.x.x/172.x.x.x) The network configuration can allow you to restrict all access from
public/Internet sources.

Logging Storage Synapse Analytics Private DNS can override the publicly addressed hostnames with
private IP addresses via Conditional Forwarding. Because of the
additional requirements for networking, connectivity, and private
DNS configuration, it is not recommended to deploy this
configuration for a PoC unless you have those requirements already
in place.
Log Analytics Data Lake Synapse

Private Endpoint: 10.x.x.x/172.x.x.x


DNS: name.blob.core.windows.net

Private Endpoint: 10.x.x.x/172.x.x.x


DNS: name.dfs.core.windows.net

Private Endpoint: 10.x.x.x/172.x.x.x


DNS: name.dev.azuresynapse.net

Virtual Network Private Endpoint: 10.x.x.x/172.x.x.x


Internet Peering DNS: name.sql.azuresynapse.net
Connectivity from on-premises to these Azure
platform services can be established through the
public endpoints over the Internet or ExpressRoute.
– OR – – OR –

ExpressRoute VPN / ExpressRoute


Authentication & Authorization

Managed Identity Authentication

Service-level Managed Identity is used for


authentication to Azure Data Lake using the Storage Synapse Analytics Workspace Access
Blob Data Contributor role.
Authentication and authorization is provided by Azure
Storage User Access Active Directory ACL’s using several roles at the Synapse
Analytics Workspace level. (details)
The Dedicated SQL Admin user is automatically
provided the Storage Blob Data Contributor role. Dedicated SQL Admin Access
Additional users/groups will also need to be
provided the Storage Blob Data Contributor role. An Azure Active Directory user or group provides initial
administrative access to the Dedicated SQL Pool and can
be used to define further Azure Active Directory
permissioning. (details)

Dedicated SQL Database Access

Azure Active Directory users and groups can be assigned


database level access. (details)
Azure Management Plane Azure Platform
Dedicated SQL Row/Column Level Security
Authentication and authorization is provided by Azure
Active Directory ACL’s. This allows management and Logging Storage Synapse Analytics Azure Active Directory users and groups can be restricted
configuration of the Data Lake or Synapse Analytics to filtered rows/columns. (details)
Workspace, but not access to the data.

Log Analytics Data Lake Synapse

Internet
Logging, Monitoring, & Telemetry

Synapse Analytics
Azure Platform • RBAC Operations
• Gateway API Requests
Logging Storage Synapse Analytics • Serverless SQL Requests
Logs and telemetry are emitted to Log Analytics for • Integration Pipeline Runs
alerting, reporting, and dashboarding. • Integration Activity Runs
• Integration Trigger Runs
• Dedicated SQL Requests
• Dedicated SQL Request Steps
Log Analytics Data Lake Synapse • Dedicated SQL Execution Steps
• Dedicated SQL DMS Workers
• Dedicated SQL Waits

Storage
• Reads
• Writes
• Deletes
Internet
• Transactions

You might also like