Azure / azure-synapse-analytics-end2end
- четверг, 13 января 2022 г. в 00:28:57
Azure Analytics End to End with Azure Synapse - Deployment Accelerator
This is a deployment accelerator based on the reference architecture described in the Azure Architecture Centre article Analytics end-to-end with Azure Synapse. This deployment accelerator aims to automate not only the deployment of the services covered by the reference architecture, but also to fully automate the configuration and permissions required for the services to work together. The deployed architecture enables the end-to-end analytics platform capable of handling the most common uses cases for most organizations.
The implementation of this deployment accelerator is done through the use of Azure Bicep, a domain-specific language (DSL) that uses declarative syntax to deploy Azure resources.
Before you hit the deploy button, make sure you review the details about the services deployed.
You can also use Azure CLI to deploy the services:
For a full deployment of all workloads with public endpoints use the command below:
az deployment group create --resource-group resource-group-name --template-file ./AzureAnalyticsE2E.bicep --parameters synapseSqlAdminPassword=use-complex-password-here
For a full deployment of all workloads with vNet integrated endpoints use the command below:
az deployment group create --resource-group resource-group-name --template-file ./AzureAnalyticsE2E.bicep --parameters networkIsolationMode=vNet synapseSqlAdminPassword=use-complex-password-here
You can have more control over the deployment by providing values to optional template parameters in the form of:
az deployment group create --resource-group resource-group-name --template-file ./AzureAnalyticsE2E.bicep --parameters synapseSqlAdminPassword=use-complex-password-here param1=value1 param2=value2...
The target subscription for the deployment accelerator needs to have the following resource providers enabled before the deployment execution:
The deployment accelerator can be deployed in two network isolation modes: default or vNet.
Network Isolation Mode | Description |
---|---|
default | Deploys the selected components to Azure using public endpoints. |
vNet | Deploys the selected components to Azure and the additional services to support private connectivity and restricted inter-service connectivity where possible. This includes provisioning and configuration of virtual networks, managed virtual network deployments for Azure Synapse Analytics, the private endpoints for all services that support Private Link and the supporting Private DNS Zones. |
The scope of this deployment accelerator is illustrated in the diagram below.
Important: All services are deployed in a single resource group and in the same region as the resource group. Before creating the resource group that will host the workloads, check the Azure Products by Region and select a region that has all selected services available. The deployment will fail if any of the services is not available in the chosen region.
The Azure services used in the architecture above have been divided into workloads that can be conditionally deployed based on input parameters to better suit the needs of the workload.
Note: The only mandatory workload is Synapse Analytics represented in the grey box in the diagram.
The following services are part of the deployment accelerator and they will all be deployed in a single resource group and in the region where the resource group was defined.
The default pricing tier for all services are provisioned are their lowest possible to meet the initial deployment requirements. If you choose to provide different different values to the input parameters, please observe the pricing information for each service in the table below.
Important: For a fully automated deployment and configuration of Synapse Analytics and Purview the deployment accelerator makes use of post-deployment PowerShell scripts to perform data plane operations. For more details about the scripts see the deployment accelerator documentation.
If explicit names are not provided, all services names will be appended with a unique 5-letter suffix to ensure name uniqueness in Azure.
Workload | Name | Type | Default Pricing Tier | Pricing Info | Conditional | Notes |
---|---|---|---|---|---|---|
Platform Services | az-resource group name-uami | Managed Identity | N/A | No | Required to run post-deployment scripts. Should be deleted once deployment is complete. | |
Synapse Analytics | azsynapsewkssuffix | Synapse workspace | N/A | Azure Synapse Analytics pricing | No | Default workspace deployment doesn't incur costs. |
Synapse Analytics | SparkCluster | Apache Spark pool | Small (3 nodes) | Azure Synapse Analytics pricing | Yes | |
Synapse Analytics | EnterpriseDW | Synapse SQL pool | DW100 | Azure Synapse Analytics pricing | Yes | |
Synapse Analytics | adxpoolsuffix | Data Explorer pool | Extra Small (2 nodes) | Azure Synapse Analytics pricing | Yes | |
Synapse Analytics | azwksdatalakesuffix | Storage account | Standard LRS | Azure Blob Storage pricing | No | |
Synapse Analytics | azrawdatalakesuffix | Storage account | Standard GRS | Azure Blob Storage pricing | No | |
Synapse Analytics | azcurateddatalakesuffix | Storage account | Standard GRS | Azure Blob Storage pricing | No | |
Platform Services | azkeyvaultsuffix | Key vault | Standard A | Azure Key Vault pricing | No | |
Synapse Analytics | SynapsePostDeploymentScript | Deployment Script | N/A | No | Deployment script resources will be automatically deleted after 24hs. | |
Data Governance | azpurviewsuffix | Purview account | 1 Capacity Unit | Azure Purview pricing | Yes | |
Data Governance | PurviewPostDeploymentScript | Deployment Script | N/A | Yes | Deployment script resources will be automatically deleted after 24hs. | |
AI | azanomalydetectorsuffix | Anomaly detector | Standard | Anomaly Detector pricing | Yes | |
AI | aztextanalytics*suffix | Language | Standard | Language Service pricing | Yes | |
AI | azmlwkssuffix | Machine learning workspace | N/A | Azure Machine Learning pricing | Yes | Default workspace deployment doesn't incur costs. |
AI | azmlstoragesuffix | Storage account | Standard LRS | Azure Blob Storage pricing | Yes | |
AI | azmlcontainerregsuffix | Container registry | Basic or Premium (see notes) | Azure Container Registry pricing | Yes | Premium service tier required for private link support |
AI | azmlappinsightssuffix | Application Insights | On-demand data ingestion charges | Azure Container Registry pricing | Yes | |
Data Share | azdatasharesuffix | Data Share | On-demand data processing charges | Azure Data Share pricing | Yes | |
Streaming | azeventhubnssuffix | Event Hub namespace | Basic | Azure Event Hubs pricing | Yes | |
Streaming | aziothubsuffix | IoT Hub | Free | Azure IoT Hub pricing | Yes | |
Streaming | azstreamjobsuffix | Stream Analytics job | Standard | Azure Stream Analytics pricing | Yes |
Beyond the deployment of the services that make up the reference architecture, this template also automates the configuration of connections and permissions between the services in order for the to work properly. Every arrow you see in the diagram above represents a configuration step that has been automated for you saving you a lot of time to get to insights.
Each connection and permission in the list below has been implemented following the technical documentation for the services involved below. Check the reference documentation links below for more information about them.
These are the service connections explicitly defined in deployment accelerator template. These connections represent the necessary configuration for the services to be fully integrated and work well together. Note that these connections may result in implicit RBAC permissions set between resources participating in the connection that are not in the permission list below. Check the reference documentation of each service connection below for more information.
Beyond the service connections created above, the deployment accelerator template defined Azure RBAC permissions between the services. These are the minimum level of permissions granted to their system-assigned identity (MSI) for the integration to function properly. These are the Azure RBAC permissions explicitly set by the template and the reason for these permissions to exist is describer in the reference documentation for each one of them.
ID | Granted To Service | Granted On Service | Permission Level | Reference Documentation |
---|---|---|---|---|
![]() |
azsynapsewkssuffix | azwksdatalakesuffix | Storage Blob Data Contributor | Grant permissions to workspace managed identity |
![]() |
azpurviewsuffix | azsynapsewkssuffix | Reader | Connect to and manage Azure Synapse Analytics workspaces in Azure Purview |
![]() |
azsynapsewkssuffix | azrawdatalakesuffix, azcurateddatalakesuffix | Storage Blob Data Contributor | Grant permissions to workspace managed identity |
![]() |
azsynapsewkssuffix | azmlwkssuffix | Contributor | Create a new Azure Machine Learning linked service in Synapse |
![]() |
azpurviewsuffix | azrawdatalakesuffix, azcurateddatalakesuffix | Storage Blob Data Reader | Connect to Azure Data Lake Gen2 in Azure Purview |
![]() |
azdatasharesuffix | azrawdatalakesuffix, azcurateddatalakesuffix | Storage Blob Data Reader | Roles and requirements for Azure Data Share |
![]() |
azmlwkssuffix | azrawdatalakesuffix, azcurateddatalakesuffix | Storage Blob Data Reader | Connect to storage by using identity-based data access |
![]() |
azstreamjobsuffix | azrawdatalakesuffix, azcurateddatalakesuffix | Storage Blob Data Contributor | Use Managed Identity to authenticate your Azure Stream Analytics job to Azure Blob Storage |
![]() |
aziothubsuffix | azrawdatalakesuffix, azcurateddatalakesuffix | Storage Blob Data Contributor | |
![]() |
azstreamjobsuffix | azeventhubnssuffix | Event Hub Data Owner | Use managed identities to access Event Hub from an Azure Stream Analytics job |
![]() |
azstreamjobsuffix | aziothubsuffix | IoT Hub Data Receiver | Control access to IoT Hub by using Azure Active Directory |
![]() |
azpurviewsuffix | Resource Group | Storage Blob Data Reader | Connect to and manage Azure Synapse Analytics workspaces in Azure Purview |
ID | Granted to Service | Granted On Service | Permission Level | Reference Documentation |
---|---|---|---|---|
![]() |
azsynapsewkssuffix | azkeyvaultsuffix | Get and List Secrets | Use Azure Key Vault secrets in pipeline activities |
![]() |
azpurviewsuffix | azkeyvaultsuffix | Get and List Secrets | Credentials for source authentication in Azure Purview |
![]() |
azmlwkssuffix | azsynapsewkssuffix | Synapse Apache Spark Administrator | Link Azure Synapse Analytics and Azure Machine Learning workspaces and attach Apache Spark pools |
![]() |
azsynapewkssuffix | azpurviewsuffix | Data Curator | Connect a Synapse workspace to an Azure Purview account |
![]() |
azdatasharesuffix | azpurviewsuffix | Data Curator | How to connect Azure Data Share and Azure Purview |
If you choose for a 'vNet Integrated' network isolation mode then the following applies:
The following extra services will be deployed to support the private connectivity configuration:
Component | Name | Type | Optional |
---|---|---|---|
Synapse Analytics | privatelink.azuresynapse.net | Private DNS Zone | Yes |
Synapse Analytics | privatelink.dev.azuresynapse.net | Private DNS Zone | Yes |
Synapse Analytics | privatelink.azuresynapse.net | Private DNS Zone | Yes |
Synapse Analytics | privatelink.sql.azuresynapse.net | Private DNS Zone | Yes |
Synapse Analytics | privatelink.dfs.core.windows.net | Private DNS Zone | Yes |
Synapse Analytics | privatelink.vaultcore.azure.net | Private DNS Zone | Yes |
AI | privatelink.api.azureml.ms | Private DNS Zone | Yes |
AI | privatelink.azurecr.io | Private DNS Zone | Yes |
AI | privatelink.file.core.windows.net | Private DNS Zone | Yes |
AI | privatelink.notebooks.azure.net | Private DNS Zone | Yes |
Data Governance | privatelink.queue.core.windows.net | Private DNS Zone | Yes |
Data Governance | privatelink.servicebus.windows.net | Private DNS Zone | Yes |
Data Governance | privatelink.blob.core.windows.net | Private DNS Zone | Yes |
Data Governance | privatelink.purview.azure.com | Private DNS Zone | Yes |
Streaming | privatelink.azure-devices.net | Private DNS Zone | Yes |
Synapse Analytics | azvnetsuffix | Virtual Network | No |
Synapse Analytics | azsynapsehubsuffix | Synapse private link hub | No |
Synapse Analytics | azsynapsewkssuffix-web | Private Endpoint | No |
Synapse Analytics | azsynapsewkssuffix-sqlserverless | Private Endpoint | No |
Synapse Analytics | azsynapsewkssuffix-sql | Private Endpoint | No |
Synapse Analytics | azsynapsewkssuffix-dev | Private Endpoint | No |
Synapse Analytics | azkeyvaultsuffix | Private Endpoint | No |
Synapse Analytics | azwksdatalakesuffix-dfs | Private Endpoint | No |
Synapse Analytics | azrawdatalakesuffix-dfs | Private Endpoint | No |
Synapse Analytics | azcurateddatalakesuffix-dfs | Private Endpoint | No |
Data Governance | azpurviewsuffix-queue | Private Endpoint | No |
Data Governance | azpurviewsuffix-portal | Private Endpoint | No |
Data Governance | azpurviewsuffix-namespace | Private Endpoint | No |
Data Governance | azpurviewsuffix-blob | Private Endpoint | No |
Data Governance | azpurviewsuffix-account | Private Endpoint | No |
AI | aztextanalyticssuffix-account | Private Endpoint | No |
AI | azanomalydetectorsuffix-account | Private Endpoint | No |
AI | azmlwkssuffix-amlworkspace | Private Endpoint | No |
AI | azmlstoragesuffix-file | Private Endpoint | No |
AI | azmlstoragesuffix-blob | Private Endpoint | No |
AI | azmlcontainerregsuffix-registry | Private Endpoint | No |
Streaming | azeventhubnssuffix-namespace | Private Endpoint | No |
Streaming | azeiothubsuffix-iothub | Private Endpoint | No |
Beyond the extra services above required to support the network isolation mode, the following network settings are applied to the services:
Workload | Name | Type | Network Settings | Notes | Reference Documentation |
---|---|---|---|---|---|
Platform Services | azkeyvaultsuffix | Key vault | ![]() ![]() |
'Allow Azure Services' required for access from Azure Purview and Azure ML | Configure Azure Key Vault networking settings |
Synapse Analytics | azsynapsewkssuffix | Synapse workspace | ![]() |
Managed Virtual Network enabled | Understanding Azure Synapse Private Endpoints |
Synapse Analytics | azwksdatalakesuffix | Storage account | ![]() ![]() |
Configure Azure Storage firewalls and virtual networks | |
Synapse Analytics | azrawdatalakesuffix | Storage account | ![]() ![]() ![]() |
'Allow Azure Services' enabled only when deploying Streaming workloads with Event Hubs | Configure Azure Storage firewalls and virtual networks |
Synapse Analytics | azcurateddatalakesuffix | Storage account | ![]() ![]() ![]() |
'Allow Azure Services' enabled only when deploying Streaming workloads with Event Hubs | Configure Azure Storage firewalls and virtual networks |
Data Governance | azpurviewsuffix | Purview account | ![]() |
Connect to your Azure Purview and scan data sources privately and securely | |
AI | azanomalydetectorsuffix | Anomaly detector | ![]() |
Configure Azure Cognitive Services virtual networks | |
AI | aztextanalyticssuffix | Language | ![]() |
Configure Azure Cognitive Services virtual networks | |
AI | azmlwkssuffix | Machine learning workspace | ![]() |
Secure Azure Machine Learning workspace resources using virtual networks (VNets) | |
AI | azmlstoragesuffix | Storage account | ![]() ![]() |
Secure an Azure Machine Learning workspace with virtual networks | |
AI | azmlcontainerregsuffix | Container registry | ![]() ![]() |
Secure an Azure Machine Learning workspace with virtual networks | |
Streaming | azeventhubnssuffix | Event Hub namespace | ![]() ![]() |
Network security for Azure Event Hubs | |
Streaming | aziothubsuffix | IoT Hub | ![]() |
IoT Hub support for virtual networks with Private Link and Managed Identity | |
Streaming | azstreamjobsuffix | Stream Analytics job | Stream Analytics Jobs don't support vNet integration. For that you should use Stream Analytics Clusters |
If you would like to contribute to the solution (log bugs, issues, or add code) we have details on how to do that in our CONTRIBUTING.md file.
Details on licensing for the project can be found in the LICENSE file.