Databricks op Azure opzetten met Terraform — Complete IaC Gids 2026
Leer hoe je een volledig reproduceerbare Databricks-omgeving op Azure opzet met Terraform als Infrastructure as Code. Van workspace provisioning en RBAC tot Unity Catalog governance en CI/CD-integratie — alles in code, niets handmatig.
Databricks IaC implementeren?
Onze IaC-experts helpen je met een productie-klare Terraform setup voor Databricks op Azure
Wat leer je in deze gids
- Waarom Infrastructure as Code voor Databricks?
- Architectuuroverzicht & benodigdheden
- Stap 1 — Terraform project structuur & providers
- Stap 2 — Azure resources provisionen
- Stap 3 — Databricks Workspace aanmaken
- Stap 4 — RBAC: gebruikers, groepen & rollen
- Stap 5 — Unity Catalog: metastore, catalogs & schemas
- Stap 6 — Cluster Policies & Instance Pools
- Stap 7 — Secret Scopes & Azure Key Vault integratie
- Stap 8 — CI/CD met GitHub Actions
- Best practices & veelgemaakte fouten
- Vrijblijvend gesprek inplannen
Waarom Infrastructure as Code voor Databricks?
Veel data-teams starten hun Databricks-omgeving via de Azure Portal: workspace hier, cluster daar, handmatig wat gebruikers toevoegen. Na drie maanden ziet niemand meer waarom bepaalde instellingen zo zijn geconfigureerd, bestaat er geen documentatie, en is de acceptatie-omgeving volledig anders dan productie. Infrastructure as Code (IaC) lost dit op.
Versiebeheer
Elke infrastructuurwijziging is traceerbaar via Git. Wie, wat, wanneer en waarom.
Reproduceerbaar
Dev, test en productie zijn identiek. Nooit meer "werkt alleen in prod".
Rollback
Foutieve wijziging? terraform destroy of revert in Git zet alles terug.
Automatisering
CI/CD pipeline rolt nieuwe omgevingen automatisch uit zonder handmatige stappen.
Compliance
Security policies en RBAC worden afgedwongen via code — geen uitzonderingen mogelijk.
Tijdsbesparing
Nieuwe workspace in <15 minuten. Geen 2 uur klikken in portals.
Architectuuroverzicht & benodigdheden
We bouwen de volgende stack volledig via Terraform:
| Laag | Resource | Terraform Provider |
|---|---|---|
| Azure Infra | Resource Group, VNet, Subnets, Key Vault, Storage Account | hashicorp/azurerm |
| Identity | Azure AD groepen, Service Principal, Managed Identity | hashicorp/azuread |
| Databricks | Workspace, Cluster Policies, Instance Pools | databricks/databricks |
| Governance | Unity Catalog Metastore, Catalogs, Schemas, Grants | databricks/databricks (account) |
| RBAC | Gebruikers, Groepen, Entitlement Toewijzingen | databricks/databricks |
| Secrets | Secret Scopes, Key Vault koppeling | databricks/databricks |
Benodigdheden
- Azure subscription met Owner/Contributor rechten
- Terraform ≥ 1.5 (installatie instructies)
- Azure CLI (
az login) - Databricks account (premium tier voor Unity Catalog)
- Service Principal in Azure AD met juiste permissies
- GitHub repo voor de Terraform code (voor CI/CD)
Stap-voor-stap implementatie
Terraform project structuur & providers configureren
ProjectstructuurEen goede mappenstructuur is de basis van onderhoudbare Terraform code. Gebruik afzonderlijke directories per omgeving:
databricks-iac/ ├── modules/ │ ├── workspace/ # reusable workspace module │ ├── unity-catalog/ # metastore + catalogs │ ├── rbac/ # gebruikers & groepen │ └── cluster-policy/ # cluster templates ├── environments/ │ ├── dev/ │ │ ├── main.tf │ │ ├── variables.tf │ │ └── terraform.tfvars │ └── prod/ │ ├── main.tf │ ├── variables.tf │ └── terraform.tfvars ├── backend.tf # Azure Blob remote state └── providers.tf
Definieer de providers in providers.tf. Je hebt drie providers nodig:
terraform { required_version = ">= 1.5" required_providers { azurerm = { source = "hashicorp/azurerm" version = "~> 3.100" } azuread = { source = "hashicorp/azuread" version = "~> 2.47" } databricks = { source = "databricks/databricks" version = "~> 1.40" } } # Remote state in Azure Blob Storage (zie stap 2) backend "azurerm" { resource_group_name = "rg-terraform-state" storage_account_name = "sttfstatedatapartner" container_name = "tfstate" key = "databricks-prod.tfstate" } } # Azure provider — authenticeren via Service Principal provider "azurerm" { features { key_vault { purge_soft_delete_on_destroy = false } } subscription_id = var.azure_subscription_id client_id = var.sp_client_id client_secret = var.sp_client_secret tenant_id = var.azure_tenant_id } # Azure AD provider (zelfde service principal) provider "azuread" { tenant_id = var.azure_tenant_id client_id = var.sp_client_id client_secret = var.sp_client_secret } # Databricks provider — workspace niveau provider "databricks" { alias = "workspace" host = azurerm_databricks_workspace.main.workspace_url azure_workspace_resource_id = azurerm_databricks_workspace.main.id azure_client_id = var.sp_client_id azure_client_secret = var.sp_client_secret azure_tenant_id = var.azure_tenant_id } # Databricks provider — account niveau (voor Unity Catalog) provider "databricks" { alias = "account" host = "https://accounts.azuredatabricks.net" account_id = var.databricks_account_id azure_client_id = var.sp_client_id azure_client_secret = var.sp_client_secret azure_tenant_id = var.azure_tenant_id }
sp_client_secret en andere gevoelige waarden op als GitHub Secrets of in Azure Key Vault — nooit hardcoded in je .tf bestanden. Gebruik TF_VAR_ environment variabelen in je CI/CD pipeline.Azure resources provisionen (Resource Group, VNet, Storage, Key Vault)
Azure InfraDatabricks op Azure vereist een VNet met twee subnets voor VNet injection (private + public). Dit geeft volledige controle over het netwerk.
# Resource Group resource "azurerm_resource_group" "databricks" { name = "rg-databricks-${var.environment}" location = var.location tags = local.common_tags } # Virtual Network met Databricks subnets resource "azurerm_virtual_network" "databricks" { name = "vnet-databricks-${var.environment}" address_space = ["10.20.0.0/16"] location = azurerm_resource_group.databricks.location resource_group_name = azurerm_resource_group.databricks.name } resource "azurerm_subnet" "private" { name = "snet-databricks-private" resource_group_name = azurerm_resource_group.databricks.name virtual_network_name = azurerm_virtual_network.databricks.name address_prefixes = ["10.20.1.0/24"] delegation { name = "databricks-private" service_delegation { name = "Microsoft.Databricks/workspaces" actions = ["Microsoft.Network/virtualNetworks/subnets/join/action"] } } } resource "azurerm_subnet" "public" { name = "snet-databricks-public" resource_group_name = azurerm_resource_group.databricks.name virtual_network_name = azurerm_virtual_network.databricks.name address_prefixes = ["10.20.2.0/24"] delegation { name = "databricks-public" service_delegation { name = "Microsoft.Databricks/workspaces" actions = ["Microsoft.Network/virtualNetworks/subnets/join/action"] } } } # Azure Key Vault voor secrets resource "azurerm_key_vault" "databricks" { name = "kv-databricks-${var.environment}" location = azurerm_resource_group.databricks.location resource_group_name = azurerm_resource_group.databricks.name tenant_id = var.azure_tenant_id sku_name = "standard" soft_delete_retention_days = 90 purge_protection_enabled = true } # Storage Account voor Unity Catalog metastore data resource "azurerm_storage_account" "unity" { name = "stunitycatalog${var.environment}" resource_group_name = azurerm_resource_group.databricks.name location = azurerm_resource_group.databricks.location account_tier = "Standard" account_replication_type = "LRS" is_hns_enabled = true # ADLS Gen2 } resource "azurerm_storage_container" "metastore" { name = "metastore" storage_account_name = azurerm_storage_account.unity.name }
Databricks Workspace aanmaken
Workspaceresource "azurerm_databricks_workspace" "main" { name = "dbw-datapartner-${var.environment}" resource_group_name = azurerm_resource_group.databricks.name location = azurerm_resource_group.databricks.location sku = "premium" # Vereist voor Unity Catalog # VNet injection voor netwerk-isolatie custom_parameters { virtual_network_id = azurerm_virtual_network.databricks.id private_subnet_name = azurerm_subnet.private.name public_subnet_name = azurerm_subnet.public.name private_subnet_network_security_group_association_id = azurerm_subnet_network_security_group_association.private.id public_subnet_network_security_group_association_id = azurerm_subnet_network_security_group_association.public.id no_public_ip = true # Secure cluster connectivity } tags = local.common_tags } # Output: workspace URL voor andere modules output "workspace_url" { value = azurerm_databricks_workspace.main.workspace_url } output "workspace_id" { value = azurerm_databricks_workspace.main.workspace_id }
standard alleen voor proof-of-concept zonder governance eisen.RBAC: gebruikers, groepen & rollen toewijzen
RBACRole-Based Access Control (RBAC) in Databricks werkt op twee niveaus: workspace-level (admin, gebruiker, service principal) en Unity Catalog-level (grants op catalogs, schemas en tabellen). Beide beheer je volledig via Terraform.
# ── Azure AD groepen aanmaken ────────────────────── resource "azuread_group" "data_engineers" { display_name = "databricks-data-engineers-${var.environment}" security_enabled = true } resource "azuread_group" "data_analysts" { display_name = "databricks-data-analysts-${var.environment}" security_enabled = true } resource "azuread_group" "admins" { display_name = "databricks-workspace-admins" security_enabled = true } # ── Groepen in Databricks workspace registreren ──── resource "databricks_group" "data_engineers" { provider = databricks.workspace display_name = "data-engineers" allow_cluster_create = true } resource "databricks_group" "data_analysts" { provider = databricks.workspace display_name = "data-analysts" allow_cluster_create = false } # ── Gebruiker toevoegen via Terraform ───────────── resource "databricks_user" "engineer_example" { provider = databricks.workspace user_name = "engineer@datapartner365.nl" } resource "databricks_group_member" "engineer_to_group" { provider = databricks.workspace group_id = databricks_group.data_engineers.id member_id = databricks_user.engineer_example.id } # ── Service Principal voor pipelines ────────────── resource "databricks_service_principal" "pipeline_sp" { provider = databricks.workspace application_id = var.sp_client_id display_name = "sp-databricks-pipeline" allow_cluster_create = true } # ── Workspace admin rechten toewijzen ───────────── resource "databricks_group_member" "admin_sp" { provider = databricks.workspace group_id = data.databricks_group.admins.id member_id = databricks_service_principal.pipeline_sp.id }
for_each met een var.users map om meerdere gebruikers dynamisch toe te voegen vanuit een terraform.tfvars bestand. Zo hoef je de Terraform code nooit aan te passen voor nieuwe teamleden.Unity Catalog: metastore, catalogs & schemas
Unity CatalogUnity Catalog is de centrale governance laag van Databricks. Met Terraform stel je de volledige hiërarchie in: Metastore → Catalog → Schema → Table. Grants op elk niveau worden afgedwongen als code.
# ── Metastore aanmaken (account niveau) ─────────── resource "databricks_metastore" "main" { provider = databricks.account name = "metastore-${var.environment}-westeurope" region = "westeurope" storage_root = "abfss://metastore@${azurerm_storage_account.unity.name}.dfs.core.windows.net/" force_destroy = false } # ── Metastore koppelen aan workspace ────────────── resource "databricks_metastore_assignment" "main" { provider = databricks.account metastore_id = databricks_metastore.main.id workspace_id = azurerm_databricks_workspace.main.workspace_id default_catalog_name = "main" } # ── Catalogs per domein ─────────────────────────── resource "databricks_catalog" "bronze" { provider = databricks.workspace metastore_id = databricks_metastore.main.id name = "bronze" comment = "Raw ingested data — read-only voor analysts" } resource "databricks_catalog" "silver" { provider = databricks.workspace metastore_id = databricks_metastore.main.id name = "silver" comment = "Gecleande, gecureerde data" } resource "databricks_catalog" "gold" { provider = databricks.workspace metastore_id = databricks_metastore.main.id name = "gold" comment = "Business-ready data voor BI en rapportages" } # ── Schemas aanmaken ────────────────────────────── resource "databricks_schema" "finance" { provider = databricks.workspace catalog_name = databricks_catalog.gold.name name = "finance" comment = "Finance domein data" } # ── Grants: RBAC op catalog/schema niveau ───────── resource "databricks_grants" "bronze_read" { provider = databricks.workspace catalog = databricks_catalog.bronze.name grant { principal = databricks_group.data_analysts.display_name privileges = ["USE_CATALOG", "USE_SCHEMA", "SELECT"] } } resource "databricks_grants" "silver_engineer" { provider = databricks.workspace catalog = databricks_catalog.silver.name grant { principal = databricks_group.data_engineers.display_name privileges = ["USE_CATALOG", "USE_SCHEMA", "SELECT", "MODIFY", "CREATE_TABLE"] } grant { principal = databricks_group.data_analysts.display_name privileges = ["USE_CATALOG", "USE_SCHEMA", "SELECT"] } }
Cluster Policies & Instance Pools
Cost ControlCluster Policies via Terraform voorkomen dat gebruikers te grote of onjuist geconfigureerde clusters aanmaken. Instance Pools zorgen voor snelle cluster-start en kostenbesparing.
# Instance Pool — voorgewarmde nodes voor snelle start resource "databricks_instance_pool" "standard" { provider = databricks.workspace instance_pool_name = "pool-standard-ds3v2" min_idle_instances = 0 max_capacity = 20 idle_instance_autotermination_minutes = 10 azure_attributes { availability = "SPOT_WITH_FALLBACK_AZURE" spot_bid_max_price = 100 } node_type_id = "Standard_DS3_v2" } # Cluster Policy — Engineers: max 4 workers, autoscaling, auto-terminate resource "databricks_cluster_policy" "engineers" { provider = databricks.workspace name = "policy-data-engineers" definition = jsonencode({ "autoscale.min_workers": { "type": "fixed", "value": 1 }, "autoscale.max_workers": { "type": "range", "maxValue": 4, "defaultValue": 2 }, "autotermination_minutes": { "type": "fixed", "value": 30 }, "instance_pool_id": { "type": "fixed", "value": databricks_instance_pool.standard.id }, "spark_version": { "type": "allowlist", "values": ["15.4.x-scala2.12", "14.3.x-scala2.12"] }, "data_security_mode": { "type": "fixed", "value": "USER_ISOLATION" # Verplicht voor Unity Catalog } }) } # Policy toewijzen aan groep resource "databricks_group_policy" "engineers_policy" { provider = databricks.workspace group_id = databricks_group.data_engineers.id policy_id = databricks_cluster_policy.engineers.id }
Secret Scopes & Azure Key Vault integratie
SecretsDatabricks Secret Scopes koppelen aan Azure Key Vault zodat secrets (API keys, database passwords) nooit in notebooks of code verschijnen. Databricks haalt de waarden automatisch op via de scope-naam.
# Key Vault Secret Scope (Azure-backed) resource "databricks_secret_scope" "keyvault" { provider = databricks.workspace name = "kv-scope" keyvault_metadata { resource_id = azurerm_key_vault.databricks.id dns_name = azurerm_key_vault.databricks.vault_uri } } # Geheimen opslaan in Key Vault via Terraform resource "azurerm_key_vault_secret" "db_password" { name = "db-connection-password" value = var.db_password # uit TF_VAR_ env variabele key_vault_id = azurerm_key_vault.databricks.id } # ACL: alleen data-engineers kunnen de scope lezen resource "databricks_secret_acl" "engineers" { provider = databricks.workspace principal = databricks_group.data_engineers.display_name permission = "READ" scope = databricks_secret_scope.keyvault.name }
In een notebook gebruik je het geheim zo:
# Secret wordt NOOIT zichtbaar in logs of output password = dbutils.secrets.get(scope="kv-scope", key="db-connection-password")
CI/CD met GitHub Actions — automatisch deployen
GitOps
De kracht van IaC komt pas echt tot zijn recht met een CI/CD pipeline. Elke Pull Request triggert een terraform plan
zodat je ziet wat er gaat wijzigen. Na merge naar main wordt automatisch terraform apply uitgevoerd.
name: Terraform Databricks IaC on: push: branches: [main] pull_request: branches: [main] env: TF_VAR_sp_client_id: ${{ secrets.AZURE_CLIENT_ID }} TF_VAR_sp_client_secret: ${{ secrets.AZURE_CLIENT_SECRET }} TF_VAR_azure_tenant_id: ${{ secrets.AZURE_TENANT_ID }} TF_VAR_azure_subscription_id: ${{ secrets.AZURE_SUBSCRIPTION_ID }} TF_VAR_databricks_account_id: ${{ secrets.DATABRICKS_ACCOUNT_ID }} jobs: terraform: runs-on: ubuntu-latest defaults: run: working-directory: environments/prod steps: - uses: actions/checkout@v4 - name: Setup Terraform uses: hashicorp/setup-terraform@v3 with: terraform_version: "1.7.0" - name: Terraform Init run: terraform init - name: Terraform Validate run: terraform validate - name: Terraform Plan (op PR) if: github.event_name == 'pull_request' run: terraform plan -no-color - name: Terraform Apply (op main push) if: github.ref == 'refs/heads/main' && github.event_name == 'push' run: terraform apply -auto-approve
terraform force-unlock alleen als een pipeline is gecrashed.Best practices & veelgemaakte fouten
| Situatie | Fout aanpak | Juiste aanpak (IaC) |
|---|---|---|
| Nieuwe medewerker toevoegen | Handmatig in Databricks UI | databricks_user + PR → CI/CD deploy |
| Cluster aanmaken | UI klik, geen beperkingen | Cluster Policy via Terraform, spots + pool |
| Secrets opslaan | Hardcoded in notebook | Key Vault Secret Scope via Terraform |
| Rechten toewijzen | Individueel per gebruiker | Azure AD groepen + databricks_grants |
| Omgevingen | Één workspace voor alles | Losse workspaces per env, modules hergebruiken |
| State management | Lokale terraform.tfstate |
Azure Blob remote backend + state locking |
terraform init — providers downloaden & backend configurerenterraform plan -out=tfplan — preview wijzigingenterraform apply tfplan — wijzigingen toepassenterraform state list — alle beheerde resources tonenterraform import — bestaande resources importeren in state
Plan een vrijblijvend gesprek in
Wil je Databricks op Azure opzetten met Terraform maar weet je niet waar te beginnen? Plan een gratis gesprek van 30 minuten in — ik kijk mee naar jouw situatie en geef direct concrete adviezen.