Waarom Terraform gebruiken voor Databricks in plaats van de Azure Portal?

Terraform zorgt voor reproduceerbare, versiebeheerde infrastructuur. Je kunt identieke omgevingen (dev/test/prod) deployen met één commando, wijzigingen tracken via Git, en rollbacks uitvoeren. Handmatige portal-clicks zijn foutgevoelig en niet herhaalbaar.

Welke Terraform providers heb je nodig voor Databricks op Azure?

Je hebt twee providers nodig: 'hashicorp/azurerm' voor Azure resources (resource groups, VNets, Key Vault) en 'databricks/databricks' voor Databricks-specifieke resources (workspaces, clusters, Unity Catalog, gebruikers en RBAC).

Hoe configureer je Unity Catalog met Terraform?

Unity Catalog vereist een Databricks account-level configuratie via de 'databricks/databricks' provider op account scope. Je maakt een metastore aan, koppelt deze aan de workspace, en configureert catalogs, schemas en grants via Terraform resources zoals databricks_metastore, databricks_catalog en databricks_grants.

IaC Serie — Deel 1

Databricks op Azure opzetten met Terraform — Complete IaC Gids 2026

23 maart 2026

Leestijd: 35 minuten

Terraform · IaC · Databricks · Azure · Unity Catalog · RBAC · DevOps

Leer hoe je een volledig reproduceerbare Databricks-omgeving op Azure opzet met Terraform als Infrastructure as Code. Van workspace provisioning en RBAC tot Unity Catalog governance en CI/CD-integratie — alles in code, niets handmatig.

Databricks IaC implementeren?

Onze IaC-experts helpen je met een productie-klare Terraform setup voor Databricks op Azure

Gratis gesprek inplannen Databricks gids

Wat leer je in deze gids

Waarom Infrastructure as Code voor Databricks?
Architectuuroverzicht & benodigdheden
Stap 1 — Terraform project structuur & providers
Stap 2 — Azure resources provisionen
Stap 3 — Databricks Workspace aanmaken
Stap 4 — RBAC: gebruikers, groepen & rollen
Stap 5 — Unity Catalog: metastore, catalogs & schemas
Stap 6 — Cluster Policies & Instance Pools
Stap 7 — Secret Scopes & Azure Key Vault integratie
Stap 8 — CI/CD met GitHub Actions
Best practices & veelgemaakte fouten
Vrijblijvend gesprek inplannen

Waarom Infrastructure as Code voor Databricks?

Veel data-teams starten hun Databricks-omgeving via de Azure Portal: workspace hier, cluster daar, handmatig wat gebruikers toevoegen. Na drie maanden ziet niemand meer waarom bepaalde instellingen zo zijn geconfigureerd, bestaat er geen documentatie, en is de acceptatie-omgeving volledig anders dan productie. Infrastructure as Code (IaC) lost dit op.

Versiebeheer

Elke infrastructuurwijziging is traceerbaar via Git. Wie, wat, wanneer en waarom.

Reproduceerbaar

Dev, test en productie zijn identiek. Nooit meer "werkt alleen in prod".

Rollback

Foutieve wijziging? terraform destroy of revert in Git zet alles terug.

Automatisering

CI/CD pipeline rolt nieuwe omgevingen automatisch uit zonder handmatige stappen.

Compliance

Security policies en RBAC worden afgedwongen via code — geen uitzonderingen mogelijk.

Tijdsbesparing

Nieuwe workspace in <15 minuten. Geen 2 uur klikken in portals.

IaC als standaard: Gartner voorspelt dat in 2027 meer dan 80% van enterprise data platforms volledig via IaC worden beheerd. Nu investeren in Terraform voor Databricks is investeren in de toekomst van je dataplatform.

Architectuuroverzicht & benodigdheden

We bouwen de volgende stack volledig via Terraform:

Laag	Resource	Terraform Provider
Azure Infra	Resource Group, VNet, Subnets, Key Vault, Storage Account	hashicorp/azurerm
Identity	Azure AD groepen, Service Principal, Managed Identity	hashicorp/azuread
Databricks	Workspace, Cluster Policies, Instance Pools	databricks/databricks
Governance	Unity Catalog Metastore, Catalogs, Schemas, Grants	databricks/databricks (account)
RBAC	Gebruikers, Groepen, Entitlement Toewijzingen	databricks/databricks
Secrets	Secret Scopes, Key Vault koppeling	databricks/databricks

Benodigdheden

Azure subscription met Owner/Contributor rechten
Terraform ≥ 1.5 (installatie instructies)
Azure CLI (az login)
Databricks account (premium tier voor Unity Catalog)
Service Principal in Azure AD met juiste permissies
GitHub repo voor de Terraform code (voor CI/CD)

Stap-voor-stap implementatie

Terraform project structuur & providers configureren

Projectstructuur

Een goede mappenstructuur is de basis van onderhoudbare Terraform code. Gebruik afzonderlijke directories per omgeving:

mappenstructuur

databricks-iac/
├── modules/
│   ├── workspace/          # reusable workspace module
│   ├── unity-catalog/      # metastore + catalogs
│   ├── rbac/               # gebruikers & groepen
│   └── cluster-policy/     # cluster templates
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── terraform.tfvars
│   └── prod/
│       ├── main.tf
│       ├── variables.tf
│       └── terraform.tfvars
├── backend.tf              # Azure Blob remote state
└── providers.tf

Definieer de providers in providers.tf. Je hebt drie providers nodig:

providers.tf

terraform {
  required_version = ">= 1.5"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.100"
    }
    azuread = {
      source  = "hashicorp/azuread"
      version = "~> 2.47"
    }
    databricks = {
      source  = "databricks/databricks"
      version = "~> 1.40"
    }
  }

  # Remote state in Azure Blob Storage (zie stap 2)
  backend "azurerm" {
    resource_group_name  = "rg-terraform-state"
    storage_account_name = "sttfstatedatapartner"
    container_name       = "tfstate"
    key                  = "databricks-prod.tfstate"
  }
}

# Azure provider — authenticeren via Service Principal
provider "azurerm" {
  features {
    key_vault {
      purge_soft_delete_on_destroy = false
    }
  }
  subscription_id = var.azure_subscription_id
  client_id       = var.sp_client_id
  client_secret   = var.sp_client_secret
  tenant_id       = var.azure_tenant_id
}

# Azure AD provider (zelfde service principal)
provider "azuread" {
  tenant_id     = var.azure_tenant_id
  client_id     = var.sp_client_id
  client_secret = var.sp_client_secret
}

# Databricks provider — workspace niveau
provider "databricks" {
  alias = "workspace"
  host  = azurerm_databricks_workspace.main.workspace_url
  azure_workspace_resource_id = azurerm_databricks_workspace.main.id
  azure_client_id       = var.sp_client_id
  azure_client_secret   = var.sp_client_secret
  azure_tenant_id       = var.azure_tenant_id
}

# Databricks provider — account niveau (voor Unity Catalog)
provider "databricks" {
  alias      = "account"
  host       = "https://accounts.azuredatabricks.net"
  account_id = var.databricks_account_id
  azure_client_id     = var.sp_client_id
  azure_client_secret = var.sp_client_secret
  azure_tenant_id     = var.azure_tenant_id
}

Nooit secrets in code: Sla sp_client_secret en andere gevoelige waarden op als GitHub Secrets of in Azure Key Vault — nooit hardcoded in je .tf bestanden. Gebruik TF_VAR_ environment variabelen in je CI/CD pipeline.

Azure resources provisionen (Resource Group, VNet, Storage, Key Vault)

Azure Infra

Databricks op Azure vereist een VNet met twee subnets voor VNet injection (private + public). Dit geeft volledige controle over het netwerk.

modules/workspace/azure_infra.tf

# Resource Group
resource "azurerm_resource_group" "databricks" {
  name     = "rg-databricks-${var.environment}"
  location = var.location
  tags     = local.common_tags
}

# Virtual Network met Databricks subnets
resource "azurerm_virtual_network" "databricks" {
  name                = "vnet-databricks-${var.environment}"
  address_space       = ["10.20.0.0/16"]
  location            = azurerm_resource_group.databricks.location
  resource_group_name = azurerm_resource_group.databricks.name
}

resource "azurerm_subnet" "private" {
  name                 = "snet-databricks-private"
  resource_group_name  = azurerm_resource_group.databricks.name
  virtual_network_name = azurerm_virtual_network.databricks.name
  address_prefixes     = ["10.20.1.0/24"]

  delegation {
    name = "databricks-private"
    service_delegation {
      name = "Microsoft.Databricks/workspaces"
      actions = ["Microsoft.Network/virtualNetworks/subnets/join/action"]
    }
  }
}

resource "azurerm_subnet" "public" {
  name                 = "snet-databricks-public"
  resource_group_name  = azurerm_resource_group.databricks.name
  virtual_network_name = azurerm_virtual_network.databricks.name
  address_prefixes     = ["10.20.2.0/24"]

  delegation {
    name = "databricks-public"
    service_delegation {
      name = "Microsoft.Databricks/workspaces"
      actions = ["Microsoft.Network/virtualNetworks/subnets/join/action"]
    }
  }
}

# Azure Key Vault voor secrets
resource "azurerm_key_vault" "databricks" {
  name                = "kv-databricks-${var.environment}"
  location            = azurerm_resource_group.databricks.location
  resource_group_name = azurerm_resource_group.databricks.name
  tenant_id           = var.azure_tenant_id
  sku_name            = "standard"
  soft_delete_retention_days = 90
  purge_protection_enabled   = true
}

# Storage Account voor Unity Catalog metastore data
resource "azurerm_storage_account" "unity" {
  name                     = "stunitycatalog${var.environment}"
  resource_group_name      = azurerm_resource_group.databricks.name
  location                 = azurerm_resource_group.databricks.location
  account_tier             = "Standard"
  account_replication_type = "LRS"
  is_hns_enabled           = true  # ADLS Gen2
}

resource "azurerm_storage_container" "metastore" {
  name                 = "metastore"
  storage_account_name = azurerm_storage_account.unity.name
}

Databricks Workspace aanmaken

Workspace

modules/workspace/databricks_workspace.tf

resource "azurerm_databricks_workspace" "main" {
  name                = "dbw-datapartner-${var.environment}"
  resource_group_name = azurerm_resource_group.databricks.name
  location            = azurerm_resource_group.databricks.location
  sku                 = "premium"  # Vereist voor Unity Catalog

  # VNet injection voor netwerk-isolatie
  custom_parameters {
    virtual_network_id                                  = azurerm_virtual_network.databricks.id
    private_subnet_name                                 = azurerm_subnet.private.name
    public_subnet_name                                  = azurerm_subnet.public.name
    private_subnet_network_security_group_association_id = azurerm_subnet_network_security_group_association.private.id
    public_subnet_network_security_group_association_id  = azurerm_subnet_network_security_group_association.public.id
    no_public_ip                                        = true  # Secure cluster connectivity
  }

  tags = local.common_tags
}

# Output: workspace URL voor andere modules
output "workspace_url" {
  value = azurerm_databricks_workspace.main.workspace_url
}
output "workspace_id" {
  value = azurerm_databricks_workspace.main.workspace_id
}

Premium SKU is verplicht voor Unity Catalog, RBAC op workspace-niveau en Serverless SQL Warehouses. Gebruik standard alleen voor proof-of-concept zonder governance eisen.

RBAC: gebruikers, groepen & rollen toewijzen

RBAC

Role-Based Access Control (RBAC) in Databricks werkt op twee niveaus: workspace-level (admin, gebruiker, service principal) en Unity Catalog-level (grants op catalogs, schemas en tabellen). Beide beheer je volledig via Terraform.

modules/rbac/main.tf

# ── Azure AD groepen aanmaken ──────────────────────
resource "azuread_group" "data_engineers" {
  display_name     = "databricks-data-engineers-${var.environment}"
  security_enabled = true
}

resource "azuread_group" "data_analysts" {
  display_name     = "databricks-data-analysts-${var.environment}"
  security_enabled = true
}

resource "azuread_group" "admins" {
  display_name     = "databricks-workspace-admins"
  security_enabled = true
}

# ── Groepen in Databricks workspace registreren ────
resource "databricks_group" "data_engineers" {
  provider     = databricks.workspace
  display_name = "data-engineers"
  allow_cluster_create = true
}

resource "databricks_group" "data_analysts" {
  provider     = databricks.workspace
  display_name = "data-analysts"
  allow_cluster_create = false
}

# ── Gebruiker toevoegen via Terraform ─────────────
resource "databricks_user" "engineer_example" {
  provider  = databricks.workspace
  user_name = "engineer@datapartner365.nl"
}

resource "databricks_group_member" "engineer_to_group" {
  provider  = databricks.workspace
  group_id  = databricks_group.data_engineers.id
  member_id = databricks_user.engineer_example.id
}

# ── Service Principal voor pipelines ──────────────
resource "databricks_service_principal" "pipeline_sp" {
  provider       = databricks.workspace
  application_id = var.sp_client_id
  display_name   = "sp-databricks-pipeline"
  allow_cluster_create = true
}

# ── Workspace admin rechten toewijzen ─────────────
resource "databricks_group_member" "admin_sp" {
  provider  = databricks.workspace
  group_id  = data.databricks_group.admins.id
  member_id = databricks_service_principal.pipeline_sp.id
}

Tip: Gebruik for_each met een var.users map om meerdere gebruikers dynamisch toe te voegen vanuit een terraform.tfvars bestand. Zo hoef je de Terraform code nooit aan te passen voor nieuwe teamleden.

Unity Catalog: metastore, catalogs & schemas

Unity Catalog

Unity Catalog is de centrale governance laag van Databricks. Met Terraform stel je de volledige hiërarchie in: Metastore → Catalog → Schema → Table. Grants op elk niveau worden afgedwongen als code.

modules/unity-catalog/main.tf

# ── Metastore aanmaken (account niveau) ───────────
resource "databricks_metastore" "main" {
  provider      = databricks.account
  name          = "metastore-${var.environment}-westeurope"
  region        = "westeurope"
  storage_root  = "abfss://metastore@${azurerm_storage_account.unity.name}.dfs.core.windows.net/"
  force_destroy = false
}

# ── Metastore koppelen aan workspace ──────────────
resource "databricks_metastore_assignment" "main" {
  provider             = databricks.account
  metastore_id         = databricks_metastore.main.id
  workspace_id         = azurerm_databricks_workspace.main.workspace_id
  default_catalog_name = "main"
}

# ── Catalogs per domein ───────────────────────────
resource "databricks_catalog" "bronze" {
  provider     = databricks.workspace
  metastore_id = databricks_metastore.main.id
  name         = "bronze"
  comment      = "Raw ingested data — read-only voor analysts"
}

resource "databricks_catalog" "silver" {
  provider     = databricks.workspace
  metastore_id = databricks_metastore.main.id
  name         = "silver"
  comment      = "Gecleande, gecureerde data"
}

resource "databricks_catalog" "gold" {
  provider     = databricks.workspace
  metastore_id = databricks_metastore.main.id
  name         = "gold"
  comment      = "Business-ready data voor BI en rapportages"
}

# ── Schemas aanmaken ──────────────────────────────
resource "databricks_schema" "finance" {
  provider     = databricks.workspace
  catalog_name = databricks_catalog.gold.name
  name         = "finance"
  comment      = "Finance domein data"
}

# ── Grants: RBAC op catalog/schema niveau ─────────
resource "databricks_grants" "bronze_read" {
  provider = databricks.workspace
  catalog  = databricks_catalog.bronze.name

  grant {
    principal  = databricks_group.data_analysts.display_name
    privileges = ["USE_CATALOG", "USE_SCHEMA", "SELECT"]
  }
}

resource "databricks_grants" "silver_engineer" {
  provider = databricks.workspace
  catalog  = databricks_catalog.silver.name

  grant {
    principal  = databricks_group.data_engineers.display_name
    privileges = ["USE_CATALOG", "USE_SCHEMA", "SELECT", "MODIFY", "CREATE_TABLE"]
  }

  grant {
    principal  = databricks_group.data_analysts.display_name
    privileges = ["USE_CATALOG", "USE_SCHEMA", "SELECT"]
  }
}

Medallion Architecture via Terraform: Door Bronze/Silver/Gold catalogs te definiëren als code en grants te koppelen aan Azure AD groepen, dwing je de medallion architecture automatisch af. Nieuwe medewerkers krijgen automatisch de juiste rechten zodra ze in de juiste Azure AD groep zitten.

Cluster Policies & Instance Pools

Cost Control

Cluster Policies via Terraform voorkomen dat gebruikers te grote of onjuist geconfigureerde clusters aanmaken. Instance Pools zorgen voor snelle cluster-start en kostenbesparing.

modules/cluster-policy/main.tf

# Instance Pool — voorgewarmde nodes voor snelle start
resource "databricks_instance_pool" "standard" {
  provider                          = databricks.workspace
  instance_pool_name                = "pool-standard-ds3v2"
  min_idle_instances                = 0
  max_capacity                      = 20
  idle_instance_autotermination_minutes = 10

  azure_attributes {
    availability       = "SPOT_WITH_FALLBACK_AZURE"
    spot_bid_max_price = 100
  }

  node_type_id = "Standard_DS3_v2"
}

# Cluster Policy — Engineers: max 4 workers, autoscaling, auto-terminate
resource "databricks_cluster_policy" "engineers" {
  provider = databricks.workspace
  name     = "policy-data-engineers"

  definition = jsonencode({
    "autoscale.min_workers": { "type": "fixed", "value": 1 },
    "autoscale.max_workers": { "type": "range", "maxValue": 4, "defaultValue": 2 },
    "autotermination_minutes": { "type": "fixed", "value": 30 },
    "instance_pool_id": {
      "type": "fixed",
      "value": databricks_instance_pool.standard.id
    },
    "spark_version": {
      "type": "allowlist",
      "values": ["15.4.x-scala2.12", "14.3.x-scala2.12"]
    },
    "data_security_mode": {
      "type": "fixed",
      "value": "USER_ISOLATION"  # Verplicht voor Unity Catalog
    }
  })
}

# Policy toewijzen aan groep
resource "databricks_group_policy" "engineers_policy" {
  provider  = databricks.workspace
  group_id  = databricks_group.data_engineers.id
  policy_id = databricks_cluster_policy.engineers.id
}

Secret Scopes & Azure Key Vault integratie

Secrets

Databricks Secret Scopes koppelen aan Azure Key Vault zodat secrets (API keys, database passwords) nooit in notebooks of code verschijnen. Databricks haalt de waarden automatisch op via de scope-naam.

modules/workspace/secrets.tf

# Key Vault Secret Scope (Azure-backed)
resource "databricks_secret_scope" "keyvault" {
  provider = databricks.workspace
  name     = "kv-scope"

  keyvault_metadata {
    resource_id = azurerm_key_vault.databricks.id
    dns_name    = azurerm_key_vault.databricks.vault_uri
  }
}

# Geheimen opslaan in Key Vault via Terraform
resource "azurerm_key_vault_secret" "db_password" {
  name         = "db-connection-password"
  value        = var.db_password  # uit TF_VAR_ env variabele
  key_vault_id = azurerm_key_vault.databricks.id
}

# ACL: alleen data-engineers kunnen de scope lezen
resource "databricks_secret_acl" "engineers" {
  provider   = databricks.workspace
  principal  = databricks_group.data_engineers.display_name
  permission = "READ"
  scope      = databricks_secret_scope.keyvault.name
}

In een notebook gebruik je het geheim zo:

notebook.py (Python)

# Secret wordt NOOIT zichtbaar in logs of output
password = dbutils.secrets.get(scope="kv-scope", key="db-connection-password")

CI/CD met GitHub Actions — automatisch deployen

GitOps

De kracht van IaC komt pas echt tot zijn recht met een CI/CD pipeline. Elke Pull Request triggert een terraform plan zodat je ziet wat er gaat wijzigen. Na merge naar main wordt automatisch terraform apply uitgevoerd.

.github/workflows/terraform-databricks.yml

name: Terraform Databricks IaC

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

env:
  TF_VAR_sp_client_id:       ${{ secrets.AZURE_CLIENT_ID }}
  TF_VAR_sp_client_secret:   ${{ secrets.AZURE_CLIENT_SECRET }}
  TF_VAR_azure_tenant_id:    ${{ secrets.AZURE_TENANT_ID }}
  TF_VAR_azure_subscription_id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
  TF_VAR_databricks_account_id: ${{ secrets.DATABRICKS_ACCOUNT_ID }}

jobs:
  terraform:
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: environments/prod

    steps:
      - uses: actions/checkout@v4

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: "1.7.0"

      - name: Terraform Init
        run: terraform init

      - name: Terraform Validate
        run: terraform validate

      - name: Terraform Plan (op PR)
        if: github.event_name == 'pull_request'
        run: terraform plan -no-color

      - name: Terraform Apply (op main push)
        if: github.ref == 'refs/heads/main' && github.event_name == 'push'
        run: terraform apply -auto-approve

Remote State lock: Terraform gebruikt Azure Blob Storage lease als lock-mechanisme. Draai nooit twee pipelines tegelijk — de tweede zal falen totdat de lock vrijkomt. Gebruik terraform force-unlock alleen als een pipeline is gecrashed.

Best practices & veelgemaakte fouten

Situatie	Fout aanpak	Juiste aanpak (IaC)
Nieuwe medewerker toevoegen	Handmatig in Databricks UI	`databricks_user` + PR → CI/CD deploy
Cluster aanmaken	UI klik, geen beperkingen	Cluster Policy via Terraform, spots + pool
Secrets opslaan	Hardcoded in notebook	Key Vault Secret Scope via Terraform
Rechten toewijzen	Individueel per gebruiker	Azure AD groepen + `databricks_grants`
Omgevingen	Één workspace voor alles	Losse workspaces per env, modules hergebruiken
State management	Lokale `terraform.tfstate`	Azure Blob remote backend + state locking

Handige commando's om te onthouden:
terraform init — providers downloaden & backend configureren
terraform plan -out=tfplan — preview wijzigingen
terraform apply tfplan — wijzigingen toepassen
terraform state list — alle beheerde resources tonen
terraform import — bestaande resources importeren in state

Over de auteur

Abdullah Özisik — Data Engineer met specialisatie in IaC, MLOps en Azure data platforms. Heeft meerdere productie Databricks-omgevingen volledig via Terraform ingericht voor enterprise klanten.

💼 LinkedIn 👨‍💻 GitHub

Plan een vrijblijvend gesprek in

Wil je Databricks op Azure opzetten met Terraform maar weet je niet waar te beginnen? Plan een gratis gesprek van 30 minuten in — ik kijk mee naar jouw situatie en geef direct concrete adviezen.

Naam *

E-mailadres *

Bedrijf

Onderwerp

Korte omschrijving (optioneel)

Reactie binnen 24 uur · Geen spam · Gesprek duurt max. 30 minuten

Meer lezen over Databricks & IaC

Databricks Complete Gids

Unity Catalog, Delta Lake, MLflow — alles uitgelegd

Azure Synapse Analytics

Azure's native analytics service — vergelijking met Databricks

Terraform Basics Blog

Nieuw met Terraform? Begin hier met de basics