← Terug naar Downloads DataPartner365 — datapartner365.nl
DataPartner365 · Stappenplan · 2026

IaC opzetten voor Databricks
in een Azure Tenant

Een compleet stappenplan voor het opzetten van Infrastructure as Code (Terraform) voor een Databricks workspace in Azure — van Service Principal tot Unity Catalog en CI/CD pipeline.

Terraform Azure Databricks Unity Catalog GitHub Actions Gratis
1

Prerequisites & tooling

Wat je nodig hebt voordat je begint
ToolVersieVereist?Doel
Terraform≥ 1.6VereistIaC engine
Azure CLI≥ 2.55VereistAzure authenticatie & beheer
Git≥ 2.40VereistVersiebeheer
VS Code + Terraform ext.LaatsteOptioneelIDE met syntax highlighting
Databricks CLI≥ 0.200OptioneelHandmatige checks en debugging
ℹ️ Je hebt een Azure subscription nodig met minimaal Contributor-rechten op resource group niveau, en Application Administrator in Entra ID om een Service Principal aan te maken.

Installatie controleren

bash
# Controleer versies
terraform --version     # → Terraform v1.6+
az --version            # → azure-cli 2.55+
git --version           # → git version 2.40+

# Login bij Azure
az login
az account show         # controleer juiste subscription
az account set --subscription "<jouw-subscription-id>"
2

Azure omgeving voorbereiden

Resource Group, Service Principal, Key Vault en Terraform state storage

2.1 — Resource Group aanmaken

bash — azure cli
RG_NAME="rg-databricks-prod"
LOCATION="westeurope"

az group create \
  --name $RG_NAME \
  --location $LOCATION \
  --tags Environment=Production Project=DataPlatform ManagedBy=Terraform

2.2 — Service Principal voor Terraform

bash — azure cli
SP_NAME="sp-terraform-databricks"
SUBSCRIPTION_ID=$(az account show --query id -o tsv)

# Aanmaken met Contributor rechten op de resource group
az ad sp create-for-rbac \
  --name $SP_NAME \
  --role Contributor \
  --scopes /subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RG_NAME \
  --sdk-auth

# Sla de output op — je hebt deze waarden nodig:
# appId        → ARM_CLIENT_ID
# password     → ARM_CLIENT_SECRET
# tenant       → ARM_TENANT_ID
# subscriptionId → ARM_SUBSCRIPTION_ID
⚠️ Sla de password direct op — deze is maar één keer zichtbaar. Bewaar hem in Azure Key Vault of GitHub Secrets, nooit in code.

2.3 — Key Vault voor secrets

bash
>az keyvault create \
  --name "kv-databricks-prod" \
  --resource-group $RG_NAME \
  --location $LOCATION \
  --enable-rbac-authorization true

# Sla SP secret op in Key Vault
az keyvault secret set \
  --vault-name "kv-databricks-prod" \
  --name "terraform-sp-secret" \
  --value "<sp-password>"

2.4 — Remote state storage (Terraform backend)

bash
>SA_NAME="stterraformstate001"   # wereldwijd unieke naam

az storage account create \
  --name $SA_NAME \
  --resource-group $RG_NAME \
  --location $LOCATION \
  --sku Standard_LRS \
  --min-tls-version TLS1_2

az storage container create \
  --name "tfstate" \
  --account-name $SA_NAME
3

Terraform project opzetten

Mapstructuur, providers en backend configuratie

Aanbevolen mapstructuur

bestandsstructuur
>databricks-iac/
├── main.tf              # Hoofd resources
├── variables.tf         # Input variabelen
├── outputs.tf           # Output waarden
├── providers.tf         # Provider configuratie
├── backend.tf           # Remote state
├── terraform.tfvars     # Waarden (niet committen!)
├── .gitignore
└── modules/
    ├── workspace/       # Databricks workspace module
    ├── unity-catalog/   # Unity Catalog module
    └── clusters/        # Cluster policies module

providers.tf

terraform hcl
>terraform {
  required_version = ">= 1.6"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.90"
    }
    databricks = {
      source  = "databricks/databricks"
      version = "~> 1.38"
    }
  }
}

provider "azurerm" {
  features {}
  # Authenticatie via env vars:
  # ARM_CLIENT_ID, ARM_CLIENT_SECRET, ARM_TENANT_ID, ARM_SUBSCRIPTION_ID
}

provider "databricks" {
  host = azurerm_databricks_workspace.main.workspace_url
  azure_workspace_resource_id = azurerm_databricks_workspace.main.id
}

backend.tf

terraform hcl
>terraform {
  backend "azurerm" {
    resource_group_name  = "rg-databricks-prod"
    storage_account_name = "stterraformstate001"
    container_name       = "tfstate"
    key                  = "databricks.terraform.tfstate"
  }
}

.gitignore

.gitignore
>.terraform/
*.tfstate
*.tfstate.backup
*.tfvars          # bevat secrets — nooit committen
.terraform.lock.hcl
Gebruik terraform.tfvars.example als template (zonder echte waarden) die je wél commit, zodat collega's weten welke variabelen nodig zijn.
4

Databricks workspace deployen

VNet injection, private endpoints en workspace aanmaken

variables.tf

terraform hcl
>variable "location"        { default = "westeurope" }
variable "resource_group"  { default = "rg-databricks-prod" }
variable "environment"     { default = "prod" }
variable "workspace_name"  { default = "dbw-dataplatform-prod" }
variable "sku"             { default = "premium" }  # premium vereist voor Unity Catalog

main.tf — Workspace resource

terraform hcl
>resource "azurerm_databricks_workspace" "main" {
  name                = var.workspace_name
  resource_group_name = var.resource_group
  location            = var.location
  sku                 = var.sku   # "premium" voor Unity Catalog

  # Managed resource group voor Databricks-beheerde resources
  managed_resource_group_name = "rg-databricks-managed-prod"

  tags = {
    Environment = var.environment
    ManagedBy   = "Terraform"
    Project     = "DataPlatform"
  }
}

output "workspace_url" {
  value = azurerm_databricks_workspace.main.workspace_url
}

output "workspace_id" {
  value = azurerm_databricks_workspace.main.id
}

Deployen

bash
># Exporteer SP credentials als env vars
export ARM_CLIENT_ID="<appId>"
export ARM_CLIENT_SECRET="<password>"
export ARM_TENANT_ID="<tenant>"
export ARM_SUBSCRIPTION_ID="<subscriptionId>"

terraform init      # providers downloaden + backend initialiseren
terraform plan      # bekijk wat er aangemaakt wordt
terraform apply     # uitvoeren (bevestig met 'yes')
5

Unity Catalog configureren

Metastore, catalogs, schemas en rechten via Terraform
ℹ️ Unity Catalog vereist een Premium SKU workspace. De metastore wordt op account-niveau aangemaakt (één per regio). Als je al een metastore hebt, sla stap 5.1 over.

5.1 — Storage voor Unity Catalog (ADLS Gen2)

terraform hcl
>resource "azurerm_storage_account" "unity" {
  name                     = "stunity001"
  resource_group_name      = var.resource_group
  location                 = var.location
  account_tier             = "Standard"
  account_replication_type = "LRS"
  is_hns_enabled           = true   # ADLS Gen2 vereist
  min_tls_version          = "TLS1_2"
}

resource "azurerm_storage_container" "unity" {
  name                  = "unity-catalog"
  storage_account_name  = azurerm_storage_account.unity.name
  container_access_type = "private"
}

resource "databricks_metastore" "main" {
  name          = "metastore-prod"
  storage_root  = "abfss://unity-catalog@${azurerm_storage_account.unity.name}.dfs.core.windows.net/"
  region        = var.location
  force_destroy = false
}

resource "databricks_metastore_assignment" "main" {
  metastore_id = databricks_metastore.main.id
  workspace_id = azurerm_databricks_workspace.main.workspace_id
}

5.2 — Catalog en schemas aanmaken

terraform hcl
>resource "databricks_catalog" "bronze" {
  name    = "bronze"
  comment = "Raw inkomende data — ongewijzigd"
  depends_on = [databricks_metastore_assignment.main]
}

resource "databricks_catalog" "silver" {
  name    = "silver"
  comment = "Gecleande en gevalideerde data"
}

resource "databricks_catalog" "gold" {
  name    = "gold"
  comment = "Business-ready data voor rapportages"
}

# Schema's per catalog
resource "databricks_schema" "bronze_raw" {
  catalog_name = databricks_catalog.bronze.name
  name         = "raw"
}

resource "databricks_schema" "silver_clean" {
  catalog_name = databricks_catalog.silver.name
  name         = "clean"
}

resource "databricks_schema" "gold_reporting" {
  catalog_name = databricks_catalog.gold.name
  name         = "reporting"
}
6

Clusters & policies via Terraform

Cluster policies, instance pools en service principals

6.1 — Cluster policy voor jobs

terraform hcl
>resource "databricks_cluster_policy" "jobs" {
  name = "Jobs Cluster Policy"

  definition = jsonencode({
    "spark_version" : {
      "type" : "allowlist",
      "values" : ["14.3.x-scala2.12", "15.4.x-scala2.12"],
      "defaultValue" : "15.4.x-scala2.12"
    },
    "node_type_id" : {
      "type" : "allowlist",
      "values" : ["Standard_DS3_v2", "Standard_DS4_v2"]
    },
    "autotermination_minutes" : {
      "type" : "fixed",
      "value" : 30,
      "hidden" : true
    },
    "data_security_mode" : {
      "type" : "fixed",
      "value" : "SINGLE_USER"
    }
  })
}

6.2 — Service Principal voor pipelines

terraform hcl
>resource "databricks_service_principal" "pipeline_sp" {
  application_id = "<aad-app-id>"
  display_name   = "sp-databricks-pipelines"
  active         = true
}

resource "databricks_group_member" "pipeline_sp_admin" {
  group_id  = databricks_group.data_engineers.id
  member_id = databricks_service_principal.pipeline_sp.id
}
7

CI/CD pipeline met GitHub Actions

Automatisch plan op PR, apply op merge naar main

GitHub Secrets instellen

Voeg deze secrets toe in GitHub → Settings → Secrets and variables → Actions:

Secret naamWaarde
ARM_CLIENT_IDService Principal appId
ARM_CLIENT_SECRETService Principal password
ARM_TENANT_IDAzure tenant ID
ARM_SUBSCRIPTION_IDAzure subscription ID

.github/workflows/terraform.yml

yaml — github actions
>name: Terraform Databricks IaC

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

env:
  ARM_CLIENT_ID:       ${{ secrets.ARM_CLIENT_ID }}
  ARM_CLIENT_SECRET:   ${{ secrets.ARM_CLIENT_SECRET }}
  ARM_TENANT_ID:       ${{ secrets.ARM_TENANT_ID }}
  ARM_SUBSCRIPTION_ID: ${{ secrets.ARM_SUBSCRIPTION_ID }}

jobs:
  terraform:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.6.6

      - name: Terraform Init
        run: terraform init

      - name: Terraform Format Check
        run: terraform fmt -check

      - name: Terraform Validate
        run: terraform validate

      - name: Terraform Plan
        run: terraform plan -no-color
        if: github.event_name == 'pull_request'

      - name: Terraform Apply
        run: terraform apply -auto-approve
        if: github.ref == 'refs/heads/main' && github.event_name == 'push'
Voeg branch protection rules toe op main zodat Terraform Plan altijd slaagt vóór merge. Zo voorkom je dat een foutieve config live gaat.
8

Volledige checklist

Print uit en vink af terwijl je implementeert
Azure Omgeving
  • Resource Group aangemaakt
  • Service Principal aangemaakt met Contributor rechten
  • SP credentials opgeslagen in Key Vault
  • Storage Account voor Terraform state aangemaakt
  • ADLS Gen2 storage aangemaakt voor Unity Catalog
Terraform Project
  • providers.tf geconfigureerd (azurerm + databricks provider)
  • backend.tf geconfigureerd met remote state
  • .gitignore ingericht (geen .tfvars of state in Git)
  • terraform init succesvol uitgevoerd
  • terraform plan toont verwachte resources
Databricks Workspace
  • Workspace aangemaakt met Premium SKU
  • terraform apply succesvol afgerond
  • Workspace toegankelijk via URL
Unity Catalog
  • Metastore aangemaakt (of bestaande hergebruikt)
  • Metastore gekoppeld aan workspace
  • Bronze / Silver / Gold catalogs aangemaakt
  • Schema's per catalog aangemaakt
  • Rechten correct ingesteld
Clusters & Security
  • Cluster policies aangemaakt voor jobs en interactive
  • Service Principal voor pipelines geconfigureerd
  • Autotermination ingesteld (max 30 min)
CI/CD Pipeline
  • GitHub Secrets ingesteld (ARM_* variabelen)
  • GitHub Actions workflow aangemaakt
  • Branch protection op main ingesteld
  • Test PR aangemaakt — plan loopt succesvol
  • Merge naar main — apply loopt succesvol