Fully automated creation of an AAD-integrated Kubernetes cluster with Terraform

Introduction

To run your Kubernetes cluster in Azure integrated with Azure Active Directory as your identity provider is a best practice in terms of security and compliance. You can give (and remove – when people are leaving your organisation) fine-grained permissions to your team members, to resources and/or namespaces as they need them. Sounds good? Well, you have to do a lot of manual steps to create such a cluster. If you don’t believe me, follow the official documentation 🙂 https://docs.microsoft.com/en-us/azure/aks/azure-ad-integration.

So, we developers are known to be lazy folks…then how can this automatically be achieved e.g. with Terraform (which is one of the most popular tools out there to automate the creation/management of your cloud resources)? It took me a while to figure out, but here’s a working example how to create an AAD integrated AKS cluster with “near-zero” manual work.

The rest of this blog post will guide you through the complete Terraform script which can be found on my GitHub account.

Create the cluster

To work with Terraform (TF), it is best-practice to store the Terraform state not on you workstation as other team members also need the state-information to be able to work on the same environment. So, first…let’s create a storage account in your Azure subscription to store the TF state.

Basic setup

With the commands below, we will be creating a resource group in Azure, a basic storage account and a corresponding container where the TF state will be put in.

# Resource Group

$ az group create --name tf-rg --location westeurope

# Storage Account

$ az storage account create -n tfstatestac -g tf-rg --sku Standard_LRS

# Storage Account Container

$ az storage container create -n tfstate --account-name tfstatestac --account-key `az storage account keys list -n tfstatestac -g tf-rg --query "[0].value" -otsv`

Terraform Providers + Resource Group

Of course, we need a few Terraform providers for our example. First and foremost, we need the Azure and also the Azure Active Directory resource providers.

One of the first things we need is – as always in Azure – a resource group where we will be the deploying our AKS cluster to.

provider "azurerm" {
  version = "=1.38.0"
}

provider "azuread" {
  version = "~> 0.3"
}

terraform {
  backend "azurerm" {
    resource_group_name  = "tf-rg"
    storage_account_name = "tfstatestac"
    container_name       = "tfstate"
    key                  = "org.terraform.tfstate"
  }
}

data "azurerm_subscription" "current" {}

# Resource Group creation
resource "azurerm_resource_group" "k8s" {
  name     = "${var.rg-name}"
  location = "${var.location}"
}

AAD Applications for K8s server / client components

To be able to integrate AKS with Azure Active Directory, we need to register two applications in the directory. The first AAD application is the server component (Kubernetes API) that provides user authentication. The second application is the client component (e.g. kubectl) that’s used when you’re prompted by the CLI for authentication.

We will assign certain permissions to these two applications, that need “admin consent”. Therefore, the Terraform script needs to be executed by someone who is able to grant that for the whole AAD.

# AAD K8s Backend App

resource "azuread_application" "aks-aad-srv" {
  name                       = "${var.clustername}srv"
  homepage                   = "https://${var.clustername}srv"
  identifier_uris            = ["https://${var.clustername}srv"]
  reply_urls                 = ["https://${var.clustername}srv"]
  type                       = "webapp/api"
  group_membership_claims    = "All"
  available_to_other_tenants = false
  oauth2_allow_implicit_flow = false
  required_resource_access {
    resource_app_id = "00000003-0000-0000-c000-000000000000"
    resource_access {
      id   = "7ab1d382-f21e-4acd-a863-ba3e13f7da61"
      type = "Role"
    }
    resource_access {
      id   = "06da0dbc-49e2-44d2-8312-53f166ab848a"
      type = "Scope"
    }
    resource_access {
      id   = "e1fe6dd8-ba31-4d61-89e7-88639da4683d"
      type = "Scope"
    }
  }
  required_resource_access {
    resource_app_id = "00000002-0000-0000-c000-000000000000"
    resource_access {
      id   = "311a71cc-e848-46a1-bdf8-97ff7156d8e6"
      type = "Scope"
    }
  }
}

resource "azuread_service_principal" "aks-aad-srv" {
  application_id = "${azuread_application.aks-aad-srv.application_id}"
}

resource "random_password" "aks-aad-srv" {
  length  = 16
  special = true
}

resource "azuread_application_password" "aks-aad-srv" {
  application_object_id = "${azuread_application.aks-aad-srv.object_id}"
  value                 = "${random_password.aks-aad-srv.result}"
  end_date              = "2024-01-01T01:02:03Z"
}

# AAD AKS kubectl app

resource "azuread_application" "aks-aad-client" {
  name       = "${var.clustername}client"
  homepage   = "https://${var.clustername}client"
  reply_urls = ["https://${var.clustername}client"]
  type       = "native"
  required_resource_access {
    resource_app_id = "${azuread_application.aks-aad-srv.application_id}"
    resource_access {
      id   = "${azuread_application.aks-aad-srv.oauth2_permissions.0.id}"
      type = "Scope"
    }
  }
}

resource "azuread_service_principal" "aks-aad-client" {
  application_id = "${azuread_application.aks-aad-client.application_id}"
}

The important parts regarding the permissions are highlighted above. If you wonder what these “magic permission GUIDs” stand for, here’s a list of what will be assigned.

Microsoft Graph (AppId: 00000003-0000-0000-c000-000000000000) Persmissions

GUIDPermission
7ab1d382-f21e-4acd-a863-ba3e13f7da61Read directory data (Application Permission)
06da0dbc-49e2-44d2-8312-53f166ab848aRead directory data (Delegated Permission)
e1fe6dd8-ba31-4d61-89e7-88639da4683dSign in and read user profile

Windows Azure Active Directory (AppId: 00000002-0000-0000-c000-000000000000) Permissions

GUIDPermission
311a71cc-e848-46a1-bdf8-97ff7156d8e6Sign in and read user profile

After a successful run of the Terraform script, it will look like that in the portal.

AAD applications
Server app permissions

By the way, you can query the permissions of the applications (MS Graph/Azure Active Directory) mentioned above. Here’s a quick sample for one of the MS Graph permissions:

$ az ad sp show --id 00000003-0000-0000-c000-000000000000 | grep -A 6 -B 3 06da0dbc-49e2-44d2-8312-53f166ab848a
    
{
      "adminConsentDescription": "Allows the app to read data in your organization's directory, such as users, groups and apps.",
      "adminConsentDisplayName": "Read directory data",
      "id": "06da0dbc-49e2-44d2-8312-53f166ab848a",
      "isEnabled": true,
      "type": "Admin",
      "userConsentDescription": "Allows the app to read data in your organization's directory.",
      "userConsentDisplayName": "Read directory data",
      "value": "Directory.Read.All"
}

Cluster Admin AAD Group

Now that we have the script for the applications we need to integrate our cluster with Azure Active Directory, let’s also add a default AAD group for our cluster admins.

# AAD K8s cluster admin group / AAD

resource "azuread_group" "aks-aad-clusteradmins" {
  name = "${var.clustername}clusteradmin"
}

Service Principal for AKS Cluster

Last but not least, before we can finally create the Kubernetes cluster, a service principal is required. That’s basically the technical user Kubernetes uses to interact with Azure (e.g. acquire a public IP at the Azure load balancer). We will assign the role “Contributor” (for the whole subscription – please adjust to your needs!) to that service principal.

# Service Principal for AKS

resource "azuread_application" "aks_sp" {
  name                       = "${var.clustername}"
  homepage                   = "https://${var.clustername}"
  identifier_uris            = ["https://${var.clustername}"]
  reply_urls                 = ["https://${var.clustername}"]
  available_to_other_tenants = false
  oauth2_allow_implicit_flow = false
}

resource "azuread_service_principal" "aks_sp" {
  application_id = "${azuread_application.aks_sp.application_id}"
}

resource "random_password" "aks_sp_pwd" {
  length  = 16
  special = true
}

resource "azuread_service_principal_password" "aks_sp_pwd" {
  service_principal_id = "${azuread_service_principal.aks_sp.id}"
  value                = "${random_password.aks_sp_pwd.result}"
  end_date             = "2024-01-01T01:02:03Z"
}

resource "azurerm_role_assignment" "aks_sp_role_assignment" {
  scope                = "${data.azurerm_subscription.current.id}"
  role_definition_name = "Contributor"
  principal_id         = "${azuread_service_principal.aks_sp.id}"

  depends_on = [
    azuread_service_principal_password.aks_sp_pwd
  ]
}

Create the AKS cluster

Everything is now ready for the provisioning of the cluster. But hey, we created the AAD applications, but haven’t granted admin consent?! We can also do this via our Terrform script and that’s what we will be doing before finally creating the cluster.

Azure is sometimes a bit too fast in sending a 200 and signalling that a resource is ready. In the background, not all services have already access to e.g. newly created applications. So it happens, that things fail although they shouldn’t 🙂 Therefore, we simply wait a few seconds and give AAD time to distribute application information, before kicking off the cluster creation.

# K8s cluster

# Before giving consent, wait. Sometimes Azure returns a 200, but not all services have access to the newly created applications/services.

resource "null_resource" "delay_before_consent" {
  provisioner "local-exec" {
    command = "sleep 60"
  }
  depends_on = [
    azuread_service_principal.aks-aad-srv,
    azuread_service_principal.aks-aad-client
  ]
}

# Give admin consent - SP/az login user must be AAD admin

resource "null_resource" "grant_srv_admin_constent" {
  provisioner "local-exec" {
    command = "az ad app permission admin-consent --id ${azuread_application.aks-aad-srv.application_id}"
  }
  depends_on = [
    null_resource.delay_before_consent
  ]
}
resource "null_resource" "grant_client_admin_constent" {
  provisioner "local-exec" {
    command = "az ad app permission admin-consent --id ${azuread_application.aks-aad-client.application_id}"
  }
  depends_on = [
    null_resource.delay_before_consent
  ]
}

# Again, wait for a few seconds...

resource "null_resource" "delay" {
  provisioner "local-exec" {
    command = "sleep 60"
  }
  depends_on = [
    null_resource.grant_srv_admin_constent,
    null_resource.grant_client_admin_constent
  ]
}

# Create the cluster

resource "azurerm_kubernetes_cluster" "aks" {
  name                = "${var.clustername}"
  location            = "${var.location}"
  resource_group_name = "${var.rg-name}"
  dns_prefix          = "${var.clustername}"

  default_node_pool {
    name            = "default"
    type            = "VirtualMachineScaleSets"
    node_count      = 2
    vm_size         = "Standard_B2s"
    os_disk_size_gb = 30
    max_pods        = 50
  }
  service_principal {
    client_id     = "${azuread_application.aks_sp.application_id}"
    client_secret = "${random_password.aks_sp_pwd.result}"
  }
  role_based_access_control {
    azure_active_directory {
      client_app_id     = "${azuread_application.aks-aad-client.application_id}"
      server_app_id     = "${azuread_application.aks-aad-srv.application_id}"
      server_app_secret = "${random_password.aks-aad-srv.result}"
      tenant_id         = "${data.azurerm_subscription.current.tenant_id}"
    }
    enabled = true
  }
  depends_on = [
    azurerm_role_assignment.aks_sp_role_assignment,
    azuread_service_principal_password.aks_sp_pwd
  ]
}

Assign the AAD admin group to be cluster-admin

When the cluster is finally created, we need to assign the Kubernetes cluster role cluster-admin to our AAD cluster admin group. We simply get access to the Kubernetes cluster by adding the Kubernetes Terraform provider. Because we already have a working integration with AAD, we need to use the admin credentials of our cluster! But that will be the last time, we will ever need them again.

To be able to use the admin credentials, we point the Kubernetes provider to use kube_admin_config which is automatically provided for us.

In the last step, we bind the cluster role to the fore-mentioned AAD cluster group id.

# Role assignment

# Use ADMIN credentials
provider "kubernetes" {
  host                   = "${azurerm_kubernetes_cluster.aks.kube_admin_config.0.host}"
  client_certificate     = "${base64decode(azurerm_kubernetes_cluster.aks.kube_admin_config.0.client_certificate)}"
  client_key             = "${base64decode(azurerm_kubernetes_cluster.aks.kube_admin_config.0.client_key)}"
  cluster_ca_certificate = "${base64decode(azurerm_kubernetes_cluster.aks.kube_admin_config.0.cluster_ca_certificate)}"
}

# Cluster role binding to AAD group

resource "kubernetes_cluster_role_binding" "aad_integration" {
  metadata {
    name = "${var.clustername}admins"
  }
  role_ref {
    api_group = "rbac.authorization.k8s.io"
    kind      = "ClusterRole"
    name      = "cluster-admin"
  }
  subject {
    kind = "Group"
    name = "${azuread_group.aks-aad-clusteradmins.id}"
  }
  depends_on = [
    azurerm_kubernetes_cluster.aks
  ]
}

Run the Terraform script

We now have discussed all the relevant parts of the script, it’s time to let the Terraform magic happen 🙂 Run the script via…

$ terraform init

# ...and then...

$ terraform apply

Access the Cluster

When the script has finished, it’s time to access the cluster and try to logon. First, let’s do the “negativ check” and try to access it without having been added as cluster admin (AAD group member).

After downloading the user credentials and querying the cluster nodes, the OAuth 2.0 Device Authorization Grant flow kicks in and we need to authenticate against our Azure directory (as you might know it from logging in with Azure CLI).

$ az aks get-credentials --resource-group <RESOURCE_GROUP> -n <CLUSTER_NAME>

$ kubectl get nodes
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code DP9JA76WS to authenticate.

Error from server (Forbidden): nodes is forbidden: User "593736cb-1f95-4f23-bfbd-75891886b05f" cannot list resource "nodes" in API group "" at the cluster scope

Great, we get the expected authorization error!

Now add a user from the Azure Active Directory to the AAD admin group in the portal. Navigate to “Azure Active Directory” –> “Groups” and select your cluster-admin group. On the left navigation, select “Members” and add e.g. your own Azure user.

Now go back to the command line and try again. One last time, download the user credentials with az aks get-credentials (it will simply overwrite the former entry in you .kubeconfig to make sure we get the latest information from AAD).

$ az aks get-credentials --resource-group <RESOURCE_GROUP> -n <CLUSTER_NAME>

$ kubectl get nodes
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code ASGRA765S to authenticate.

NAME                              STATUS   ROLES   AGE   VERSION
aks-default-41331054-vmss000000   Ready    agent   18m   v1.13.12
aks-default-41331054-vmss000001   Ready    agent   18m   v1.13.12

Wrap Up

So, that’s all we wanted to achieve! We have created an AKS cluster with fully-automated Azure Active Directory integration, added a default AAD group for our Kubernetes admins and bound it to the “cluster-admin” role of Kubernetes – all done by a Terraform script which can now be integrated with you CI/CD pipeline to create compliant and AAD-secured AKS clusters (as many as you want ;)).

Well, we also could have added a user to the admin group, but that’s the only manual step in our scenario…but hey, you would have needed to do it anyway 🙂

You can find the complete script including the variables.tf file on my Github account. Feel free to use it in your own projects.

House-Keeping

To remove all of the provisioned resources (service principals, AAD groups, Kubernetes service, storage accounts etc.) simply…

$ terraform destroy

# ...and then...

$ az group delete -n tf-rg

Author: Christian Dennig

developer, html5, js / nodejs, c#, xamarin. software architect. cloud enthusiast...I work at Microsoft, opinions are totally my own.

2 thoughts

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.