Development

0 % Development items are ✓
    • DEVELOPMENT
    • DEVELOPMENT
    • DEVELOPMENT
    • DEVELOPMENT
    • DEVELOPMENT
    • DEVELOPMENT
    • RESILIENCY
    • SECURITY
  • When pods need access to other Azure services, such as Cosmos DB, Key Vault, or Blob Storage, the pod needs access credentials. These access credentials could be defined with the container image or injected as a Kubernetes secret, but need to be manually created and assigned. Often, the credentials are reused across pods, and aren't regularly rotated. Managed identities for Azure resources (currently implemented as an associated AKS open source project) let you automatically request access to services through Azure AD. You don't manually define credentials for pods, instead they request an access token in real time, and can use it to access only their assigned services.

    • SECURITY
  • Documentation

    • SECURITY
    • RESOURCES
  • You should also disable mounting credentials by default (automountServiceAccountToken)

    • RESOURCES
    • SECURITY
    • RESOURCES
    • SECURITY
    • SECURITY
    • SECURITY
  • Use a tool that allows for the restriction of builds with enough granularity to not break development. All Critical CVE's are not the same, so being able to restrict builds based on Critical or High vulnerabilities with a Vendor fix, but allowing builds to continue if that Critical vulnerability is 'Open'

    • SECURITY
  • Identifying an image running as 'root' before it get deployed, or opening up port 80 or 22

    • SECURITY
    • DEVELOPMENT
    • DEVELOPMENT
    • DEVELOPMENT
    • RESILIENCY
  • Documentation

    • DEVELOPMENT
    • RESILIENCY

Image management

0 % Image management items are ✓
    • SECURITY
    • SECURITY
    • SECURITY
  • On build, the image is secured based on the threshold set, but now while in the registry a new issue is discovered. You need to ensure that the image can not be deployed until the issue is remediated.

    Documentation

    • SECURITY
    • SECURITY
    • NETWORK
    • SECURITY
  • 'Distroless' images are bare-bones versions of common base images. They have the bare-minimum needed to execute a binary.The shell and other developer utilities have been removed so that if/when an attacker gains control of your container, they can’t do much of anything

    • SECURITY

Cluster setup

0 % Cluster setup items are ✓
  • Documentation

    • Documentation

        • SECURITY
        • SECURITY
      • Documentation

        • SECURITY
        • SECURITY
        • OPERATION
        • SECURITY
        • OPERATION
        • OPERATION
        • SECURITY
      • By using a private cluster, you can ensure that network traffic between your API server and your node pools remains on the private network only. Because the API server has a private address, it means that to access it for administration or for deployment, you need to set up private connection, like using a 'jumpbox' (i.e.: Azure Bastion)

        • SECURITY
      • You can enable autoscaling module per node pool but only create one mutual autoscale profile

        Documentation

        • RESOURCES
        • FINOPS
      • Choosing on one hand between easy management and big blast radius, and on the other end to focus on high replication, low impact but worse resources optimization

        • RESOURCES
        • RESOURCES
      • It can be done with the following CLI: az aks disable-addons --addons kube-dashboard --resource-group RG_NAME --name CLUSTER_NAME

        Documentation

          • SECURITY
        • For AKS to download/pull images from Azure Container Registry (ACR), it needs the ACR credentials including the password. To avoid saving the password in the cluster, you can simply activate the ACR integration on new or existing AKS cluster using SPN or Managed Identity

          • SECURITY
        • Be careful, by using PPG on a nodepool, you reduce the average SLA of your application since they don't rely on availability zones anymore

          • SECURITY
        • Documentation

          • SECURITY
        • If you have multiple AKS clusters in different regions, use Traffic Manager to control how traffic flows to the applications that run in each cluster. Azure Traffic Manager is a DNS-based traffic load balancer that can distribute network traffic across regions. Use Traffic Manager to route users based on cluster response time or based on geography. It can be used to improve app availability with automatic failover

          • NETWORK
          • RESILIENCY
          • MULTI-TENANCY
        • As a best practice, placing a container registry in each region where images are run allows network-close operations, enabling fast, reliable image layer transfers. Geo-replication enables an Azure container registry to function as a single registry, serving multiple regions with multi-master regional registries.

          • RESILIENCY

        Disaster Recovery

        0 % Disaster Recovery items are ✓
          • DEVOPS
          • RESILIENCY
          • RESILIENCY
          • RESILIENCY
          • NETWORK
          • RESILIENCY
          • STORAGE
        • Customers needing an SLA to meet compliance requirements or require extending an SLA to their end users should enable this feature. Customers with critical workloads that will benefit from a higher uptime SLA may also benefit. Using the Uptime SLA feature with Availability Zones enables a higher availability for the uptime of the Kubernetes API server.

          • RESILIENCY
        • Customers needing an SLA to meet compliance requirements or require extending an SLA to their end users should enable this feature. Customers with critical workloads that will benefit from a higher uptime SLA may also benefit. Using the Uptime SLA feature with Availability Zones enables a higher availability for the uptime of the Kubernetes API server.

          • RESILIENCY

        Storage

        0 % Storage items are ✓
          • STORAGE
        • Different types and sizes of nodes are available. Each node (underlying VM) size provides a different amount of core resources such as CPU and memory. These VM sizes have a maximum number of disks that can be attached. Storage performance also varies between VM sizes for the maximum local and attached disk IOPS (input/output operations per second). If your applications require Azure Disks as their storage solution, plan for and choose an appropriate node VM size. The amount of CPU and memory isn't the only factor when you choose a VM size. The storage capabilities are also important.

          • STORAGE
          • STORAGE
        • Understand the limitations of the different approaches to data backups and if you need to quiesce your data prior to snapshot. Data backups don't necessarily let you restore your application environment of cluster deployment.

          • STORAGE
          • RESILIENCY
        • Service state refers to the in-memory or on-disk data that a service requires to function. State includes the data structures and member variables that the service reads and writes. Depending on how the service is architected, the state might also include files or other resources that are stored on the disk. For example, the state might include the files a database uses to store data and transaction logs.

          • STORAGE
          • RESILIENCY

        Network

        0 % Network items are ✓
        • While Kubenet is the default Kubernetes network plugin, the Container Networking Interface (CNI) is a vendor-neutral protocol that lets the container runtime make requests to a network provider. The Azure CNI assigns IP addresses to pods and nodes, and provides IP address management (IPAM) features as you connect to existing Azure virtual networks. Each node and pod resource receives an IP address in the Azure virtual network, and no additional routing is needed to communicate with other resources or services.

          Documentation

          • NETWORK
        • As an example, using CNI, you need one IP for each node + one spare for a new node in case of cluster upgrade, and you need an IP for each pod which can represent hundred of IP addresses

          • NETWORK
          • NETWORK
          • NETWORK
          • NETWORK
          • SECURITY
          • NETWORK
          • SECURITY
        • Network policy is a Kubernetes feature that lets you control the traffic flow between pods. You can choose to allow or deny traffic based on settings such as assigned labels, namespace, or traffic port. The use of network policies gives a cloud-native way to control the flow of traffic. As pods are dynamically created in an AKS cluster, the required network policies can be automatically applied. Don't use Azure network security groups to control pod-to-pod traffic, use network policies.

          • NETWORK
          • MULTI-TENANCY
          • SECURITY
          • NETWORK
          • NETWORK
          • SECURITY
          • NETWORK
          • SECURITY

        Resource Management

        0 % Resource Management items are ✓
        • Resource requests and limits are placed in the pod specification. These limits are used by the Kubernetes scheduler at deployment time to find an available node in the cluster. But developers can forget them and thus impact other applications by over-consuming resources of the cluster

          • RESILIENCY
          • RESOURCE MANAGEMENT
          • MULTI-TENANCY
        • Documentation

          • RESILIENCY
          • RESOURCE MANAGEMENT
          • MULTI-TENANCY
        • When you specify limits for CPU and memory, each takes a different action when it reaches the specified limit. With CPU limits, the container is throttled from using more than its specified limit. With memory limits, the pod is restarted if it reaches its limit. The pod might be restarted on the same host or a different host within the cluster.

          • RESILIENCY
          • RESOURCE MANAGEMENT
          • MULTI-TENANCY
        • At some point in time, Kubernetes might need to evict pods from a host. There are two types of evictions: voluntary and involuntary disruptions. Involuntary disruptions can be caused by hardware failure, network partitions, kernel panics, or a node being out of resources. Voluntary evictions can be caused by performing maintenance on the cluster, the Cluster Autoscaler deallocating nodes, or updating pod templates. To minimize the impact to your application, you can set a PodDisruptionBudget to ensure uptime of the application when pods need to be evicted. A PodDisruptionBudget allows you to set a policy on the minimum available and maximum unavailable pods during voluntary eviction events. An example of a voluntary eviction would be when draining a node to perform maintenance on the node.

          • RESILIENCY
          • RESOURCE MANAGEMENT
          • RESILIENCY
          • RESOURCE MANAGEMENT

        Windows

        0 % Windows items are ✓
          • WINDOWS
          • WINDOWS
          • WINDOWS
          • WINDOWS
          • WINDOWS
          • WINDOWS
          • WINDOWS

        Cluster Maintenance

        0 % Cluster Maintenance items are ✓
          • OPERATION
          • SECURITY
        • Using automation and this method will ensure that all your nodes are consistently up to date with last features/fixes/patchs, without having to upgrade the kubernetes version. An alternative could be to use Kured to reboot nodes with pending reboots but it will only patch the Operating System, not the AKS layer

          • OPERATION
          • SECURITY
          • WINDOWS
          • OPERATION
          • SECURITY
          • OPERATION
          • OPERATION
          • OPERATION
          • OPERATION
          • SECURITY
        • Typically, to use Prometheus, you need to set up and manage a Prometheus server with a store. By integrating with Azure Monitor, a Prometheus server is not required. You just need to expose the Prometheus metrics endpoint through your exporters or pods (application), and the containerized agent for Azure Monitor for containers can scrape the metrics for you.

          • OPERATION
          • OPERATION
          • OPERATION
          • OPERATION
          • OPERATION
          • OPERATION
          • SECURITY
        • On Azure you can for instance use Azure Arc for Kubernetes but also directly GitOps addon for AKS

          • OPERATION
          • OPERATION
          • OPERATION
        • Documentation

          • NETWORK
          • MULTI-TENANCY

        Report and navigation

        • 0/33 ✓ high priority
        • 0/46 ✓ medium priority
        • 0/20 ✓ low priority
        X