Cluster operator and developer best practices to build and manage applications on Azure Kubernetes Service (AKS)
Building and running applications successfully in Azure Kubernetes Service (AKS) requires understanding and implementation of some key concepts, including:
- Multi-tenancy and scheduler features.
- Cluster and pod security.
- Business continuity and disaster recovery.
The AKS product group, engineering teams, and field teams (including global black belts (GBBs)) contributed to, wrote, and grouped the following best practices and conceptual articles. Their purpose is to help cluster operators and developers better understand the concepts above and implement the appropriate features.
Cluster operator best practices
If you're a cluster operator, work with application owners and developers to understand their needs. Then, you can use the following best practices to configure your AKS clusters to fit your needs.
An important practice that you should include as part of your application development and deployment process is remembering to follow commonly used deployment and testing patterns. Testing your application before deployment is an important step to ensure its quality, functionality, and compatibility with the target environment. It can help you identify and fix any errors, bugs, or issues that might affect the performance, security, or usability of the application or underlying infrastructure.
Multi-tenancy
- Best practices for cluster isolation
- Includes multi-tenancy core components and logical isolation with namespaces.
- Best practices for basic scheduler features
- Includes using resource quotas and pod disruption budgets.
- Best practices for advanced scheduler features
- Includes using taints and tolerations, node selectors and affinity, and inter-pod affinity and anti-affinity.
- Best practices for authentication and authorization
- Includes integration with Microsoft Entra ID, using Kubernetes role-based access control (Kubernetes RBAC), using Azure RBAC, and pod identities.
Security
- Best practices for cluster security and upgrades
- Includes securing access to the API server, limiting container access, and managing upgrades and node reboots.
- Best practices for container image management and security
- Includes securing the image and runtimes and automated builds on base image updates.
- Best practices for pod security
- Includes securing access to resources, limiting credential exposure, and using pod identities and digital key vaults.
Network and storage
- Best practices for network connectivity
- Includes different network models, using ingress and web application firewalls (WAF), and securing node SSH access.
- Best practices for storage and backups
- Includes choosing the appropriate storage type and node size, dynamically provisioning volumes, and data backups.
Running enterprise-ready workloads
- Best practices for business continuity and disaster recovery
- Includes using region pairs, multiple clusters with Azure Traffic Manager, and geo-replication of container images.
Developer best practices
If you're a developer or application owner, you can simplify your development experience and define required application performance features.
- Best practices for application developers to manage resources
- Includes defining pod resource requests and limits, configuring development tools, and checking for application issues.
- Best practices for pod security
- Includes securing access to resources, limiting credential exposure, and using pod identities and digital key vaults.
- Best practices for deployment and cluster reliability
- Includes deployment, cluster, and node pool level best practices.
Kubernetes and AKS concepts
The following conceptual articles cover some of the fundamental features and components for clusters in AKS: