Curated documentation updates, feature announcements, community blogs, release highlights, and more.
February continued with a strong focus on AI/ML workloads, VM convergence, and platform security hardening. This edition covers new AKS Engineering Blog posts on scaling Ray workloads, deploying KubeVirt, and autoscaling KAITO inference with KEDA. The release notes bring meaningful behavioral changes including LocalDNS defaulting on for Kubernetes 1.35+, new security annotations on nodes, and important Windows Server retirement timelines. Documentation updates span storage, networking, identity bindings, and Agentic CLI support.
Let's dive in.
Instant access snapshot support for Premium SSD v2 and Ultra Disk: New documentation covers how to use instant access snapshots with Premium SSD v2 and Ultra Disk in AKS. This enables faster volume snapshot and restore operations for performance-critical stateful workloads.
NVMe storage best practices for AKS: The documentation for emptyDir volumes was updated to clarify NVMe-based usage patterns and best practices. This is relevant for teams using NVMe-backed local storage for ephemeral or scratch workloads.
Container comparison article added to AKS docs: A new article comparing container options was added to the AKS documentation. It helps teams evaluate Azure Container Apps, AKS, and other container hosting options side by side.
NPM retirement documentation: Documentation was updated to reflect the retirement of Azure Network Policy Manager (NPM). Teams relying on NPM should plan their migration to Cilium-based or Calico-based network policies.
Ubuntu 24.04 containerd 2.0 and version updates: The Ubuntu 24.04 documentation was refreshed with containerd 2.0 details and version consistency fixes. This is important for teams planning OS upgrades.
Identity bindings updates for .NET, Go, and JavaScript: The identity bindings documentation was expanded with language-specific guidance for Go, JavaScript, and .NET. This makes it easier for application developers to adopt the new identity bindings model.
LocalDNS Preferred mode and node reimage impact: The LocalDNS documentation now clarifies Preferred mode behavior and the impact of node reimage on DNS configuration. This is important context given that LocalDNS now defaults to enabled on Kubernetes 1.35+.
Node auto-provisioning disk encryption and metrics: Documentation was updated to cover disk encryption support and metrics for node auto-provisioning (NAP). This fills a gap for security-conscious teams using NAP.
AKS MCP (Model Context Protocol) documentation: New documentation introduces the AKS MCP server and Agentic CLI. This enables AI-assisted cluster management and troubleshooting through natural language interfaces.
Managed Gateway API content updates: The Managed Gateway API documentation received a content development review and editorial updates. Gateway API CRDs can now be enabled directly without requiring a supported gateway implementation to be installed first.
ACStor v2 documentation: The Azure Container Storage documentation was updated to reflect ACStor v2. This brings updated guidance for storage pool management and volume provisioning.
Windows Annual Channel retirement notice: A retirement notice was added for Windows Server Annual Channel (Preview), scheduled for May 15, 2026. Teams using Annual Channel should migrate to LTSC.
API Server VNET Integration – new regions: API Server VNET Integration is now available in eastus2, eastus3, and belgiumcentral. This expands the geographic availability for teams requiring private API server access through VNET integration.
HTTP Proxy and Custom CA support in NAP clusters: HTTP Proxy and Custom Certificate Authority (CA) are now supported in Node Auto-Provisioning (NAP) enabled clusters. This removes a gap for enterprises requiring proxy-based egress control and custom certificate chains.
LocalDNS enabled by default on Kubernetes 1.35+: LocalDNS is now enabled by default for clusters running Kubernetes 1.35 or newer. This is a significant operational change — teams should validate DNS resolution behavior before upgrading to 1.35.
Security patch timestamp annotation on nodes: Nodes are now annotated with kubernetes.azure.com/security-patch-timestamp during security VHD reboot upgrades. This gives operators a unified way to verify when the last security patch was applied to each node.
NSG management changes for Application Gateway for Containers: AKS no longer creates or updates Network Security Groups on subnets delegated for Application Gateway for Containers. This improves reliability in policy-managed environments but may require teams to manage NSG rules directly.
AKS Automatic – nodes/proxy defense hardening: AKS Automatic has added multiple layers of defense against remote code execution via nodes/proxy permissions, including a ValidatingAdmissionPolicy blocking nodes/proxy grants and an authorization policy denying nodes/proxy by default.
Deployment Safeguards probe policy relaxed on AKS Automatic: AKS Deployment Safeguards no longer deny missing startup, liveness, and readiness probe requirements on AKS Automatic clusters. The policy has been changed to warn only.
Gateway API CRDs without gateway implementation: Gateway API CRDs can now be enabled directly without first requiring a supported gateway implementation such as the Managed Istio service mesh add-on. This reduces friction for teams adopting Gateway API incrementally.
Windows Server 2019 retirement – March 1, 2026: Windows Server 2019 is scheduled for retirement on March 1, 2026. After that date, AKS will no longer produce new node images or provide security patches. Teams must transition to Windows Server 2022 or newer.
Windows Server Annual Channel retirement – May 15, 2026: Windows Server Annual Channel (Preview) will be retired on May 15, 2026. Teams should transition to the Long Term Servicing Channel (LTSC).
Istio add-on revision asm-1-25 deprecated: Istio-based service mesh add-on revision asm-1-25 has been deprecated. Revision asm-1-28 is now supported.
Scaling Anyscale Ray Workloads on AKS: This post covers running Anyscale's managed Ray service on AKS with multi-cluster multi-region GPU capacity aggregation, unified BlobFuse2 storage for ML/AI pipelines, and automated service principal authentication. It is especially relevant for teams running distributed training and inference at scale across GPU-constrained regions.
Deploying KubeVirt on AKS: This post walks through deploying KubeVirt on AKS for running virtual machines alongside containerized workloads. It covers prerequisites including nested virtualization support, KubeVirt operator installation with AKS-specific node placement, and VM migration using Forklift. This is a key reference for organizations with legacy VM workloads exploring Kubernetes-based unified infrastructure management.
Autoscale KAITO inference workloads on AKS using KEDA: This post introduces the new KAITO InferenceSet CRD and KEDA KAITO Scaler for event-driven autoscaling of LLM inference workloads. It shows how to configure both time-based and metric-based scaling (using vLLM metrics like num_requests_waiting) to dynamically scale GPU inference instances. This directly addresses GPU cost optimization for production AI workloads.
Accelerating AKS Upgrades with Fleet Manager: Finding the Right Balance: This post explores the trade-offs between speed and safety when orchestrating AKS upgrades at scale using Azure Fleet Manager. It covers update runs, stages, and groups, and explains why reducing stages risks blast radius while increasing parallel update groups can hit capacity constraints.
Seamless Migrations From Self Hosted Nginx Ingress To The AKS App Routing Add-On: With the upstream Nginx Ingress controller retiring in March 2026, this post walks through a zero-downtime migration to the AKS App Routing add-on. It covers running both controllers in parallel with separate IngressClasses and cutting over DNS without disrupting production traffic.
Regional Endpoints for Geo-Replicated Azure Container Registries (Private Preview): Regional endpoints now let teams target specific ACR geo-replicated regions directly, bypassing Azure-managed routing. This enables predictable regional affinity for AKS clusters, client-side failover strategies, and easier troubleshooting of image pull behavior.
Beyond iptables: Scaling AKS Networking with nftables and Project Calico: This post explains the transition from iptables to nftables in AKS Ubuntu 24.04 nodes. It covers how Project Calico adapts to nftables-based dataplane rules and what operators need to know about compatibility, performance gains, and troubleshooting in the new networking stack.
Hardening Spring Boot Health Probes on AKS: How to Prevent Restart Storms Before They Start: This post dives into common misconfiguration patterns with Spring Boot liveness and readiness probes on AKS that can trigger cascading pod restarts under load. It provides guidance on tuning probe timing, separating health check dependencies, and avoiding restart storms in production.
AKS Tenant Migration: Considerations and Approach: A detailed guide on migrating AKS clusters between Azure AD tenants. It covers planning considerations, resource group moves, identity reconfiguration, and step-by-step migration procedures for clusters with managed identities and RBAC dependencies.
Reference Architecture for Highly Available Multi-Region Azure Kubernetes Service (AKS): This reference architecture post outlines a multi-region AKS deployment pattern for high availability, covering Azure Front Door for global load balancing, cross-region state management, and failover strategies for mission-critical workloads.
From Ingress to Gateway API: A Pragmatic Path Forward: This post covers the evolution from Kubernetes Ingress to the Gateway API, explaining the practical benefits and migration path for AKS users. It covers HTTPRoute, TLSRoute, and how the new API model enables more expressive traffic management.
Retina 1.0 Is Now Available: Retina, the open-source cloud-native container networking observability platform, has reached 1.0. This release includes production-ready distributed packet captures, flow-level metrics with Hubble integration, and support for both AKS and self-managed Kubernetes clusters.
Troubleshooting Disk Latency with Burak Ok – AKS Troubleshooting Series: This episode covers diagnosing and resolving disk latency issues in AKS clusters, including identifying bottlenecks in Azure Managed Disks and NVMe storage, and practical tooling for measuring I/O performance.
AKS Community Call – US & Europe (Feb 2026): The February 2026 Community Call covers announcements, community content showcase, a feature deep dive on AKS networking best practices, product roadmap updates, and open Q&A with the AKS team.
Troubleshooting OOM failures with Claudio Godoy – AKS Troubleshooting Series: This episode of the AKS Troubleshooting Series covers how to diagnose and resolve out-of-memory (OOM) failures in AKS clusters. It walks through practical scenarios and tooling for identifying memory pressure at both the pod and node level.
Troubleshooting DNS Issues with Qasim Sarfaraz – AKS Troubleshooting Series: This episode walks through diagnosing DNS resolution failures in AKS, covering CoreDNS debugging, pod-level DNS configuration, and common pitfalls with custom DNS and LocalDNS setups.
February was defined by three clear themes: AI/ML workload maturity, platform security hardening, and operational convergence.
The KAITO InferenceSet with KEDA integration, Ray on AKS with Anyscale, and managed GPU profiles all point to AKS becoming a first-class platform for AI/ML at scale. The investment in event-driven autoscaling for inference workloads directly addresses GPU cost optimization — a top concern for every team running LLMs in production.
On the security front, the nodes/proxy hardening in AKS Automatic, new security patch timestamp annotations, and extensive CVE remediation across Cilium, Konnectivity, and egress gateway components show a consistent push toward defense-in-depth.
The KubeVirt support signals that AKS is also evolving as a convergence platform for teams managing both containerized and VM-based workloads. Combined with the new MCP/Agentic CLI documentation, AKS is expanding its operational surface in meaningful ways.
For platform teams: pay close attention to the LocalDNS default change on Kubernetes 1.35+ and the Windows Server 2019 retirement deadline of March 1, 2026. Both require proactive planning.