{"id":13400,"date":"2026-02-25T12:37:56","date_gmt":"2026-02-25T12:37:56","guid":{"rendered":"https:\/\/cheesecakelabs.com\/blog\/"},"modified":"2026-04-02T00:42:08","modified_gmt":"2026-04-02T00:42:08","slug":"kubernetes-autoscaling","status":"publish","type":"post","link":"https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/","title":{"rendered":"Production-Grade Kubernetes Autoscaling: Custom Metrics, Prometheus, Celery &amp; Cluster Node Scaling"},"content":{"rendered":"\n<p>Autoscaling is one of the most powerful promises of Kubernetes, but also one of the most misunderstood. Many teams rely on default CPU \u2014 or memory \u2014 based autoscaling, only to discover that their cluster <strong>does<\/strong> <strong>not<\/strong> <strong>scale<\/strong> <strong>when<\/strong> <strong>it<\/strong> <strong>should<\/strong>, especially for asynchronous workloads (e.g., Celery workers, Kafka consumers, ETL services, etc.). <\/p>\n\n\n\n<p>In this guide, we\u2019ll walk through a <strong>production-proven autoscaling strategy<\/strong> that goes far beyond basic resource metrics. You\u2019ll learn how to combine:  <\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Custom application metrics;<\/strong><\/li>\n\n\n\n<li><strong>Prometheus<\/strong> for scraping;<\/li>\n\n\n\n<li><strong>Prometheus Adapter<\/strong> for exposing metrics to Kubernetes;<\/li>\n\n\n\n<li><strong>Horizontal Pod Autoscalers (HPA)<\/strong> for pod-level scaling;<\/li>\n\n\n\n<li><strong>AKS (or any cloud)<\/strong> <strong>Node Autoscaler<\/strong> for infrastructure-level scaling;<\/li>\n<\/ul>\n\n\n\n<p>This is the exact architecture we use to scale Celery-based workloads under high demand, and it has proven to be resilient, predictable, and cost-efficient.<\/p>\n\n\n\n<p>Whether you&#8217;re building search pipelines, background processors, event-driven systems, or AI inference services, this tutorial provides <strong>a definitive scaling blueprint<\/strong> for Kubernetes. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Why traditional autoscaling falls short<\/strong><\/h2>\n\n\n\n<p>Kubernetes\u2019s default horizontal autoscaling relies on <strong>CPU<\/strong> and <strong>memory<\/strong>, but with asynchronous or queue-driven systems, those metrics don\u2019t reflect real pressure.<\/p>\n\n\n\n<p>For example:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Your Celery queue grows from 200 \u2192 6,000 tasks;<\/li>\n\n\n\n<li>Workers are fully idle between tasks;<\/li>\n\n\n\n<li>CPU remains at 10\u201330%;<\/li>\n\n\n\n<li>Kubernetes thinks everything is fine.<\/li>\n<\/ul>\n\n\n\n<p>Meanwhile, users wait longer and longer.<\/p>\n\n\n\n<p><strong>The truth is:<\/strong> To autoscale correctly, you need <strong>semantic metrics<\/strong> that describe the real workload, like queue depth, worker utilization, event lag, etc. In this tutorial, we\u2019ll scale <strong>based on what actually matters<\/strong>, not what the kernel reports.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Also, read it:<\/strong> <a href=\"https:\/\/cheesecakelabs.com\/blog\/cross-platform-migration-why-it-works\/\" target=\"_blank\" rel=\"noreferrer noopener\">Cross-Platform Migration: Why It Works<\/a><\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Architecture overview<\/strong><\/h2>\n\n\n\n<p>Autoscaling will follow this flow:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" width=\"1200\" height=\"800\" src=\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2026\/02\/architeture-kuberneste-1200x800.jpg\" alt=\"\" class=\"wp-image-13404\" srcset=\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2026\/02\/architeture-kuberneste-1200x800.jpg 1200w, https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2026\/02\/architeture-kuberneste-600x400.jpg 600w, https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2026\/02\/architeture-kuberneste-768x512.jpg 768w, https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2026\/02\/architeture-kuberneste-900x600.jpg 900w, https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2026\/02\/architeture-kuberneste-760x507.jpg 760w, https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2026\/02\/architeture-kuberneste.jpg 1536w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" \/><\/figure>\n<\/div>\n\n\n<p>We will implement each layer step-by-step.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>1. Exposing application metrics using Celery as an example<\/strong><\/h3>\n\n\n\n<p>First, expose the metrics your autoscaler will use.<br>For Celery, we typically output:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>celery_queue_depth<\/li>\n\n\n\n<li>celery_workers_busy_ratio<\/li>\n<\/ul>\n\n\n\n<p>Example endpoint structure (Python\/Flask style):<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-1\" data-shcb-language-name=\"PHP\" data-shcb-language-slug=\"php\"><span><code class=\"hljs language-php\">@app.route(<span class=\"hljs-string\">\"\/metrics\/celery\"<\/span>)\n\ndef celery_metrics():\n\n\u00a0\u00a0\u00a0\u00a0queue_depth = get_queue_depth()\n\n\u00a0\u00a0\u00a0\u00a0busy_ratio = get_busy_workers_ratio()\n\n\u00a0\u00a0\u00a0\u00a0<span class=\"hljs-keyword\">return<\/span> f<span class=\"hljs-string\">\"\"<\/span><span class=\"hljs-string\">\"\n\n\u00a0\u00a0\u00a0\u00a0celery_queue_depth {queue_depth}\n\n\u00a0\u00a0\u00a0\u00a0celery_workers_busy_ratio {busy_ratio}\n\n\u00a0\u00a0\u00a0\u00a0\"<\/span><span class=\"hljs-string\">\"\"<\/span>, <span class=\"hljs-number\">200<\/span>, {<span class=\"hljs-string\">\"Content-Type\"<\/span>: <span class=\"hljs-string\">\"text\/plain\"<\/span>}<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-1\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">PHP<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">php<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Expose it inside Kubernetes via a Service:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-2\" data-shcb-language-name=\"HTTP\" data-shcb-language-slug=\"http\"><span><code class=\"hljs language-http\"><span class=\"hljs-attribute\">apiVersion<\/span>: v1\n\nkind: Service\n\nmetadata:\n\n\u00a0\u00a0name: worker-metrics\n\nspec:\n\n\u00a0\u00a0selector:\n\n\u00a0\u00a0\u00a0\u00a0app: my-worker\n\n\u00a0\u00a0ports:\n\n\u00a0\u00a0\u00a0\u00a0- name: metrics\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0port: 8000\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0targetPort: 8000<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-2\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">HTTP<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">http<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h2 class=\"wp-block-heading\"><strong>2. Scraping metrics with Prometheus<\/strong><\/h2>\n\n\n\n<p>Prometheus must scrape the metrics endpoint.<\/p>\n\n\n\n<p>Add the following to your Prometheus <strong>values.yaml<\/strong>:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-3\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\">extraScrapeConfigs: |\n\n\u00a0\u00a0- job_name: <span class=\"hljs-string\">'worker-celery-metrics'<\/span>\n\n\u00a0\u00a0\u00a0\u00a0<span class=\"hljs-attr\">metrics_path<\/span>: <span class=\"hljs-regexp\">\/metrics\/<\/span>celery\n\n\u00a0\u00a0\u00a0\u00a0<span class=\"hljs-attr\">scrape_interval<\/span>: <span class=\"hljs-number\">15<\/span>s\n\n\u00a0\u00a0\u00a0\u00a0<span class=\"hljs-attr\">static_configs<\/span>:\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0- targets:\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0- <span class=\"hljs-string\">'worker-metrics.default.svc.cluster.local:8000'<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-3\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>This ensures Prometheus scrapes celery_queue_depth and celery_workers_busy_ratio every 15 seconds.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>3. Exposing metrics via Prometheus Adapter<\/strong><\/h2>\n\n\n\n<p>Kubernetes Horizontal Pod Autoscalers cannot consume Prometheus metrics directly. To bridge this gap, the <strong>Prometheus Adapter<\/strong> is deployed as a <strong>separate component in the cluster<\/strong>, usually via <a href=\"https:\/\/github.com\/prometheus-community\/helm-charts\/tree\/main\/charts\/prometheus-adapter\" target=\"_blank\" rel=\"noreferrer noopener\">its own Helm chart<\/a> and configuration file, commonly in a shared namespace such as monitoring.<\/p>\n\n\n\n<p>The adapter connects to Prometheus, runs predefined queries, and then exposes the results through the Kubernetes <strong>External Metrics API<\/strong> (external.metrics.k8s.io). These metric-mapping rules are defined exclusively in the Prometheus Adapter configuration and are what make custom application metrics, such as Celery queue depth or worker utilization, available for autoscaling.<\/p>\n\n\n\n<p><strong>Example adapter configuration:<\/strong><\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-4\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\">rules:\n\n\u00a0\u00a0external:\n\n\u00a0\u00a0\u00a0\u00a0- seriesQuery: <span class=\"hljs-string\">'celery_queue_depth{job=\"worker-celery-metrics\"}'<\/span>\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<span class=\"hljs-attr\">name<\/span>:\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<span class=\"hljs-keyword\">as<\/span>: <span class=\"hljs-string\">'celery_queue_depth'<\/span>\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<span class=\"hljs-attr\">metricsQuery<\/span>: <span class=\"hljs-string\">'avg(celery_queue_depth)'<\/span>\n\n\u00a0\u00a0\u00a0\u00a0- seriesQuery: <span class=\"hljs-string\">'celery_workers_busy_ratio{job=\"worker-celery-metrics\"}'<\/span>\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<span class=\"hljs-attr\">name<\/span>:\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<span class=\"hljs-keyword\">as<\/span>: <span class=\"hljs-string\">'celery_workers_busy_ratio'<\/span>\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<span class=\"hljs-attr\">metricsQuery<\/span>: <span class=\"hljs-string\">'avg(celery_workers_busy_ratio)'<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-4\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Once deployed, these metrics can be queried by Kubernetes and referenced directly by Horizontal Pod Autoscalers.<br><br>These can be inspected with:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-5\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\">kubectl <span class=\"hljs-keyword\">get<\/span> --raw \"\/apis\/external.metrics.k8s.io\/v1beta1\" | jq\n\nYou should see:\n\ncelery_queue_depth\n\ncelery_workers_busy_ratio<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-5\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h2 class=\"wp-block-heading\"><strong>4. Kubernetes HPA: Autoscaling pods based on real workload<\/strong><\/h2>\n\n\n\n<p>Once custom metrics are exposed through the Prometheus Adapter, Kubernetes can use them to make autoscaling decisions. This is done through the <strong>Horizontal Pod Autoscaler (HPA)<\/strong>, which adjusts the number of pod replicas based on real workload pressure rather than CPU or memory usage alone.<\/p>\n\n\n\n<p>In this setup, the HPA scales worker pods based on external metrics and is configured to scale worker pods using external metrics such as queue depth and worker utilization. These metrics reflect how much work the system is actually processing, making scaling decisions more accurate and responsive.<\/p>\n\n\n\n<p>Example HPA configuration (using Helm):<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-6\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\">autoscaling:\n\n\u00a0\u00a0enabled: <span class=\"hljs-literal\">true<\/span>\n\n\u00a0\u00a0<span class=\"hljs-attr\">minReplicas<\/span>: <span class=\"hljs-number\">2<\/span>\n\n\u00a0\u00a0<span class=\"hljs-attr\">maxReplicas<\/span>: <span class=\"hljs-number\">10<\/span>\n\n\u00a0\u00a0<span class=\"hljs-attr\">metrics<\/span>:\n\n\u00a0\u00a0\u00a0\u00a0- type: External\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<span class=\"hljs-attr\">external<\/span>:\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0metric:\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0name: celery_queue_depth\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<span class=\"hljs-attr\">target<\/span>:\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0type: AverageValue\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<span class=\"hljs-attr\">averageValue<\/span>: <span class=\"hljs-string\">\"10\"<\/span>\n\n\u00a0\u00a0\u00a0\u00a0- type: External\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<span class=\"hljs-attr\">external<\/span>:\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0metric:\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0name: celery_workers_busy_ratio\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<span class=\"hljs-attr\">target<\/span>:\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0type: AverageValue\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<span class=\"hljs-attr\">averageValue<\/span>: <span class=\"hljs-string\">\"1\"<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-6\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>With this configuration:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Pods scale up as the queue grows beyond the target depth;<\/li>\n\n\n\n<li>Pods scale up when workers approach full utilization;<\/li>\n\n\n\n<li>Pods scale down when the system becomes idle.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Tuning HPA behavior and why this matters<\/strong><\/h3>\n\n\n\n<p>When scaling based on external or workload-driven metrics, it is important to also configure <strong>HPA behavior parameters<\/strong>. Without them, the autoscaler may react too aggressively to short-lived metric spikes, leading to rapid scale-up and scale-down cycles (\u201cflapping\u201d).<\/p>\n\n\n\n<p>By defining stabilization windows and scaling policies, you can ensure that scaling decisions are <strong>smoother, more predictable, and aligned with sustained workload trends rather than transient noise<\/strong>.<\/p>\n\n\n\n<p><strong>Example behavior configuration:<\/strong><\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs\">behavior:\n\n\u00a0\u00a0scaleUp:\n\n\u00a0\u00a0\u00a0\u00a0stabilizationWindowSeconds: 300\n\n\u00a0\u00a0\u00a0\u00a0policies:\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0- type: Pods\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0value: 1\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0periodSeconds: 60\n\n\u00a0\u00a0scaleDown:\n\n\u00a0\u00a0\u00a0\u00a0stabilizationWindowSeconds: 300\n\n\u00a0\u00a0\u00a0\u00a0policies:\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0- type: Pods\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0value: 1\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0periodSeconds: 60<\/code><\/span><\/pre>\n\n\n<p>This configuration limits how frequently replicas can be added or removed and gives the system time to absorb changes in demand before making further adjustments. In production environments, especially when autoscaling from queue-based metrics, proper HPA behavior tuning is essential to avoid instability and unnecessary resource churn.<\/p>\n\n\n\n<p>This is <strong>semantic autoscaling<\/strong>, scaling driven by business logic.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Also, read it: <\/strong><a href=\"https:\/\/cheesecakelabs.com\/blog\/nearshore-staff-augmentation\/\" target=\"_blank\" rel=\"noreferrer noopener\">Nearshore Staff Augmentation: A Guide For Your Business<\/a><\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>5. Node-level autoscaling (AKS or any cloud provider)<\/strong><\/h2>\n\n\n\n<p>Pod autoscaling only works when the cluster has enough compute capacity. To automatically add or remove nodes, a <strong>cluster autoscaler<\/strong> must be enabled. This can be implemented using the platform\u2019s preferred solution (such as the native Kubernetes Cluster Autoscaler, <strong>Karpenter<\/strong>, or a cloud-provider managed autoscaler) and configured either via infrastructure-as-code tools or directly through the cloud console. In this guide, node autoscaling is enabled using <strong>Terraform<\/strong>, but the same concepts apply regardless of the tooling or autoscaler implementation used.<\/p>\n\n\n\n<p>Generic Terraform example:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-7\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\">resource <span class=\"hljs-string\">\"azurerm_kubernetes_cluster\"<\/span> <span class=\"hljs-string\">\"example\"<\/span> {\n\n\u00a0\u00a0name\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 = <span class=\"hljs-string\">\"autoscaling-cluster\"<\/span>\n\n\u00a0\u00a0location\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 = <span class=\"hljs-string\">\"eastus\"<\/span>\n\n\u00a0\u00a0resource_group_name = azurerm_resource_group.example.name\n\n\u00a0\u00a0dns_prefix\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 = <span class=\"hljs-string\">\"example\"<\/span>\n\n\u00a0\u00a0default_node_pool {\n\n\u00a0\u00a0\u00a0\u00a0name\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 = <span class=\"hljs-string\">\"default\"<\/span>\n\n\u00a0\u00a0\u00a0\u00a0vm_size \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 = <span class=\"hljs-string\">\"Standard_D4s_v3\"<\/span>\n\n\u00a0\u00a0\u00a0\u00a0node_count\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 = <span class=\"hljs-number\">2<\/span>\n\n\u00a0\u00a0\u00a0\u00a0enable_auto_scaling = <span class=\"hljs-literal\">true<\/span>\n\n\u00a0\u00a0\u00a0\u00a0min_count \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 = <span class=\"hljs-number\">1<\/span>\n\n\u00a0\u00a0\u00a0\u00a0max_count \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 = <span class=\"hljs-number\">5<\/span>\n\n\u00a0\u00a0\u00a0\u00a0mode\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 = <span class=\"hljs-string\">\"System\"<\/span>\n\n\u00a0\u00a0}\n\n\u00a0\u00a0identity {\n\n\u00a0\u00a0\u00a0\u00a0type = <span class=\"hljs-string\">\"SystemAssigned\"<\/span>\n\n\u00a0\u00a0}\n\n}<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-7\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Behavior:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If pods cannot be scheduled \u2192 Add a new node;<\/li>\n\n\n\n<li>If nodes stay underutilized \u2192 Remove nodes;<\/li>\n<\/ul>\n\n\n\n<p>This ensures:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Your HPA never stalls;<\/li>\n\n\n\n<li>You only pay for what you use;<\/li>\n<\/ul>\n\n\n\n<p>This is essential for any scalable production Kubernetes system.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>6. Validating the setup<\/strong><\/h2>\n\n\n\n<p>Check HPA decisions:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-8\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\">kubectl describe hpa -n <span class=\"hljs-keyword\">default<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-8\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Check external metrics:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-9\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\">kubectl <span class=\"hljs-keyword\">get<\/span> --raw \"\/apis\/external.metrics.k8s.io\/v1beta1\/namespaces\/default\/celery_queue_depth\"<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-9\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Check pending pods:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-10\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\">kubectl <span class=\"hljs-keyword\">get<\/span> pods -A | grep Pending<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-10\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Check node autoscaler actions:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-11\" data-shcb-language-name=\"HTML, XML\" data-shcb-language-slug=\"xml\"><span><code class=\"hljs language-xml\">kubectl get nodes\n\nkubectl describe node <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">name<\/span>&gt;<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-11\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">HTML, XML<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">xml<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h2 class=\"wp-block-heading\"><strong>Conclusion: A scalable, production-proven autoscaling strategy<\/strong><\/h2>\n\n\n\n<p>Autoscaling Kubernetes isn\u2019t just about turning on HPAs. Real-world autoscaling requires <strong>application-aware metrics<\/strong>, <strong>smart decision-making,<\/strong> and <strong>infrastructure elasticity<\/strong>.<\/p>\n\n\n\n<p>By combining:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Custom metrics<\/li>\n\n\n\n<li>Prometheus<\/li>\n\n\n\n<li>Prometheus Adapter<\/li>\n\n\n\n<li>Kubernetes HPA<\/li>\n\n\n\n<li>Node autoscaling<\/li>\n<\/ul>\n\n\n\n<p>You build a cluster that reacts to <strong>real demand<\/strong>, scales smoothly under pressure, and minimizes cost during idle periods.<\/p>\n\n\n\n<p>If your applications rely on queues, background processing, or any asynchronous workloads, this scaling strategy is not just ideal. It\u2019s essential.<\/p>\n\n\n\n<p>This is the <strong>definitive way to autoscale Kubernetes<\/strong>.<\/p>\n\n\n\n<p>If your engineering team wants help implementing this pattern or wants to scale more advanced workloads (AI, search pipelines, ETL, etc.), feel free to reach out!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Autoscaling is one of the most powerful promises of Kubernetes, but also one of the most misunderstood. Many teams rely on default CPU \u2014 or memory \u2014 based autoscaling, only to discover that their cluster does not scale when it should, especially for asynchronous workloads (e.g., Celery workers, Kafka consumers, ETL services, etc.). In this [&hellip;]<\/p>\n","protected":false},"author":92,"featured_media":13401,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[432],"tags":[305,1362,1363],"class_list":["post-13400","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-engineering","tag-tag-development","tag-kubernetes","tag-prometheus"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.1.1 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Kubernetes Autoscaling: Custom Metrics, Prometheus, Celery &amp; Cluster Node Scaling<\/title>\n<meta name=\"description\" content=\"Discover how to effectively implement Kubernetes Autoscaling with custom metrics and strategies for resilient workload handling.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Kubernetes Autoscaling: Custom Metrics, Prometheus, Celery &amp; Cluster Node Scaling\" \/>\n<meta property=\"og:description\" content=\"Discover how to effectively implement Kubernetes Autoscaling with custom metrics and strategies for resilient workload handling.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/\" \/>\n<meta property=\"og:site_name\" content=\"Cheesecake Labs\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cheesecakelabs\" \/>\n<meta property=\"article:published_time\" content=\"2026-02-25T12:37:56+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-04-02T00:42:08+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2026\/02\/Kubernetes-Autoscaling.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1525\" \/>\n\t<meta property=\"og:image:height\" content=\"677\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Cheesecake Labs\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@cheesecakelabs\" \/>\n<meta name=\"twitter:site\" content=\"@cheesecakelabs\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/\"},\"author\":{\"name\":\"Jo\u00e3o Victor Alhadas\"},\"headline\":\"Production-Grade Kubernetes Autoscaling: Custom Metrics, Prometheus, Celery &amp; Cluster Node Scaling\",\"datePublished\":\"2026-02-25T12:37:56+00:00\",\"dateModified\":\"2026-04-02T00:42:08+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/\"},\"wordCount\":1007,\"image\":{\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2026\/02\/Kubernetes-Autoscaling.jpg\",\"keywords\":[\"development\",\"kubernetes\",\"prometheus\"],\"articleSection\":[\"Engineering\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/\",\"url\":\"https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/\",\"name\":\"Kubernetes Autoscaling: Custom Metrics, Prometheus, Celery & Cluster Node Scaling\",\"isPartOf\":{\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2026\/02\/Kubernetes-Autoscaling.jpg\",\"datePublished\":\"2026-02-25T12:37:56+00:00\",\"dateModified\":\"2026-04-02T00:42:08+00:00\",\"author\":{\"@type\":\"person\",\"name\":\"Jo\u00e3o Victor Alhadas\"},\"description\":\"Discover how to effectively implement Kubernetes Autoscaling with custom metrics and strategies for resilient workload handling.\",\"breadcrumb\":{\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/#primaryimage\",\"url\":\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2026\/02\/Kubernetes-Autoscaling.jpg\",\"contentUrl\":\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2026\/02\/Kubernetes-Autoscaling.jpg\",\"width\":1525,\"height\":677},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/cheesecakelabs.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Production-Grade Kubernetes Autoscaling: Custom Metrics, Prometheus, Celery &amp; Cluster Node Scaling\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/#website\",\"url\":\"https:\/\/cheesecakelabs.com\/blog\/\",\"name\":\"Cheesecake Labs\",\"description\":\"Nearshore outsourcing company for Web and Mobile design and engineering services, and staff augmentation for startups and enterprises..\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/cheesecakelabs.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"name\":\"Jo\u00e3o Victor Alhadas\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2024\/12\/joao-ferreira-devops.jpg\",\"contentUrl\":\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2024\/12\/joao-ferreira-devops.jpg\",\"caption\":\"Jo\u00e3o Victor Alhadas\"},\"url\":\"https:\/\/cheesecakelabs.com\/blog\/autor\/joao-victor-alhadas\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Kubernetes Autoscaling: Custom Metrics, Prometheus, Celery & Cluster Node Scaling","description":"Discover how to effectively implement Kubernetes Autoscaling with custom metrics and strategies for resilient workload handling.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/","og_locale":"en_US","og_type":"article","og_title":"Kubernetes Autoscaling: Custom Metrics, Prometheus, Celery & Cluster Node Scaling","og_description":"Discover how to effectively implement Kubernetes Autoscaling with custom metrics and strategies for resilient workload handling.","og_url":"https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/","og_site_name":"Cheesecake Labs","article_publisher":"https:\/\/www.facebook.com\/cheesecakelabs","article_published_time":"2026-02-25T12:37:56+00:00","article_modified_time":"2026-04-02T00:42:08+00:00","og_image":[{"width":1525,"height":677,"url":"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2026\/02\/Kubernetes-Autoscaling.jpg","type":"image\/jpeg"}],"author":"Cheesecake Labs","twitter_card":"summary_large_image","twitter_creator":"@cheesecakelabs","twitter_site":"@cheesecakelabs","twitter_misc":{"Written by":null,"Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/#article","isPartOf":{"@id":"https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/"},"author":{"name":"Jo\u00e3o Victor Alhadas"},"headline":"Production-Grade Kubernetes Autoscaling: Custom Metrics, Prometheus, Celery &amp; Cluster Node Scaling","datePublished":"2026-02-25T12:37:56+00:00","dateModified":"2026-04-02T00:42:08+00:00","mainEntityOfPage":{"@id":"https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/"},"wordCount":1007,"image":{"@id":"https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/#primaryimage"},"thumbnailUrl":"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2026\/02\/Kubernetes-Autoscaling.jpg","keywords":["development","kubernetes","prometheus"],"articleSection":["Engineering"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/","url":"https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/","name":"Kubernetes Autoscaling: Custom Metrics, Prometheus, Celery & Cluster Node Scaling","isPartOf":{"@id":"https:\/\/cheesecakelabs.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/#primaryimage"},"image":{"@id":"https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/#primaryimage"},"thumbnailUrl":"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2026\/02\/Kubernetes-Autoscaling.jpg","datePublished":"2026-02-25T12:37:56+00:00","dateModified":"2026-04-02T00:42:08+00:00","author":{"@type":"person","name":"Jo\u00e3o Victor Alhadas"},"description":"Discover how to effectively implement Kubernetes Autoscaling with custom metrics and strategies for resilient workload handling.","breadcrumb":{"@id":"https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/#primaryimage","url":"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2026\/02\/Kubernetes-Autoscaling.jpg","contentUrl":"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2026\/02\/Kubernetes-Autoscaling.jpg","width":1525,"height":677},{"@type":"BreadcrumbList","@id":"https:\/\/cheesecakelabs.com\/blog\/kubernetes-autoscaling\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/cheesecakelabs.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Production-Grade Kubernetes Autoscaling: Custom Metrics, Prometheus, Celery &amp; Cluster Node Scaling"}]},{"@type":"WebSite","@id":"https:\/\/cheesecakelabs.com\/blog\/#website","url":"https:\/\/cheesecakelabs.com\/blog\/","name":"Cheesecake Labs","description":"Nearshore outsourcing company for Web and Mobile design and engineering services, and staff augmentation for startups and enterprises..","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/cheesecakelabs.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","name":"Jo\u00e3o Victor Alhadas","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/cheesecakelabs.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2024\/12\/joao-ferreira-devops.jpg","contentUrl":"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2024\/12\/joao-ferreira-devops.jpg","caption":"Jo\u00e3o Victor Alhadas"},"url":"https:\/\/cheesecakelabs.com\/blog\/autor\/joao-victor-alhadas\/"}]}},"_links":{"self":[{"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/posts\/13400","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/users\/92"}],"replies":[{"embeddable":true,"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/comments?post=13400"}],"version-history":[{"count":6,"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/posts\/13400\/revisions"}],"predecessor-version":[{"id":13578,"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/posts\/13400\/revisions\/13578"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/media\/13401"}],"wp:attachment":[{"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/media?parent=13400"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/categories?post=13400"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/tags?post=13400"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}