Auto-scaling Spring Boot Microservices in Kubernetes with Prometheus and KEDA

Mehmet Ozkaya
7 min readFeb 14, 2023

In this article, we will perform auto-scaling Spring Boot Microservices in Kubernetes with Prometheus and KEDA using custom metrics.

See References

This is series of articles that you can check previous article about “Monitor Spring Boot Custom Metrics with deploying Kubernetes using Prometheus”. With this article, we will focus on auto-scaling our Spring Boot microservices using Kubernetes and KEDA that you can find summary of how things work end to end at image above— each of these will be discussed in detail in this article.

I have just published a new course — Design Microservices Architecture with Patterns & Principles.

Background

In this tutorial series, we will learn how horizontally auto-scale spring boot microservice applications with using Prometheus custom metrics and KEDA Kubernetes Event-driven Auto-scaler. Here you can find the 3 main article that we are going to follow:

  1. Monitor Spring Boot Custom Metrics with Micrometer and Prometheus using Docker
  2. Monitor Custom Metrics with deploying Kubernetes using Prometheus
  3. Auto-scaling Kubernetes apps with Prometheus and KEDA (this article)

Prerequisites

As you can understand from the first image, we have some prerequisite for monitoring Spring Boot application. Those are;

In previous article, we have deployed and containerize Spring Boot application into Kubernetes. So now, we will focus on auto-scaling with Kubernetes KEDA using Helm Charts. Let’s start with introduction about Prometheus and KEDA.

KEDA and Prometheus

Prometheus is an open-source systems monitoring and alerting toolkit which is a part of the Cloud Native Computing Foundation. Prometheus scrapes metrics from various sources and stores them as time-series data and tools like Grafana or other API consumers can be used to visualize the collected data.

KEDA supports the concept of Scalers which act as a bridge between KEDA and an external system. A Scaler implementation is specific to a target system and fetches relevant data from it, which is then used by KEDA to help drive auto-scaling. There is support for multiple scalers(including Kafka, Redis, etc.) including Prometheus. This means that you can leverage KEDA to auto-scale your Kubernetes Deployments using Prometheus metrics as the criteria.

Install KEDA with deploying Helm charts

Deploying KEDA with Helm is very simple:

1. Add Helm repo

helm repo add kedacore https://kedacore.github.io/charts

2. Update Helm repo

helm repo update

3. Install keda Helm chart

kubectl create namespace keda
helm install keda kedacore/keda --namespace keda

KEDA and its components are installed in the keda namespace. To confirm,

kubectl get pods -n keda

Wait for KEDA operator Pod to start (Runningstate) before you proceed

KEDA Prometheus ScaledObject

As explained previously, a Scaler implementation acts as a bridge between KEDA and the external system from which metrics need to be fetched. ScaledObject is a custom resource that needs to be deployed in order to sync a Deployment with an event source (Prometheus in this case). It contains information on which Deploymentto scale, metadata on the event source (e.g. connection string secret, queue name), polling interval, cooldown period, etc. The ScaledObject will result in corresponding autoscaling resource (HPA definition) to scale the Deployment

When a ScaledObject gets deleted, the corresponding HPA definition is cleaned up.

Here is the ScaledObject definition for our example which uses the Prometheus scaler: Keda offers many triggers that can scale our application, but of course we will use the Prometheus trigger:

In a new file called scaled-object.yaml add the following content:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: prometheus-scaledobject
spec:
scaleTargetRef:
name: demoapp
pollingInterval: 15
cooldownPeriod: 30
minReplicaCount: 1
maxReplicaCount: 10
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus-kube-prometheus-prometheus.default.svc.cluster.local:9090
metricName: order_books_total
threshold: "5"
query: sum(rate(order_books_total[1m]))

Notice the following:

  • It targets a Deployment named demoapp
  • The trigger type is prometheus. The Prometheus serverAddress is mentioned along with metricName, threshold and the PromQL query(sum(rate(http_requests[1m]))) to be used
  • As per pollingInterval, KEDA will poll Prometheus target every fifteen seconds. A minimum of one Pod will be maintained (minReplicaCount) and the maximum number of Pods will not exceed the maxReplicaCount(ten in this example)

Deploy the Application

First of all, we should deploy our spring boot application and the Prometheus:

kubectl apply -f demoapp.yaml
kubectl apply -f service_monitor.yaml
kubectl get svc
kubectl port-forward service/demoapp 8080:8080
kubectl port-forward service/prometheus-kube-prometheus-prometheus 9090:9090

Deploy the KEDA auto-scale config

We need to create the ScaledObject in our K8s cluster.

kubectl apply -f scaled-object.yaml

Check KEDA operator logs:

KEDA_POD_NAME=$(kubectl get pods -n keda -o=jsonpath='{.items[0].metadata.name}')
kubectl logs $KEDA_POD_NAME -n keda

You should see:

time="2019-10-15T09:38:28Z" level=info msg="Watching ScaledObject: default/prometheus-scaledobject"
time="2019-10-15T09:38:28Z" level=info msg="Created HPA with namespace default and name keda-hpa-go-prom-app"

This will provision an HPA in your namespace which you can check with:

kubectl get hpa

but because this is a custom CRD you can also query the custom CRD with kubectl:

kubectl get scaledobject.keda.sh/prometheus-scaledobject

NAME SCALETARGETKIND SCALETARGETNAME MIN MAX TRIGGERS AUTHENTICATION READY ACTIVE FALLBACK AGE
prometheus-scaledobject apps/v1.Deployment demoapp 1 20 prometheus True False False 64s

We can see that our prometheus-scaledobject is ready so let’s scale our application! Remember our application scales on the metric http_requests_total and our threshold is only 100 so we should be able reach that threshold. For this we can use a simple tool called hey.

Autoscaling in action

We will use hey, a utility program to generate load.

Run the application (make sure that it is up and running on 8080)

kubectl port-forward service/demoapp 8080:8080

In another terminal watch the pods

kubectl get pods -w

Put load on the application (Do this continuously, until there are 20 pods)

hey -n 10000 -m GET http://localhost:8080

Or if you use windows os, we can use PowerShell for sending 10K request: Open PowerShell cli and run:

$i=1
for(;$i -le 10000;$i++)
{
Write-Host $i
Invoke-WebRequest -Uri http://localhost:8080/books -Method POST
}

It can take a minute before the application actually starts scaling. After a while you should have 10 pods up and running! Now let’s also look at the scale down process. Stop putting load on the application and let’s just watch the pods. This is basically how KEDA goes to works.

In another terminal watch the pods

kubectl get pods -w

Also, you will see that the Deployment will be scaled out by the HPA and new Pods will be spun up.

Check the HPA to confirm the same,

According to our current configurations, it seems every 1K request create 1 additional pod from KEDA. But if we change Scaled-object.yaml file with below values:

threshold: "3"
query: sum(rate(order_books_total[2m]))

This will create additional pod with 600 request in per second. See below result:

kubectl get hpa

NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
keda-hpa-go-prom-app Deployment/go-prom-app 1830m/3 (avg) 1 10 6 4m22s

So the idea is same, If the load does not sustain, the Deployment will be scaled down to the point where only a single Pod is running.

After peek load KEDA is auto scale down the pods: See termintating pods after the load:

To Clean Up

You can follow the commands below:

//Delete KEDA
kubectl delete namespace keda//Delete the app, Prometheus server and KEDA scaled object
kubectl delete -f .

Conclusion

KEDA allows you to auto scale your Kubernetes Deployments (to/from zero) based on data from external metrics such as Prometheus metrics, queue length in Redis, consumer lag of a Kafka topic, etc. It does all the heavy lifting of integrating with the external source as well as exposing its metrics via a Metrics server for the Horizontal Pod Auto-scaler.

Source Code

Get the Source Code from Github — Clone or fork this repository, if you like don’t forget the star. If you find or ask anything you can directly open issue on repository.

Step by Step Design Architectures w/ Course

I have just published a new course — Design Microservices Architecture with Patterns & Principles.

In this course, we’re going to learn how to Design Microservices Architecture with using Design Patterns, Principles and the Best Practices. We will start with designing Monolithic to Event-Driven Microservices step by step and together using the right architecture design patterns and techniques.

References

https://www.stackstalk.com/2022/03/monitor-spring-boot-app.html

https://tanzu.vmware.com/developer/guides/spring-prometheus/

https://tanzu.vmware.com/developer/guides/observability-prometheus-grafana-p1/

https://itnext.io/tutorial-auto-scale-your-kubernetes-apps-with-prometheus-and-keda-c6ea460e4642

https://djamaile.dev/blog/using-keda-and-prometheus/

--

--

Mehmet Ozkaya

Software Architect | Udemy Instructor | AWS Community Builder | Cloud-Native and Serverless Event-driven Microservices https://github.com/mehmetozkaya