Auto-scaling Spring Boot Microservices in Kubernetes with Prometheus and KEDA
In this article, we will perform auto-scaling Spring Boot Microservices in Kubernetes with Prometheus and KEDA using custom metrics.
This is series of articles that you can check previous article about “Monitor Spring Boot Custom Metrics with deploying Kubernetes using Prometheus”. With this article, we will focus on auto-scaling our Spring Boot microservices using Kubernetes and KEDA that you can find summary of how things work end to end at image above— each of these will be discussed in detail in this article.
I have just published a new course — Design Microservices Architecture with Patterns & Principles.
Background
In this tutorial series, we will learn how horizontally auto-scale spring boot microservice applications with using Prometheus custom metrics and KEDA — Kubernetes Event-driven Auto-scaler. Here you can find the 3 main article that we are going to follow:
- Monitor Spring Boot Custom Metrics with Micrometer and Prometheus using Docker
- Monitor Custom Metrics with deploying Kubernetes using Prometheus
- Auto-scaling Kubernetes apps with Prometheus and KEDA (this article)
Prerequisites
As you can understand from the first image, we have some prerequisite for monitoring Spring Boot application. Those are;
- Spring Boot — Java applications
- Docker — https://docs.docker.com/desktop/install/windows-install/
- Minikube — https://minikube.sigs.k8s.io/docs/start/
- Helm Charts— https://helm.sh/docs/intro/install/
- KEDA — https://keda.sh/docs/2.8/deploy/#helm
- Hey — https://github.com/rakyll/hey
In previous article, we have deployed and containerize Spring Boot application into Kubernetes. So now, we will focus on auto-scaling with Kubernetes KEDA using Helm Charts. Let’s start with introduction about Prometheus and KEDA.
KEDA and Prometheus
Prometheus is an open-source systems monitoring and alerting toolkit which is a part of the Cloud Native Computing Foundation. Prometheus scrapes metrics from various sources and stores them as time-series data and tools like Grafana or other API consumers can be used to visualize the collected data.
KEDA supports the concept of Scaler
s which act as a bridge between KEDA and an external system. A Scaler
implementation is specific to a target system and fetches relevant data from it, which is then used by KEDA to help drive auto-scaling. There is support for multiple scalers(including Kafka, Redis, etc.) including Prometheus. This means that you can leverage KEDA to auto-scale your Kubernetes Deployment
s using Prometheus metrics as the criteria.
Install KEDA with deploying Helm charts
Deploying KEDA with Helm is very simple:
1. Add Helm repo
helm repo add kedacore https://kedacore.github.io/charts
2. Update Helm repo
helm repo update
3. Install keda
Helm chart
kubectl create namespace keda
helm install keda kedacore/keda --namespace keda
KEDA and its components are installed in the keda
namespace. To confirm,
kubectl get pods -n keda
Wait for KEDA operator Pod
to start (Running
state) before you proceed
KEDA Prometheus ScaledObject
As explained previously, a Scaler
implementation acts as a bridge between KEDA and the external system from which metrics need to be fetched. ScaledObject
is a custom resource that needs to be deployed in order to sync a Deployment
with an event source (Prometheus in this case). It contains information on which Deployment
to scale, metadata on the event source (e.g. connection string secret, queue name), polling interval, cooldown period, etc. The ScaledObject
will result in corresponding autoscaling resource (HPA definition) to scale the Deployment
When a ScaledObject
gets deleted, the corresponding HPA definition is cleaned up.
Here is the ScaledObject definition for our example which uses the Prometheus
scaler: Keda offers many triggers that can scale our application, but of course we will use the Prometheus trigger:
In a new file called scaled-object.yaml add the following content:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: prometheus-scaledobject
spec:
scaleTargetRef:
name: demoapp
pollingInterval: 15
cooldownPeriod: 30
minReplicaCount: 1
maxReplicaCount: 10
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus-kube-prometheus-prometheus.default.svc.cluster.local:9090
metricName: order_books_total
threshold: "5"
query: sum(rate(order_books_total[1m]))
Notice the following:
- It targets a
Deployment
nameddemoapp
- The trigger type is
prometheus
. The PrometheusserverAddress
is mentioned along withmetricName
, threshold and the PromQL query(sum(rate(http_requests[1m]))
) to be used - As per
pollingInterval
, KEDA will poll Prometheus target everyfifteen
seconds. A minimum of onePod
will be maintained (minReplicaCount
) and the maximum number ofPod
s will not exceed themaxReplicaCount
(ten
in this example)
Deploy the Application
First of all, we should deploy our spring boot application and the Prometheus:
kubectl apply -f demoapp.yaml
kubectl apply -f service_monitor.yaml
kubectl get svc
kubectl port-forward service/demoapp 8080:8080
kubectl port-forward service/prometheus-kube-prometheus-prometheus 9090:9090
Deploy the KEDA auto-scale config
We need to create the ScaledObject
in our K8s cluster.
kubectl apply -f scaled-object.yaml
Check KEDA operator logs:
KEDA_POD_NAME=$(kubectl get pods -n keda -o=jsonpath='{.items[0].metadata.name}')
kubectl logs $KEDA_POD_NAME -n keda
You should see:
time="2019-10-15T09:38:28Z" level=info msg="Watching ScaledObject: default/prometheus-scaledobject"
time="2019-10-15T09:38:28Z" level=info msg="Created HPA with namespace default and name keda-hpa-go-prom-app"
This will provision an HPA in your namespace which you can check with:
kubectl get hpa
but because this is a custom CRD you can also query the custom CRD with kubectl:
kubectl get scaledobject.keda.sh/prometheus-scaledobject
NAME SCALETARGETKIND SCALETARGETNAME MIN MAX TRIGGERS AUTHENTICATION READY ACTIVE FALLBACK AGE
prometheus-scaledobject apps/v1.Deployment demoapp 1 20 prometheus True False False 64s
We can see that our prometheus-scaledobject
is ready so let’s scale our application! Remember our application scales on the metric http_requests_total
and our threshold is only 100 so we should be able reach that threshold. For this we can use a simple tool called hey.
Autoscaling in action
We will use hey, a utility program to generate load.
Run the application (make sure that it is up and running on 8080)
kubectl port-forward service/demoapp 8080:8080
In another terminal watch the pods
kubectl get pods -w
Put load on the application (Do this continuously, until there are 20 pods)
hey -n 10000 -m GET http://localhost:8080
Or if you use windows os, we can use PowerShell for sending 10K request: Open PowerShell cli and run:
$i=1
for(;$i -le 10000;$i++)
{
Write-Host $i
Invoke-WebRequest -Uri http://localhost:8080/books -Method POST
}
It can take a minute before the application actually starts scaling. After a while you should have 10 pods up and running! Now let’s also look at the scale down process. Stop putting load on the application and let’s just watch the pods. This is basically how KEDA goes to works.
In another terminal watch the pods
kubectl get pods -w
Also, you will see that the Deployment
will be scaled out by the HPA and new Pod
s will be spun up.
Check the HPA to confirm the same,
According to our current configurations, it seems every 1K request create 1 additional pod from KEDA. But if we change Scaled-object.yaml file with below values:
threshold: "3"
query: sum(rate(order_books_total[2m]))
This will create additional pod with 600 request in per second. See below result:
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
keda-hpa-go-prom-app Deployment/go-prom-app 1830m/3 (avg) 1 10 6 4m22s
So the idea is same, If the load does not sustain, the Deployment
will be scaled down to the point where only a single Pod
is running.
After peek load KEDA is auto scale down the pods: See termintating pods after the load:
To Clean Up
You can follow the commands below:
//Delete KEDA
kubectl delete namespace keda//Delete the app, Prometheus server and KEDA scaled object
kubectl delete -f .
Conclusion
KEDA allows you to auto scale your Kubernetes Deployments (to/from zero) based on data from external metrics such as Prometheus metrics, queue length in Redis, consumer lag of a Kafka topic, etc. It does all the heavy lifting of integrating with the external source as well as exposing its metrics via a Metrics server for the Horizontal Pod Auto-scaler.
Source Code
Get the Source Code from Github — Clone or fork this repository, if you like don’t forget the star. If you find or ask anything you can directly open issue on repository.
Step by Step Design Architectures w/ Course
I have just published a new course — Design Microservices Architecture with Patterns & Principles.
In this course, we’re going to learn how to Design Microservices Architecture with using Design Patterns, Principles and the Best Practices. We will start with designing Monolithic to Event-Driven Microservices step by step and together using the right architecture design patterns and techniques.
References
https://www.stackstalk.com/2022/03/monitor-spring-boot-app.html
https://tanzu.vmware.com/developer/guides/spring-prometheus/
https://tanzu.vmware.com/developer/guides/observability-prometheus-grafana-p1/
https://itnext.io/tutorial-auto-scale-your-kubernetes-apps-with-prometheus-and-keda-c6ea460e4642