Auto-scaling Spring Boot Microservices in Kubernetes with Prometheus and KEDA
In this article, we will perform auto-scaling Spring Boot Microservices in Kubernetes with Prometheus and KEDA using custom metrics.
This is series of articles that you can check previous article about “Monitor Spring Boot Custom Metrics with deploying Kubernetes using Prometheus”. With this article, we will focus on auto-scaling our Spring Boot microservices using Kubernetes and KEDA that you can find summary of how things work end to end at image above— each of these will be discussed in detail in this article.
I have just published a new course — Design Microservices Architecture with Patterns & Principles.
Background
In this tutorial series, we will learn how horizontally auto-scale spring boot microservice applications with using Prometheus custom metrics and KEDA — Kubernetes Event-driven Auto-scaler. Here you can find the 3 main article that we are going to follow:
- Monitor Spring Boot Custom Metrics with Micrometer and Prometheus using Docker
- Monitor Custom Metrics with deploying Kubernetes using Prometheus
- Auto-scaling Kubernetes apps with Prometheus and KEDA (this article)
Prerequisites
As you can understand from the first image, we have some prerequisite for monitoring Spring Boot application. Those are;
- Spring Boot — Java applications
- Docker — https://docs.docker.com/desktop/install/windows-install/
- Minikube — https://minikube.sigs.k8s.io/docs/start/
- Helm Charts— https://helm.sh/docs/intro/install/
- KEDA — https://keda.sh/docs/2.8/deploy/#helm
- Hey — https://github.com/rakyll/hey
In previous article, we have deployed and containerize Spring Boot application into Kubernetes. So now, we will focus on auto-scaling with Kubernetes KEDA using Helm Charts. Let’s start with introduction about Prometheus and KEDA.
KEDA and Prometheus
Prometheus is an open-source systems monitoring and alerting toolkit which is a part of the Cloud Native Computing Foundation. Prometheus scrapes metrics from various sources and stores them as time-series data and tools like Grafana or other API consumers can be used to visualize the collected data.
KEDA supports the concept of Scalers which act as a bridge between KEDA and an external system. A Scaler implementation is specific to a target system and fetches relevant data from it, which is then used by KEDA to help drive auto-scaling. There is support for multiple scalers(including Kafka, Redis, etc.) including Prometheus. This means that you can leverage KEDA to auto-scale your Kubernetes Deployments using Prometheus metrics as the criteria.
Install KEDA with deploying Helm charts
Deploying KEDA with Helm is very simple:
1. Add Helm repo
helm repo add kedacore https://kedacore.github.io/charts2. Update Helm repo
helm repo update3. Install keda Helm chart
kubectl create namespace keda
helm install keda kedacore/keda --namespace kedaKEDA and its components are installed in the keda namespace. To confirm,
kubectl get pods -n kedaWait for KEDA operator Pod to start (Runningstate) before you proceed
KEDA Prometheus ScaledObject
As explained previously, a Scaler implementation acts as a bridge between KEDA and the external system from which metrics need to be fetched. ScaledObject is a custom resource that needs to be deployed in order to sync a Deployment with an event source (Prometheus in this case). It contains information on which Deploymentto scale, metadata on the event source (e.g. connection string secret, queue name), polling interval, cooldown period, etc. The ScaledObject will result in corresponding autoscaling resource (HPA definition) to scale the Deployment
When a ScaledObject gets deleted, the corresponding HPA definition is cleaned up.
Here is the ScaledObject definition for our example which uses the Prometheus scaler: Keda offers many triggers that can scale our application, but of course we will use the Prometheus trigger:
In a new file called scaled-object.yaml add the following content:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: prometheus-scaledobject
spec:
scaleTargetRef:
name: demoapp
pollingInterval: 15
cooldownPeriod: 30
minReplicaCount: 1
maxReplicaCount: 10
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus-kube-prometheus-prometheus.default.svc.cluster.local:9090
metricName: order_books_total
threshold: "5"
query: sum(rate(order_books_total[1m]))Notice the following:
- It targets a
Deploymentnameddemoapp - The trigger type is
prometheus. The PrometheusserverAddressis mentioned along withmetricName, threshold and the PromQL query(sum(rate(http_requests[1m]))) to be used - As per
pollingInterval, KEDA will poll Prometheus target everyfifteenseconds. A minimum of onePodwill be maintained (minReplicaCount) and the maximum number ofPods will not exceed themaxReplicaCount(tenin this example)
Deploy the Application
First of all, we should deploy our spring boot application and the Prometheus:
kubectl apply -f demoapp.yaml
kubectl apply -f service_monitor.yaml
kubectl get svc
kubectl port-forward service/demoapp 8080:8080
kubectl port-forward service/prometheus-kube-prometheus-prometheus 9090:9090Deploy the KEDA auto-scale config
We need to create the ScaledObject in our K8s cluster.
kubectl apply -f scaled-object.yamlCheck KEDA operator logs:
KEDA_POD_NAME=$(kubectl get pods -n keda -o=jsonpath='{.items[0].metadata.name}')
kubectl logs $KEDA_POD_NAME -n kedaYou should see:
time="2019-10-15T09:38:28Z" level=info msg="Watching ScaledObject: default/prometheus-scaledobject"
time="2019-10-15T09:38:28Z" level=info msg="Created HPA with namespace default and name keda-hpa-go-prom-app"This will provision an HPA in your namespace which you can check with:
kubectl get hpabut because this is a custom CRD you can also query the custom CRD with kubectl:
kubectl get scaledobject.keda.sh/prometheus-scaledobject
NAME SCALETARGETKIND SCALETARGETNAME MIN MAX TRIGGERS AUTHENTICATION READY ACTIVE FALLBACK AGE
prometheus-scaledobject apps/v1.Deployment demoapp 1 20 prometheus True False False 64sWe can see that our prometheus-scaledobject is ready so let’s scale our application! Remember our application scales on the metric http_requests_total and our threshold is only 100 so we should be able reach that threshold. For this we can use a simple tool called hey.
Autoscaling in action
We will use hey, a utility program to generate load.
Run the application (make sure that it is up and running on 8080)
kubectl port-forward service/demoapp 8080:8080In another terminal watch the pods
kubectl get pods -wPut load on the application (Do this continuously, until there are 20 pods)
hey -n 10000 -m GET http://localhost:8080Or if you use windows os, we can use PowerShell for sending 10K request: Open PowerShell cli and run:
$i=1
for(;$i -le 10000;$i++)
{
Write-Host $i
Invoke-WebRequest -Uri http://localhost:8080/books -Method POST
}It can take a minute before the application actually starts scaling. After a while you should have 10 pods up and running! Now let’s also look at the scale down process. Stop putting load on the application and let’s just watch the pods. This is basically how KEDA goes to works.
In another terminal watch the pods
kubectl get pods -wAlso, you will see that the Deployment will be scaled out by the HPA and new Pods will be spun up.
Check the HPA to confirm the same,
According to our current configurations, it seems every 1K request create 1 additional pod from KEDA. But if we change Scaled-object.yaml file with below values:
threshold: "3"
query: sum(rate(order_books_total[2m]))This will create additional pod with 600 request in per second. See below result:
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
keda-hpa-go-prom-app Deployment/go-prom-app 1830m/3 (avg) 1 10 6 4m22sSo the idea is same, If the load does not sustain, the Deployment will be scaled down to the point where only a single Pod is running.
After peek load KEDA is auto scale down the pods: See termintating pods after the load:
To Clean Up
You can follow the commands below:
//Delete KEDA
kubectl delete namespace keda//Delete the app, Prometheus server and KEDA scaled object
kubectl delete -f .Conclusion
KEDA allows you to auto scale your Kubernetes Deployments (to/from zero) based on data from external metrics such as Prometheus metrics, queue length in Redis, consumer lag of a Kafka topic, etc. It does all the heavy lifting of integrating with the external source as well as exposing its metrics via a Metrics server for the Horizontal Pod Auto-scaler.
Source Code
Get the Source Code from Github — Clone or fork this repository, if you like don’t forget the star. If you find or ask anything you can directly open issue on repository.
Step by Step Design Architectures w/ Course
I have just published a new course — Design Microservices Architecture with Patterns & Principles.
In this course, we’re going to learn how to Design Microservices Architecture with using Design Patterns, Principles and the Best Practices. We will start with designing Monolithic to Event-Driven Microservices step by step and together using the right architecture design patterns and techniques.
References
https://www.stackstalk.com/2022/03/monitor-spring-boot-app.html
https://tanzu.vmware.com/developer/guides/spring-prometheus/
https://tanzu.vmware.com/developer/guides/observability-prometheus-grafana-p1/
https://itnext.io/tutorial-auto-scale-your-kubernetes-apps-with-prometheus-and-keda-c6ea460e4642
