Say I have 100 running pods with an HPA set to min=100, max=150. Then I change the HPA to min=50, max=105 (e.g. max is still above current pod count). Should k8s immediately initialize new pods when I change the HPA? I wouldn't think it does, but I seem to have observed this today.
CodePudding user response:
First, as mentioned in the comments, in your specific case some pods will be terminated if usage metrics are below utilization target, no new pods will be created.
Second thing it's absolutely normal that is takes some time to scale down replicas - it's because the
stabilizationWindowSeconds parameter is by default set to 300:
behavior: scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 100 periodSeconds: 15
So, if you have running HPA with configuration (min=100, max=150) for a long time, and you have changed to min=50, max=105, then after 300 seconds (5 minutes) your replicas will be scaled down to the 50 replicas.
Good explanation about how exactly stabilizationWindowSeconds works is in this document:
Story 5: Stabilization before scaling down
This mode is used when the user expects a lot of flapping or does not want to scale down pods too early expecting some late load spikes.
Create an HPA with the following behavior:
behavior: scaleDown: stabilizationWindowSeconds: 600 policies: - type: Pods value: 5i.e., the algorithm will:
- gather recommendations for 600 seconds (default: 300 seconds)
- pick the largest one
- scale down no more than 5 pods per minute
Example for
CurReplicas = 10and HPA controller cycle once per a minute:
- First 9 minutes the algorithm will do nothing except gathering recommendations. Let's imagine that we have the following recommendations
recommendations = [10, 9, 8, 9, 9, 8, 9, 8, 9]
- On the 10th minute, we'll add one more recommendation (let it me
8):recommendations = [10, 9, 8, 9, 9, 8, 9, 8, 9, 8]
Now the algorithm picks the largest one
10. Hence it will not change number of replicas
- On the 11th minute, we'll add one more recommendation (let it be
7) and removes the first one to keep the same amount of recommendations:recommendations = [9, 8, 9, 9, 8, 9, 8, 9, 8, 7]
The algorithm picks the largest value
9and changes the number of replicas10 -> 9
Another thing is that it depends which Kubernetes version, which apiVersion for the autoscaling are you using and which Kuberntes solution are you using. The behaviour could vary - check this topic on GitHub with a bug reports.
If you want to have scale down done immediately (not recommended in the production), you can setup following:
behavior:
scaleDown:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 1
Also check:
