k8s HPA change inits new pods when reducing max to above current pod count?-CodePudding

Say I have 100 running pods with an HPA set to min=100, max=150. Then I change the HPA to min=50, max=105 (e.g. max is still above current pod count). Should k8s immediately initialize new pods when I change the HPA? I wouldn't think it does, but I seem to have observed this today.

CodePudding user response：

First, as mentioned in the comments, in your specific case some pods will be terminated if usage metrics are below utilization target, no new pods will be created.

Second thing it's absolutely normal that is takes some time to scale down replicas - it's because the stabilizationWindowSeconds parameter is by default set to 300:

behavior:
scaleDown:
   stabilizationWindowSeconds: 300
   policies:
   - type: Percent
     value: 100
     periodSeconds: 15

So, if you have running HPA with configuration (min=100, max=150) for a long time, and you have changed to min=50, max=105, then after 300 seconds (5 minutes) your replicas will be scaled down to the 50 replicas.

Good explanation about how exactly stabilizationWindowSeconds works is in this document:

Story 5: Stabilization before scaling down

This mode is used when the user expects a lot of flapping or does not want to scale down pods too early expecting some late load spikes.

Create an HPA with the following behavior:
behavior:
 scaleDown:
   stabilizationWindowSeconds: 600
   policies:
   - type: Pods
     value: 5
i.e., the algorithm will:

gather recommendations for 600 seconds (default: 300 seconds)

pick the largest one

scale down no more than 5 pods per minute

Example for CurReplicas = 10 and HPA controller cycle once per a minute:

First 9 minutes the algorithm will do nothing except gathering recommendations. Let's imagine that we have the following recommendations

recommendations = [10, 9, 8, 9, 9, 8, 9, 8, 9]

On the 10th minute, we'll add one more recommendation (let it me 8):

recommendations = [10, 9, 8, 9, 9, 8, 9, 8, 9, 8]

Now the algorithm picks the largest one 10. Hence it will not change number of replicas

On the 11th minute, we'll add one more recommendation (let it be 7) and removes the first one to keep the same amount of recommendations:

recommendations = [9, 8, 9, 9, 8, 9, 8, 9, 8, 7]

The algorithm picks the largest value 9 and changes the number of replicas 10 -> 9

Another thing is that it depends which Kubernetes version, which apiVersion for the autoscaling are you using and which Kuberntes solution are you using. The behaviour could vary - check this topic on GitHub with a bug reports.

If you want to have scale down done immediately (not recommended in the production), you can setup following:

behavior:
    scaleDown:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 1

Also check: