Which mechanism does Canary Deployment strategy use?-CodePudding

I am new to K8s and came across the multiple deployment strategies. I found the theory on Canary Deployment Strategy a little unclear.

I understand that in Canary strategy the whole aim is to first test the new version on a subset of the current pod instances and then, if successful, upgrade the remaining instances.

My doubt is regarding the upgrading of the remaining instances when the tests are successful:

Does the upgrade follow a Recreate type mechanism where all the pods are killed and new ones are created thus resulting in downtime? (OR)
Does it follow a rolling update mechanism where the remaining pods are updated one by one thus avoiding any downtime? (OR)
Is it like blue/green, where we already have a set of pods with newer version and we just swap thus requiring additional resources? (OR)
It can follow any one of these mechanisms as per specification.

CodePudding user response：

In canary deployment strategy, the new version of the application is gradually deployed to the Kubernetes cluster while getting a very small amount of live traffic. A subset of live users are connecting to the new version while the rest are still using the previous version.

The small subset of live traffic to the new version acts as an early warning for potential problems that might be present in the new code. As our confidence grows, gradually increase the canary traffic and more users are now connecting to the updated version. In the end, all live traffic goes to canaries, and thus the canary version becomes the new “production version”

The big advantage of using canary strategy is that deployment issues can be detected very early while they still affect only a small subset of all application users. If something goes wrong with a canary, the production version is still present and all traffic can simply be reverted to it.

Canary releases are based on the following assumptions:

Multiple versions of your application can exist together at the same time, getting live traffic.
If you don’t use some kind of sticky session mechanism, some customers might hit a production server in one request and a canary server in another.

Consider Istio if you want to do canary deployment. following link would be helpful https://istio.io/latest/blog/2017/0.1-canary/

CodePudding user response：

In addition to P Ekambaram's comments:

Does the upgrade follow a Recreate type mechanism where all the pods are killed and new ones are created thus resulting in downtime?

Not normally. It does depend on how you choose to setup your canary release. If you have 10 pods and you want to test on 20% of your traffic, you would add two new pods of the new version, and terminate two of the old pods, making sure the new pods have the same labels as the old pods, to ensure it gets traffic. Normally this would be done as a second deployment or something like Spinnaker which manages the ramping up/down.

Does it follow a rolling update mechanism where the remaining pods are updated one by one thus avoiding any downtime?

Again, it depends. But normally, yes, you would ramp up the number of new version pods in the deployment and ramp down the number of old version pods, while keeping an eye on a certain metric (for example, number of 5xx errors, if the deployment is e.g. a web service). Depending on the business and the service, it may be "check 20% of traffic with new version, and if all good, do the remaining 80% in one go"

Is it like blue/green, where we already have a set of pods with newer version and we just swap thus requiring additional resources?

Blue/Green is normally having an entire set of pods of the new version to hand and flipping the DNS to the new version as a "big bang" switch.

You can Canary deploy using a weighted DNS, but that is not reliable due to different DNS caching layers which you can't control.

It can follow any one of these mechanisms as per specification. Yes, it's entirely up to you (and/or the product/service owner) to decide how they want the application deployed.