{"id":30020,"date":"2025-01-15T08:56:14","date_gmt":"2025-01-15T07:56:14","guid":{"rendered":"https:\/\/sii.pl\/blog\/?p=30020"},"modified":"2025-01-16T15:44:14","modified_gmt":"2025-01-16T14:44:14","slug":"advanced-deployment-options-with-kubernetes-and-argo-rollouts","status":"publish","type":"post","link":"https:\/\/sii.pl\/blog\/en\/advanced-deployment-options-with-kubernetes-and-argo-rollouts\/","title":{"rendered":"Advanced deployment options with Kubernetes and Argo Rollouts"},"content":{"rendered":"\n<p>A regular Deployment resource in Kubernetes provides us with 2 deployment strategies that we can specify in <strong><em>.spec.strategy.type<\/em><\/strong> field \u2013 <strong><em>RollingUpdate<\/em> <\/strong>(default option) and <strong><em>Recreate<\/em><\/strong>, and that&#8217;s basically everything that we&#8217;re able to use in Kubernetes by default. This could be enough for some scenarios, especially if we just want things to get done and set up a Minimal Viable Product as fast as possible.<\/p>\n\n\n\n<p>What if we need a much more sophisticated deployment method? There are countless deployment strategies:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Blue-Green,<\/li>\n\n\n\n<li>Canary,<\/li>\n\n\n\n<li>Big Bang,<\/li>\n\n\n\n<li>Feature Toggle, and so on\u2026<\/li>\n<\/ul>\n\n\n\n<p>Obviously, we can also use hybrids of those methods, so there is much more to explore than is provided by default in K8s. But how can we leverage those deployment strategies <strong>without the need for writing complex Bash scripts, without complex configuration of Load Balancer and multiple environments (or K8s clusters) in our cloud, and finally without the need for very complicated routing configuration of our K8s Ingress?<\/strong><\/p>\n\n\n\n<p>There is a way simpler solution for that, which is Kubernetes-friendly and will allow us for various mature deployment mechanisms with even the simplest K8s+cloud setup that you can image \u2013 the name of this tool is Argo Rollouts!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong><strong>First, remind the basics<\/strong><\/strong><\/h2>\n\n\n\n<p>Before explaining what the Argo Rollouts is, and what it can give us with the usage of the Canary strategy. First, let&#8217;s remind how the regular Kubernetes Deployment (with default <strong><em>RollingUpdate<\/em><\/strong> strategy) behaves on an update.<\/p>\n\n\n\n<p>Let&#8217;s take the below Deployment definition as an example:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\napiVersion: apps\/v1\nkind: Deployment\nmetadata:\n  name: nginx-deployment\n  labels:\n    app: nginx\nspec:\n  replicas: 10\n  selector:\n    matchLabels:\n      app: nginx\n  template:\n    metadata:\n      labels:\n        app: nginx\n    spec:\n      containers:\n      - name: nginx\n        image: my-image:v1\n        ports:\n        - containerPort: 80\n  strategy: # Field added for better clarity\n    type: RollingUpdate\n    rollingUpdate:\n      maxSurge: 10%\n      maxUnavailable: 10%\n<\/pre><\/div>\n\n\n<p>This Deployment uses all of the default deployment strategy configuration options, but I explicitly defined values in the <strong><em>.spec.strategy<\/em><\/strong> field for better clarity.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>The process<\/strong><\/h3>\n\n\n\n<p>When we create this Deployment and then update its image to a new version, it will simultaneously start spinning up new pods and remove the pods with older image versions in a sequence of patches; the whole process will look like this:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Initially, we have 10 pods, all with an image in the v1 version.<\/li>\n\n\n\n<li>We update the image tag to v2 so the Rolling Update triggers are based on defined <strong><em>maxSurge<\/em><\/strong> and <strong><em>maxUnavailable<\/em><\/strong>.<\/li>\n\n\n\n<li>A new ReplicaSet is created that will be spinning up pods with a v2 image.<\/li>\n\n\n\n<li>Simultaneously \u2013 the new ReplicaSet creates 2 pods with v2 image, and the old ReplicaSet terminates its 1 old pod. New ReplicaSet can only create 2 pods at that point because of <strong><em>maxSurge <\/em><\/strong>set to 10% (of .<strong><em>spec.replicas<\/em><\/strong> count), so we will have 11 replicas in total (9 old replicas in <strong>Running<\/strong> state and 2 new replicas with <strong>ContainerCreating<\/strong> state). The old ReplicaSet (at this point) can terminate only a single pod because the <strong><em>maxUnavailable<\/em><\/strong> is set to 10% (at least 9 replicas need to be in the <strong>Running<\/strong> state).<\/li>\n\n\n\n<li>Right after replicas from new ReplicaSet turn from <strong>ContainerCreating<\/strong> to<strong> Running <\/strong>state, the old ReplicaSet will terminate 2 of its replicas, and at the same time the new ReplicaSet will create another 2 replicas), so we will have 11 replicas in total (7 old replicas in <strong>Running<\/strong> state, 2 new replicas in <strong>Running<\/strong> state, and 2 new replicas with <strong>ContainerCreating<\/strong> state).<\/li>\n\n\n\n<li>Then, all of the consecutive patches are similar to the 5th step until we reach 100% of pods in the deployment in the desired v2 (new) version and 0% of pods in the previous v1 (old) version.<\/li>\n\n\n\n<li>After performing the last patch, the old ReplicaSet is kept or removed based on your .spec.revisionHistoryLimit field (with the default config, it will stay, though it won&#8217;t have any replicas).<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\"><strong><strong>Gradual updates<\/strong><\/strong><\/h3>\n\n\n\n<p>This is a very handy functionality, especially compared to obsolete deployment methods in the pre-Kubernetes era. With Rolling Updates, we have no downtime. Instead, we have gradual updates where, continuously, a larger percentage of our pods are replaced by the newer version. We can even set maxUnavailable to 0 so we won&#8217;t lose any capacity (during the deployment of a new version, we will always have at least the same number of running pods as before the start of the update process).<\/p>\n\n\n\n<p>That&#8217;s fine, but what if we need a much more sophisticated deployment strategy that would allow us to use much smarter logic in your deployment process?<\/p>\n\n\n\n<p>Now, we can finally get into Argo Rollouts!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Argo Rollouts and Canary Deployment<\/strong><\/h2>\n\n\n\n<p>Argo Rollout is an open-source tool that provides a Kubernetes controller and set of CRDs (e.g., a Rollout) for advanced deployment capabilities. Similarly to Argo CD, it comes from the Argo project. In this blog post, I&#8217;m focusing only on the most standard usage of the Canary Deployment strategy, but you can leverage Blue-Green deployment and many more deployment features of the Canary Deployment, so feel free to refer to the <a href=\"https:\/\/argo-rollouts.readthedocs.io\/en\/stable\/\" target=\"_blank\" rel=\"noopener\" title=\"\" rel=\"nofollow\" >official documentation<\/a> after\/while reading this blog post.<\/p>\n\n\n\n<p>With Canary Deployment, you have way more control over the deployment process of a new version compared to a regular Rolling Update. In the Rolling Update, the deployment process is straightforward and continuous \u2013 we just gradually replace old replicas with new ones at the same pace. <strong>This is not the case in Canary Deployment.<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong><strong>Canary Deployment and miners&#8217; canaries<\/strong><\/strong><\/h3>\n\n\n\n<p>The name &#8220;Canary Deployment&#8221; comes from the practice of coal miners that used, in the past, canary birds as an early warning system for harmful gasses like carbon monoxide (CO) and methane (CH4). Canary birds alerted the miners of danger before they recognized it. Similar to coal miners, software engineers want to make sure that a new area (app version) is safe and can be used on a larger scale.<\/p>\n\n\n\n<p>Instead of just deploying a new version without having any control when the deployment process is already happening, we can leverage Canary Deployment to initially spin up only a couple of pods in a new version, so only some of the users will use the new version, and then if everything is successfully validated (tests passed and users are satisfied with a change) we can fully update to a new version (and possibly implement some additional useful steps during the update process).<\/p>\n\n\n\n<p>You can, e.g., initially replace 10% of your pods in a new version and serve 10% of your production traffic to this new version. Then we can have time waits or a manual gate and, in the meantime, have running automated tests that will be looking for an issue with the new version. Then, if tests are successful, the right amount of time passes, or a manual gate is approved, we scale to, e.g., 30% of pods (and routed traffic) in a new version and 70% in the old version, then wait for some time, then scale to half of the pods in a new version, and then finally scale to 100% of pods in a new version.<\/p>\n\n\n\n<p>This is only one of the infinite number of possible setups that you can configure with the Canary Deployment approach and Argo Rollouts. Argo Rollout provides an enormous number of features to satisfy your deployment process needs. You can, e.g., manipulate the amount of traffic with the usage of 2 K8s services and a K8s Ingress controller for better isolation of versions and independency on a number of pods and route traffic to them (e.g., 10% pods in a new version but only 5% of traffic routed to this set of pods).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong><strong>What to choose?<\/strong><\/strong><\/h3>\n\n\n\n<p>As you can see, <strong>Canary Deployment is a much more capable deployment strategy than a Rolling Update<\/strong>. I&#8217;m not saying it&#8217;s better or worse. Rarely is something in engineering simply better or worse. If one of your apps running in K8s doesn&#8217;t need Canary Deployment because, e.g., it is relatively simple, has very frequent and minor updates, and doesn&#8217;t need comprehensive validation on each update to a new version, then Canary Deployment would be an absolute overengineering, and you should stay with a default Rolling Update.<\/p>\n\n\n\n<p>However, if your app can benefit from the Canary Deployment approach, you should definitely consider implementing it, especially with Argo Rollouts.<\/p>\n\n\n\n<p>If you&#8217;re a DevOps Engineer or anybody who is interested in modern cloud-native and containerization, then there&#8217;s a high chance that you have already heard about Argo CD. If so, then that&#8217;s good because it will help you get an idea of what the Argo Rollouts are. Both of those tools have a very different purpose, but in terms of how they fundamentally operate, they are very similar \u2013 both of them are a bunch of open-source Golang code that we can install (along with a dedicated CLI tool) in order to leverage a new Kubernetes Controller and a bunch of CRD for deployment-related purposes in our K8s cluster.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong><strong>Argo Rollouts core concept \u2013 a Rollout<\/strong><\/strong><\/h2>\n\n\n\n<p>The most important concept in Argo Rollout is a CRD called &#8220;Rollout.&#8221; This &#8220;new&#8221; object is not as new as it may seem because a Rollout is basically a regular K8s Deployment with a bunch of useful deployment capabilities built on top of it.<\/p>\n\n\n\n<p>Let&#8217;s see an example definition of a Rollout object:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\napiVersion: argoproj.io\/v1alpha1\nkind: Rollout\nmetadata:\n  name: nginx-rollout\n  labels:\n    app: nginx\nspec:\n  replicas: 10\n  selector:\n    matchLabels:\n      app: nginx\n  template:\n    metadata:\n      labels:\n        app: nginx\n    spec:\n      containers:\n      - name: nginx\n        image: my-image:v1\n        ports:\n        - containerPort: 80\n  strategy:\n    canary: \n      maxSurge: &#039;10%&#039;\n      maxUnavailable: &#039;10%&#039;\n      steps:\n        - setWeight: 30 \n        - pause:\n            duration: 30m \n        - setWeight: 40 \n        - pause:\n            duration: 1h \n        - pause: {} \n        # We don&#039;t need to explicitly specify below line because it&#039;s the default behavior\n        # - setWeight: 100 \n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\"><strong>Rollout and Deployment<\/strong><\/h3>\n\n\n\n<p>As you can see, the Rollout definition is almost the same as the Deployment, but with only 3 differences:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong><em>.apiVersion:<\/em><\/strong> \u2013 we need to use <strong><em>argoproj.io\/v1alpha1<\/em><\/strong> instead of <strong><em>apps\/v1<\/em><\/strong><em>,<\/em><\/li>\n\n\n\n<li><strong><em>.kind:<\/em><\/strong> \u2013 we need to use <strong><em>Rollout<\/em><\/strong> instead of <strong><em>Deployment<\/em><\/strong><em>,<\/em><\/li>\n\n\n\n<li><strong><em>.spec.strategy:<\/em><\/strong> \u2013 instead of specifying the deployment strategy in <strong><em>.spec.strategy.type<\/em><\/strong> field and then eventually configuring the option of Rolling Update under <strong><em>.spec.strategy.rollingUpdate<\/em><\/strong>, we have the possibility of choosing between canary and blueGreen and specifying the configuration options under those new fields.<\/li>\n<\/ul>\n\n\n\n<p>So basically, everything is the same as in a deployment, with the only major difference being the deployment strategy configuration. That&#8217;s great news for everybody who doesn&#8217;t have time to fight with writing some fancy CRD manifest file from scratch, especially if you want to use this custom resource as a replacement for a Deployment, which is undisputedly one of the most crucial objects in K8s clusters.<\/p>\n\n\n\n<p>With Argo Rollouts, you can just install Argo Rollouts (we will get to this in a moment), change .apiVersion, .kind, and add, e.g., <strong><em>.spec.strategy.canary<\/em><\/strong> field with <strong>{} <\/strong>as the value, and that&#8217;s it! You don&#8217;t even need to specify anything in the .spec.strategy.canary field because if you don&#8217;t specify anything in this field, then your Rollout will behave exactly like a normal Deployment.<\/p>\n\n\n\n<p>But obviously, you for sure want to leverage the features that Rollout provides if you decide to install Argo Rollouts, <strong>so don&#8217;t leave this field empty<\/strong> \ud83d\ude09<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong><em>.spec.strategy <\/em>field<\/strong><\/h3>\n\n\n\n<p>Now let&#8217;s explain what is happening under the <strong><em>.spec.strategy <\/em><\/strong>field in the example that I showed you here:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\n...\n  strategy:\n    canary:\n      maxSurge: &#039;10%&#039;\n      maxUnavailable: &#039;10%&#039;\n      steps:\n        - setWeight: 30\n        - pause:\n            duration: 30m\n        - setWeight: 40\n        - pause:\n            duration: 1h\n        - pause: {}\n<\/pre><\/div>\n\n\n<p>First, we specify that we want to use the Canary Deployment strategy, then we optionally specify <strong><em>maxSurge<\/em><\/strong> and <strong><em>maxUnavailable<\/em><\/strong> that work exactly the same as in a regular Deployment \u2013 (in this case), <strong><em>maxSurge<\/em><\/strong> ensures that there will never be more than 11 replicas (in total) in the <strong><em>Running<\/em><\/strong> or <strong><em>ContainerCreating<\/em><\/strong> state, and <strong><em>maxUnavailable<\/em><\/strong> ensure that there is always at least 9 replicas that are running.<\/p>\n\n\n\n<p>Then, we specify our deployment steps that define the Rollout&#8217;s behavior when updating to a new version.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong><strong>Updating a pod to a new version<\/strong><\/strong><\/h3>\n\n\n\n<p>Here&#8217;s a step-by-step explanation of what will happen when we update a pod to a new version:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Initially, we have 10 replicas in the single ReplicaSet with the image of v1 tag.<\/li>\n\n\n\n<li>We update the pod image version to v2.<\/li>\n\n\n\n<li><strong><u>First step<\/u><\/strong> is performed &#8211; 1 replica is terminated from on old ReplicaSet (with revision:1) and simultaneously a new ReplicaSet (with revision:2) is created with 2 replicas in ContainerCreating state, so we will have 9 running replicas (all from old ReplicaSet) and 11 replicas in <strong><em>Running<\/em><\/strong> or <strong><em>ContainerCreating <\/em><\/strong>state (in total) \u2013 this is exactly what we expect from our config of<strong>maxUnavailable<\/strong> and<strong> m<em>axSurge<\/em><\/strong> fields.<\/li>\n\n\n\n<li>Right after one of the replicas from a new ReplicaSet turns from <strong><em>ContainerCreating <\/em><\/strong>to<strong><em> Running<\/em><\/strong> state, a new replica in a new ReplicaSet is created, and at the same time, another replica from the old ReplicaSet is terminated, so we will have 9 running replicas (1 from the new ReplicaSet and 8 from the old ReplicaSet) and 11 replicas in <strong><em>Running<\/em><\/strong> or <strong><em>ContainerCreating <\/em><\/strong>state (in total).<\/li>\n\n\n\n<li>Right after another replica from the new ReplicaSet turns from <strong><em>ContainerCreating <\/em><\/strong>to<strong><em> Running<\/em><\/strong> state, another replica from the old ReplicaSet is terminated, so we will have 9 running replicas (2 from the new ReplicaSet and 7 from the old ReplicaSet) and 11 replicas in <strong><em>Running<\/em><\/strong> or <strong><em>ContainerCreating <\/em><\/strong>state (in total).<\/li>\n\n\n\n<li>Right after another replica from the new ReplicaSet turns from <strong><em>ContainerCreating <\/em><\/strong>to<strong><em> Running<\/em><\/strong> state, another replica from the old ReplicaSet is terminated, so we will have 10 running replicas (3 from the new ReplicaSet and 7 from the old ReplicaSet) and 10 replicas in <strong><em>Running<\/em><\/strong> state, so the first step ended &#8211; we have 30% of running replicas from a new ReplicaSet and 30% of traffic is routed to those new replicas.<\/li>\n\n\n\n<li><strong><u>The second step<\/u><\/strong> is performed \u2013 the Rollout waits for 30 minutes (no changes in the number of replicas).<\/li>\n\n\n\n<li><strong><u>Third step<\/u><\/strong> is performed \u2013 1 replica is terminated from an old ReplicaSet, and at the same time, 1 replica is created in the new ReplicaSet (and is in<strong><em> ContainerCreating <\/em><\/strong>state).<\/li>\n\n\n\n<li>When a replica from a new ReplicaSet turns from <strong><em>ContainerCreating<\/em><\/strong> to <strong><em>Running<\/em><\/strong> state, then the third step ends because we have 40% of running replicas from a new ReplicaSet, and 40% of traffic is routed to those new replicas.<\/li>\n\n\n\n<li><strong><u>Fourth step<\/u><\/strong> is performed \u2013 the Rollout waits for 1 hour (no changes in the number of replicas).<\/li>\n\n\n\n<li><strong><u>Fifth step<\/u><\/strong> is performed \u2013 the Rollouts with for a promotion (a manual approval). This is an important step in implementing a manual gate to our deployment process. This step is the last one defined in this manifest, so it is the last step before updating our Rollout to 100% of new replicas. Now, we should go to our application interface, probably perform some tests, and make sure that we really want to update to a new version. If everything looks fine, then we can promote a Rollout, e.g., by using the command kubectl argo rollouts promote nginx-rollout (in a moment, I will show you how to install this command).<\/li>\n\n\n\n<li>Finally the last step is performed \u2013 update to 100% of replicas from the new ReplicaSet and scale down all of the replicas from the old ReplicaSet (obviously with respect to <strong>m<em>axSurge<\/em><\/strong> and <strong>maxUnavailable<\/strong> fields). This step will be performed regardless do we specify it or not.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>The setup process<\/strong><\/h3>\n\n\n\n<p>That was a simple example of a Rollout usage. Now, let&#8217;s go through the setup process so you can test this example on your own cluster!<\/p>\n\n\n\n<p>First, run those 2 commands in order to install the Argo Rollouts Controller and CRDs:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nkubectl create namespace argo-rollouts\nkubectl apply -n argo-rollouts -f https:\/\/github.com\/argoproj\/argo-rollouts\/releases\/latest\/download\/install.yaml\n<\/pre><\/div>\n\n\n<p>Now you can already deploy your first Rollout instance but it&#8217;s definitely a good idea to first install a few more things.<\/p>\n\n\n\n<p>First, I highly recommend you <a href=\"https:\/\/argo-rollouts.readthedocs.io\/en\/stable\/installation\/#kubectl-plugin-installation\" target=\"_blank\" rel=\"noopener\" title=\"\" rel=\"nofollow\" >install the Argo Rollouts plugin for kubectl<\/a> which will allow you to promote the Rollouts, visualize the deployment update process, view <a href=\"https:\/\/argo-rollouts.readthedocs.io\/en\/stable\/dashboard\/\" target=\"_blank\" rel=\"noopener\" title=\"\" rel=\"nofollow\" >Argo Rollouts Dashboard<\/a>, and overall work more efficiently with Rollout objects. Moreover, you may want to install <a href=\"https:\/\/argo-rollouts.readthedocs.io\/en\/stable\/installation\/#shell-auto-completion\" target=\"_blank\" rel=\"noopener\" title=\"\" rel=\"nofollow\" >shell auto-completion for Argo Rollouts<\/a>.<\/p>\n\n\n\n<p>After installing all of the needed software you probably should play with Argo Rollouts by yourself. You can use an example that I already showed here, the example that I will show you in a moment, or look for some examples available on the internet (<a href=\"https:\/\/github.com\/argoproj\/argo-rollouts\/tree\/master\/examples\" target=\"_blank\" rel=\"noopener\" title=\"\" rel=\"nofollow\" >official GitHub examples<\/a> can be a good starting point).<\/p>\n\n\n\n<p>I recommend you to keep at least 2 terminal windows opened at the same time \u2013 one where you will execute commands like e.g. <strong><em>kubectl apply<\/em><\/strong>, and a second window where you will keep running <strong><em>kubectl argo rollouts get rollout nginx-rollout &#8211;watch<\/em><\/strong> command so you will be aware of everything that is happening with you Rollout (how it progress in deployment).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong><strong>Automated tests integrated into the deployment process<\/strong><\/strong><\/h2>\n\n\n\n<p>Now, let&#8217;s get into a much more interesting example that will show you absolutely one of the most crucial benefits of Argo Rollouts \u2013 <strong>automated tests that are triggered during the deployment process<\/strong>. Manual gates, time waits, and flexibility of setting the amount of the pods updated on each step are all very useful functionalities but not as game-changing as the side of Argo Rollouts that we will cover right now.<\/p>\n\n\n\n<p><strong>Argo Rollouts allows us to define really comprehensive tests<\/strong> that will be based, e.g. on metrics from your monitoring solution (e.g. Prometheus) that will test a new version (revision) of your Rollout and based on the tests results decide whether it should continue with a deployment process or rollback to the previous revision \u2013 <strong>all fully automated<\/strong>!<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong><strong>Two manifests<\/strong><\/strong><\/h3>\n\n\n\n<p>Let&#8217;s see an example with 2 manifests. One with a Rollout and the second one with a new resource \u2013 AnalysisTemplate. An AnalysisTemplate defines how to perform a canary analysis, such as the metrics which it should perform, its frequency, and the values that are considered successful or failed.<\/p>\n\n\n\n<p>rollout.yml (with <strong><em>.spec<\/em><\/strong> field simplified for better readability):<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\napiVersion: argoproj.io\/v1alpha1\nkind: Rollout\nmetadata:\n  name: guestbook\nspec:\n# ...\n  strategy:\n    canary:\n      analysis:\n        templates:\n        - templateName: success-rate\n        startingStep: 2 # Delay starting analysis run until setWeight: 40%\n        args:\n        - name: service-name\n          value: guestbook-svc.default.svc.cluster.local\n      steps:\n      - setWeight: 20\n      - pause: {duration: 10m}\n      - setWeight: 40\n      - pause: {duration: 10m}\n      - setWeight: 60\n      - pause: {duration: 10m}\n      - setWeight: 80\n      - pause: {duration: 10m}\n<\/pre><\/div>\n\n\n<p>analysis-template.yml:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\napiVersion: argoproj.io\/v1alpha1\nkind: AnalysisTemplate\nmetadata:\n  name: success-rate\nspec:\n  args:\n  - name: service-name\n  metrics:\n  - name: success-rate\n    interval: 5m\n    # NOTE: Prometheus queries return results in the form of a vector.\n    # So it is common to access the index 0 of the returned array to obtain the value\n    successCondition: result&#x5B;0] &gt;= 0.95\n    failureLimit: 3\n    provider:\n      prometheus:\n        address: http:\/\/prometheus.example.com:9090\n        query: |\n          sum(irate(\n            istio_requests_total{reporter=&quot;source&quot;,destination_service=~&quot;{{args.service-name}}&quot;,response_code!~&quot;5.*&quot;}&#x5B;5m]\n          )) \/\n          sum(irate(\n            istio_requests_total{reporter=&quot;source&quot;,destination_service=~&quot;{{args.service-name}}&quot;}&#x5B;5m]\n          ))\n<\/pre><\/div>\n\n\n<p>In the Rollout definition, we can see that we specify AnalysisTemplate that will be used for our Rollout, moreover, we explicitly specify the Service FQDN and that&#8217;s everything which is new, but the definition of the AnalysisTemplate is something completely new. In analysis-template.yml, <strong>we specify the configuration<\/strong> of AnalysisTemplate like:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>the interval,<\/li>\n\n\n\n<li>success condition,<\/li>\n\n\n\n<li>failure limit,<\/li>\n\n\n\n<li>and the provider config.<\/li>\n<\/ul>\n\n\n\n<p>Note that in order to use this example, you need to run Prometheus.<\/p>\n\n\n\n<p>Our AnalysisTemplate will start its testing from the second step and will execute the Prometheus query (PromQL expression) every 5 minutes to check whether the success rate is at least 95% and if the condition isn&#8217;t satisfied 3 times (there will be 3 or more failures) then a new revision (ReplicaSet) will be rolled back (scaled back to 0%) and the previous revision will be scaled out to 100% again, and the whole Rollout will stay in <strong><em>Degraded<\/em><\/strong> state (until we update the Rollout again).<\/p>\n\n\n\n<p><strong>That&#8217;s amazing functionality<\/strong> \u2013 absolutely zero manual effort, and our deployment is performed automatically with continuous tests. If something is wrong, we will just go back to the previous version (and almost definitely the working version)! Imagine how much you can do with those tests.<\/p>\n\n\n\n<p>You can look for:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>any metrics (latency, success rate, etc.),<\/li>\n\n\n\n<li>any logs,<\/li>\n\n\n\n<li>and results from your app.<\/li>\n<\/ul>\n\n\n\n<p><strong>The possibilities are practically endless.<\/strong><\/p>\n\n\n\n<p>Prometheus metrics itself can provide you with very useful information about potential problems with a new version of your app, but you are not limited to Prometheus \u2013 you can use Datadog, NewRelic, AWS CloudWatch, and even set up custom K8s jobs or configure HTTP request that will be looking for a specific measurements!<\/p>\n\n\n\n<p>AnalysisTemplate is a true power of Argo Rollouts, and I&#8217;m close to saying that&#8217;s the most useful feature of this tool, so if you have already decided that you want to implement Argo Rollouts because of some other of its functionality, then I highly recommend you to jump into a rabbit hole of <a href=\"https:\/\/argo-rollouts.readthedocs.io\/en\/stable\/features\/analysis\/\" target=\"_blank\" rel=\"noopener\" title=\"\" rel=\"nofollow\" >Analysis in Argo Rollouts<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Argo Rollouts and Argo CD<\/strong><\/h2>\n\n\n\n<p>There is one thing left that is definitely worth mentioning \u2013 how do Argo Rollouts and Argo CD work together when used on the same K8s workloads?<\/p>\n\n\n\n<p>Both of those tools come from the same Argo project, so as you can expect, they have a great integration. Both of those tools can be used without using the other one (be used as standalone tools), but in most of the modern K8s setups you want to implement the Argo CD, and then depending on your desired deployment strategy, you can implement Argo Rollout too.<\/p>\n\n\n\n<p>Of course, everything seems to be clear in case of successful deployment \u2013 there is an update to the pod image in the remote repository -&gt; Argo CD notices that and triggers synchronization -&gt; Argo Rollout performs an update (that is successful) -&gt; we have a new version running in the cluster.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong><strong>Are we in danger of an endless loop?<\/strong><\/strong><\/h3>\n\n\n\n<p><strong>How about the situation when the AnalysisTemplate fails, and the deployment process is aborted by the Rollout because a new image version has a bug?<\/strong> Will we end up in an endless loop where Argo CD is continuously trying to sync the state from the remote repo, and at the same time, Argo Rollout is failing over and over again? Fortunately, it won&#8217;t happen. Argo CD is aware of the <strong><em>Degraded<\/em><\/strong> state of the Rollouts, and it won&#8217;t take any further action if this state occurs; instead of doing an override, it will simply show <strong><em>Out of Sync<\/em><\/strong> status.<\/p>\n\n\n\n<p>That&#8217;s great, but what we should do in case of a situation like that? There are at least a few ways to handle this.<\/p>\n\n\n\n<p>I believe that if you&#8217;re already using the GitOps approach in your SDLC, then you should stick with the values of this philosophy and use the <strong><em>git revert<\/em><\/strong> command to revert the commit that caused the issue in the latest version. So after doing a push, your git repo will reflect the desired state and Argo CD will notice that chance and automatically update the cluster, so you will end up with a repository and K8s cluster that is again in sync and Rollout that is stable and healthy (though with a previous version).<\/p>\n\n\n\n<p>Eventually, you may want to simply push a new change with an issue fix instead of using git revert, but this action obviously assumes that you know how to fix the issue, you already fixed it, and have a new Docker container ready to be used. Often when your Rollout fails during the deployment process, first you want to roll back to the previous version (using <strong>git revert<\/strong>) in order to reduce the app disruption as fast as you can, and then try to actually fix the issue and try to deploy again.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><a href=\"https:\/\/sii.pl\/oferty-pracy\/\" target=\"_blank\" rel=\"noreferrer noopener\"><img decoding=\"async\" width=\"737\" height=\"170\" src=\"https:\/\/sii.pl\/blog\/wp-content\/uploads\/2025\/01\/praca-k-EN.jpg\" alt=\"job offers\" class=\"wp-image-30089\" srcset=\"https:\/\/sii.pl\/blog\/wp-content\/uploads\/2025\/01\/praca-k-EN.jpg 737w, https:\/\/sii.pl\/blog\/wp-content\/uploads\/2025\/01\/praca-k-EN-300x69.jpg 300w\" sizes=\"(max-width: 737px) 100vw, 737px\" \/><\/a><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Summary<\/strong><\/h2>\n\n\n\n<p>Argo Rollouts can be an absolute game-changer if you need to implement a more advanced deployment strategy (like Canary) instead of relying on the default Rolling Update in Kubernetes Deployment. Furthermore, keep in mind that probably the most crucial advantages of Argo Rollouts are its automated tests and rollbacks on the update, so don&#8217;t forget about the power of those features.<\/p>\n\n\n\n<p>Moreover, remember that implementing this new tool into your cluster shouldn&#8217;t be too complicated. The documentation is clear and contains a couple of good examples, and the DevOps community has already created many great guides on Argo CD. You also shouldn&#8217;t have a problem finding the right GitHub issue or Stack Overflow question if you encounter a problem while working with Argo Rollouts.<\/p>\n\n\n\n<p>Last but not least \u2013 always remember to analyze and really make sure that you actually need such a tool. Don&#8217;t try to implement something that you don&#8217;t actually need \u2013 overengineering is one of the greatest traps for every engineer (not only DevOps engineers), so always try to question your requirements instead of creating or accepting work that doesn&#8217;t bring the actual value (business value or some other sort).<\/p>\n\n\n\n<p>Nevertheless, if you see real benefits in Canary deployment for your scenario then Argo Rollouts is waiting for you!<\/p>\n\n\n\n<p>***<\/p>\n\n\n\n<p>If you are interested in the tools used in IT, be sure to also take <a href=\"https:\/\/sii.pl\/blog\/en\/all\/tools\/\" target=\"_blank\" rel=\"noopener\" title=\"\">a look at other articles by our experts<\/a> \ud83d\ude42<\/p>\n\n\n<div class=\"kk-star-ratings kksr-auto kksr-align-left kksr-valign-bottom\"\n    data-payload='{&quot;align&quot;:&quot;left&quot;,&quot;id&quot;:&quot;30020&quot;,&quot;slug&quot;:&quot;default&quot;,&quot;valign&quot;:&quot;bottom&quot;,&quot;ignore&quot;:&quot;&quot;,&quot;reference&quot;:&quot;auto&quot;,&quot;class&quot;:&quot;&quot;,&quot;count&quot;:&quot;4&quot;,&quot;legendonly&quot;:&quot;&quot;,&quot;readonly&quot;:&quot;&quot;,&quot;score&quot;:&quot;5&quot;,&quot;starsonly&quot;:&quot;&quot;,&quot;best&quot;:&quot;5&quot;,&quot;gap&quot;:&quot;11&quot;,&quot;greet&quot;:&quot;&quot;,&quot;legend&quot;:&quot;5\\\/5 ( votes: 4)&quot;,&quot;size&quot;:&quot;18&quot;,&quot;title&quot;:&quot;Advanced deployment options with Kubernetes and Argo Rollouts&quot;,&quot;width&quot;:&quot;139.5&quot;,&quot;_legend&quot;:&quot;{score}\\\/{best} ( {votes}: {count})&quot;,&quot;font_factor&quot;:&quot;1.25&quot;}'>\n            \n<div class=\"kksr-stars\">\n    \n<div class=\"kksr-stars-inactive\">\n            <div class=\"kksr-star\" data-star=\"1\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" data-star=\"2\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" data-star=\"3\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" data-star=\"4\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" data-star=\"5\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n    <\/div>\n    \n<div class=\"kksr-stars-active\" style=\"width: 139.5px;\">\n            <div class=\"kksr-star\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n    <\/div>\n<\/div>\n                \n\n<div class=\"kksr-legend\" style=\"font-size: 14.4px;\">\n            5\/5 ( votes: 4)    <\/div>\n    <\/div>\n","protected":false},"excerpt":{"rendered":"<p>A regular Deployment resource in Kubernetes provides us with 2 deployment strategies that we can specify in .spec.strategy.type field \u2013 &hellip; <a class=\"continued-btn\" href=\"https:\/\/sii.pl\/blog\/en\/advanced-deployment-options-with-kubernetes-and-argo-rollouts\/\">Continued<\/a><\/p>\n","protected":false},"author":692,"featured_media":30018,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_editorskit_title_hidden":false,"_editorskit_reading_time":0,"_editorskit_is_block_options_detached":false,"_editorskit_block_options_position":"{}","inline_featured_image":false,"footnotes":""},"categories":[1320],"tags":[2773,2771,1590,1526,1372],"class_list":["post-30020","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-hard-development","tag-container","tag-argo-rollouts-en","tag-tools","tag-guidebook","tag-kubernetes-en"],"acf":[],"aioseo_notices":[],"republish_history":[],"featured_media_url":"https:\/\/sii.pl\/blog\/wp-content\/uploads\/2025\/01\/Zaawansowane-opcje-wdrazania-z-Kubernetes-i-Argo-Rollouts.jpg","category_names":["Hard development"],"_links":{"self":[{"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/posts\/30020"}],"collection":[{"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/users\/692"}],"replies":[{"embeddable":true,"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/comments?post=30020"}],"version-history":[{"count":2,"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/posts\/30020\/revisions"}],"predecessor-version":[{"id":30092,"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/posts\/30020\/revisions\/30092"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/media\/30018"}],"wp:attachment":[{"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/media?parent=30020"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/categories?post=30020"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/tags?post=30020"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}