Argo Rollouts: progressive delivery for Kubernetes

Install

kubectl create namespace argo-rollouts
kubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml

# Install the kubectl plugin
curl -LO https://github.com/argoproj/argo-rollouts/releases/latest/download/kubectl-argo-rollouts-linux-amd64
chmod +x kubectl-argo-rollouts-linux-amd64
sudo mv kubectl-argo-rollouts-linux-amd64 /usr/local/bin/kubectl-argo-rollouts

# Dashboard
kubectl argo rollouts dashboard
# Opens http://localhost:3100

The first Rollout (canary strategy)

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: my-app
spec:
  replicas: 10
  strategy:
    canary:
      steps:
        - setWeight: 10           # 10% of traffic to new version
        - pause: { duration: 5m } # wait 5 min, watch metrics
        - setWeight: 25
        - pause: { duration: 5m }
        - setWeight: 50
        - pause: { duration: 10m }
        - setWeight: 100          # fully promoted

  selector:
    matchLabels: { app: my-app }
  template:
    metadata: { labels: { app: my-app } }
    spec:
      containers:
        - name: my-app
          image: ghcr.io/myorg/my-app:v1.0
          ports: [ { containerPort: 8080 } ]

Update image: ghcr.io/myorg/my-app:v1.1 via kubectl set image or kubectl edit; Argo Rollouts walks the steps: spawns canary pods, shifts 10% of traffic, waits, increases. Pauses indefinitely on indefinite pauses (no duration:) until manually promoted.

kubectl argo rollouts: the CLI

# Watch a rollout's progress
kubectl argo rollouts get rollout my-app --watch

# Promote (skip to next step or fully promote)
kubectl argo rollouts promote my-app

# Promote skipping all remaining steps
kubectl argo rollouts promote my-app --full

# Abort and rollback
kubectl argo rollouts abort my-app

# Restart all pods (useful after config changes)
kubectl argo rollouts restart my-app

Automatic analysis: rollback on metric regression

The killer feature. Define an AnalysisTemplate that queries a metric provider; reference it from the rollout:

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate-prometheus
spec:
  args:
    - name: service-name
  metrics:
    - name: success-rate
      interval: 1m
      successCondition: result[0] >= 0.95
      failureLimit: 3
      provider:
        prometheus:
          address: http://prometheus:9090
          query: |
            sum(rate(
              http_requests_total{service="{{args.service-name}}", code!~"5.."}[2m]
            )) /
            sum(rate(
              http_requests_total{service="{{args.service-name}}"}[2m]
            ))

Then reference in the rollout's canary steps:

strategy:
  canary:
    steps:
      - setWeight: 10
      - analysis:
          templates: [ { templateName: success-rate-prometheus } ]
          args:
            - name: service-name
              value: my-app
      - setWeight: 50
      - analysis:
          templates: [ { templateName: success-rate-prometheus } ]
          args: [ { name: service-name, value: my-app } ]
      - setWeight: 100

At each analysis step, Argo Rollouts polls Prometheus every minute. If success rate drops below 95%, after 3 consecutive failures the canary is rolled back automatically.

Supported providers: Prometheus, Datadog, New Relic, Wavefront, CloudWatch, GraphiteQL, Influx, Kayenta, SkyWalking, plus generic Job (run a container that exits 0/non-zero) and web (HTTP probe).

Blue-green strategy

strategy:
  blueGreen:
    activeService: my-app-active
    previewService: my-app-preview
    autoPromotionEnabled: false
    prePromotionAnalysis:
      templates: [ { templateName: smoke-tests } ]
      args: [ { name: service-name, value: my-app-preview } ]
    postPromotionAnalysis:
      templates: [ { templateName: success-rate-prometheus } ]

Spawns the new version's full replica set alongside the old; clients hitting my-app-active still see old; clients hitting my-app-preview see the new. Runs pre-promotion smoke tests; on success, swaps the active service's selector; post-promotion analysis verifies; auto-rollback on failure.

Traffic shaping with a service mesh

For weighted-traffic canary (not pod-count canary), Argo Rollouts integrates with traffic routers:

# Use Istio for traffic split
strategy:
  canary:
    canaryService: my-app-canary
    stableService: my-app-stable
    trafficRouting:
      istio:
        virtualServices:
          - name: my-app-vs
    steps:
      - setWeight: 5         # 5% of traffic to canary (Istio routes it)
      - pause: { duration: 2m }
      - setWeight: 25
      - pause: { duration: 2m }
      - setWeight: 50
      - pause: { duration: 10m }
      - setWeight: 100

Without a service mesh, weight is approximated by pod count (5% = 1 pod of 20). With Istio / Linkerd (see that tutorial) / Cilium / nginx-ingress, traffic is precisely split regardless of pod ratio. Necessary for accurate small-weight tests (1% canary needs precise routing; pod ratios can't do it).

Experiment CRD

For A/B tests independent of the rollout:

apiVersion: argoproj.io/v1alpha1
kind: Experiment
metadata:
  name: feature-flag-test
spec:
  duration: 1h
  templates:
    - name: variant-a
      replicas: 1
      template: { ... pod template with feature-flag-a=true ... }
    - name: variant-b
      replicas: 1
      template: { ... pod template with feature-flag-a=false ... }
  analyses:
    - name: success-rate
      templateName: success-rate-prometheus

Runs both variants in parallel for an hour; the analysis compares them. Useful for feature-flag-style behavior comparison at infra level.

The dashboard

The Argo Rollouts dashboard (port-forward to 3100) shows every rollout's current step, traffic weight, pod counts, recent analysis results, manual promote / abort buttons. Useful as the "live ops view" during a deploy.

Integration with Argo CD

Argo CD (see that tutorial) recognizes Rollout resources; the App view shows progressive-delivery state alongside other resources. Combined: Argo CD watches the Git repo, applies a Rollout manifest change; Argo Rollouts walks the canary steps with metric-based gates. End-to-end GitOps progressive delivery.

When to use Argo Rollouts

You're running real production traffic in K8s and "deploy at midnight and watch the dashboards" isn't good enough.
You have Prometheus / Datadog / etc. metrics that meaningfully indicate canary health.
You want auto-rollback as a safety net for "the metric drops; don't ship."
You have a service mesh or ingress that can do precise weighted routing.

When it's overkill

For batch jobs / cron jobs, progressive delivery doesn't apply.
For very-low-traffic services (where statistics aren't significant in canary windows), standard Deployments are simpler.
For small fleets (1-2 pods), the canary granularity isn't useful without weighted-traffic routing.

Argo Rollouts vs Flagger

Flagger (Flux's progressive-delivery tool) is the main alternative; similar feature set; tighter Flux CD integration. Pick by which GitOps tool (Argo CD or Flux) you're already using.