Install
kubectl create namespace argo-rollouts
kubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml
# Install the kubectl plugin
curl -LO https://github.com/argoproj/argo-rollouts/releases/latest/download/kubectl-argo-rollouts-linux-amd64
chmod +x kubectl-argo-rollouts-linux-amd64
sudo mv kubectl-argo-rollouts-linux-amd64 /usr/local/bin/kubectl-argo-rollouts
# Dashboard
kubectl argo rollouts dashboard
# Opens http://localhost:3100
The first Rollout (canary strategy)
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: my-app
spec:
replicas: 10
strategy:
canary:
steps:
- setWeight: 10 # 10% of traffic to new version
- pause: { duration: 5m } # wait 5 min, watch metrics
- setWeight: 25
- pause: { duration: 5m }
- setWeight: 50
- pause: { duration: 10m }
- setWeight: 100 # fully promoted
selector:
matchLabels: { app: my-app }
template:
metadata: { labels: { app: my-app } }
spec:
containers:
- name: my-app
image: ghcr.io/myorg/my-app:v1.0
ports: [ { containerPort: 8080 } ]
Update image: ghcr.io/myorg/my-app:v1.1 via kubectl set image or kubectl edit; Argo Rollouts walks the steps: spawns canary pods, shifts 10% of traffic, waits, increases. Pauses indefinitely on indefinite pauses (no duration:) until manually promoted.
kubectl argo rollouts: the CLI
# Watch a rollout's progress
kubectl argo rollouts get rollout my-app --watch
# Promote (skip to next step or fully promote)
kubectl argo rollouts promote my-app
# Promote skipping all remaining steps
kubectl argo rollouts promote my-app --full
# Abort and rollback
kubectl argo rollouts abort my-app
# Restart all pods (useful after config changes)
kubectl argo rollouts restart my-app
Automatic analysis: rollback on metric regression
The killer feature. Define an AnalysisTemplate that queries a metric provider; reference it from the rollout:
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: success-rate-prometheus
spec:
args:
- name: service-name
metrics:
- name: success-rate
interval: 1m
successCondition: result[0] >= 0.95
failureLimit: 3
provider:
prometheus:
address: http://prometheus:9090
query: |
sum(rate(
http_requests_total{service="{{args.service-name}}", code!~"5.."}[2m]
)) /
sum(rate(
http_requests_total{service="{{args.service-name}}"}[2m]
))
Then reference in the rollout's canary steps:
strategy:
canary:
steps:
- setWeight: 10
- analysis:
templates: [ { templateName: success-rate-prometheus } ]
args:
- name: service-name
value: my-app
- setWeight: 50
- analysis:
templates: [ { templateName: success-rate-prometheus } ]
args: [ { name: service-name, value: my-app } ]
- setWeight: 100
At each analysis step, Argo Rollouts polls Prometheus every minute. If success rate drops below 95%, after 3 consecutive failures the canary is rolled back automatically.
Supported providers: Prometheus, Datadog, New Relic, Wavefront, CloudWatch, GraphiteQL, Influx, Kayenta, SkyWalking, plus generic Job (run a container that exits 0/non-zero) and web (HTTP probe).
Blue-green strategy
strategy:
blueGreen:
activeService: my-app-active
previewService: my-app-preview
autoPromotionEnabled: false
prePromotionAnalysis:
templates: [ { templateName: smoke-tests } ]
args: [ { name: service-name, value: my-app-preview } ]
postPromotionAnalysis:
templates: [ { templateName: success-rate-prometheus } ]
Spawns the new version's full replica set alongside the old; clients hitting my-app-active still see old; clients hitting my-app-preview see the new. Runs pre-promotion smoke tests; on success, swaps the active service's selector; post-promotion analysis verifies; auto-rollback on failure.
Traffic shaping with a service mesh
For weighted-traffic canary (not pod-count canary), Argo Rollouts integrates with traffic routers:
# Use Istio for traffic split
strategy:
canary:
canaryService: my-app-canary
stableService: my-app-stable
trafficRouting:
istio:
virtualServices:
- name: my-app-vs
steps:
- setWeight: 5 # 5% of traffic to canary (Istio routes it)
- pause: { duration: 2m }
- setWeight: 25
- pause: { duration: 2m }
- setWeight: 50
- pause: { duration: 10m }
- setWeight: 100
Without a service mesh, weight is approximated by pod count (5% = 1 pod of 20). With Istio / Linkerd (see that tutorial) / Cilium / nginx-ingress, traffic is precisely split regardless of pod ratio. Necessary for accurate small-weight tests (1% canary needs precise routing; pod ratios can't do it).
Experiment CRD
For A/B tests independent of the rollout:
apiVersion: argoproj.io/v1alpha1
kind: Experiment
metadata:
name: feature-flag-test
spec:
duration: 1h
templates:
- name: variant-a
replicas: 1
template: { ... pod template with feature-flag-a=true ... }
- name: variant-b
replicas: 1
template: { ... pod template with feature-flag-a=false ... }
analyses:
- name: success-rate
templateName: success-rate-prometheus
Runs both variants in parallel for an hour; the analysis compares them. Useful for feature-flag-style behavior comparison at infra level.
The dashboard
The Argo Rollouts dashboard (port-forward to 3100) shows every rollout's current step, traffic weight, pod counts, recent analysis results, manual promote / abort buttons. Useful as the "live ops view" during a deploy.
Integration with Argo CD
Argo CD (see that tutorial) recognizes Rollout resources; the App view shows progressive-delivery state alongside other resources. Combined: Argo CD watches the Git repo, applies a Rollout manifest change; Argo Rollouts walks the canary steps with metric-based gates. End-to-end GitOps progressive delivery.
When to use Argo Rollouts
- You're running real production traffic in K8s and "deploy at midnight and watch the dashboards" isn't good enough.
- You have Prometheus / Datadog / etc. metrics that meaningfully indicate canary health.
- You want auto-rollback as a safety net for "the metric drops; don't ship."
- You have a service mesh or ingress that can do precise weighted routing.
When it's overkill
- For batch jobs / cron jobs, progressive delivery doesn't apply.
- For very-low-traffic services (where statistics aren't significant in canary windows), standard Deployments are simpler.
- For small fleets (1-2 pods), the canary granularity isn't useful without weighted-traffic routing.
Argo Rollouts vs Flagger
Flagger (Flux's progressive-delivery tool) is the main alternative; similar feature set; tighter Flux CD integration. Pick by which GitOps tool (Argo CD or Flux) you're already using.