operationsengineering

Release Checklist

Create go/no-go release checklists with pre-deploy verification, staged rollout steps, monitoring checkpoints, rollback triggers, and stakeholder communication plans.

releasedeploymentchecklistgo-no-gorolloutrollback

Works well with agents

DevOps Engineer AgentOpen Source Maintainer AgentProduct Operations AgentRelease Manager Agent

Works well with skills

Runbook WritingTicket Writing
release-checklist/
    • payment-service-v2.md3.8 KB
  • SKILL.md6.4 KB
release-checklist/examples/payment-service-v2.md
payment-service-v2.md
Markdown
1# Release Checklist: payment-service v2.4.0
2 
3**Release date**: 2026-03-25
4**Release manager**: @jchen
5**On-call engineer**: @mpatel
6 
7## Scope Inventory
8 
9| Change | Owner | Feature flag | Touches shared infra |
10|--------|-------|-------------|---------------------|
11| Stripe SDK upgrade (v12 -> v14) | @lnguyen | No | Yes — payment gateway |
12| Idempotency key refactor | @akim | `idempotency_v2` | Yes — Redis cache layer |
13| PCI audit logging enhancement | @jchen | No | No |
14| Retry policy tuning (3 -> 5 retries, exponential backoff) | @mpatel | `retry_v2` | No |
15 
16## Risk Classification: HIGH
17 
18- Stripe SDK major version upgrade — breaking API changes in webhook signature verification
19- Idempotency refactor touches every write path in the payment flow
20- Revenue-critical service: $2.3M daily transaction volume
21 
22**Required**: Dedicated rollback runbook reviewed and approved before go/no-go.
23 
24## Dependency Check
25 
26- [ ] Stripe API v2024-12 compatibility verified in staging
27- [ ] Redis 7.2 cluster healthy in all target environments
28- [ ] Feature flags `idempotency_v2` and `retry_v2` configured (default: off) in production
29- [ ] PCI audit log sink verified in compliance-logging service
30- [ ] Secrets rotated: Stripe API keys in Vault (`payment-service/prod/stripe`)
31 
32## Go/No-Go
33 
34| Criteria | Owner | Status |
35|----------|-------|--------|
36| All CI checks pass on `release/v2.4.0` branch | @lnguyen | ___ |
37| Staging smoke tests pass (50 test transactions) | @akim | ___ |
38| Stripe webhook signature verification passes on staging | @lnguyen | ___ |
39| Rollback runbook reviewed by SRE | @mpatel | ___ |
40| On-call engineer confirmed and available through release window | @jchen | ___ |
41| Stakeholders notified: product, support, finance | @jchen | ___ |
42| PCI compliance officer sign-off on audit log changes | @jchen | ___ |
43 
44## Staged Rollout Plan
45 
46| Stage | Traffic % | Bake time | Metric thresholds |
47|-------|----------|-----------|-------------------|
48| Canary | 1% | 30 min | Error rate < 0.05%, P99 < 450ms, zero failed charges |
49| Partial | 10% | 1 hour | Error rate < 0.03%, no new error signatures |
50| Half | 50% | 2 hours | Error rate < 0.02%, revenue per tx within 1% of baseline |
51| Full | 100% | 4 hours soak | All thresholds hold, zero duplicate charges |
52 
53## Monitoring Checkpoints
54 
55At each rollout stage, verify:
56 
57- [ ] **Error rates**: `https://grafana.internal/d/payment-svc` — canary vs. baseline cohort
58- [ ] **Latency**: P50, P95, P99 on `payment.process` span — baseline: P50=120ms, P99=380ms
59- [ ] **Duplicate charges**: `SELECT count(*) FROM charges WHERE is_duplicate = true AND created_at > now() - interval '1 hour'` — must be 0
60- [ ] **Stripe webhook success rate**: Grafana panel `stripe-webhooks` — must remain > 99.9%
61- [ ] **Redis connection pool**: `payment-redis-pool-utilization` alert — must remain < 80%
62 
63## Rollback Triggers
64 
65Initiate rollback immediately if any condition is met:
66 
67- 5xx error rate > 0.1% for 3+ minutes
68- Any duplicate charge detected
69- Stripe webhook verification failure rate > 0.5%
70- P99 latency > 1000ms (2.6x baseline)
71- Redis connection pool utilization > 95%
72 
73**Rollback procedure**:
74 
751. Set feature flags `idempotency_v2` and `retry_v2` to `off`
762. Roll back deployment: `kubectl rollout undo deployment/payment-service -n payments`
773. Verify metrics return to baseline within 10 minutes
784. Notify #payments-incidents channel with timeline
795. Page finance-oncall if any duplicate charges were processed
80 
81## Post-Release
82 
83- [ ] Metrics stable at 100% for 4 hours, smoke tests pass
84- [ ] Support team briefed on new retry behavior (customers may see longer processing times)
85- [ ] Finance team notified of PCI audit log format changes
86- [ ] Feature flags `idempotency_v2` and `retry_v2` scheduled for cleanup (target: v2.5.0)
87- [ ] Retrospective scheduled for 2026-03-27 (high-risk release)
88 
AgentsSkillsCompaniesJobsForumBlogFAQAbout

©2026 ai-directory.company

·Privacy·Terms·Cookies·