operationsengineering
Release Checklist
Create go/no-go release checklists with pre-deploy verification, staged rollout steps, monitoring checkpoints, rollback triggers, and stakeholder communication plans.
releasedeploymentchecklistgo-no-gorolloutrollback
Works well with agents
Works well with skills
release-checklist/
payment-service-v2.md
Markdown| 1 | # Release Checklist: payment-service v2.4.0 |
| 2 | |
| 3 | **Release date**: 2026-03-25 |
| 4 | **Release manager**: @jchen |
| 5 | **On-call engineer**: @mpatel |
| 6 | |
| 7 | ## Scope Inventory |
| 8 | |
| 9 | | Change | Owner | Feature flag | Touches shared infra | |
| 10 | |--------|-------|-------------|---------------------| |
| 11 | | Stripe SDK upgrade (v12 -> v14) | @lnguyen | No | Yes — payment gateway | |
| 12 | | Idempotency key refactor | @akim | `idempotency_v2` | Yes — Redis cache layer | |
| 13 | | PCI audit logging enhancement | @jchen | No | No | |
| 14 | | Retry policy tuning (3 -> 5 retries, exponential backoff) | @mpatel | `retry_v2` | No | |
| 15 | |
| 16 | ## Risk Classification: HIGH |
| 17 | |
| 18 | - Stripe SDK major version upgrade — breaking API changes in webhook signature verification |
| 19 | - Idempotency refactor touches every write path in the payment flow |
| 20 | - Revenue-critical service: $2.3M daily transaction volume |
| 21 | |
| 22 | **Required**: Dedicated rollback runbook reviewed and approved before go/no-go. |
| 23 | |
| 24 | ## Dependency Check |
| 25 | |
| 26 | - [ ] Stripe API v2024-12 compatibility verified in staging |
| 27 | - [ ] Redis 7.2 cluster healthy in all target environments |
| 28 | - [ ] Feature flags `idempotency_v2` and `retry_v2` configured (default: off) in production |
| 29 | - [ ] PCI audit log sink verified in compliance-logging service |
| 30 | - [ ] Secrets rotated: Stripe API keys in Vault (`payment-service/prod/stripe`) |
| 31 | |
| 32 | ## Go/No-Go |
| 33 | |
| 34 | | Criteria | Owner | Status | |
| 35 | |----------|-------|--------| |
| 36 | | All CI checks pass on `release/v2.4.0` branch | @lnguyen | ___ | |
| 37 | | Staging smoke tests pass (50 test transactions) | @akim | ___ | |
| 38 | | Stripe webhook signature verification passes on staging | @lnguyen | ___ | |
| 39 | | Rollback runbook reviewed by SRE | @mpatel | ___ | |
| 40 | | On-call engineer confirmed and available through release window | @jchen | ___ | |
| 41 | | Stakeholders notified: product, support, finance | @jchen | ___ | |
| 42 | | PCI compliance officer sign-off on audit log changes | @jchen | ___ | |
| 43 | |
| 44 | ## Staged Rollout Plan |
| 45 | |
| 46 | | Stage | Traffic % | Bake time | Metric thresholds | |
| 47 | |-------|----------|-----------|-------------------| |
| 48 | | Canary | 1% | 30 min | Error rate < 0.05%, P99 < 450ms, zero failed charges | |
| 49 | | Partial | 10% | 1 hour | Error rate < 0.03%, no new error signatures | |
| 50 | | Half | 50% | 2 hours | Error rate < 0.02%, revenue per tx within 1% of baseline | |
| 51 | | Full | 100% | 4 hours soak | All thresholds hold, zero duplicate charges | |
| 52 | |
| 53 | ## Monitoring Checkpoints |
| 54 | |
| 55 | At each rollout stage, verify: |
| 56 | |
| 57 | - [ ] **Error rates**: `https://grafana.internal/d/payment-svc` — canary vs. baseline cohort |
| 58 | - [ ] **Latency**: P50, P95, P99 on `payment.process` span — baseline: P50=120ms, P99=380ms |
| 59 | - [ ] **Duplicate charges**: `SELECT count(*) FROM charges WHERE is_duplicate = true AND created_at > now() - interval '1 hour'` — must be 0 |
| 60 | - [ ] **Stripe webhook success rate**: Grafana panel `stripe-webhooks` — must remain > 99.9% |
| 61 | - [ ] **Redis connection pool**: `payment-redis-pool-utilization` alert — must remain < 80% |
| 62 | |
| 63 | ## Rollback Triggers |
| 64 | |
| 65 | Initiate rollback immediately if any condition is met: |
| 66 | |
| 67 | - 5xx error rate > 0.1% for 3+ minutes |
| 68 | - Any duplicate charge detected |
| 69 | - Stripe webhook verification failure rate > 0.5% |
| 70 | - P99 latency > 1000ms (2.6x baseline) |
| 71 | - Redis connection pool utilization > 95% |
| 72 | |
| 73 | **Rollback procedure**: |
| 74 | |
| 75 | 1. Set feature flags `idempotency_v2` and `retry_v2` to `off` |
| 76 | 2. Roll back deployment: `kubectl rollout undo deployment/payment-service -n payments` |
| 77 | 3. Verify metrics return to baseline within 10 minutes |
| 78 | 4. Notify #payments-incidents channel with timeline |
| 79 | 5. Page finance-oncall if any duplicate charges were processed |
| 80 | |
| 81 | ## Post-Release |
| 82 | |
| 83 | - [ ] Metrics stable at 100% for 4 hours, smoke tests pass |
| 84 | - [ ] Support team briefed on new retry behavior (customers may see longer processing times) |
| 85 | - [ ] Finance team notified of PCI audit log format changes |
| 86 | - [ ] Feature flags `idempotency_v2` and `retry_v2` scheduled for cleanup (target: v2.5.0) |
| 87 | - [ ] Retrospective scheduled for 2026-03-27 (high-risk release) |
| 88 |