engineeringdata

Data Migration Plan

Plan and execute data migrations with phased rollout — dual-write, backfill, cutover, and verification stages with rollback triggers and data integrity checks at every step.

migrationdataschemarollbackdatabaseETL

Works well with agents

Data Engineer AgentDatabase Architect Agent

Works well with skills

System Design DocumentTicket Writing
data-migration-plan/
    • postgres-to-new-schema.md3.6 KB
  • SKILL.md6.9 KB
SKILL.md
Markdown
1 
2# Data Migration Plan
3 
4## Before you start
5 
6Gather the following from the user. If any is missing, ask before proceeding:
7 
81. **Current schema** — Tables, columns, relationships, and storage engine. Include indexes and constraints.
92. **Target schema** — The desired end state. What changes: renamed columns, split tables, new relationships, type changes?
103. **Data volume** — Row counts for affected tables, total size on disk, and growth rate.
114. **Downtime tolerance** — Zero-downtime required? Maintenance window available? If so, how long?
125. **Rollback requirements** — How quickly must you revert? Is data loss during rollback acceptable? What is the point-of-no-return?
13 
14If the user says "just rename a column," push back: What reads from this column? Are there cached references? Is the column in any API contract, event payload, or downstream ETL?
15 
16## Migration plan template
17 
18Use the following structure. Every phase includes explicit entry criteria, steps, and exit criteria.
19 
20### Pre-Migration Assessment
21 
22Before writing any migration code:
23 
24- Catalog every service, job, and query that reads or writes the affected tables
25- Identify foreign key dependencies and cascade behavior
26- Measure current query performance on affected tables (baseline P50/P99)
27- Estimate migration duration: run the backfill query against a production-sized replica
28- Confirm backup and restore procedures work — test the restore, not just the backup
29 
30### Phase 1: Dual-Write
31 
32**Goal**: New writes go to both old and new schema without affecting reads.
33 
341. Deploy schema changes additively (new columns, new tables) — never drop or rename yet
352. Update write paths to populate both old and new locations
363. Add logging to confirm dual-write consistency (compare old vs. new on every write)
374. **Integrity check**: Query for rows where old and new values diverge. Count must be zero before proceeding.
38 
39**Rollback trigger**: If dual-write error rate exceeds 0.1%, disable new write path and revert deployment.
40 
41### Phase 2: Backfill
42 
43**Goal**: Migrate all historical data from old schema to new schema.
44 
451. Run backfill in batches (1,000–10,000 rows per batch, tuned to avoid lock contention)
462. Throttle to keep replica lag under your threshold (e.g., < 5 seconds)
473. Use idempotent upserts so the backfill can be safely restarted at any point
484. Log progress: batch number, rows processed, elapsed time, estimated time remaining
495. **Integrity check**: After backfill completes, run a full-table comparison — old source vs. new target. Row count must match. Spot-check 1,000 random rows for value equality.
50 
51**Rollback trigger**: If backfill causes replica lag > 30 seconds or error rate spikes, pause and reduce batch size. If data corruption is detected, stop and revert to Phase 1 state.
52 
53### Phase 3: Cutover
54 
55**Goal**: Switch reads from old schema to new schema.
56 
571. Deploy read path changes behind a feature flag — route a small percentage of reads to new schema first (1%, 10%, 50%, 100%)
582. Compare read results between old and new paths during the ramp (shadow reads or dual-read with comparison logging)
593. Monitor latency, error rate, and correctness at each ramp stage
604. At 100%, hold for a soak period (minimum 24 hours in production) before proceeding
615. **Integrity check**: P99 latency on new read path must be within 10% of the old path baseline.
62 
63**Rollback trigger**: If error rate or latency degrades at any ramp stage, set feature flag back to 0% immediately.
64 
65### Phase 4: Verification
66 
67**Goal**: Confirm the migration is complete and correct before cleanup.
68 
691. Run final consistency check: compare row counts and checksums between old and new schema
702. Verify all downstream consumers (reports, analytics, ETL jobs, APIs) produce correct output
713. Confirm no service is still reading from or writing to the old schema (check query logs)
724. Document any data that was intentionally dropped, transformed, or defaulted during migration
735. Get sign-off from the data owner before proceeding to cleanup
74 
75### Phase 5: Cleanup
76 
77**Goal**: Remove old schema and dual-write code.
78 
791. Remove dual-write code paths and feature flags
802. Drop old columns or tables (after a grace period — typically 1–2 release cycles)
813. Update documentation, runbooks, and schema diagrams
824. Archive the migration scripts for audit trail
83 
84**Do not skip cleanup.** Leftover dual-write code becomes a maintenance burden and a source of subtle bugs.
85 
86### Rollback Plan
87 
88Document the full rollback procedure upfront, not during an incident:
89 
90- **Phase 1 rollback**: Revert deployment, remove new columns/tables. No data loss.
91- **Phase 2 rollback**: Stop backfill, truncate new tables. Dual-write code still active — disable it.
92- **Phase 3 rollback**: Flip feature flag to route all reads back to old schema. Immediate.
93- **Phase 4+ rollback**: If old schema is already dropped, you need a restore from backup. This is the point of no return — make sure you are past it before cleanup.
94 
95## Quality checklist
96 
97Before delivering the migration plan, verify:
98 
99- [ ] Every phase has explicit entry criteria, steps, and exit criteria
100- [ ] Rollback triggers are defined with specific thresholds (not "if something goes wrong")
101- [ ] Data integrity checks exist at every phase boundary
102- [ ] Backfill is idempotent and restartable
103- [ ] Downstream consumers are cataloged and their migration path is documented
104- [ ] The rollback plan covers every phase, including the point of no return
105- [ ] Migration duration has been estimated against production-scale data
106 
107## Common mistakes to avoid
108 
109- **Skipping the dual-write phase**. Going straight from old schema to new schema forces a big-bang cutover. Dual-write lets you validate incrementally and roll back safely.
110- **Non-idempotent backfills**. If your backfill crashes at row 500,000, you need to restart from the beginning unless your upsert logic is idempotent. Always use `INSERT ... ON CONFLICT UPDATE` or equivalent.
111- **Ignoring downstream consumers**. The migration is not done when the database is updated. Every ETL job, cache layer, search index, and analytics pipeline that reads the old schema needs a migration path too.
112- **No production-scale test**. A backfill that takes 2 minutes on staging may take 14 hours on production. Always benchmark against a production-sized dataset before scheduling the migration window.
113- **Cleaning up too early**. Dropping the old schema the day after cutover leaves no safety net. Keep the old schema readable for at least one full release cycle.
114 

©2026 ai-directory.company

·Privacy·Terms·Cookies·