engineeringdata

Data Migration Plan

Plan and execute data migrations with phased rollout — dual-write, backfill, cutover, and verification stages with rollback triggers and data integrity checks at every step.

migrationdataschemarollbackdatabaseETL

Works well with agents

Data Engineer AgentDatabase Architect Agent

Works well with skills

System Design DocumentTicket Writing
data-migration-plan/
    • postgres-to-new-schema.md3.6 KB
  • SKILL.md6.9 KB
data-migration-plan/examples/postgres-to-new-schema.md
postgres-to-new-schema.md
Markdown
1# Migration Plan: User Accounts Schema Restructuring
2 
3## Overview
4 
5Restructure the `users` table into separate `accounts` and `profiles` tables to support multi-tenant access. PostgreSQL 15, ~8.2M rows, zero-downtime requirement.
6 
7## Pre-Migration Assessment
8 
9- **Affected services**: auth-service, billing-api, admin-dashboard, analytics-etl, email-worker
10- **Foreign keys**: `orders.user_id`, `sessions.user_id`, `audit_logs.actor_id` (12 tables total)
11- **Baseline performance**: P50 = 2ms, P99 = 18ms on `users` table lookups
12- **Estimated backfill duration**: 47 minutes (tested against production replica at 10k rows/batch)
13- **Backup verified**: Full pg_dump restore tested on 2026-03-12 — 22 minutes to restore
14 
15## Phase 1: Dual-Write (Days 1-3)
16 
17**Entry criteria**: All CI checks pass, backup verified within 7 days.
18 
191. Deploy additive schema: create `accounts` and `profiles` tables alongside `users`
202. Update auth-service and billing-api write paths to insert into both `users` and `accounts`/`profiles`
213. Deploy dual-write consistency logger — compare row-by-row on every write
22 
23**Integrity check**: `SELECT count(*) FROM accounts a LEFT JOIN users u ON a.legacy_user_id = u.id WHERE u.id IS NULL;` must return 0.
24 
25**Rollback trigger**: Dual-write error rate > 0.1% over any 10-minute window. Action: revert auth-service deploy, drop `accounts` and `profiles` tables.
26 
27## Phase 2: Backfill (Day 4)
28 
29**Entry criteria**: Dual-write running 48+ hours with zero divergences.
30 
311. Run backfill in batches of 5,000 rows with 200ms delay between batches
322. Monitor replica lag — pause if lag exceeds 5 seconds
333. Use `INSERT INTO accounts (...) SELECT ... FROM users ON CONFLICT (legacy_user_id) DO UPDATE`
34 
35**Integrity check**: Row count match + spot-check 2,000 random rows for field equality:
36```sql
37SELECT count(*) FROM users u
38JOIN accounts a ON a.legacy_user_id = u.id
39WHERE u.email != a.email OR u.created_at != a.created_at;
40-- Must return 0
41```
42 
43**Rollback trigger**: Replica lag > 30s sustained, or any data mismatch detected. Action: pause backfill, investigate, reduce batch size to 1,000.
44 
45## Phase 3: Cutover (Days 5-9)
46 
47**Entry criteria**: Backfill complete, 100% row parity confirmed.
48 
49| Stage | Read traffic % | Duration | Gate |
50|-------|---------------|----------|------|
51| Canary | 1% | 4 hours | Error rate < 0.01%, P99 < 22ms |
52| Partial | 10% | 12 hours | Error rate < 0.01%, P99 < 20ms |
53| Majority | 50% | 24 hours | Error rate < 0.005% |
54| Full | 100% | 48 hours soak | P99 within 10% of baseline |
55 
56**Rollback trigger**: Error rate or latency exceeds gate thresholds. Action: set feature flag `use_new_accounts_schema` to false — immediate revert to `users` table reads.
57 
58## Phase 4: Verification (Days 10-12)
59 
601. Final consistency check: row counts, checksums on email and created_at columns
612. Verify analytics-etl output matches pre-migration baseline report (< 0.01% variance)
623. Confirm no queries hit `users` table directly (check `pg_stat_user_tables` for seq/idx scans)
634. Sign-off from data-eng lead and product-eng lead
64 
65## Phase 5: Cleanup (Days 20-25)
66 
671. Remove dual-write code from auth-service and billing-api
682. Drop `users` table after 2 release cycles (grace period through Day 25)
693. Update schema diagrams, ERD docs, and runbooks
704. Archive migration scripts in `migrations/2026-03-archive/`
71 
72## Rollback Summary
73 
74| Phase | Rollback action | Data loss risk |
75|-------|----------------|----------------|
76| 1 — Dual-Write | Revert deploy, drop new tables | None |
77| 2 — Backfill | Stop backfill, truncate new tables | None |
78| 3 — Cutover | Flip feature flag to 0% | None |
79| 4+ — Post-cleanup | Restore from backup (point of no return) | Up to RPO window |
80 
AgentsSkillsCompaniesJobsForumBlogFAQAbout

©2026 ai-directory.company

·Privacy·Terms·Cookies·