engineering

Database Migration Plan

Plan database schema migrations (DDL changes) with backward-compatible strategies, migration scripts, rollback procedures, and zero-downtime deployment patterns. Covers column additions, type changes, index management, and table restructuring.

databasemigrationschemaddlzero-downtimerollbackpostgresqlmysql

Works well with agents

Database Architect Agent Database Migrator Agent DevOps Engineer Agent

Works well with skills

Release Checklist System Design Document

$ npx skills add The-AI-Directory-Company/(…) --skill database-migration-plan

database-migration-plan/

SKILL.md

Markdown

1
2	# Database Migration Plan
3
4	## Before you start
5
6	Gather the following from the user:
7
8	1. What schema change is needed? (Add column, rename column, change type, add/drop index, create/drop table, split table)
9	2. Which database engine? (PostgreSQL, MySQL, SQL Server, or engine-agnostic)
10	3. Is zero-downtime required? (Can the application tolerate maintenance windows?)
11	4. What is the table size? (Row count and data size — this determines strategy)
12	5. Are there dependent applications? (Other services reading/writing this table)
13	6. What ORM or migration tool is in use? (Prisma, Drizzle, Knex, Flyway, Alembic, ActiveRecord, raw SQL)
14
15	If the user says "add a column to the users table," push back: "How many rows? Is the app deployed with zero-downtime requirements? Is the column nullable or does it need a default? Other services reading this table?"
16
17	This skill covers schema-level DDL changes. For moving data between systems or ETL pipelines, see the `data-migration-plan` skill instead.
18
19	## Procedure
20
21	### Step 1: Classify the migration risk
22
23	\| Change type \| Lock risk \| Data risk \| Typical approach \|
24	\|------------\|-----------\|-----------\|------------------\|
25	\| Add nullable column \| Low \| None \| Single migration \|
26	\| Add column with default \| Medium (engine-dependent) \| None \| Check engine behavior \|
27	\| Add index \| Medium-High \| None \| CREATE INDEX CONCURRENTLY \|
28	\| Drop column \| Low \| High (data loss) \| Multi-phase \|
29	\| Rename column \| High \| Medium \| Multi-phase with alias \|
30	\| Change column type \| High \| High \| Multi-phase with dual-write \|
31	\| Drop table \| Low \| High (data loss) \| Multi-phase with grace period \|
32	\| Split/merge tables \| High \| High \| Multi-phase with dual-write \|
33
34	For medium or high risk changes, use the multi-phase approach in Step 3.
35
36	### Step 2: Write the migration script
37
38	Every migration must have an up and down script:
39
40	```sql
41	-- Migration: 20240115_001_add_status_to_orders
42	-- Description: Add status column to orders table for fulfillment tracking
43
44	-- UP
45	ALTER TABLE orders ADD COLUMN status VARCHAR(32) NULL;
46
47	-- DOWN
48	ALTER TABLE orders DROP COLUMN status;
49	```
50
51	Rules for migration scripts:
52
53	- Name with timestamp prefix and sequential number: `YYYYMMDD_NNN_description`
54	- One logical change per migration file — never bundle unrelated changes
55	- DOWN script must be tested and produce the exact prior schema
56	- Include comments stating what the migration does and why
57	- For large tables, include estimated execution time
58
59	### Step 3: Multi-phase migration pattern
60
61	For changes that require zero-downtime on production tables:
62
63	1. Expand — Add the new structure alongside the old (e.g., `ALTER TABLE ADD COLUMN status_new NULL`).
64	2. Dual-write — Deploy code that writes to both old and new. Reads prefer new, fall back to old.
65	3. Backfill — Migrate existing data in batches (5,000-50,000 rows per commit). Monitor replication lag between batches.
66	4. Cutover — Switch all reads and writes to the new structure. Verify no queries reference the old column in production logs.
67	5. Contract — Drop the old structure (`ALTER TABLE DROP COLUMN`).
68
69	Each phase is a separate deployment. Never combine phases into one release.
70
71	### Step 4: Handle engine-specific concerns
72
73	PostgreSQL: `ADD COLUMN` with non-volatile default is metadata-only in PG 11+. Use `CREATE INDEX CONCURRENTLY` (cannot run inside a transaction). `ALTER COLUMN TYPE` rewrites the table on large tables — use multi-phase.
74
75	MySQL (InnoDB): Use `ALGORITHM=INPLACE, LOCK=NONE` where supported. Adding a column with default rewrites the table in MySQL < 8.0.12 — use `pt-online-schema-change` or `gh-ost` for large tables.
76
77	### Step 5: Write the rollback plan
78
79	For each migration phase, document:
80
81	```
82	Phase: [N]
83	Rollback trigger: [What condition triggers rollback]
84	Rollback steps:
85	1. [Exact SQL or deployment step]
86	2. [Verify step]
87	Time estimate: [How long rollback takes]
88	Data impact: [What data is lost on rollback, if any]
89	```
90
91	Rollback must be tested in staging before production deployment. If the DOWN migration involves data loss, document what is lost and whether it is recoverable from backups.
92
93	### Step 6: Define the verification plan
94
95	After each phase, verify three areas:
96
97	- Schema: Run `\d table_name` (PG) or `DESCRIBE table_name` (MySQL). Compare against expected state.
98	- Data: Row count matches pre-migration. Null check on new columns shows 0 after backfill. Spot check sample rows.
99	- Application: Health checks return 200, error rates unchanged, query latency p99 within baseline.
100
101	## Quality checklist
102
103	Before delivering the migration plan, verify:
104
105	- [ ] Every migration has both UP and DOWN scripts
106	- [ ] High-risk changes use the multi-phase expand/migrate/contract pattern
107	- [ ] Backfill runs in batches with commit points, not one massive UPDATE
108	- [ ] Rollback plan exists for each phase with explicit triggers and time estimates
109	- [ ] Engine-specific locking behavior is accounted for
110	- [ ] Verification queries are included for schema, data integrity, and performance
111	- [ ] The plan distinguishes between schema changes (this skill) and data movement (data-migration-plan)
112
113	## Common mistakes
114
115	- Running ALTER TABLE on a large table without checking lock behavior. A table-rewriting ALTER on a 100M-row table locks writes for minutes. Check your engine version's online DDL support first.
116	- Combining expand and contract in one deployment. If the new schema has a bug, you cannot roll back without data loss. Always separate by at least one deployment cycle.
117	- Backfilling without batching. `UPDATE orders SET x = y` on 50M rows creates a massive transaction, bloats WAL/binlog, and may cause replication lag or OOM. Batch in chunks of 5,000-50,000.
118	- Skipping the DOWN migration. "We will never roll back" is not a plan. Write and test the rollback script before running the forward migration.
119	- Forgetting dependent services. Another team's service may query the column you are renaming. Check for cross-service dependencies before dropping or renaming.
120	- Not monitoring replication lag during backfill. On replicated databases, large writes increase replication lag. Pause between batches if lag exceeds your threshold.
121