engineering
Database Migration Plan
Plan database schema migrations (DDL changes) with backward-compatible strategies, migration scripts, rollback procedures, and zero-downtime deployment patterns. Covers column additions, type changes, index management, and table restructuring.
databasemigrationschemaddlzero-downtimerollbackpostgresqlmysql
Works well with agents
Works well with skills
$ npx skills add The-AI-Directory-Company/(…) --skill database-migration-planSKILL.md
Markdown
| 1 | |
| 2 | # Database Migration Plan |
| 3 | |
| 4 | ## Before you start |
| 5 | |
| 6 | Gather the following from the user: |
| 7 | |
| 8 | 1. **What schema change is needed?** (Add column, rename column, change type, add/drop index, create/drop table, split table) |
| 9 | 2. **Which database engine?** (PostgreSQL, MySQL, SQL Server, or engine-agnostic) |
| 10 | 3. **Is zero-downtime required?** (Can the application tolerate maintenance windows?) |
| 11 | 4. **What is the table size?** (Row count and data size — this determines strategy) |
| 12 | 5. **Are there dependent applications?** (Other services reading/writing this table) |
| 13 | 6. **What ORM or migration tool is in use?** (Prisma, Drizzle, Knex, Flyway, Alembic, ActiveRecord, raw SQL) |
| 14 | |
| 15 | If the user says "add a column to the users table," push back: "How many rows? Is the app deployed with zero-downtime requirements? Is the column nullable or does it need a default? Other services reading this table?" |
| 16 | |
| 17 | This skill covers **schema-level DDL changes**. For moving data between systems or ETL pipelines, see the `data-migration-plan` skill instead. |
| 18 | |
| 19 | ## Procedure |
| 20 | |
| 21 | ### Step 1: Classify the migration risk |
| 22 | |
| 23 | | Change type | Lock risk | Data risk | Typical approach | |
| 24 | |------------|-----------|-----------|------------------| |
| 25 | | Add nullable column | Low | None | Single migration | |
| 26 | | Add column with default | Medium (engine-dependent) | None | Check engine behavior | |
| 27 | | Add index | Medium-High | None | CREATE INDEX CONCURRENTLY | |
| 28 | | Drop column | Low | High (data loss) | Multi-phase | |
| 29 | | Rename column | High | Medium | Multi-phase with alias | |
| 30 | | Change column type | High | High | Multi-phase with dual-write | |
| 31 | | Drop table | Low | High (data loss) | Multi-phase with grace period | |
| 32 | | Split/merge tables | High | High | Multi-phase with dual-write | |
| 33 | |
| 34 | For medium or high risk changes, use the multi-phase approach in Step 3. |
| 35 | |
| 36 | ### Step 2: Write the migration script |
| 37 | |
| 38 | Every migration must have an **up** and **down** script: |
| 39 | |
| 40 | ```sql |
| 41 | -- Migration: 20240115_001_add_status_to_orders |
| 42 | -- Description: Add status column to orders table for fulfillment tracking |
| 43 | |
| 44 | -- UP |
| 45 | ALTER TABLE orders ADD COLUMN status VARCHAR(32) NULL; |
| 46 | |
| 47 | -- DOWN |
| 48 | ALTER TABLE orders DROP COLUMN status; |
| 49 | ``` |
| 50 | |
| 51 | Rules for migration scripts: |
| 52 | |
| 53 | - Name with timestamp prefix and sequential number: `YYYYMMDD_NNN_description` |
| 54 | - One logical change per migration file — never bundle unrelated changes |
| 55 | - DOWN script must be tested and produce the exact prior schema |
| 56 | - Include comments stating what the migration does and why |
| 57 | - For large tables, include estimated execution time |
| 58 | |
| 59 | ### Step 3: Multi-phase migration pattern |
| 60 | |
| 61 | For changes that require zero-downtime on production tables: |
| 62 | |
| 63 | 1. **Expand** — Add the new structure alongside the old (e.g., `ALTER TABLE ADD COLUMN status_new NULL`). |
| 64 | 2. **Dual-write** — Deploy code that writes to both old and new. Reads prefer new, fall back to old. |
| 65 | 3. **Backfill** — Migrate existing data in batches (5,000-50,000 rows per commit). Monitor replication lag between batches. |
| 66 | 4. **Cutover** — Switch all reads and writes to the new structure. Verify no queries reference the old column in production logs. |
| 67 | 5. **Contract** — Drop the old structure (`ALTER TABLE DROP COLUMN`). |
| 68 | |
| 69 | Each phase is a separate deployment. Never combine phases into one release. |
| 70 | |
| 71 | ### Step 4: Handle engine-specific concerns |
| 72 | |
| 73 | **PostgreSQL:** `ADD COLUMN` with non-volatile default is metadata-only in PG 11+. Use `CREATE INDEX CONCURRENTLY` (cannot run inside a transaction). `ALTER COLUMN TYPE` rewrites the table on large tables — use multi-phase. |
| 74 | |
| 75 | **MySQL (InnoDB):** Use `ALGORITHM=INPLACE, LOCK=NONE` where supported. Adding a column with default rewrites the table in MySQL < 8.0.12 — use `pt-online-schema-change` or `gh-ost` for large tables. |
| 76 | |
| 77 | ### Step 5: Write the rollback plan |
| 78 | |
| 79 | For each migration phase, document: |
| 80 | |
| 81 | ``` |
| 82 | Phase: [N] |
| 83 | Rollback trigger: [What condition triggers rollback] |
| 84 | Rollback steps: |
| 85 | 1. [Exact SQL or deployment step] |
| 86 | 2. [Verify step] |
| 87 | Time estimate: [How long rollback takes] |
| 88 | Data impact: [What data is lost on rollback, if any] |
| 89 | ``` |
| 90 | |
| 91 | Rollback must be tested in staging before production deployment. If the DOWN migration involves data loss, document what is lost and whether it is recoverable from backups. |
| 92 | |
| 93 | ### Step 6: Define the verification plan |
| 94 | |
| 95 | After each phase, verify three areas: |
| 96 | |
| 97 | - **Schema:** Run `\d table_name` (PG) or `DESCRIBE table_name` (MySQL). Compare against expected state. |
| 98 | - **Data:** Row count matches pre-migration. Null check on new columns shows 0 after backfill. Spot check sample rows. |
| 99 | - **Application:** Health checks return 200, error rates unchanged, query latency p99 within baseline. |
| 100 | |
| 101 | ## Quality checklist |
| 102 | |
| 103 | Before delivering the migration plan, verify: |
| 104 | |
| 105 | - [ ] Every migration has both UP and DOWN scripts |
| 106 | - [ ] High-risk changes use the multi-phase expand/migrate/contract pattern |
| 107 | - [ ] Backfill runs in batches with commit points, not one massive UPDATE |
| 108 | - [ ] Rollback plan exists for each phase with explicit triggers and time estimates |
| 109 | - [ ] Engine-specific locking behavior is accounted for |
| 110 | - [ ] Verification queries are included for schema, data integrity, and performance |
| 111 | - [ ] The plan distinguishes between schema changes (this skill) and data movement (data-migration-plan) |
| 112 | |
| 113 | ## Common mistakes |
| 114 | |
| 115 | - **Running ALTER TABLE on a large table without checking lock behavior.** A table-rewriting ALTER on a 100M-row table locks writes for minutes. Check your engine version's online DDL support first. |
| 116 | - **Combining expand and contract in one deployment.** If the new schema has a bug, you cannot roll back without data loss. Always separate by at least one deployment cycle. |
| 117 | - **Backfilling without batching.** `UPDATE orders SET x = y` on 50M rows creates a massive transaction, bloats WAL/binlog, and may cause replication lag or OOM. Batch in chunks of 5,000-50,000. |
| 118 | - **Skipping the DOWN migration.** "We will never roll back" is not a plan. Write and test the rollback script before running the forward migration. |
| 119 | - **Forgetting dependent services.** Another team's service may query the column you are renaming. Check for cross-service dependencies before dropping or renaming. |
| 120 | - **Not monitoring replication lag during backfill.** On replicated databases, large writes increase replication lag. Pause between batches if lag exceeds your threshold. |
| 121 |