engineeringoperations
CI/CD Pipeline Design
Design CI/CD pipeline configurations for GitHub Actions, GitLab CI, and CircleCI. Covers build, test, and deploy stages with caching strategies, parallelization, environment promotion, and secret management.
ci-cdgithub-actionsgitlab-cicirclecipipelinedeploymentdevops
Works well with agents
Works well with skills
$ npx skills add The-AI-Directory-Company/(…) --skill ci-cd-pipeline-designSKILL.md
Markdown
| 1 | |
| 2 | # CI/CD Pipeline Design |
| 3 | |
| 4 | ## Before you start |
| 5 | |
| 6 | Gather the following from the user: |
| 7 | |
| 8 | 1. **Which CI/CD platform?** (GitHub Actions, GitLab CI, CircleCI, or platform-agnostic) |
| 9 | 2. **What language/runtime?** (Node.js, Python, Go, Java, Rust, multi-language) |
| 10 | 3. **What needs to happen?** (Lint, test, build, deploy — which of these?) |
| 11 | 4. **Where does it deploy?** (Vercel, AWS, GCP, Kubernetes, static hosting) |
| 12 | 5. **What is the branching strategy?** (Trunk-based, GitFlow, feature branches) |
| 13 | 6. **What is slow today?** (If optimizing an existing pipeline, what takes the longest?) |
| 14 | |
| 15 | If the user says "set up CI/CD," push back: "For which platform, language, and deployment target? I also need your branching strategy to design the trigger rules." |
| 16 | |
| 17 | ## Procedure |
| 18 | |
| 19 | ### Step 1: Define trigger rules |
| 20 | |
| 21 | Map git events to pipeline runs: |
| 22 | |
| 23 | ```yaml |
| 24 | # GitHub Actions example |
| 25 | on: |
| 26 | pull_request: |
| 27 | branches: [main] # Run tests on PRs to main |
| 28 | push: |
| 29 | branches: [main] # Deploy on merge to main |
| 30 | tags: ["v*"] # Deploy releases on version tags |
| 31 | workflow_dispatch: # Manual trigger for ad-hoc runs |
| 32 | ``` |
| 33 | |
| 34 | Rules: |
| 35 | - PRs trigger lint + test + build (no deploy) |
| 36 | - Merge to main triggers lint + test + build + deploy to staging |
| 37 | - Tags trigger deploy to production |
| 38 | - Never auto-deploy to production on push to main |
| 39 | |
| 40 | ### Step 2: Design the stage graph |
| 41 | |
| 42 | Organize jobs into stages with dependency edges: |
| 43 | |
| 44 | ``` |
| 45 | [Lint] ──┐ |
| 46 | ├──> [Build] ──> [Deploy Staging] ──> [Smoke Test] ──> [Deploy Prod] |
| 47 | [Test] ──┘ |
| 48 | ``` |
| 49 | |
| 50 | For each stage, document: |
| 51 | |
| 52 | | Stage | Trigger | Depends on | Timeout | Runs on | |
| 53 | |-------|---------|------------|---------|---------| |
| 54 | | Lint | PR + push | None | 5 min | ubuntu-latest | |
| 55 | | Test | PR + push | None | 15 min | ubuntu-latest | |
| 56 | | Build | PR + push | Lint + Test pass | 10 min | ubuntu-latest | |
| 57 | | Deploy staging | Push to main | Build | 10 min | ubuntu-latest | |
| 58 | | Smoke test | After staging deploy | Deploy staging | 5 min | ubuntu-latest | |
| 59 | | Deploy prod | Tag v* or manual | Smoke test pass | 10 min | ubuntu-latest | |
| 60 | |
| 61 | Run Lint and Test in parallel. They have no dependency on each other. |
| 62 | |
| 63 | ### Step 3: Configure caching |
| 64 | |
| 65 | Caching is the single highest-impact optimization. For each ecosystem, cache the dependency directory with a key based on the lockfile hash: |
| 66 | |
| 67 | - **Node.js:** Cache `node_modules`, key on `hashFiles('pnpm-lock.yaml')` |
| 68 | - **Python:** Cache `~/.cache/pip`, key on `hashFiles('requirements.txt')` |
| 69 | - **Go:** Cache `~/go/pkg/mod` + `~/.cache/go-build`, key on `hashFiles('go.sum')` |
| 70 | - **Docker:** Use `cache-from: type=gha` and `cache-to: type=gha,mode=max` |
| 71 | |
| 72 | Always set `restore-keys` for partial cache hits. Monitor cache hit rates — below 80% means the key strategy needs adjustment. |
| 73 | |
| 74 | ### Step 4: Implement test parallelization |
| 75 | |
| 76 | Split tests across parallel runners to reduce wall-clock time: |
| 77 | |
| 78 | **GitHub Actions matrix:** |
| 79 | ```yaml |
| 80 | test: |
| 81 | strategy: |
| 82 | matrix: |
| 83 | shard: [1, 2, 3, 4] |
| 84 | steps: |
| 85 | - run: npx jest --shard=${{ matrix.shard }}/4 |
| 86 | ``` |
| 87 | |
| 88 | **CircleCI parallelism:** |
| 89 | ```yaml |
| 90 | test: |
| 91 | parallelism: 4 |
| 92 | steps: |
| 93 | - run: | |
| 94 | TESTS=$(circleci tests glob "**/*.test.ts" | circleci tests split --split-by=timings) |
| 95 | npx jest $TESTS |
| 96 | ``` |
| 97 | |
| 98 | Split by timing data when available. Fall back to file count splitting. Rebalance shards when test distribution becomes uneven. |
| 99 | |
| 100 | ### Step 5: Configure secrets and environment variables |
| 101 | |
| 102 | Use the platform's secret store (`${{ secrets.X }}` in GitHub Actions, CI/CD variables in GitLab). Rules: |
| 103 | - Never echo secrets in logs — use `add-mask` or equivalent |
| 104 | - Scope secrets per environment (staging vs production) |
| 105 | - For OIDC-capable targets (AWS, GCP), use identity federation instead of static credentials |
| 106 | - Document secret rotation procedure in the pipeline README |
| 107 | |
| 108 | ### Step 6: Add deploy gates |
| 109 | |
| 110 | Between staging and production, require at least one gate: |
| 111 | |
| 112 | - **Smoke test**: Automated health check hitting critical endpoints on staging |
| 113 | - **Manual approval**: Required for production deploys (GitHub: `environment` with reviewers; GitLab: `when: manual`) |
| 114 | - **Canary verification**: Deploy to a subset, verify metrics, then proceed |
| 115 | |
| 116 | ```yaml |
| 117 | # GitHub Actions environment protection |
| 118 | deploy-prod: |
| 119 | environment: |
| 120 | name: production |
| 121 | url: https://app.example.com |
| 122 | needs: [smoke-test] |
| 123 | ``` |
| 124 | |
| 125 | ### Step 7: Handle artifacts |
| 126 | |
| 127 | Build once, deploy the same artifact to all environments. Upload build output as a pipeline artifact with a retention policy. Download in deploy jobs. The artifact deployed to production must be byte-identical to what was tested in staging — never rebuild per environment. |
| 128 | |
| 129 | ## Quality checklist |
| 130 | |
| 131 | Before delivering the pipeline configuration, verify: |
| 132 | |
| 133 | - [ ] PRs run lint + test + build but never deploy |
| 134 | - [ ] Lint and test run in parallel where possible |
| 135 | - [ ] Caching is configured for dependencies with lockfile-based keys |
| 136 | - [ ] Secrets use the platform's secret store, never hardcoded |
| 137 | - [ ] Production deploy requires a gate (manual approval or automated canary) |
| 138 | - [ ] Build artifacts are created once and reused across environments |
| 139 | - [ ] Timeouts are set on every job to prevent hung pipelines |
| 140 | - [ ] The pipeline YAML is syntactically valid for the target platform |
| 141 | |
| 142 | ## Common mistakes |
| 143 | |
| 144 | - **Deploying to production on push to main.** Always gate production deploys behind manual approval or automated verification. A broken merge should not reach users automatically. |
| 145 | - **No caching.** Installing dependencies from scratch on every run wastes 2-5 minutes. Cache aggressively and monitor hit rates. |
| 146 | - **Rebuilding per environment.** Building separately for staging and production means you are deploying untested artifacts to production. Build once, deploy everywhere. |
| 147 | - **Sequential lint and test.** These have no dependency on each other. Run them in parallel to cut pipeline time. |
| 148 | - **Secrets in pipeline YAML.** Even "temporary" secrets in YAML files end up in git history. Use the platform's secret management from day one. |
| 149 | - **No timeout on jobs.** A hung test suite or a stuck deploy can block the pipeline for hours. Set explicit timeouts on every job. |
| 150 |