engineeringoperations

CI/CD Pipeline Design

Design CI/CD pipeline configurations for GitHub Actions, GitLab CI, and CircleCI. Covers build, test, and deploy stages with caching strategies, parallelization, environment promotion, and secret management.

ci-cdgithub-actionsgitlab-cicirclecipipelinedeploymentdevops

Works well with agents

CI/CD Engineer Agent DevOps Engineer Agent Platform Engineer Agent

Works well with skills

Monorepo Setup Guide Release Checklist Runbook Writing

$ npx skills add The-AI-Directory-Company/(…) --skill ci-cd-pipeline-design

ci-cd-pipeline-design/

SKILL.md

Markdown

1
2	# CI/CD Pipeline Design
3
4	## Before you start
5
6	Gather the following from the user:
7
8	1. Which CI/CD platform? (GitHub Actions, GitLab CI, CircleCI, or platform-agnostic)
9	2. What language/runtime? (Node.js, Python, Go, Java, Rust, multi-language)
10	3. What needs to happen? (Lint, test, build, deploy — which of these?)
11	4. Where does it deploy? (Vercel, AWS, GCP, Kubernetes, static hosting)
12	5. What is the branching strategy? (Trunk-based, GitFlow, feature branches)
13	6. What is slow today? (If optimizing an existing pipeline, what takes the longest?)
14
15	If the user says "set up CI/CD," push back: "For which platform, language, and deployment target? I also need your branching strategy to design the trigger rules."
16
17	## Procedure
18
19	### Step 1: Define trigger rules
20
21	Map git events to pipeline runs:
22
23	```yaml
24	# GitHub Actions example
25	on:
26	pull_request:
27	branches: [main] # Run tests on PRs to main
28	push:
29	branches: [main] # Deploy on merge to main
30	tags: ["v*"] # Deploy releases on version tags
31	workflow_dispatch: # Manual trigger for ad-hoc runs
32	```
33
34	Rules:
35	- PRs trigger lint + test + build (no deploy)
36	- Merge to main triggers lint + test + build + deploy to staging
37	- Tags trigger deploy to production
38	- Never auto-deploy to production on push to main
39
40	### Step 2: Design the stage graph
41
42	Organize jobs into stages with dependency edges:
43
44	```
45	[Lint] ──┐
46	├──> [Build] ──> [Deploy Staging] ──> [Smoke Test] ──> [Deploy Prod]
47	[Test] ──┘
48	```
49
50	For each stage, document:
51
52	\| Stage \| Trigger \| Depends on \| Timeout \| Runs on \|
53	\|-------\|---------\|------------\|---------\|---------\|
54	\| Lint \| PR + push \| None \| 5 min \| ubuntu-latest \|
55	\| Test \| PR + push \| None \| 15 min \| ubuntu-latest \|
56	\| Build \| PR + push \| Lint + Test pass \| 10 min \| ubuntu-latest \|
57	\| Deploy staging \| Push to main \| Build \| 10 min \| ubuntu-latest \|
58	\| Smoke test \| After staging deploy \| Deploy staging \| 5 min \| ubuntu-latest \|
59	\| Deploy prod \| Tag v* or manual \| Smoke test pass \| 10 min \| ubuntu-latest \|
60
61	Run Lint and Test in parallel. They have no dependency on each other.
62
63	### Step 3: Configure caching
64
65	Caching is the single highest-impact optimization. For each ecosystem, cache the dependency directory with a key based on the lockfile hash:
66
67	- Node.js: Cache `node_modules`, key on `hashFiles('pnpm-lock.yaml')`
68	- Python: Cache `~/.cache/pip`, key on `hashFiles('requirements.txt')`
69	- Go: Cache `~/go/pkg/mod` + `~/.cache/go-build`, key on `hashFiles('go.sum')`
70	- Docker: Use `cache-from: type=gha` and `cache-to: type=gha,mode=max`
71
72	Always set `restore-keys` for partial cache hits. Monitor cache hit rates — below 80% means the key strategy needs adjustment.
73
74	### Step 4: Implement test parallelization
75
76	Split tests across parallel runners to reduce wall-clock time:
77
78	GitHub Actions matrix:
79	```yaml
80	test:
81	strategy:
82	matrix:
83	shard: [1, 2, 3, 4]
84	steps:
85	- run: npx jest --shard=${{ matrix.shard }}/4
86	```
87
88	CircleCI parallelism:
89	```yaml
90	test:
91	parallelism: 4
92	steps:
93	- run: \|
94	TESTS=$(circleci tests glob "*/.test.ts" \| circleci tests split --split-by=timings)
95	npx jest $TESTS
96	```
97
98	Split by timing data when available. Fall back to file count splitting. Rebalance shards when test distribution becomes uneven.
99
100	### Step 5: Configure secrets and environment variables
101
102	Use the platform's secret store (`${{ secrets.X }}` in GitHub Actions, CI/CD variables in GitLab). Rules:
103	- Never echo secrets in logs — use `add-mask` or equivalent
104	- Scope secrets per environment (staging vs production)
105	- For OIDC-capable targets (AWS, GCP), use identity federation instead of static credentials
106	- Document secret rotation procedure in the pipeline README
107
108	### Step 6: Add deploy gates
109
110	Between staging and production, require at least one gate:
111
112	- Smoke test: Automated health check hitting critical endpoints on staging
113	- Manual approval: Required for production deploys (GitHub: `environment` with reviewers; GitLab: `when: manual`)
114	- Canary verification: Deploy to a subset, verify metrics, then proceed
115
116	```yaml
117	# GitHub Actions environment protection
118	deploy-prod:
119	environment:
120	name: production
121	url: https://app.example.com
122	needs: [smoke-test]
123	```
124
125	### Step 7: Handle artifacts
126
127	Build once, deploy the same artifact to all environments. Upload build output as a pipeline artifact with a retention policy. Download in deploy jobs. The artifact deployed to production must be byte-identical to what was tested in staging — never rebuild per environment.
128
129	## Quality checklist
130
131	Before delivering the pipeline configuration, verify:
132
133	- [ ] PRs run lint + test + build but never deploy
134	- [ ] Lint and test run in parallel where possible
135	- [ ] Caching is configured for dependencies with lockfile-based keys
136	- [ ] Secrets use the platform's secret store, never hardcoded
137	- [ ] Production deploy requires a gate (manual approval or automated canary)
138	- [ ] Build artifacts are created once and reused across environments
139	- [ ] Timeouts are set on every job to prevent hung pipelines
140	- [ ] The pipeline YAML is syntactically valid for the target platform
141
142	## Common mistakes
143
144	- Deploying to production on push to main. Always gate production deploys behind manual approval or automated verification. A broken merge should not reach users automatically.
145	- No caching. Installing dependencies from scratch on every run wastes 2-5 minutes. Cache aggressively and monitor hit rates.
146	- Rebuilding per environment. Building separately for staging and production means you are deploying untested artifacts to production. Build once, deploy everywhere.
147	- Sequential lint and test. These have no dependency on each other. Run them in parallel to cut pipeline time.
148	- Secrets in pipeline YAML. Even "temporary" secrets in YAML files end up in git history. Use the platform's secret management from day one.
149	- No timeout on jobs. A hung test suite or a stuck deploy can block the pipeline for hours. Set explicit timeouts on every job.
150