engineering

Debugging Guide

Step-by-step debugging methodology for systematic bug finding — covers reproduction, isolation, hypothesis testing, root cause analysis, and fix verification.

debuggingtroubleshootingbug-fixingmethodology

Works well with agents

Debugger Agent QA Engineer Agent SRE Engineer Agent

Works well with skills

Bug Report Writing Incident Postmortem Test Plan Writing

$ npx skills add The-AI-Directory-Company/(…) --skill debugging-guide

debugging-guide/

SKILL.md5.8 KB

SKILL.md

Markdown

1
2	# Debugging Guide
3
4	## Before you start
5
6	Gather the following before investigating:
7
8	1. What is the expected behavior? — What should happen
9	2. What is the actual behavior? — What happens instead (exact error message, wrong output, crash)
10	3. When did it start? — Was it always broken, or did it work before a specific change?
11	4. What changed recently? — Deployments, config changes, dependency updates, data migrations
12	5. Who is affected? — All users, specific accounts, specific environments
13	6. Can you reproduce it? — Consistent or intermittent? Steps to trigger?
14
15	Do not start guessing at fixes until you can reproduce the bug or have clear evidence of the root cause.
16
17	## Procedure
18
19	### 1. Reproduce the bug
20
21	Before anything else, make the bug happen on demand.
22
23	- Follow the exact steps from the bug report
24	- Use the same environment (OS, browser, API version) as the reporter
25	- If the bug is intermittent, identify the conditions that increase its likelihood (load, timing, specific data)
26	- If you cannot reproduce it, gather more data — logs, screenshots, network traces — before proceeding
27
28	A bug you cannot reproduce is a bug you cannot verify as fixed.
29
30	### 2. Isolate the scope
31
32	Narrow down where the bug lives:
33
34	- Layer: Is it frontend, backend, database, infrastructure, or third-party?
35	- Component: Which module, service, or function?
36	- Input: Which specific inputs trigger the bug? Which inputs do NOT trigger it?
37
38	Techniques for isolation:
39	- Binary search: Comment out or bypass half the code path. Does the bug persist? Narrow to the half that matters.
40	- Minimal reproduction: Strip away everything unrelated until you have the smallest code/input that triggers the bug.
41	- Environment comparison: Does it happen in staging but not local? Diff the configs.
42	- Git bisect: If it worked before, use `git bisect` to find the exact commit that introduced it.
43
44	### 3. Form a hypothesis
45
46	State your hypothesis explicitly before testing it:
47
48	```
49	HYPOTHESIS: [what you think is wrong]
50	EVIDENCE: [what makes you think so]
51	TEST: [how to confirm or disprove it]
52	PREDICTION: [what you expect to see if the hypothesis is correct]
53	```
54
55	One hypothesis at a time. If you test multiple changes simultaneously, you will not know which one mattered.
56
57	### 4. Test the hypothesis
58
59	Run the test you defined. Compare the result to your prediction.
60
61	- Prediction matches — Your hypothesis is likely correct. Proceed to fix.
62	- Prediction does not match — Your hypothesis is wrong. Do not force it. Return to step 2 with new information.
63	- Partial match — There may be multiple contributing factors. Isolate further.
64
65	### 5. Find the root cause
66
67	The first fix that makes symptoms disappear is not necessarily the root cause. Ask:
68
69	- Why does this input cause this behavior? (not just "what" happens)
70	- Is this a symptom of a deeper issue? (fixing the symptom may leave the real bug)
71	- Are there other code paths with the same underlying flaw?
72
73	Use the "5 Whys" technique:
74	1. Why did the request fail? — The response was 500.
75	2. Why was it 500? — An unhandled null reference.
76	3. Why was the value null? — The database query returned no rows.
77	4. Why were there no rows? — The user ID was from a deleted account.
78	5. Why was a deleted account ID used? — The session was not invalidated on deletion.
79
80	Root cause: sessions are not invalidated when accounts are deleted.
81
82	### 6. Implement and verify the fix
83
84	1. Write a test that reproduces the bug (it should fail before the fix)
85	2. Apply the minimal fix — change the least amount of code possible
86	3. Run the reproduction test — it should pass now
87	4. Run the full test suite — no regressions
88	5. Check related code paths for the same pattern
89
90	### 7. Document the finding
91
92	Record for future reference:
93
94	```
95	BUG: [one-line summary]
96	ROOT CAUSE: [what was actually wrong]
97	FIX: [what was changed]
98	RELATED: [other areas that might have the same issue]
99	PREVENTION: [what would have caught this earlier — test, lint rule, type constraint]
100	```
101
102	## Quality checklist
103
104	- [ ] The bug is reproducible with a defined set of steps
105	- [ ] The root cause is identified, not just the symptom
106	- [ ] A test exists that fails before the fix and passes after
107	- [ ] The fix changes the minimum necessary code
108	- [ ] Related code paths were checked for the same pattern
109	- [ ] The full test suite passes with no regressions
110	- [ ] The fix is documented with root cause and prevention notes
111
112	## Common mistakes
113
114	- Fixing symptoms instead of root causes. Adding a null check hides the bug — ask WHY the value is null in the first place.
115	- Changing multiple things at once. If you change three things and the bug disappears, you do not know which change fixed it. Change one thing at a time.
116	- Skipping reproduction. Fixing a bug you cannot reproduce means you cannot verify the fix works. Reproduce first, always.
117	- Blaming the environment. "It works on my machine" is not a diagnosis. If it fails in production, the difference between your machine and production IS the bug.
118	- Stopping at the first fix that works. The first fix may mask the real issue. Verify the root cause before declaring victory.
119	- Not checking for related instances. If a bug exists in one place, the same pattern likely exists elsewhere. Search the codebase for similar code.
120