engineering

Codebase Exploration

Systematic methodology for understanding unfamiliar repositories — finding entry points, mapping architecture layers, tracing data flows, and identifying patterns and conventions used across the codebase.

onboardingarchitecturecode-readingcodebaseexplorationreverse-engineering

Works well with agents

Code Explainer Agent Codebase Onboarder Agent Software Architect Agent Tech Lead Agent

Works well with skills

Architecture Decision Record Code Review Checklist System Design Document

$ npx skills add The-AI-Directory-Company/(…) --skill codebase-exploration

codebase-exploration/

SKILL.md

Markdown

1
2	# Codebase Exploration
3
4	## Before you start
5
6	Gather the following from the user. If anything is missing, ask before proceeding:
7
8	1. What is the repository? — URL or local path to the codebase
9	2. What is the goal? — Bug fix, feature addition, general understanding, onboarding, or audit
10	3. What do you already know? — Language, framework, or prior context (even partial)
11	4. What is the scope? — Entire repo, a specific subsystem, or a single feature flow
12	5. What is the time budget? — Quick orientation (30 min) or deep mapping (hours)
13
14	## Exploration procedure
15
16	### 1. Read the Project Manifest
17
18	Start with the files that declare what the project is and how it runs:
19
20	- `README.md`, `CONTRIBUTING.md`, `CLAUDE.md` — stated architecture, setup, conventions
21	- `package.json`, `Cargo.toml`, `pyproject.toml`, `go.mod` — language, dependencies, scripts
22	- `Dockerfile`, `docker-compose.yml`, `.env.example` — runtime environment and services
23	- CI config (`.github/workflows/`, `.gitlab-ci.yml`) — build steps reveal the dependency graph
24
25	Record: language, framework, build tool, test runner, deployment target.
26
27	### 2. Map the Directory Structure
28
29	Run a shallow tree (depth 2-3) and classify each top-level directory:
30
31	- Entry points: `src/index.`, `app/`, `cmd/`, `main.`
32	- Configuration: config files, env schemas, feature flags
33	- Domain logic: models, services, use-cases, controllers
34	- Data access: repositories, queries, migrations, ORM schemas
35	- API surface: routes, handlers, resolvers, RPC definitions
36	- Shared utilities: libs, helpers, utils, common
37	- Tests: test directories, fixture files, factories
38
39	Sketch a layer diagram: entry point -> routing -> handlers -> domain -> data access -> external services.
40
41	### 3. Trace the Primary Data Flow
42
43	Pick the most important user action (e.g., "user signs up", "order is placed") and trace it end-to-end:
44
45	1. Find the route or entry point that handles it
46	2. Follow the handler into service/domain logic
47	3. Identify every database query, API call, or side effect
48	4. Note the response path back to the caller
49	5. Record each file touched and its role in the flow
50
51	This single trace reveals naming conventions, error handling patterns, and the project's layering strategy.
52
53	### 4. Identify Patterns and Conventions
54
55	Look for recurring structural patterns across 3-5 files of the same type:
56
57	- Naming: How are files, functions, variables, and types named?
58	- Error handling: Exceptions, result types, error codes, or error boundaries?
59	- State management: Global store, context, dependency injection, or passed parameters?
60	- Authentication/authorization: Middleware, decorators, guards, or inline checks?
61	- Testing style: Unit-heavy, integration-heavy, or end-to-end? Mocks or real dependencies?
62
63	Document each pattern with a concrete file reference.
64
65	### 5. Map External Dependencies
66
67	Identify every external system the codebase communicates with:
68
69	- Databases and caches (connection strings, ORM config)
70	- Third-party APIs (HTTP clients, SDK imports)
71	- Message queues or event buses
72	- File storage (S3, local disk)
73	- Authentication providers
74
75	For each, note: what module owns the integration, how errors are handled, and whether there is a fallback.
76
77	### 6. Locate the Test Suite
78
79	Find where tests live and assess coverage:
80
81	- Run the test command from the manifest (e.g., `npm test`, `pytest`)
82	- Identify which areas have dense coverage and which have none
83	- Check for test utilities, factories, or fixtures that reveal domain assumptions
84
85	### 7. Produce the Exploration Summary
86
87	Deliver a structured summary:
88
89	\| Section \| Content \|
90	\|---------\|---------\|
91	\| Stack \| Language, framework, runtime, key libraries \|
92	\| Architecture \| Layer diagram or description \|
93	\| Entry points \| Main files that start the application \|
94	\| Primary data flow \| Step-by-step trace of the core user action \|
95	\| Patterns \| Naming, error handling, state, auth conventions \|
96	\| External deps \| Every external system and its integration module \|
97	\| Test coverage \| Where tests exist, where they are missing \|
98	\| Risks/concerns \| Dead code, circular deps, missing docs, unclear ownership \|
99
100	## Quality checklist
101
102	Before delivering the exploration summary, verify:
103
104	- [ ] The project manifest was read and stack is identified correctly
105	- [ ] Directory structure is classified by responsibility, not just listed
106	- [ ] At least one end-to-end data flow is traced with specific file references
107	- [ ] Patterns are documented with concrete examples, not guessed
108	- [ ] External dependencies are enumerated with owning modules
109	- [ ] Test coverage gaps are identified
110	- [ ] The summary is structured and scannable, not a wall of text
111
112	## Common mistakes
113
114	- Jumping straight to code without reading the manifest. The README, package manager config, and CI files answer half your questions in 5 minutes.
115	- Listing files instead of classifying them. A directory listing is not understanding. Every folder should have a role label.
116	- Stopping at the surface layer. Reading route definitions without tracing into handlers and data access misses the actual architecture.
117	- Assuming conventions from one file. Check at least 3 files of the same type before declaring a pattern. One file might be an exception.
118	- Ignoring the test suite. Tests are executable documentation. They reveal intended behavior, edge cases, and which parts the team considers important.
119	- Producing an unstructured brain dump. The output should be a reference someone can scan in 2 minutes, not a narrative essay.
120