engineering
Codebase Exploration
Systematic methodology for understanding unfamiliar repositories — finding entry points, mapping architecture layers, tracing data flows, and identifying patterns and conventions used across the codebase.
onboardingarchitecturecode-readingcodebaseexplorationreverse-engineering
Works well with agents
Works well with skills
$ npx skills add The-AI-Directory-Company/(…) --skill codebase-explorationSKILL.md
Markdown
| 1 | |
| 2 | # Codebase Exploration |
| 3 | |
| 4 | ## Before you start |
| 5 | |
| 6 | Gather the following from the user. If anything is missing, ask before proceeding: |
| 7 | |
| 8 | 1. **What is the repository?** — URL or local path to the codebase |
| 9 | 2. **What is the goal?** — Bug fix, feature addition, general understanding, onboarding, or audit |
| 10 | 3. **What do you already know?** — Language, framework, or prior context (even partial) |
| 11 | 4. **What is the scope?** — Entire repo, a specific subsystem, or a single feature flow |
| 12 | 5. **What is the time budget?** — Quick orientation (30 min) or deep mapping (hours) |
| 13 | |
| 14 | ## Exploration procedure |
| 15 | |
| 16 | ### 1. Read the Project Manifest |
| 17 | |
| 18 | Start with the files that declare what the project is and how it runs: |
| 19 | |
| 20 | - `README.md`, `CONTRIBUTING.md`, `CLAUDE.md` — stated architecture, setup, conventions |
| 21 | - `package.json`, `Cargo.toml`, `pyproject.toml`, `go.mod` — language, dependencies, scripts |
| 22 | - `Dockerfile`, `docker-compose.yml`, `.env.example` — runtime environment and services |
| 23 | - CI config (`.github/workflows/`, `.gitlab-ci.yml`) — build steps reveal the dependency graph |
| 24 | |
| 25 | Record: language, framework, build tool, test runner, deployment target. |
| 26 | |
| 27 | ### 2. Map the Directory Structure |
| 28 | |
| 29 | Run a shallow tree (depth 2-3) and classify each top-level directory: |
| 30 | |
| 31 | - **Entry points**: `src/index.*`, `app/`, `cmd/`, `main.*` |
| 32 | - **Configuration**: config files, env schemas, feature flags |
| 33 | - **Domain logic**: models, services, use-cases, controllers |
| 34 | - **Data access**: repositories, queries, migrations, ORM schemas |
| 35 | - **API surface**: routes, handlers, resolvers, RPC definitions |
| 36 | - **Shared utilities**: libs, helpers, utils, common |
| 37 | - **Tests**: test directories, fixture files, factories |
| 38 | |
| 39 | Sketch a layer diagram: entry point -> routing -> handlers -> domain -> data access -> external services. |
| 40 | |
| 41 | ### 3. Trace the Primary Data Flow |
| 42 | |
| 43 | Pick the most important user action (e.g., "user signs up", "order is placed") and trace it end-to-end: |
| 44 | |
| 45 | 1. Find the route or entry point that handles it |
| 46 | 2. Follow the handler into service/domain logic |
| 47 | 3. Identify every database query, API call, or side effect |
| 48 | 4. Note the response path back to the caller |
| 49 | 5. Record each file touched and its role in the flow |
| 50 | |
| 51 | This single trace reveals naming conventions, error handling patterns, and the project's layering strategy. |
| 52 | |
| 53 | ### 4. Identify Patterns and Conventions |
| 54 | |
| 55 | Look for recurring structural patterns across 3-5 files of the same type: |
| 56 | |
| 57 | - **Naming**: How are files, functions, variables, and types named? |
| 58 | - **Error handling**: Exceptions, result types, error codes, or error boundaries? |
| 59 | - **State management**: Global store, context, dependency injection, or passed parameters? |
| 60 | - **Authentication/authorization**: Middleware, decorators, guards, or inline checks? |
| 61 | - **Testing style**: Unit-heavy, integration-heavy, or end-to-end? Mocks or real dependencies? |
| 62 | |
| 63 | Document each pattern with a concrete file reference. |
| 64 | |
| 65 | ### 5. Map External Dependencies |
| 66 | |
| 67 | Identify every external system the codebase communicates with: |
| 68 | |
| 69 | - Databases and caches (connection strings, ORM config) |
| 70 | - Third-party APIs (HTTP clients, SDK imports) |
| 71 | - Message queues or event buses |
| 72 | - File storage (S3, local disk) |
| 73 | - Authentication providers |
| 74 | |
| 75 | For each, note: what module owns the integration, how errors are handled, and whether there is a fallback. |
| 76 | |
| 77 | ### 6. Locate the Test Suite |
| 78 | |
| 79 | Find where tests live and assess coverage: |
| 80 | |
| 81 | - Run the test command from the manifest (e.g., `npm test`, `pytest`) |
| 82 | - Identify which areas have dense coverage and which have none |
| 83 | - Check for test utilities, factories, or fixtures that reveal domain assumptions |
| 84 | |
| 85 | ### 7. Produce the Exploration Summary |
| 86 | |
| 87 | Deliver a structured summary: |
| 88 | |
| 89 | | Section | Content | |
| 90 | |---------|---------| |
| 91 | | Stack | Language, framework, runtime, key libraries | |
| 92 | | Architecture | Layer diagram or description | |
| 93 | | Entry points | Main files that start the application | |
| 94 | | Primary data flow | Step-by-step trace of the core user action | |
| 95 | | Patterns | Naming, error handling, state, auth conventions | |
| 96 | | External deps | Every external system and its integration module | |
| 97 | | Test coverage | Where tests exist, where they are missing | |
| 98 | | Risks/concerns | Dead code, circular deps, missing docs, unclear ownership | |
| 99 | |
| 100 | ## Quality checklist |
| 101 | |
| 102 | Before delivering the exploration summary, verify: |
| 103 | |
| 104 | - [ ] The project manifest was read and stack is identified correctly |
| 105 | - [ ] Directory structure is classified by responsibility, not just listed |
| 106 | - [ ] At least one end-to-end data flow is traced with specific file references |
| 107 | - [ ] Patterns are documented with concrete examples, not guessed |
| 108 | - [ ] External dependencies are enumerated with owning modules |
| 109 | - [ ] Test coverage gaps are identified |
| 110 | - [ ] The summary is structured and scannable, not a wall of text |
| 111 | |
| 112 | ## Common mistakes |
| 113 | |
| 114 | - **Jumping straight to code without reading the manifest.** The README, package manager config, and CI files answer half your questions in 5 minutes. |
| 115 | - **Listing files instead of classifying them.** A directory listing is not understanding. Every folder should have a role label. |
| 116 | - **Stopping at the surface layer.** Reading route definitions without tracing into handlers and data access misses the actual architecture. |
| 117 | - **Assuming conventions from one file.** Check at least 3 files of the same type before declaring a pattern. One file might be an exception. |
| 118 | - **Ignoring the test suite.** Tests are executable documentation. They reveal intended behavior, edge cases, and which parts the team considers important. |
| 119 | - **Producing an unstructured brain dump.** The output should be a reference someone can scan in 2 minutes, not a narrative essay. |
| 120 |