project-management
Sprint Retrospective
Run effective sprint retrospectives that produce actionable improvements — with facilitation formats, data gathering techniques, and experiment-based follow-through.
retrospectiveagilesprintcontinuous-improvementfacilitation
Works well with agents
Works well with skills
sprint-retrospective/
SKILL.md
Markdown| 1 | |
| 2 | # Sprint Retrospective |
| 3 | |
| 4 | ## Before you start |
| 5 | |
| 6 | Gather the following from the facilitator. If anything is missing, ask before proceeding: |
| 7 | |
| 8 | 1. **Sprint duration and dates** — What period are we reflecting on? |
| 9 | 2. **Team size and composition** — How many participants? Are there remote members? |
| 10 | 3. **Sprint goal and outcome** — Was the goal met, partially met, or missed? |
| 11 | 4. **Key events** — Any incidents, launches, scope changes, or team changes during the sprint? |
| 12 | 5. **Previous action items** — What did the team commit to last retro? Were they completed? |
| 13 | 6. **Known tensions** — Any interpersonal or process friction the facilitator is aware of? |
| 14 | |
| 15 | ## Retrospective template |
| 16 | |
| 17 | ### 1. Review Previous Action Items (5 minutes) |
| 18 | |
| 19 | Start every retro by reviewing last sprint's commitments. For each item: |
| 20 | |
| 21 | ``` |
| 22 | Action Item: [Description] |
| 23 | Owner: [Name] |
| 24 | Status: Completed / In Progress / Not Started / Abandoned |
| 25 | Result: [What happened — measurable outcome if available] |
| 26 | ``` |
| 27 | |
| 28 | If more than half of previous action items were not completed, address that pattern before gathering new data. Chronic non-completion means the team is overcommitting or under-prioritizing retro outcomes. |
| 29 | |
| 30 | ### 2. Choose a Facilitation Format |
| 31 | |
| 32 | Select one format based on the team's current needs: |
| 33 | |
| 34 | **Start/Stop/Continue** — Best for teams new to retros or when things are generally stable. |
| 35 | - Start: What should we begin doing? |
| 36 | - Stop: What should we stop doing? |
| 37 | - Continue: What is working and should not change? |
| 38 | |
| 39 | **4Ls (Liked, Learned, Lacked, Longed For)** — Best when the team needs to reflect on growth. Four quadrants: what went well, what was discovered, what was missing, what was wished for. |
| 40 | |
| 41 | **Mad/Sad/Glad** — Best when there is emotional tension or team morale is a concern. Surfaces feelings before jumping to process fixes. |
| 42 | |
| 43 | **Timeline** — Best after a complex sprint with many events. Plot events chronologically, then annotate with energy levels and observations. |
| 44 | |
| 45 | Rotate formats every 3-4 sprints to prevent staleness. Never use the same format more than 3 times in a row. |
| 46 | |
| 47 | ### 3. Gather Data (10 minutes) |
| 48 | |
| 49 | Rules for data gathering: |
| 50 | |
| 51 | - Silent writing first — 5 minutes of individual brainstorming before any discussion |
| 52 | - One observation per sticky note or card — no compound statements |
| 53 | - Facts over feelings where possible: "Deploys took 45 minutes on average" beats "deploys were slow" |
| 54 | - Include specific examples: "The payments API had 3 unplanned outages" not "reliability was bad" |
| 55 | |
| 56 | ### 4. Group and Discuss (15 minutes) |
| 57 | |
| 58 | Cluster related observations into themes. Common theme categories: |
| 59 | |
| 60 | - **Process**: Planning, estimation, standup effectiveness, deployment workflow |
| 61 | - **Technical**: Code quality, testing, tooling, tech debt, architecture |
| 62 | - **Communication**: Cross-team coordination, documentation, handoffs, meetings |
| 63 | - **People**: Workload distribution, skill gaps, onboarding, morale |
| 64 | |
| 65 | For each theme, facilitate discussion with these questions: |
| 66 | - What is the root cause, not just the symptom? |
| 67 | - Is this within our control to change? |
| 68 | - How would we know if it got better? |
| 69 | |
| 70 | ### 5. Vote and Prioritize (5 minutes) |
| 71 | |
| 72 | Each team member gets 3 votes (dot voting). Vote on themes, not individual observations. |
| 73 | |
| 74 | Take the top 2-3 themes only. Teams that try to fix everything fix nothing. |
| 75 | |
| 76 | ### 6. Define Experiments (10 minutes) |
| 77 | |
| 78 | Convert each prioritized theme into a concrete experiment using this format: |
| 79 | |
| 80 | ``` |
| 81 | Theme: [The problem area] |
| 82 | Hypothesis: If we [specific change], then [expected outcome], measured by [metric]. |
| 83 | Experiment: [Exact action to take] |
| 84 | Owner: [Single person responsible — not "the team"] |
| 85 | Duration: [How long to run the experiment — usually 1-2 sprints] |
| 86 | Success Criteria: [How we will know it worked] |
| 87 | ``` |
| 88 | |
| 89 | **Good experiment**: "If we add a 15-minute deploy verification step after each release, then we will catch issues before users do, measured by reducing user-reported post-deploy bugs from 4/sprint to 1/sprint. Owner: Sarah. Duration: 2 sprints." |
| 90 | |
| 91 | **Bad experiment**: "We should deploy better." (No hypothesis, no metric, no owner, no timeline.) |
| 92 | |
| 93 | ### 7. Document and Share |
| 94 | |
| 95 | Record the retro output covering: sprint name/dates, participants, sprint goal status, previous action item outcomes, top themes with discussion summaries, and new experiments in full format from Step 6. |
| 96 | |
| 97 | Share with the team within 24 hours. Stale retro notes lose their impact. |
| 98 | |
| 99 | ## Quality checklist |
| 100 | |
| 101 | Before closing the retrospective, verify: |
| 102 | |
| 103 | - [ ] Previous action items were reviewed with explicit status updates |
| 104 | - [ ] Data gathering included silent individual writing before group discussion |
| 105 | - [ ] Themes are based on specific observations, not vague feelings |
| 106 | - [ ] No more than 3 experiments were committed to |
| 107 | - [ ] Each experiment has a single owner (not "the team"), a duration, and a measurable success criterion |
| 108 | - [ ] The retro document is written and ready to share within 24 hours |
| 109 | - [ ] The team agreed on when experiments will be reviewed (usually next retro) |
| 110 | |
| 111 | ## Common mistakes |
| 112 | |
| 113 | - **Skipping the review of previous action items.** If last sprint's commitments are never revisited, the team learns that retro outcomes do not matter. Always start with accountability. |
| 114 | - **Letting one voice dominate.** Silent writing before discussion ensures introverts contribute. If one person talks for 5 minutes straight, the facilitator must redirect. |
| 115 | - **Turning themes into blame.** "Deploys were slow because DevOps did not prioritize our tickets" is blame. "Our deploy pipeline averages 45 minutes — what can WE change?" is actionable. |
| 116 | - **Committing to too many actions.** Three experiments is the maximum. Teams that commit to seven items complete zero. Fewer commitments, higher follow-through. |
| 117 | - **Vague action items without owners.** "Improve documentation" will not happen. "Sarah will write a deployment runbook for the payments service by March 28" will. |
| 118 | - **Running the same format every sprint.** Repetition causes autopilot. Rotate formats to surface different kinds of observations. |
| 119 |