leadershipbusiness
Hiring Rubric
Build structured hiring rubrics — defining evaluation dimensions, behavioral interview questions, scoring criteria, and calibration processes that maximize signal and minimize bias.
hiringinterviewsrubricsrecruitingevaluation
Works well with agents
Works well with skills
$ npx skills add The-AI-Directory-Company/(…) --skill hiring-rubrichiring-rubric/
SKILL.md
Markdown
| 1 | |
| 2 | # Hiring Rubric |
| 3 | |
| 4 | ## Before you start |
| 5 | |
| 6 | Gather the following from the user: |
| 7 | |
| 8 | 1. **What role are you hiring for?** (Title, level, team) |
| 9 | 2. **What does success look like at 6 months?** (3-5 concrete outcomes the hire should achieve) |
| 10 | 3. **What are the must-have vs nice-to-have skills?** (Technical and non-technical) |
| 11 | 4. **How many interview rounds?** (Phone screen, technical, system design, behavioral, culture) |
| 12 | 5. **Who is on the interview panel?** (Names and which dimensions they'll evaluate) |
| 13 | |
| 14 | If the user says "we need a strong engineer," push back: "Strong at what? Backend systems? Cross-team collaboration? Debugging production issues? Define 3-4 specific capabilities that matter most for this role." |
| 15 | |
| 16 | ## Hiring rubric template |
| 17 | |
| 18 | ### 1. Role Profile |
| 19 | |
| 20 | Write a concise profile that anchors the rubric to actual job needs, not a generic job description. Include: role title, team, level, reporting line, and 2-3 concrete 6-month goals the hire should achieve. |
| 21 | |
| 22 | ### 2. Evaluation Dimensions |
| 23 | |
| 24 | Define 4-6 dimensions. Each dimension must be independent — avoid overlap. Assign a weight reflecting its importance to the role. |
| 25 | |
| 26 | ``` |
| 27 | | Dimension | Weight | Assessed In | |
| 28 | |------------------------|--------|----------------------| |
| 29 | | Technical depth | 30% | Technical interview | |
| 30 | | System design | 25% | Design interview | |
| 31 | | Problem-solving | 20% | Technical interview | |
| 32 | | Communication | 15% | All rounds | |
| 33 | | Collaboration | 10% | Behavioral interview | |
| 34 | ``` |
| 35 | |
| 36 | ### 3. Scoring Criteria |
| 37 | |
| 38 | For each dimension, define what a 1 through 4 looks like. Avoid vague language — describe observable behaviors. |
| 39 | |
| 40 | ``` |
| 41 | Dimension: Technical Depth |
| 42 | |
| 43 | 4 - Strong Hire: Solves the problem correctly with clean, production-quality |
| 44 | code. Identifies edge cases proactively. Discusses tradeoffs |
| 45 | of their approach without prompting. |
| 46 | |
| 47 | 3 - Hire: Solves the problem with minor issues. Handles most edge |
| 48 | cases when prompted. Can articulate why they chose their |
| 49 | approach. |
| 50 | |
| 51 | 2 - Weak: Reaches a partial solution with significant guidance. |
| 52 | Misses important edge cases. Struggles to compare |
| 53 | alternative approaches. |
| 54 | |
| 55 | 1 - No Hire: Cannot make meaningful progress on the problem. Shows |
| 56 | gaps in fundamentals expected at this level. |
| 57 | ``` |
| 58 | |
| 59 | Write a scoring rubric like this for every dimension. Use concrete behaviors, not personality traits. |
| 60 | |
| 61 | ### 4. Interview Questions |
| 62 | |
| 63 | For each dimension, provide 2-3 questions with follow-ups. Behavioral questions must use the "Tell me about a time..." format to elicit past behavior, not hypotheticals. |
| 64 | |
| 65 | For technical dimensions, use role-specific coding or debugging problems with follow-ups like "How would you test this in production?" For behavioral dimensions, use "Tell me about a time..." questions that elicit past behavior with follow-ups exploring outcomes and lessons learned. |
| 66 | |
| 67 | ### 5. Scorecard |
| 68 | |
| 69 | Create a standardized scorecard every interviewer fills out within 24 hours of the interview. |
| 70 | |
| 71 | ``` |
| 72 | Candidate: _______________ Interviewer: _______________ |
| 73 | Role: _______________ Date: _______________ |
| 74 | Round: _______________ |
| 75 | |
| 76 | | Dimension | Score (1-4) | Evidence (required) | |
| 77 | |------------------------|-------------|-------------------------------| |
| 78 | | Technical depth | | | |
| 79 | | System design | | | |
| 80 | | Problem-solving | | | |
| 81 | | Communication | | | |
| 82 | | Collaboration | | | |
| 83 | |
| 84 | Overall recommendation: [ ] Strong Hire [ ] Hire [ ] Weak [ ] No Hire |
| 85 | |
| 86 | Key strengths: |
| 87 | Key concerns: |
| 88 | ``` |
| 89 | |
| 90 | The "Evidence" column is mandatory. A score without a specific observation is not valid. |
| 91 | |
| 92 | ### 6. Calibration Process |
| 93 | |
| 94 | Before interviews begin, have all interviewers score the same mock interview independently, then compare. Align on what a "3" looks like for each dimension with a concrete example. During the process, interviewers submit scorecards before the debrief — the most junior interviewer presents first to prevent anchoring. |
| 95 | |
| 96 | ## Quality checklist |
| 97 | |
| 98 | Before using the rubric, verify: |
| 99 | |
| 100 | - [ ] Every dimension maps to a real job requirement, not a generic trait |
| 101 | - [ ] Scoring criteria describe observable behaviors at each level |
| 102 | - [ ] No two dimensions overlap significantly (test: could one score high on dimension A and low on dimension B?) |
| 103 | - [ ] Behavioral questions ask about past experiences, not hypothetical scenarios |
| 104 | - [ ] The scorecard requires written evidence for every score |
| 105 | - [ ] Weights total 100% and reflect actual role priorities |
| 106 | - [ ] The calibration process is documented and scheduled before interviews begin |
| 107 | - [ ] At least one dimension assesses collaboration or communication |
| 108 | |
| 109 | ## Common mistakes to avoid |
| 110 | |
| 111 | - **Generic dimensions.** "Technical skills" is too broad. "Ability to design fault-tolerant distributed systems" is specific enough to evaluate. Tie every dimension to what the person will actually do in the role. |
| 112 | - **Hypothetical interview questions.** "What would you do if..." invites rehearsed answers. "Tell me about a time when..." surfaces real behavior. Always prefer behavioral questions for non-technical dimensions. |
| 113 | - **Missing the evidence requirement.** Without mandatory evidence, scorecards become gut-feel ratings. Require interviewers to write the specific moment or statement that justified their score. |
| 114 | - **Anchoring in debriefs.** If the hiring manager shares their opinion first, everyone adjusts toward it. Always have the most junior interviewer present first, and submit scores before the meeting. |
| 115 | - **Overweighting technical skills.** A brilliant engineer who can't communicate or collaborate will slow the team down. Ensure at least 20-25% of the weight covers non-technical dimensions. |
| 116 |