dataproduct-management

Experiment Design

Design rigorous A/B tests and product experiments — defining hypotheses, choosing metrics, calculating sample sizes, setting stopping rules, and writing analysis plans that avoid common statistical pitfalls.

a-b-testingexperimentshypothesissample-sizestatisticsproduct-analytics

Works well with agents

AI Engineer AgentData Scientist AgentGrowth Engineer AgentPricing Strategist AgentProduct Analyst AgentProduct Operations AgentPrompt Engineer Agent

Works well with skills

Metrics FrameworkML Model EvaluationPRD WritingPricing AnalysisPrompt Engineering Guide
$ npx skills add The-AI-Directory-Company/(…) --skill experiment-design
experiment-design/
    • onboarding-flow-test.md3.3 KB
  • SKILL.md8.7 KB
experiment-design/examples/onboarding-flow-test.md
onboarding-flow-test.md
Markdown
1# Experiment Design: Guided Onboarding Flow
2 
3## Hypothesis
4 
5If we replace the current self-serve onboarding (5-step form) with an interactive guided tour that demonstrates core features in context, then the 14-day activation rate will increase by at least 5 percentage points, because users who experience value during onboarding are more likely to complete setup and return within the first two weeks.
6 
7## Metrics
8 
9| Role | Metric | Baseline | MDE | Direction |
10|------|--------|----------|-----|-----------|
11| Primary | 14-day activation rate (user completes 2+ core actions within 14 days of signup) | 32% | +5pp | Increase |
12| Guardrail | Onboarding completion rate | 68% | -5pp | Must not decrease |
13| Guardrail | Support ticket rate (first 14 days) | 3.1% | +1pp | Must not increase |
14| Guardrail | Time-to-complete onboarding | 4.2 min | +2 min | Must not increase |
15 
16## Sample Size Calculation
17 
18```
19Baseline rate: 32%
20Minimum detectable effect: +5pp (absolute) -> 37%
21Significance level (alpha): 0.05 (two-sided)
22Power (1 - beta): 0.80
23Sample size per variant: ~1,530 users
24Total sample: ~3,060 users
25Tool: Evan Miller sample size calculator (two-proportion z-test)
26```
27 
28## Randomization
29 
30- **Unit**: User-level (new signups only)
31- **Method**: Hash-based assignment on user ID (deterministic)
32- **Stickiness**: User ID hash persists across sessions; variant stored in `experiments.assignments` table
33- **Exclusions**: Users on Enterprise plans (they receive white-glove onboarding)
34 
35## Runtime Estimation
36 
37```
38Daily eligible signups: ~180 users
39Sample needed: 3,060 users
40Estimated runtime: 17 days (to reach sample size)
41Recommended minimum: 21 days (3 full weeks for day-of-week coverage)
42Maximum runtime: 35 days (cap to avoid novelty decay)
43```
44 
45## Stopping Rules
46 
47- **No peeking**: Fixed-horizon design. Do not evaluate primary metric significance until day 21.
48- **Stop for harm**: If support ticket rate exceeds 5% (rolling 3-day average) in the treatment group, halt the experiment and revert all users to control.
49- **No early stopping for success**: Even if results look significant at day 10, run to the planned horizon.
50 
51## Holdout Group
52 
53- **Size**: 5% of eligible traffic withheld from the winning variant after rollout
54- **Duration**: 6 weeks post-rollout
55- **Purpose**: Confirm the activation lift is durable and not a novelty effect
56 
57## Analysis Plan
58 
591. **Primary analysis**: Two-proportion z-test on 14-day activation rate. Report point estimate, 95% CI, and p-value.
602. **Guardrail checks**: One-sided tests confirming no degradation beyond thresholds.
613. **Pre-registered segments**:
62 - Free vs. Pro plan signups
63 - Mobile vs. desktop first session
644. **SRM check**: Chi-square test on variant assignment ratio. Threshold: p < 0.001 indicates pipeline issue — invalidate results.
655. **Post-hoc** (labeled exploratory): Funnel analysis on which guided tour steps have the highest drop-off.
66 
67## Decision Framework
68 
69| Outcome | Action |
70|---------|--------|
71| Primary wins, guardrails pass | Ship to 100%, maintain holdout |
72| Primary wins, guardrail fails | Investigate the degraded guardrail, iterate on treatment |
73| Primary neutral | Kill the experiment, extract learnings from funnel analysis |
74| Primary loses | Kill the experiment, post-mortem on hypothesis |
75 
AgentsSkillsCompaniesJobsForumBlogFAQAbout

©2026 ai-directory.company

·Privacy·Terms·Cookies·