businesscommunication
Discovery GSEO
Find, evaluate, and prioritize keyword and content opportunities for both search engine ranking and AI platform citation — using live browser automation, SERP analysis, competitor intelligence, community listening, and GEO-aware scoring to produce a prioritized content plan that drives traffic and AI visibility.
SEOGEOGSEOkeyword-researchdiscoverycompetitor-analysisSERP-analysiscontent-planningPlaywrightbrowser-automation
Works well with agents
$ npx skills add The-AI-Directory-Company/(…) --skill discovery-gseodiscovery-gseo/
- analyze-serp-live.py8.9 KB
- build-topic-clusters.py7.6 KB
- classify-intent-live.py8.0 KB
- competitor-gap-analysis.py5.3 KB
- evaluate-keywords.py7.1 KB
- extract-paa.py6.4 KB
- find-quick-wins.py7.0 KB
- harvest-autocomplete.py7.9 KB
- prioritize-opportunities.py9.2 KB
- probe-ai-discovery.py5.5 KB
- scrape-community-keywords.py6.8 KB
- scrape-related-searches.py5.9 KB
- SKILL.md32.3 KB
SKILL.md
Markdown
| 1 | |
| 2 | # Discovery GSEO |
| 3 | |
| 4 | Discovery GSEO is the upstream layer that determines *what* to build before the execution skills take over. Without good discovery, all downstream SEO and GEO effort is wasted — targeting the wrong keywords, creating content for topics nobody searches for, missing opportunities competitors are already winning. |
| 5 | |
| 6 | This is skill 0 of 5 in the SGEO series: **discovery-gseo** > technical-sgeo > on-page-sgeo > content-sgeo > off-page-sgeo. |
| 7 | |
| 8 | The output of this skill is a **prioritized content plan** — keywords mapped to pages, scored on 4 dimensions (including GEO opportunity), ordered by impact. Everything else in the SGEO pipeline flows from this plan. |
| 9 | |
| 10 | ## Tool discovery |
| 11 | |
| 12 | Before gathering project details, confirm which tools are available. Ask the user directly — do not assume access to any external service. |
| 13 | |
| 14 | **Browser automation (primary for this skill):** |
| 15 | - [ ] Playwright MCP (live SERP interaction, Autocomplete harvesting, PAA expansion, screenshots) |
| 16 | - [ ] Chrome DevTools MCP (alternative browser control, performance analysis) |
| 17 | |
| 18 | **Free tools (no API key required):** |
| 19 | - [ ] WebFetch (fetch any public URL) |
| 20 | - [ ] WebSearch (search engine queries) |
| 21 | - [ ] Google PageSpeed Insights API (CWV data, no key needed for basic usage) |
| 22 | - [ ] Google Rich Results Test (structured data validation) |
| 23 | |
| 24 | **Paid tools (API key or MCP required):** |
| 25 | - [ ] Google Search Console API (requires OAuth — critical for Phase 10 quick wins) |
| 26 | - [ ] DataForSEO MCP (SERP data, keyword metrics, backlinks, AI visibility) |
| 27 | - [ ] Ahrefs API (keyword research, competitor gap analysis, backlink data) |
| 28 | - [ ] Semrush API (largest keyword database, keyword gap, magic tool) |
| 29 | |
| 30 | **The agent must:** |
| 31 | 1. Check for Playwright MCP or Chrome DevTools MCP FIRST — many scripts in this skill depend on browser automation for live SERP data |
| 32 | 2. If no browser automation available: all scripts fall back to WebSearch + WebFetch, but output quality is reduced (no live Autocomplete, no PAA expansion, no visual SERP analysis) |
| 33 | 3. Present the full checklist to the user and record what is available |
| 34 | 4. Pass the inventory to scripts as context |
| 35 | |
| 36 | ## Before you start |
| 37 | |
| 38 | Gather the following from the user. If anything is missing, ask before proceeding: |
| 39 | |
| 40 | 1. **What is your business/product?** (Product category, target market, value proposition) |
| 41 | 2. **What is your site URL?** (Existing site for quick win analysis, or "new site" if starting from scratch) |
| 42 | 3. **Who are your known competitors?** (3-5 domains — business competitors AND SEO competitors) |
| 43 | 4. **What is your current SEO status?** (Brand new / some content / established — determines whether Phase 10 is applicable) |
| 44 | 5. **Do you have Google Search Console access?** (Critical for Phase 10 quick wins) |
| 45 | 6. **What is the target country/region?** (For localized SERP analysis and volume data) |
| 46 | 7. **What is your budget for tools?** (None / small / moderate / significant — determines which paid tool paths are available) |
| 47 | 8. **How important is AI visibility?** (Determines weight of GEO scoring in prioritization — low / medium / high) |
| 48 | |
| 49 | If the user says "I just want to find keywords," push back: "Keywords without intent classification, competitor validation, and prioritization scoring produce a random list, not a strategy. Which phase do you want to start from?" |
| 50 | |
| 51 | ## Phase 1: Generate seed keywords |
| 52 | |
| 53 | > **Scripts:** Run `scripts/probe-ai-discovery.py --queries 'What are the best [category] tools?'` to validate seeds against AI visibility. |
| 54 | > **References:** See `references/seed-generation.md` for the 5-source mental model and GEO seed validation methodology. |
| 55 | |
| 56 | Seed keywords are your starting points. They are the obvious, broad terms that describe your product, service, or topic. You do not need tools for this — just structured thinking. |
| 57 | |
| 58 | **Ask yourself these 5 questions and write down every answer:** |
| 59 | |
| 60 | 1. **"What do I sell or offer?"** — Product category, service type, problem solved. If you sell invoicing software: "invoicing software", "invoice generator", "billing software." |
| 61 | 2. **"What problem does my product solve?"** — Pain points. "How to manage invoices", "track payments", "send payment reminders." |
| 62 | 3. **"How would a stranger describe what I do?"** — Plain language, no jargon. A CI/CD platform user might search "automate code deployment", not your product name. |
| 63 | 4. **"What alternatives or competitors exist?"** — "[Competitor] alternative", "[Competitor A] vs [Competitor B]", "[category] comparison." |
| 64 | 5. **"What technologies, methods, or concepts are relevant?"** — Integrations, workflows, adjacent technologies. |
| 65 | |
| 66 | **Target:** 15-30 seeds, 1-4 words each. Do not worry about volume or competition yet. |
| 67 | |
| 68 | **Example seed list (invoicing SaaS):** |
| 69 | ``` |
| 70 | invoicing software, invoice generator, online invoicing, billing software, |
| 71 | send invoices online, freelancer invoice tool, invoice template, recurring billing, |
| 72 | payment reminder, invoice tracking, accounting software, [CompetitorA] alternative, |
| 73 | [CompetitorB] vs [CompetitorC], small business invoicing, automated invoicing |
| 74 | ``` |
| 75 | |
| 76 | **GEO seed layer — the discovery step that separates this from traditional keyword research:** |
| 77 | |
| 78 | Probe AI platforms with broad questions about your space before relying solely on search data: |
| 79 | |
| 80 | - Ask ChatGPT: "What are the best tools for [category]?", "How do I [problem]?" |
| 81 | - Ask Perplexity: same queries — different citation sources reveal different market landscape |
| 82 | - Ask Gemini: same queries — yet another AI perspective |
| 83 | |
| 84 | Record which brands AI mentions, which topics it covers, and what language it uses. These are GEO-validated seeds — AI already discusses them, so content targeting these topics has citation potential from day one. |
| 85 | |
| 86 | Run `scripts/probe-ai-discovery.py --queries queries.txt` to automate this GEO validation. Keywords that appear in AI responses get a GEO-validated flag in your seed list. |
| 87 | |
| 88 | **Output:** Seed keyword list with GEO validation flags (15-30 seeds). |
| 89 | |
| 90 | ## Phase 2: Expand keyword universe |
| 91 | |
| 92 | > **Scripts:** Run `scripts/harvest-autocomplete.py --seeds 'seed1,seed2,seed3'` to harvest Google Autocomplete suggestions via live browser. Run `scripts/extract-paa.py --seeds 'seed1,seed2'` to expand People Also Ask boxes. |
| 93 | > **References:** See `references/keyword-expansion.md` for tool-by-tool guides and browser automation expansion techniques. |
| 94 | |
| 95 | Take each seed and expand it using keyword research tools. The goal is to go from 15-30 seeds to 200-1000+ raw keyword ideas. |
| 96 | |
| 97 | **Google Keyword Planner (free, volume ranges):** |
| 98 | 1. Google Ads > Tools > Keyword Planner > "Discover new keywords." |
| 99 | 2. Enter one seed at a time. Set target country. |
| 100 | 3. Download results as CSV. Repeat for each seed. |
| 101 | 4. Limitation: shows volume ranges (1K-10K), not exact numbers. Ad competition metric, not organic KD. |
| 102 | |
| 103 | **Paid tools (brief — see reference for full walkthrough):** |
| 104 | - Ahrefs Keywords Explorer: Matching terms, Related terms, Questions, Also rank for. |
| 105 | - Semrush Keyword Magic Tool: 20B+ keyword database, long-tail discovery, auto-grouping. |
| 106 | - Ubersuggest: keyword ideas, questions, related terms. Lifetime deal (~$120) is strong value. |
| 107 | |
| 108 | **Browser-automated expansion (primary for this skill):** |
| 109 | |
| 110 | Run `scripts/harvest-autocomplete.py --seeds 'invoicing software,billing software,invoice generator'` — this types each seed into Google with a-z letter suffixes and question prefixes (how/what/why/best), capturing every Autocomplete suggestion. This is data that WebSearch cannot replicate — real-time, localized, intent-rich suggestions. |
| 111 | |
| 112 | Run `scripts/extract-paa.py --seeds 'invoicing software,billing software'` — searches Google for each seed, then click-expands every People Also Ask question, capturing 20-50+ questions per seed with answer snippets and source URLs. |
| 113 | |
| 114 | **GEO expansion step:** Review your expanded list. Flag any keywords that match topics mentioned by AI platforms in Phase 1. These keywords have dual value — organic search traffic plus AI citation potential. |
| 115 | |
| 116 | **Output:** Raw spreadsheet with 200-1000+ keyword ideas, each with at least a volume estimate. |
| 117 | |
| 118 | ## Phase 3: Spy on competitors |
| 119 | |
| 120 | > **Scripts:** Run `scripts/competitor-gap-analysis.py --domain yourdomain.com --competitors comp1.com,comp2.com`. Run `scripts/probe-ai-discovery.py` with competitor-relevant queries for GEO gap analysis. |
| 121 | > **References:** See `references/competitor-intelligence.md` for gap analysis methodology and GEO competitor analysis. |
| 122 | |
| 123 | Competitor analysis is the single highest-ROI activity in the discovery process. Other websites have done the research for you. |
| 124 | |
| 125 | **Step 1: Identify SEO competitors.** |
| 126 | Search Google for 5-10 of your seed keywords. Note which domains appear repeatedly in the top 10. These are your SEO competitors — even if they sell different products. Pick 3-5, mixing direct product competitors and content competitors (blogs, review sites, aggregators). |
| 127 | |
| 128 | **Step 2: Keyword gap analysis.** |
| 129 | Find keywords competitors rank for that you do not. |
| 130 | - Ahrefs: Competitive Analysis > Content Gap. Enter your domain + 3-5 competitors. Export "Missing" keywords. |
| 131 | - Semrush: Keyword Gap tool. Compare up to 5 domains. "Missing" tab + "Weak" tab. |
| 132 | - Free approximation: `scripts/competitor-gap-analysis.py` uses WebSearch `site:competitor.com` to catalog their pages and compares against your sitemap. |
| 133 | |
| 134 | **Step 3: Analyze why competitors rank.** |
| 135 | For the most interesting gap keywords, visit the ranking pages. Ask: what type of page is it? How deep is the content? What angle do they take? What is weak or outdated? Your goal is to understand the format Google rewards, then create something better. |
| 136 | |
| 137 | **Step 4: Top pages analysis.** |
| 138 | Use Ahrefs Site Explorer > Top Pages or Semrush > Organic Research > Pages to find which competitor pages drive the most traffic. Export and add the most relevant keywords. |
| 139 | |
| 140 | **GEO competitor analysis:** |
| 141 | For your target queries, check which competitors get cited by AI platforms. Run `scripts/probe-ai-discovery.py --queries target-queries.txt --brand YourBrand`. Compare: who does AI cite? What do their cited pages look like — structure, data density, author attribution? Where are the gaps — queries where AI cites weak or outdated sources you can replace? |
| 142 | |
| 143 | **Output:** Gap keywords added to master spreadsheet + GEO citation gap analysis. |
| 144 | |
| 145 | ## Phase 4: Mine Google for free ideas |
| 146 | |
| 147 | > **Scripts:** Run `scripts/harvest-autocomplete.py`, `scripts/extract-paa.py --seeds 'seed1,seed2'`, and `scripts/scrape-related-searches.py --seeds 'seed1,seed2'`. |
| 148 | > **References:** See `references/browser-automation-guide.md` for Playwright technique details — the exact tool calls for Autocomplete extraction, PAA expansion, and Related Searches chaining. |
| 149 | |
| 150 | Google reveals what people search for through dynamic SERP features. Browser automation extracts this data systematically. |
| 151 | |
| 152 | **Technique 1: Autocomplete harvesting.** |
| 153 | Type seeds + a-z suffixes, question prefixes (how/what/why/best). Run `scripts/harvest-autocomplete.py` — this uses Playwright to interact with live Google Autocomplete, capturing real-time localized suggestions. |
| 154 | |
| 155 | **Technique 2: PAA expansion.** |
| 156 | Search each seed, click-expand all PAA boxes. Each click reveals 2-3 new questions. Run `scripts/extract-paa.py` to capture 20-50+ questions per seed with answer snippets and source URLs. Each question is a potential H2 heading or standalone article. |
| 157 | |
| 158 | **Technique 3: Related Searches chaining.** |
| 159 | Scroll to the bottom of Google SERPs, extract Related Searches, click each for 2nd-level expansion. Run `scripts/scrape-related-searches.py` for 2-level deep chaining. This uncovers lateral keyword ideas that tools miss. |
| 160 | |
| 161 | **Technique 4: AnswerThePublic.** |
| 162 | 3 free searches per day at answerthepublic.com. Enter your broadest seed keywords. Export the questions, prepositions, and comparisons data. |
| 163 | |
| 164 | **Technique 5: GSC mining (existing sites only).** |
| 165 | GSC > Performance > Queries tab. Sort by impressions descending. Find keywords with high impressions but low CTR (title/description needs improvement) and keywords at positions 8-20 (striking distance to page 1). |
| 166 | |
| 167 | **Output:** Question-based and long-tail keywords added to master spreadsheet. |
| 168 | |
| 169 | ## Phase 5: Listen to communities |
| 170 | |
| 171 | > **Scripts:** Run `scripts/scrape-community-keywords.py --topic 'your category' --platforms reddit,hn` to extract keyword candidates from community discussions. |
| 172 | > **References:** See `references/community-listening.md` for platform-by-platform guide and language pattern extraction methodology. |
| 173 | |
| 174 | Keyword tools rely on historical search data. Communities show you what people are asking right now, in their own language, before tools catch up. |
| 175 | |
| 176 | **Where to look:** |
| 177 | - **Reddit:** Search for your category in relevant subreddits. Read titles and comments of popular posts. Watch for repeated questions, complaints about tools, comparison language. |
| 178 | - **Hacker News:** Search hn.algolia.com for your category. Read "Ask HN" threads for tool recommendations. |
| 179 | - **Twitter/X:** Search for your category, competitor names, problem descriptions. |
| 180 | - **Quora:** Questions mirror Google searches directly. Search your topic. |
| 181 | - **Product Hunt:** Similar products — read comments for needs, comparisons, frustrations. |
| 182 | - **Industry forums/Discord/Slack:** Niche communities where your customers congregate. |
| 183 | |
| 184 | **What to capture — language patterns, not keyword data:** |
| 185 | Turn observations into keyword candidates: |
| 186 | - "I need a way to automatically send payment reminders" → "automatic payment reminder software" |
| 187 | - "Is there a Stripe alternative that handles invoicing too?" → "Stripe alternative with invoicing" |
| 188 | - "How do I set up recurring billing without a developer?" → "recurring billing no code" |
| 189 | |
| 190 | Community-sourced keywords often show zero volume in tools because the volume is too low to register. Do not ignore them — low volume with perfect intent can be more valuable than high volume with vague intent. |
| 191 | |
| 192 | **GEO dimension:** Community language maps directly to AI query language. The questions people ask on Reddit are the same questions they ask ChatGPT. Pain points described in forums are the same problems people describe to AI assistants. Community-sourced keywords have dual value: organic search traffic AND AI citation opportunity. |
| 193 | |
| 194 | **Output:** Community-sourced keyword candidates added to master spreadsheet. |
| 195 | |
| 196 | ## Phase 6: Evaluate and filter keywords |
| 197 | |
| 198 | > **Scripts:** Run `scripts/evaluate-keywords.py --keywords raw-keywords.txt --max-kd 50` to enrich your keyword list with volume/KD estimates and GEO scores. Run `scripts/probe-ai-discovery.py` to assess AI citation potential for your keywords. |
| 199 | > **References:** See `references/evaluation-and-scoring.md` for volume/KD interpretation scales, GEO score methodology, and filtering rules. |
| 200 | |
| 201 | You now have a large, messy spreadsheet with hundreds or thousands of keywords from all five sources. Time to evaluate and filter. |
| 202 | |
| 203 | **The 3 SEO metrics for every keyword:** |
| 204 | |
| 205 | - **Volume:** Monthly searches. 10,000+ = high/competitive. 1,000-10,000 = medium/sweet spot for mid-authority. 100-1,000 = low-medium/sweet spot for new sites. <100 = still valuable with strong intent. 0 = tools cannot measure but real people search it. |
| 206 | - **Keyword Difficulty (KD):** 0-100 scale. 0-20 = easy (new sites rank in weeks). 21-40 = medium (needs content + some backlinks). 41-60 = hard (needs strong authority). 61+ = very hard (dominated by major brands). Always verify by manually checking the SERP — KD is an estimate. |
| 207 | - **Search Intent:** Covered in depth in Phase 7. |
| 208 | |
| 209 | **GEO score — the 4th metric unique to this skill:** |
| 210 | For each keyword, assess AI citation potential on a 1-3 scale: |
| 211 | - **3 = High opportunity:** AI answers this query and cites weak or few sources. You can create content that replaces those citations. This is the discovery equivalent of finding an easy-KD keyword. |
| 212 | - **2 = Moderate opportunity:** AI answers and cites strong, authoritative sources. Hard to displace existing citations but still worth targeting for AI visibility. |
| 213 | - **1 = Low opportunity:** AI does not answer this query or defers entirely to search. SEO-only value — still worth targeting for organic traffic. |
| 214 | |
| 215 | Run `scripts/probe-ai-discovery.py --queries keywords-to-check.txt` to assess GEO scores. This is the backbone of GEO-aware discovery. |
| 216 | |
| 217 | **Filtering rules:** |
| 218 | - Remove irrelevant keywords — if the searcher would never become your customer, cut it |
| 219 | - Remove branded competitor navigational terms (keep "alternative" and "vs" variants) |
| 220 | - Remove keywords above your KD threshold (KD >40-50 for new sites, adjust based on authority) |
| 221 | - Remove duplicates and near-duplicates (keep higher-volume version — Google understands synonyms) |
| 222 | |
| 223 | **Spreadsheet structure after filtering:** |
| 224 | |
| 225 | ``` |
| 226 | | Keyword | Volume | KD | Intent | GEO Score | Source | Priority | Target URL | Status | |
| 227 | ``` |
| 228 | |
| 229 | **Output:** Filtered, enriched keyword spreadsheet with volume, KD, and GEO scores. |
| 230 | |
| 231 | ## Phase 7: Classify search intent |
| 232 | |
| 233 | > **Scripts:** Run `scripts/classify-intent-live.py --keywords keywords.txt` to search each keyword via Playwright, analyze SERP composition, classify intent, and flag AI-answerable queries. |
| 234 | > **References:** See `references/evaluation-and-scoring.md` for the 4 intent types with signals, content types, and business value. |
| 235 | |
| 236 | Search intent is the most important and most overlooked step. If you create the wrong content type for a keyword, you will not rank — regardless of content quality. |
| 237 | |
| 238 | **The 4 intent types:** |
| 239 | |
| 240 | - **Informational — "I want to learn."** Signals: "how to", "what is", "guide", "tutorial." Content: blog posts, guides, tutorials. Business value: top-of-funnel, builds authority. |
| 241 | - **Navigational — "I want a specific site."** Signals: brand names, "login", "pricing." Content: your own brand pages. Not worth targeting for other brands. |
| 242 | - **Commercial — "I'm comparing options."** Signals: "best", "vs", "review", "alternative." Content: comparison pages, reviews, "best of" lists. Business value: high — close to purchase decision. |
| 243 | - **Transactional — "I'm ready to act."** Signals: "buy", "free trial", "download", "pricing." Content: product pages, pricing pages, free tools. Business value: highest — direct conversion. |
| 244 | |
| 245 | **The SERP test — how to determine intent with certainty:** |
| 246 | 1. Open an incognito browser window. |
| 247 | 2. Search for the keyword. |
| 248 | 3. Look at the top 5-10 results. |
| 249 | 4. Match the format: blog posts = informational. Product pages = transactional. Comparison articles = commercial. |
| 250 | |
| 251 | Google has already figured out the intent. If every result for "invoice generator" is a free online tool, the intent is transactional. A blog post targeting that keyword will never rank. |
| 252 | |
| 253 | Run `scripts/classify-intent-live.py` to automate this — it searches each keyword via Playwright, analyzes the SERP composition (content types, SERP features), and classifies intent programmatically. |
| 254 | |
| 255 | **AI-answerable flag (GEO):** |
| 256 | Does Google show an AI Overview for this keyword? Does Perplexity give a direct answer? If yes, GEO optimization is critical for this keyword — optimizing for AI citation (via content-sgeo and on-page-sgeo) is not optional, it is required to capture visibility. |
| 257 | |
| 258 | **Output:** Every keyword in your spreadsheet now has an intent classification and AI-answerable flag. |
| 259 | |
| 260 | ## Phase 8: Group into topic clusters |
| 261 | |
| 262 | > **Scripts:** Run `scripts/build-topic-clusters.py --keywords evaluated-keywords.json` to group keywords by semantic similarity, identify pillars, and generate a cluster map. |
| 263 | > **References:** See `references/evaluation-and-scoring.md` for cluster architecture and the "own page" test. |
| 264 | |
| 265 | Individual keywords are not a strategy. Group them into clusters that build topical authority. |
| 266 | |
| 267 | **Cluster structure:** |
| 268 | - **Pillar page:** Comprehensive, long-form page covering a broad topic. Highest volume, broadest scope per cluster. Example: "The Complete Guide to Online Invoicing." |
| 269 | - **Supporting pages:** Shorter, specific pages covering subtopics in depth. Link back to the pillar. Examples: "How to Write a Professional Invoice", "Invoice Payment Terms Explained." |
| 270 | - **Internal links:** Every support links to the pillar. The pillar links to every support. This signals to Google that you are an authority on the topic. |
| 271 | |
| 272 | **How to group:** |
| 273 | 1. Identify natural themes in your keyword list. |
| 274 | 2. Group keywords sharing the same root topic. |
| 275 | 3. Per group: identify the head keyword (highest volume, broadest) — this is the pillar target. |
| 276 | 4. Remaining keywords become supporting pages or H2 sections within the pillar. |
| 277 | |
| 278 | **When does a keyword deserve its own page?** |
| 279 | - It has a distinct search intent from others in the group. |
| 280 | - The topic is deep enough for 800+ words of dedicated content. |
| 281 | - The SERP shows standalone pages ranking (not subsections of larger pages). |
| 282 | Otherwise, target it as an H2 section within a larger page. |
| 283 | |
| 284 | **Example cluster (invoicing SaaS, with GEO scores):** |
| 285 | |
| 286 | **Pillar: "Online Invoicing" (volume: 5,400, KD: 45, GEO: 2)** |
| 287 | - "How to Create a Professional Invoice" (vol: 6,600, KD: 35, GEO: 3) — guide [create first: highest GEO] |
| 288 | - "Invoice Payment Terms: Net 30, Net 60" (vol: 1,900, KD: 12, GEO: 2) — blog post |
| 289 | - "Recurring Invoice Software Comparison" (vol: 800, KD: 22, GEO: 3) — comparison [create second: high GEO] |
| 290 | - "Invoice Template Free Download" (vol: 3,200, KD: 28, GEO: 1) — free tool |
| 291 | - "FreshBooks vs Wave for Freelancers" (vol: 1,100, KD: 20, GEO: 2) — comparison |
| 292 | |
| 293 | **GEO citation mapping:** Within each cluster, rank supporting topics by GEO score. Highest GEO-score topics are created first — they get cited by AI sooner, building AI visibility for the entire cluster. |
| 294 | |
| 295 | **Output:** Keywords grouped into clusters with pillar/support assignments and GEO priority ordering. |
| 296 | |
| 297 | ## Phase 9: Prioritize and build content plan |
| 298 | |
| 299 | > **Scripts:** Run `scripts/prioritize-opportunities.py --clusters clusters.json` to apply 4-dimension scoring and generate a tiered content plan with publishing schedule. |
| 300 | > **References:** See `references/evaluation-and-scoring.md` for the complete 4-dimension framework and tier definitions. |
| 301 | |
| 302 | You have clusters, but you cannot create everything at once. Prioritize using 4 dimensions. |
| 303 | |
| 304 | **4-dimension scoring framework (max 12 points):** |
| 305 | |
| 306 | **Dimension 1 — Business Value (1-3):** |
| 307 | - 3 = Directly relates to your product. Searcher could become a paying customer. "[category] software", "[competitor] alternative." |
| 308 | - 2 = Indirectly related. Builds awareness with target audience. "How to [solve problem]." |
| 309 | - 1 = Tangentially related. Drives traffic but weak revenue connection. |
| 310 | |
| 311 | **Dimension 2 — Ranking Feasibility (1-3):** |
| 312 | - 3 = KD under 20 AND you have relevant existing content or expertise. |
| 313 | - 2 = KD 20-40 OR requires building backlinks. |
| 314 | - 1 = KD 40+ OR dominated by major brands. |
| 315 | |
| 316 | **Dimension 3 — Traffic Potential (1-3):** |
| 317 | - 3 = Volume over 1,000/month. |
| 318 | - 2 = Volume 200-1,000/month. |
| 319 | - 1 = Volume under 200/month. |
| 320 | |
| 321 | **Dimension 4 — GEO Opportunity (1-3):** |
| 322 | - 3 = AI answers this query and cites weak/few sources you can replace. Fastest GEO win. |
| 323 | - 2 = AI answers but cites strong sources. Worth targeting for AI presence, harder to displace. |
| 324 | - 1 = AI does not answer this query. SEO-only value. |
| 325 | |
| 326 | **Total = Business Value + Feasibility + Traffic + GEO Opportunity (max 12)** |
| 327 | |
| 328 | **Tiers:** |
| 329 | - **10-12 = Golden.** Do these first. High business value, achievable difficulty, meaningful traffic, strong GEO opportunity. |
| 330 | - **7-9 = Strong.** Do these second. Solid opportunities requiring more effort. |
| 331 | - **4-6 = Moderate.** Do these eventually. Lower priority, still worth creating over time. |
| 332 | - **1-3 = Skip.** Not worth the effort right now. |
| 333 | |
| 334 | **Example scoring (form builder SaaS):** |
| 335 | |
| 336 | | Keyword | BV | RF | TP | GEO | Total | Tier | |
| 337 | |---------|----|----|----|----|-------|------| |
| 338 | | Typeform alternative | 3 | 2 | 3 | 3 | 11 | Golden | |
| 339 | | NPS survey template | 3 | 3 | 2 | 2 | 10 | Golden | |
| 340 | | form builder with payment | 3 | 3 | 1 | 3 | 10 | Golden | |
| 341 | | how to create online form | 2 | 2 | 3 | 3 | 10 | Golden | |
| 342 | | best online form builder | 3 | 1 | 2 | 2 | 8 | Strong | |
| 343 | | conditional logic form | 2 | 3 | 1 | 1 | 7 | Strong | |
| 344 | |
| 345 | **Content calendar template:** |
| 346 | |
| 347 | ``` |
| 348 | | Priority | Keyword | Volume | KD | Intent | GEO Score | Page Type | Target URL | Cluster | Publish By | |
| 349 | ``` |
| 350 | |
| 351 | **Monthly cadence:** |
| 352 | - Month 1: 3-5 pages targeting golden-tier keywords. Pillar pages first, then highest-GEO supports. |
| 353 | - Month 2: 3-5 supporting pages. Internal linking between published pages. |
| 354 | - Month 3: Optimize Month 1 content with GSC data. Continue publishing supports. |
| 355 | - Ongoing: 2-4 new pages/month. Update existing content quarterly. Re-run discovery quarterly. |
| 356 | |
| 357 | **Output:** Prioritized content plan with 4-dimension scores, tiers, page types, target URLs, clusters, and publishing schedule. |
| 358 | |
| 359 | ## Phase 10: Find quick wins in existing data |
| 360 | |
| 361 | > **Scripts:** Run `scripts/find-quick-wins.py --domain yourdomain.com` to identify striking-distance keywords, low-CTR pages, variant keywords, and GEO citation gaps. |
| 362 | > **References:** See `references/evaluation-and-scoring.md` for quick win identification methodology. |
| 363 | |
| 364 | This phase applies only to sites with existing traffic. If your site is brand new, skip this and return after 2-3 months of content publishing and GSC data collection. |
| 365 | |
| 366 | Quick wins are the fastest path to more traffic — they optimize what is already partially working. |
| 367 | |
| 368 | **Quick win type 1 — Striking distance (positions 8-20):** |
| 369 | GSC > Performance > Queries. Filter positions 8-20. Sort by impressions descending. These keywords are on the edge of page 1. You already rank — Google considers your content relevant. A small improvement pushes you onto page 1, where traffic increases dramatically. |
| 370 | Action: improve content depth for the ranking page, add internal links from other pages, optimize the title tag. |
| 371 | |
| 372 | **Quick win type 2 — High impressions, low CTR:** |
| 373 | Filter for CTR below 2-3%, sort by impressions descending. Google shows your page, but people do not click. Your title or meta description is not compelling. |
| 374 | Action: rewrite title to be specific with a clear benefit. Rewrite meta description with active voice and value proposition. Check if a SERP feature pushes your result down. |
| 375 | |
| 376 | **Quick win type 3 — Variant keywords:** |
| 377 | Check which keywords your pages rank for that you did not deliberately target. Sometimes a page ranks for a variant keyword with higher volume or better intent than the original target. |
| 378 | Action: shift primary target to the higher-value variant. At minimum, add a section covering the variant and include it in the title or an H2. |
| 379 | |
| 380 | **GEO quick wins — keywords where you rank but AI does not cite you:** |
| 381 | For your top-ranking keywords, run `scripts/probe-ai-discovery.py` to check if AI platforms cite your pages. If you rank on page 1 but are not cited by AI, you have a GEO gap. Apply on-page-sgeo and content-sgeo to optimize content structure for AI citation — direct-answer formatting, data density, author attribution, structured data. |
| 382 | |
| 383 | **Output:** Quick win list with specific actions per keyword, including GEO citation gaps. |
| 384 | |
| 385 | ## Available scripts |
| 386 | |
| 387 | Run these scripts to automate discovery tasks. Each outputs JSON. Scripts marked with [browser] use Playwright MCP for live browser data. If Playwright is unavailable, they fall back to WebSearch/WebFetch with reduced output quality. All browser scripts accept `--no-browser` to force the fallback path. |
| 388 | |
| 389 | | Script | What it does | Run it when | |
| 390 | |--------|-------------|-------------| |
| 391 | | `harvest-autocomplete.py` [browser] | Types seeds + a-z in Google, captures all Autocomplete suggestions | Phase 2: expanding keyword universe | |
| 392 | | `extract-paa.py` [browser] | Searches Google, expands PAA boxes, captures 20-50+ questions per seed | Phase 2, 4: expanding and mining keywords | |
| 393 | | `scrape-related-searches.py` [browser] | Extracts Related Searches, chains 2 levels deep | Phase 4: mining Google for free ideas | |
| 394 | | `analyze-serp-live.py` [browser] | Flagship: full SERP analysis with organic results, features, AI Overview, screenshot | Phase 7: SERP-based intent classification | |
| 395 | | `competitor-gap-analysis.py` | Compares user domain vs competitors for keyword gaps | Phase 3: competitor analysis | |
| 396 | | `scrape-community-keywords.py` [browser] | Searches Reddit/HN/forums, extracts question titles and pain point language | Phase 5: community listening | |
| 397 | | `probe-ai-discovery.py` | GEO backbone: tests queries on AI platforms, records citations, finds gaps | Phase 1, 3, 6, 10: GEO validation throughout | |
| 398 | | `evaluate-keywords.py` | Enriches raw keywords with volume/KD/GEO scores, filters noise | Phase 6: evaluation and filtering | |
| 399 | | `classify-intent-live.py` [browser] | Live SERP analysis: determines intent + AI-answerable flag per keyword | Phase 7: intent classification | |
| 400 | | `build-topic-clusters.py` | Groups keywords by similarity, identifies pillars, generates cluster map | Phase 8: topic clustering | |
| 401 | | `prioritize-opportunities.py` | 4-dimension scoring (max 12), outputs tiered content plan with schedule | Phase 9: prioritization | |
| 402 | | `find-quick-wins.py` | GSC striking distance + high-impression/low-CTR + GEO citation gaps | Phase 10: quick wins | |
| 403 | |
| 404 | ## Quality checklist |
| 405 | |
| 406 | Before delivering the content plan, verify: |
| 407 | |
| 408 | - [ ] All 10 phases completed (or consciously skipped with documented reason) |
| 409 | - [ ] Seed keywords validated against both search data AND AI platform probing |
| 410 | - [ ] Competitor gap analysis includes both SEO gaps and GEO citation gaps |
| 411 | - [ ] Every keyword has volume estimate, KD estimate, intent classification, and GEO score |
| 412 | - [ ] Intent classification verified via live SERP test (not just heuristic guessing) |
| 413 | - [ ] Keywords grouped into topic clusters with pillar/support assignments |
| 414 | - [ ] Content plan uses 4-dimension scoring (max 12) with GEO as a real dimension, not a token checkbox |
| 415 | - [ ] Quick wins identified (if existing site with GSC data) |
| 416 | - [ ] Output is a prioritized content plan, not a raw keyword list |
| 417 | - [ ] Plan feeds clearly into content-sgeo (what to write) and on-page-sgeo (how to optimize) |
| 418 | |
| 419 | ## Common mistakes to avoid |
| 420 | |
| 421 | 1. **Targeting keywords that are too broad.** "Software" is not a keyword — it is a category. "Invoice software for freelancers" is a keyword. |
| 422 | 2. **Ignoring search intent.** If every SERP result is a blog post and you create a product page, you will not rank. Match the content type Google rewards. |
| 423 | 3. **Obsessing over search volume.** A keyword with 50 monthly searches and perfect buyer intent can be worth more than a keyword with 10,000 searches and vague informational intent. |
| 424 | 4. **Never checking the actual SERP.** Tools provide data. The SERP provides truth. Always manually search your most important target keywords. |
| 425 | 5. **Targeting the same keyword with multiple pages (cannibalization).** Each keyword or keyword cluster maps to exactly one page. If two pages compete, Google may rank neither. |
| 426 | 6. **Skipping competitor analysis.** This is the single highest-ROI activity. Other sites have validated which keywords drive traffic. Use their work. |
| 427 | 7. **Creating content without a target keyword.** Every page needs a clear primary keyword. If you cannot identify one, the page lacks strategic purpose. |
| 428 | 8. **Giving up too early.** SEO results take 3-6 months. Publish in January, check rankings in July. Not February. |
| 429 | 9. **Only looking at your own data.** GSC shows keywords you already rank for. The biggest opportunities are keywords you have zero presence for — only competitor analysis and expansion tools reveal those. |
| 430 | 10. **Doing keyword research once and never again.** Search behavior evolves. Competitors publish new content. New questions emerge. Re-run discovery quarterly. |
| 431 | 11. **Ignoring GEO opportunity in prioritization.** A keyword where AI cites weak sources is a faster win than one where AI cites authoritative sources you cannot displace. GEO opportunity is a real competitive dimension, not a nice-to-have. |
| 432 | 12. **Not validating seeds against AI platforms.** What AI recommends in your space is a leading indicator of search trends. If AI platforms discuss a topic, searchers are asking about it too — often before traditional keyword tools register the volume. |
| 433 |