businesscommunication

Discovery GSEO

Find, evaluate, and prioritize keyword and content opportunities for both search engine ranking and AI platform citation — using live browser automation, SERP analysis, competitor intelligence, community listening, and GEO-aware scoring to produce a prioritized content plan that drives traffic and AI visibility.

SEOGEOGSEOkeyword-researchdiscoverycompetitor-analysisSERP-analysiscontent-planningPlaywrightbrowser-automation

Works well with agents

SEO Specialist Agent Marketing Strategist Agent Content Strategist Agent Growth Engineer Agent Product Marketing Manager Agent

Works well with skills

Technical SGEO Setup On-Page SGEO Optimization Content SGEO Strategy Off-Page SGEO Authority Technical SEO Audit Content Calendar Go-to-Market Plan

$ npx skills add The-AI-Directory-Company/(…) --skill discovery-gseo

discovery-gseo/

SKILL.md32.3 KB

SKILL.md

Markdown

1
2	# Discovery GSEO
3
4	Discovery GSEO is the upstream layer that determines what to build before the execution skills take over. Without good discovery, all downstream SEO and GEO effort is wasted — targeting the wrong keywords, creating content for topics nobody searches for, missing opportunities competitors are already winning.
5
6	This is skill 0 of 5 in the SGEO series: discovery-gseo > technical-sgeo > on-page-sgeo > content-sgeo > off-page-sgeo.
7
8	The output of this skill is a prioritized content plan — keywords mapped to pages, scored on 4 dimensions (including GEO opportunity), ordered by impact. Everything else in the SGEO pipeline flows from this plan.
9
10	## Tool discovery
11
12	Before gathering project details, confirm which tools are available. Ask the user directly — do not assume access to any external service.
13
14	Browser automation (primary for this skill):
15	- [ ] Playwright MCP (live SERP interaction, Autocomplete harvesting, PAA expansion, screenshots)
16	- [ ] Chrome DevTools MCP (alternative browser control, performance analysis)
17
18	Free tools (no API key required):
19	- [ ] WebFetch (fetch any public URL)
20	- [ ] WebSearch (search engine queries)
21	- [ ] Google PageSpeed Insights API (CWV data, no key needed for basic usage)
22	- [ ] Google Rich Results Test (structured data validation)
23
24	Paid tools (API key or MCP required):
25	- [ ] Google Search Console API (requires OAuth — critical for Phase 10 quick wins)
26	- [ ] DataForSEO MCP (SERP data, keyword metrics, backlinks, AI visibility)
27	- [ ] Ahrefs API (keyword research, competitor gap analysis, backlink data)
28	- [ ] Semrush API (largest keyword database, keyword gap, magic tool)
29
30	The agent must:
31	1. Check for Playwright MCP or Chrome DevTools MCP FIRST — many scripts in this skill depend on browser automation for live SERP data
32	2. If no browser automation available: all scripts fall back to WebSearch + WebFetch, but output quality is reduced (no live Autocomplete, no PAA expansion, no visual SERP analysis)
33	3. Present the full checklist to the user and record what is available
34	4. Pass the inventory to scripts as context
35
36	## Before you start
37
38	Gather the following from the user. If anything is missing, ask before proceeding:
39
40	1. What is your business/product? (Product category, target market, value proposition)
41	2. What is your site URL? (Existing site for quick win analysis, or "new site" if starting from scratch)
42	3. Who are your known competitors? (3-5 domains — business competitors AND SEO competitors)
43	4. What is your current SEO status? (Brand new / some content / established — determines whether Phase 10 is applicable)
44	5. Do you have Google Search Console access? (Critical for Phase 10 quick wins)
45	6. What is the target country/region? (For localized SERP analysis and volume data)
46	7. What is your budget for tools? (None / small / moderate / significant — determines which paid tool paths are available)
47	8. How important is AI visibility? (Determines weight of GEO scoring in prioritization — low / medium / high)
48
49	If the user says "I just want to find keywords," push back: "Keywords without intent classification, competitor validation, and prioritization scoring produce a random list, not a strategy. Which phase do you want to start from?"
50
51	## Phase 1: Generate seed keywords
52
53	> Scripts: Run `scripts/probe-ai-discovery.py --queries 'What are the best [category] tools?'` to validate seeds against AI visibility.
54	> References: See `references/seed-generation.md` for the 5-source mental model and GEO seed validation methodology.
55
56	Seed keywords are your starting points. They are the obvious, broad terms that describe your product, service, or topic. You do not need tools for this — just structured thinking.
57
58	Ask yourself these 5 questions and write down every answer:
59
60	1. "What do I sell or offer?" — Product category, service type, problem solved. If you sell invoicing software: "invoicing software", "invoice generator", "billing software."
61	2. "What problem does my product solve?" — Pain points. "How to manage invoices", "track payments", "send payment reminders."
62	3. "How would a stranger describe what I do?" — Plain language, no jargon. A CI/CD platform user might search "automate code deployment", not your product name.
63	4. "What alternatives or competitors exist?" — "[Competitor] alternative", "[Competitor A] vs [Competitor B]", "[category] comparison."
64	5. "What technologies, methods, or concepts are relevant?" — Integrations, workflows, adjacent technologies.
65
66	Target: 15-30 seeds, 1-4 words each. Do not worry about volume or competition yet.
67
68	Example seed list (invoicing SaaS):
69	```
70	invoicing software, invoice generator, online invoicing, billing software,
71	send invoices online, freelancer invoice tool, invoice template, recurring billing,
72	payment reminder, invoice tracking, accounting software, [CompetitorA] alternative,
73	[CompetitorB] vs [CompetitorC], small business invoicing, automated invoicing
74	```
75
76	GEO seed layer — the discovery step that separates this from traditional keyword research:
77
78	Probe AI platforms with broad questions about your space before relying solely on search data:
79
80	- Ask ChatGPT: "What are the best tools for [category]?", "How do I [problem]?"
81	- Ask Perplexity: same queries — different citation sources reveal different market landscape
82	- Ask Gemini: same queries — yet another AI perspective
83
84	Record which brands AI mentions, which topics it covers, and what language it uses. These are GEO-validated seeds — AI already discusses them, so content targeting these topics has citation potential from day one.
85
86	Run `scripts/probe-ai-discovery.py --queries queries.txt` to automate this GEO validation. Keywords that appear in AI responses get a GEO-validated flag in your seed list.
87
88	Output: Seed keyword list with GEO validation flags (15-30 seeds).
89
90	## Phase 2: Expand keyword universe
91
92	> Scripts: Run `scripts/harvest-autocomplete.py --seeds 'seed1,seed2,seed3'` to harvest Google Autocomplete suggestions via live browser. Run `scripts/extract-paa.py --seeds 'seed1,seed2'` to expand People Also Ask boxes.
93	> References: See `references/keyword-expansion.md` for tool-by-tool guides and browser automation expansion techniques.
94
95	Take each seed and expand it using keyword research tools. The goal is to go from 15-30 seeds to 200-1000+ raw keyword ideas.
96
97	Google Keyword Planner (free, volume ranges):
98	1. Google Ads > Tools > Keyword Planner > "Discover new keywords."
99	2. Enter one seed at a time. Set target country.
100	3. Download results as CSV. Repeat for each seed.
101	4. Limitation: shows volume ranges (1K-10K), not exact numbers. Ad competition metric, not organic KD.
102
103	Paid tools (brief — see reference for full walkthrough):
104	- Ahrefs Keywords Explorer: Matching terms, Related terms, Questions, Also rank for.
105	- Semrush Keyword Magic Tool: 20B+ keyword database, long-tail discovery, auto-grouping.
106	- Ubersuggest: keyword ideas, questions, related terms. Lifetime deal (~$120) is strong value.
107
108	Browser-automated expansion (primary for this skill):
109
110	Run `scripts/harvest-autocomplete.py --seeds 'invoicing software,billing software,invoice generator'` — this types each seed into Google with a-z letter suffixes and question prefixes (how/what/why/best), capturing every Autocomplete suggestion. This is data that WebSearch cannot replicate — real-time, localized, intent-rich suggestions.
111
112	Run `scripts/extract-paa.py --seeds 'invoicing software,billing software'` — searches Google for each seed, then click-expands every People Also Ask question, capturing 20-50+ questions per seed with answer snippets and source URLs.
113
114	GEO expansion step: Review your expanded list. Flag any keywords that match topics mentioned by AI platforms in Phase 1. These keywords have dual value — organic search traffic plus AI citation potential.
115
116	Output: Raw spreadsheet with 200-1000+ keyword ideas, each with at least a volume estimate.
117
118	## Phase 3: Spy on competitors
119
120	> Scripts: Run `scripts/competitor-gap-analysis.py --domain yourdomain.com --competitors comp1.com,comp2.com`. Run `scripts/probe-ai-discovery.py` with competitor-relevant queries for GEO gap analysis.
121	> References: See `references/competitor-intelligence.md` for gap analysis methodology and GEO competitor analysis.
122
123	Competitor analysis is the single highest-ROI activity in the discovery process. Other websites have done the research for you.
124
125	Step 1: Identify SEO competitors.
126	Search Google for 5-10 of your seed keywords. Note which domains appear repeatedly in the top 10. These are your SEO competitors — even if they sell different products. Pick 3-5, mixing direct product competitors and content competitors (blogs, review sites, aggregators).
127
128	Step 2: Keyword gap analysis.
129	Find keywords competitors rank for that you do not.
130	- Ahrefs: Competitive Analysis > Content Gap. Enter your domain + 3-5 competitors. Export "Missing" keywords.
131	- Semrush: Keyword Gap tool. Compare up to 5 domains. "Missing" tab + "Weak" tab.
132	- Free approximation: `scripts/competitor-gap-analysis.py` uses WebSearch `site:competitor.com` to catalog their pages and compares against your sitemap.
133
134	Step 3: Analyze why competitors rank.
135	For the most interesting gap keywords, visit the ranking pages. Ask: what type of page is it? How deep is the content? What angle do they take? What is weak or outdated? Your goal is to understand the format Google rewards, then create something better.
136
137	Step 4: Top pages analysis.
138	Use Ahrefs Site Explorer > Top Pages or Semrush > Organic Research > Pages to find which competitor pages drive the most traffic. Export and add the most relevant keywords.
139
140	GEO competitor analysis:
141	For your target queries, check which competitors get cited by AI platforms. Run `scripts/probe-ai-discovery.py --queries target-queries.txt --brand YourBrand`. Compare: who does AI cite? What do their cited pages look like — structure, data density, author attribution? Where are the gaps — queries where AI cites weak or outdated sources you can replace?
142
143	Output: Gap keywords added to master spreadsheet + GEO citation gap analysis.
144
145	## Phase 4: Mine Google for free ideas
146
147	> Scripts: Run `scripts/harvest-autocomplete.py`, `scripts/extract-paa.py --seeds 'seed1,seed2'`, and `scripts/scrape-related-searches.py --seeds 'seed1,seed2'`.
148	> References: See `references/browser-automation-guide.md` for Playwright technique details — the exact tool calls for Autocomplete extraction, PAA expansion, and Related Searches chaining.
149
150	Google reveals what people search for through dynamic SERP features. Browser automation extracts this data systematically.
151
152	Technique 1: Autocomplete harvesting.
153	Type seeds + a-z suffixes, question prefixes (how/what/why/best). Run `scripts/harvest-autocomplete.py` — this uses Playwright to interact with live Google Autocomplete, capturing real-time localized suggestions.
154
155	Technique 2: PAA expansion.
156	Search each seed, click-expand all PAA boxes. Each click reveals 2-3 new questions. Run `scripts/extract-paa.py` to capture 20-50+ questions per seed with answer snippets and source URLs. Each question is a potential H2 heading or standalone article.
157
158	Technique 3: Related Searches chaining.
159	Scroll to the bottom of Google SERPs, extract Related Searches, click each for 2nd-level expansion. Run `scripts/scrape-related-searches.py` for 2-level deep chaining. This uncovers lateral keyword ideas that tools miss.
160
161	Technique 4: AnswerThePublic.
162	3 free searches per day at answerthepublic.com. Enter your broadest seed keywords. Export the questions, prepositions, and comparisons data.
163
164	Technique 5: GSC mining (existing sites only).
165	GSC > Performance > Queries tab. Sort by impressions descending. Find keywords with high impressions but low CTR (title/description needs improvement) and keywords at positions 8-20 (striking distance to page 1).
166
167	Output: Question-based and long-tail keywords added to master spreadsheet.
168
169	## Phase 5: Listen to communities
170
171	> Scripts: Run `scripts/scrape-community-keywords.py --topic 'your category' --platforms reddit,hn` to extract keyword candidates from community discussions.
172	> References: See `references/community-listening.md` for platform-by-platform guide and language pattern extraction methodology.
173
174	Keyword tools rely on historical search data. Communities show you what people are asking right now, in their own language, before tools catch up.
175
176	Where to look:
177	- Reddit: Search for your category in relevant subreddits. Read titles and comments of popular posts. Watch for repeated questions, complaints about tools, comparison language.
178	- Hacker News: Search hn.algolia.com for your category. Read "Ask HN" threads for tool recommendations.
179	- Twitter/X: Search for your category, competitor names, problem descriptions.
180	- Quora: Questions mirror Google searches directly. Search your topic.
181	- Product Hunt: Similar products — read comments for needs, comparisons, frustrations.
182	- Industry forums/Discord/Slack: Niche communities where your customers congregate.
183
184	What to capture — language patterns, not keyword data:
185	Turn observations into keyword candidates:
186	- "I need a way to automatically send payment reminders" → "automatic payment reminder software"
187	- "Is there a Stripe alternative that handles invoicing too?" → "Stripe alternative with invoicing"
188	- "How do I set up recurring billing without a developer?" → "recurring billing no code"
189
190	Community-sourced keywords often show zero volume in tools because the volume is too low to register. Do not ignore them — low volume with perfect intent can be more valuable than high volume with vague intent.
191
192	GEO dimension: Community language maps directly to AI query language. The questions people ask on Reddit are the same questions they ask ChatGPT. Pain points described in forums are the same problems people describe to AI assistants. Community-sourced keywords have dual value: organic search traffic AND AI citation opportunity.
193
194	Output: Community-sourced keyword candidates added to master spreadsheet.
195
196	## Phase 6: Evaluate and filter keywords
197
198	> Scripts: Run `scripts/evaluate-keywords.py --keywords raw-keywords.txt --max-kd 50` to enrich your keyword list with volume/KD estimates and GEO scores. Run `scripts/probe-ai-discovery.py` to assess AI citation potential for your keywords.
199	> References: See `references/evaluation-and-scoring.md` for volume/KD interpretation scales, GEO score methodology, and filtering rules.
200
201	You now have a large, messy spreadsheet with hundreds or thousands of keywords from all five sources. Time to evaluate and filter.
202
203	The 3 SEO metrics for every keyword:
204
205	- Volume: Monthly searches. 10,000+ = high/competitive. 1,000-10,000 = medium/sweet spot for mid-authority. 100-1,000 = low-medium/sweet spot for new sites. <100 = still valuable with strong intent. 0 = tools cannot measure but real people search it.
206	- Keyword Difficulty (KD): 0-100 scale. 0-20 = easy (new sites rank in weeks). 21-40 = medium (needs content + some backlinks). 41-60 = hard (needs strong authority). 61+ = very hard (dominated by major brands). Always verify by manually checking the SERP — KD is an estimate.
207	- Search Intent: Covered in depth in Phase 7.
208
209	GEO score — the 4th metric unique to this skill:
210	For each keyword, assess AI citation potential on a 1-3 scale:
211	- 3 = High opportunity: AI answers this query and cites weak or few sources. You can create content that replaces those citations. This is the discovery equivalent of finding an easy-KD keyword.
212	- 2 = Moderate opportunity: AI answers and cites strong, authoritative sources. Hard to displace existing citations but still worth targeting for AI visibility.
213	- 1 = Low opportunity: AI does not answer this query or defers entirely to search. SEO-only value — still worth targeting for organic traffic.
214
215	Run `scripts/probe-ai-discovery.py --queries keywords-to-check.txt` to assess GEO scores. This is the backbone of GEO-aware discovery.
216
217	Filtering rules:
218	- Remove irrelevant keywords — if the searcher would never become your customer, cut it
219	- Remove branded competitor navigational terms (keep "alternative" and "vs" variants)
220	- Remove keywords above your KD threshold (KD >40-50 for new sites, adjust based on authority)
221	- Remove duplicates and near-duplicates (keep higher-volume version — Google understands synonyms)
222
223	Spreadsheet structure after filtering:
224
225	```
226	\| Keyword \| Volume \| KD \| Intent \| GEO Score \| Source \| Priority \| Target URL \| Status \|
227	```
228
229	Output: Filtered, enriched keyword spreadsheet with volume, KD, and GEO scores.
230
231	## Phase 7: Classify search intent
232
233	> Scripts: Run `scripts/classify-intent-live.py --keywords keywords.txt` to search each keyword via Playwright, analyze SERP composition, classify intent, and flag AI-answerable queries.
234	> References: See `references/evaluation-and-scoring.md` for the 4 intent types with signals, content types, and business value.
235
236	Search intent is the most important and most overlooked step. If you create the wrong content type for a keyword, you will not rank — regardless of content quality.
237
238	The 4 intent types:
239
240	- Informational — "I want to learn." Signals: "how to", "what is", "guide", "tutorial." Content: blog posts, guides, tutorials. Business value: top-of-funnel, builds authority.
241	- Navigational — "I want a specific site." Signals: brand names, "login", "pricing." Content: your own brand pages. Not worth targeting for other brands.
242	- Commercial — "I'm comparing options." Signals: "best", "vs", "review", "alternative." Content: comparison pages, reviews, "best of" lists. Business value: high — close to purchase decision.
243	- Transactional — "I'm ready to act." Signals: "buy", "free trial", "download", "pricing." Content: product pages, pricing pages, free tools. Business value: highest — direct conversion.
244
245	The SERP test — how to determine intent with certainty:
246	1. Open an incognito browser window.
247	2. Search for the keyword.
248	3. Look at the top 5-10 results.
249	4. Match the format: blog posts = informational. Product pages = transactional. Comparison articles = commercial.
250
251	Google has already figured out the intent. If every result for "invoice generator" is a free online tool, the intent is transactional. A blog post targeting that keyword will never rank.
252
253	Run `scripts/classify-intent-live.py` to automate this — it searches each keyword via Playwright, analyzes the SERP composition (content types, SERP features), and classifies intent programmatically.
254
255	AI-answerable flag (GEO):
256	Does Google show an AI Overview for this keyword? Does Perplexity give a direct answer? If yes, GEO optimization is critical for this keyword — optimizing for AI citation (via content-sgeo and on-page-sgeo) is not optional, it is required to capture visibility.
257
258	Output: Every keyword in your spreadsheet now has an intent classification and AI-answerable flag.
259
260	## Phase 8: Group into topic clusters
261
262	> Scripts: Run `scripts/build-topic-clusters.py --keywords evaluated-keywords.json` to group keywords by semantic similarity, identify pillars, and generate a cluster map.
263	> References: See `references/evaluation-and-scoring.md` for cluster architecture and the "own page" test.
264
265	Individual keywords are not a strategy. Group them into clusters that build topical authority.
266
267	Cluster structure:
268	- Pillar page: Comprehensive, long-form page covering a broad topic. Highest volume, broadest scope per cluster. Example: "The Complete Guide to Online Invoicing."
269	- Supporting pages: Shorter, specific pages covering subtopics in depth. Link back to the pillar. Examples: "How to Write a Professional Invoice", "Invoice Payment Terms Explained."
270	- Internal links: Every support links to the pillar. The pillar links to every support. This signals to Google that you are an authority on the topic.
271
272	How to group:
273	1. Identify natural themes in your keyword list.
274	2. Group keywords sharing the same root topic.
275	3. Per group: identify the head keyword (highest volume, broadest) — this is the pillar target.
276	4. Remaining keywords become supporting pages or H2 sections within the pillar.
277
278	When does a keyword deserve its own page?
279	- It has a distinct search intent from others in the group.
280	- The topic is deep enough for 800+ words of dedicated content.
281	- The SERP shows standalone pages ranking (not subsections of larger pages).
282	Otherwise, target it as an H2 section within a larger page.
283
284	Example cluster (invoicing SaaS, with GEO scores):
285
286	Pillar: "Online Invoicing" (volume: 5,400, KD: 45, GEO: 2)
287	- "How to Create a Professional Invoice" (vol: 6,600, KD: 35, GEO: 3) — guide [create first: highest GEO]
288	- "Invoice Payment Terms: Net 30, Net 60" (vol: 1,900, KD: 12, GEO: 2) — blog post
289	- "Recurring Invoice Software Comparison" (vol: 800, KD: 22, GEO: 3) — comparison [create second: high GEO]
290	- "Invoice Template Free Download" (vol: 3,200, KD: 28, GEO: 1) — free tool
291	- "FreshBooks vs Wave for Freelancers" (vol: 1,100, KD: 20, GEO: 2) — comparison
292
293	GEO citation mapping: Within each cluster, rank supporting topics by GEO score. Highest GEO-score topics are created first — they get cited by AI sooner, building AI visibility for the entire cluster.
294
295	Output: Keywords grouped into clusters with pillar/support assignments and GEO priority ordering.
296
297	## Phase 9: Prioritize and build content plan
298
299	> Scripts: Run `scripts/prioritize-opportunities.py --clusters clusters.json` to apply 4-dimension scoring and generate a tiered content plan with publishing schedule.
300	> References: See `references/evaluation-and-scoring.md` for the complete 4-dimension framework and tier definitions.
301
302	You have clusters, but you cannot create everything at once. Prioritize using 4 dimensions.
303
304	4-dimension scoring framework (max 12 points):
305
306	Dimension 1 — Business Value (1-3):
307	- 3 = Directly relates to your product. Searcher could become a paying customer. "[category] software", "[competitor] alternative."
308	- 2 = Indirectly related. Builds awareness with target audience. "How to [solve problem]."
309	- 1 = Tangentially related. Drives traffic but weak revenue connection.
310
311	Dimension 2 — Ranking Feasibility (1-3):
312	- 3 = KD under 20 AND you have relevant existing content or expertise.
313	- 2 = KD 20-40 OR requires building backlinks.
314	- 1 = KD 40+ OR dominated by major brands.
315
316	Dimension 3 — Traffic Potential (1-3):
317	- 3 = Volume over 1,000/month.
318	- 2 = Volume 200-1,000/month.
319	- 1 = Volume under 200/month.
320
321	Dimension 4 — GEO Opportunity (1-3):
322	- 3 = AI answers this query and cites weak/few sources you can replace. Fastest GEO win.
323	- 2 = AI answers but cites strong sources. Worth targeting for AI presence, harder to displace.
324	- 1 = AI does not answer this query. SEO-only value.
325
326	Total = Business Value + Feasibility + Traffic + GEO Opportunity (max 12)
327
328	Tiers:
329	- 10-12 = Golden. Do these first. High business value, achievable difficulty, meaningful traffic, strong GEO opportunity.
330	- 7-9 = Strong. Do these second. Solid opportunities requiring more effort.
331	- 4-6 = Moderate. Do these eventually. Lower priority, still worth creating over time.
332	- 1-3 = Skip. Not worth the effort right now.
333
334	Example scoring (form builder SaaS):
335
336	\| Keyword \| BV \| RF \| TP \| GEO \| Total \| Tier \|
337	\|---------\|----\|----\|----\|----\|-------\|------\|
338	\| Typeform alternative \| 3 \| 2 \| 3 \| 3 \| 11 \| Golden \|
339	\| NPS survey template \| 3 \| 3 \| 2 \| 2 \| 10 \| Golden \|
340	\| form builder with payment \| 3 \| 3 \| 1 \| 3 \| 10 \| Golden \|
341	\| how to create online form \| 2 \| 2 \| 3 \| 3 \| 10 \| Golden \|
342	\| best online form builder \| 3 \| 1 \| 2 \| 2 \| 8 \| Strong \|
343	\| conditional logic form \| 2 \| 3 \| 1 \| 1 \| 7 \| Strong \|
344
345	Content calendar template:
346
347	```
348	\| Priority \| Keyword \| Volume \| KD \| Intent \| GEO Score \| Page Type \| Target URL \| Cluster \| Publish By \|
349	```
350
351	Monthly cadence:
352	- Month 1: 3-5 pages targeting golden-tier keywords. Pillar pages first, then highest-GEO supports.
353	- Month 2: 3-5 supporting pages. Internal linking between published pages.
354	- Month 3: Optimize Month 1 content with GSC data. Continue publishing supports.
355	- Ongoing: 2-4 new pages/month. Update existing content quarterly. Re-run discovery quarterly.
356
357	Output: Prioritized content plan with 4-dimension scores, tiers, page types, target URLs, clusters, and publishing schedule.
358
359	## Phase 10: Find quick wins in existing data
360
361	> Scripts: Run `scripts/find-quick-wins.py --domain yourdomain.com` to identify striking-distance keywords, low-CTR pages, variant keywords, and GEO citation gaps.
362	> References: See `references/evaluation-and-scoring.md` for quick win identification methodology.
363
364	This phase applies only to sites with existing traffic. If your site is brand new, skip this and return after 2-3 months of content publishing and GSC data collection.
365
366	Quick wins are the fastest path to more traffic — they optimize what is already partially working.
367
368	Quick win type 1 — Striking distance (positions 8-20):
369	GSC > Performance > Queries. Filter positions 8-20. Sort by impressions descending. These keywords are on the edge of page 1. You already rank — Google considers your content relevant. A small improvement pushes you onto page 1, where traffic increases dramatically.
370	Action: improve content depth for the ranking page, add internal links from other pages, optimize the title tag.
371
372	Quick win type 2 — High impressions, low CTR:
373	Filter for CTR below 2-3%, sort by impressions descending. Google shows your page, but people do not click. Your title or meta description is not compelling.
374	Action: rewrite title to be specific with a clear benefit. Rewrite meta description with active voice and value proposition. Check if a SERP feature pushes your result down.
375
376	Quick win type 3 — Variant keywords:
377	Check which keywords your pages rank for that you did not deliberately target. Sometimes a page ranks for a variant keyword with higher volume or better intent than the original target.
378	Action: shift primary target to the higher-value variant. At minimum, add a section covering the variant and include it in the title or an H2.
379
380	GEO quick wins — keywords where you rank but AI does not cite you:
381	For your top-ranking keywords, run `scripts/probe-ai-discovery.py` to check if AI platforms cite your pages. If you rank on page 1 but are not cited by AI, you have a GEO gap. Apply on-page-sgeo and content-sgeo to optimize content structure for AI citation — direct-answer formatting, data density, author attribution, structured data.
382
383	Output: Quick win list with specific actions per keyword, including GEO citation gaps.
384
385	## Available scripts
386
387	Run these scripts to automate discovery tasks. Each outputs JSON. Scripts marked with [browser] use Playwright MCP for live browser data. If Playwright is unavailable, they fall back to WebSearch/WebFetch with reduced output quality. All browser scripts accept `--no-browser` to force the fallback path.
388
389	\| Script \| What it does \| Run it when \|
390	\|--------\|-------------\|-------------\|
391	\| `harvest-autocomplete.py` [browser] \| Types seeds + a-z in Google, captures all Autocomplete suggestions \| Phase 2: expanding keyword universe \|
392	\| `extract-paa.py` [browser] \| Searches Google, expands PAA boxes, captures 20-50+ questions per seed \| Phase 2, 4: expanding and mining keywords \|
393	\| `scrape-related-searches.py` [browser] \| Extracts Related Searches, chains 2 levels deep \| Phase 4: mining Google for free ideas \|
394	\| `analyze-serp-live.py` [browser] \| Flagship: full SERP analysis with organic results, features, AI Overview, screenshot \| Phase 7: SERP-based intent classification \|
395	\| `competitor-gap-analysis.py` \| Compares user domain vs competitors for keyword gaps \| Phase 3: competitor analysis \|
396	\| `scrape-community-keywords.py` [browser] \| Searches Reddit/HN/forums, extracts question titles and pain point language \| Phase 5: community listening \|
397	\| `probe-ai-discovery.py` \| GEO backbone: tests queries on AI platforms, records citations, finds gaps \| Phase 1, 3, 6, 10: GEO validation throughout \|
398	\| `evaluate-keywords.py` \| Enriches raw keywords with volume/KD/GEO scores, filters noise \| Phase 6: evaluation and filtering \|
399	\| `classify-intent-live.py` [browser] \| Live SERP analysis: determines intent + AI-answerable flag per keyword \| Phase 7: intent classification \|
400	\| `build-topic-clusters.py` \| Groups keywords by similarity, identifies pillars, generates cluster map \| Phase 8: topic clustering \|
401	\| `prioritize-opportunities.py` \| 4-dimension scoring (max 12), outputs tiered content plan with schedule \| Phase 9: prioritization \|
402	\| `find-quick-wins.py` \| GSC striking distance + high-impression/low-CTR + GEO citation gaps \| Phase 10: quick wins \|
403
404	## Quality checklist
405
406	Before delivering the content plan, verify:
407
408	- [ ] All 10 phases completed (or consciously skipped with documented reason)
409	- [ ] Seed keywords validated against both search data AND AI platform probing
410	- [ ] Competitor gap analysis includes both SEO gaps and GEO citation gaps
411	- [ ] Every keyword has volume estimate, KD estimate, intent classification, and GEO score
412	- [ ] Intent classification verified via live SERP test (not just heuristic guessing)
413	- [ ] Keywords grouped into topic clusters with pillar/support assignments
414	- [ ] Content plan uses 4-dimension scoring (max 12) with GEO as a real dimension, not a token checkbox
415	- [ ] Quick wins identified (if existing site with GSC data)
416	- [ ] Output is a prioritized content plan, not a raw keyword list
417	- [ ] Plan feeds clearly into content-sgeo (what to write) and on-page-sgeo (how to optimize)
418
419	## Common mistakes to avoid
420
421	1. Targeting keywords that are too broad. "Software" is not a keyword — it is a category. "Invoice software for freelancers" is a keyword.
422	2. Ignoring search intent. If every SERP result is a blog post and you create a product page, you will not rank. Match the content type Google rewards.
423	3. Obsessing over search volume. A keyword with 50 monthly searches and perfect buyer intent can be worth more than a keyword with 10,000 searches and vague informational intent.
424	4. Never checking the actual SERP. Tools provide data. The SERP provides truth. Always manually search your most important target keywords.
425	5. Targeting the same keyword with multiple pages (cannibalization). Each keyword or keyword cluster maps to exactly one page. If two pages compete, Google may rank neither.
426	6. Skipping competitor analysis. This is the single highest-ROI activity. Other sites have validated which keywords drive traffic. Use their work.
427	7. Creating content without a target keyword. Every page needs a clear primary keyword. If you cannot identify one, the page lacks strategic purpose.
428	8. Giving up too early. SEO results take 3-6 months. Publish in January, check rankings in July. Not February.
429	9. Only looking at your own data. GSC shows keywords you already rank for. The biggest opportunities are keywords you have zero presence for — only competitor analysis and expansion tools reveal those.
430	10. Doing keyword research once and never again. Search behavior evolves. Competitors publish new content. New questions emerge. Re-run discovery quarterly.
431	11. Ignoring GEO opportunity in prioritization. A keyword where AI cites weak sources is a faster win than one where AI cites authoritative sources you cannot displace. GEO opportunity is a real competitive dimension, not a nice-to-have.
432	12. Not validating seeds against AI platforms. What AI recommends in your space is a leading indicator of search trends. If AI platforms discuss a topic, searchers are asking about it too — often before traditional keyword tools register the volume.
433

Discovery GSEO