engineeringdata

Prompt Engineering Guide

Design, test, and optimize LLM prompts systematically — with evaluation frameworks, chain-of-thought patterns, output formatting, and iteration methodology for reliable AI outputs.

promptsLLMevaluationchain-of-thoughtAIoptimization

Works well with agents

AI Engineer Agent ML Engineer Agent Prompt Engineer Agent

Works well with skills

Experiment Design Test Plan Writing

prompt-engineering-guide/

customer-support-classifier.md

Markdown

1	# Prompt Engineering — Support Ticket Classifier, Helios SaaS
2
3	## Task Definition
4
5	Goal: Classify incoming support tickets into one of 6 categories for automatic queue routing.
6	Model: Claude 3.5 Sonnet. Volume: ~800 tickets/day, budget $0.01/ticket max.
7
8	### System Prompt (v4 — final)
9
10	```
11	You are a support ticket classifier for Helios, a project management SaaS product.
12	Given a support ticket (subject and body), classify it into exactly one category.
13
14	Categories:
15	- billing: Payments, invoices, subscriptions, refunds, pricing
16	- bug: Something broken, errors, crashes, unexpected behavior
17	- feature_request: Requests for new functionality
18	- account: Login, password resets, SSO, permissions
19	- integration: API questions, webhooks, third-party connections
20	- how_to: Usage questions, workflow guidance
21
22	Rules:
23	- If a ticket spans two categories, choose the one requiring action first.
24	- Broken integrations are "bug" not "integration."
25	- If uncertain, respond with "unknown".
26
27	Respond with JSON only:
28	{"category": "string", "confidence": "high \| medium \| low", "reasoning": "1 sentence"}
29	```
30
31	## Few-Shot Examples
32
33	```
34	Ticket: "Can't export CSV anymore" / "Export button throws 500 error since yesterday."
35	Output: {"category": "bug", "confidence": "high", "reasoning": "Specific error on previously working feature."}
36
37	Ticket: "Add Gantt chart view" / "Would love a timeline view for projects."
38	Output: {"category": "feature_request", "confidence": "high", "reasoning": "Requesting new functionality."}
39
40	Ticket: "Slack notifications stopped" / "Integration was fine until last week, nothing changed."
41	Output: {"category": "bug", "confidence": "high", "reasoning": "Previously working integration is broken."}
42
43	Ticket: "How do I set up recurring tasks?" / "I want tasks to recreate every Monday."
44	Output: {"category": "how_to", "confidence": "high", "reasoning": "Asking how to use an existing feature."}
45	```
46
47	## Evaluation Framework
48
49	Test set: 50 manually labeled tickets, stratified by category.
50
51	\| Metric \| Method \| Threshold \|
52	\|--------\|--------\|-----------\|
53	\| Format compliance \| JSON schema validation \| 100% \|
54	\| Accuracy \| Exact match vs. human labels \| > 90% \|
55	\| High-confidence accuracy \| Accuracy on "high" confidence only \| > 95% \|
56	\| Latency (p95) \| API response time \| < 2s \|
57	\| Cost per ticket \| Token count * price \| < $0.01 \|
58
59	## Iteration Log
60
61	\| Version \| Change \| Accuracy \| Key Issue \|
62	\|---------\|--------\|----------\|-----------\|
63	\| v1 \| Category list only, no examples \| 74% \| Confused integration/bug (6 errors), inconsistent JSON (3 errors) \|
64	\| v2 \| Added 3 few-shot examples \| 84% \| how_to/feature_request confusion on "Can I do X?" tickets \|
65	\| v3 \| Added how_to example + broken-integration rule \| 90% \| 2 billing/account edge cases, 2 ambiguous bug/feature \|
66	\| v4 \| Added Slack edge case + confidence field \| 92% \| 4 remaining errors are genuine ambiguities (60% human agreement) \|
67
68	v4 accepted. High-confidence accuracy: 96%. Format compliance: 100%. Avg cost: $0.003/ticket. Avg latency: 680ms.
69
70	## Production Configuration
71
72	- Temperature: 0 (deterministic)
73	- Max tokens: 150
74	- Fallback: "low" confidence tickets route to human triage
75	- Monitoring: Alert if any category exceeds 2x its 30-day average volume
76	- Re-evaluation: Monthly with 10 fresh test tickets added per cycle
77