Chapter 3: Advanced Prompting Techniques

Chain-of-Thought Prompting

CoT leverages the transformer's sequential processing. By forcing the model to generate intermediate reasoning steps, you increase the effective "thinking time" (number of forward passes through the attention mechanism). Research from Google shows CoT improves accuracy by 30-50% on arithmetic and logical reasoning tasks.

Why Standard Prompts Fail on Multi-Step Tasks

Problem: Calculate ROI for a marketing campaign with 3 channels, different attribution models, and time-delayed conversions. Standard prompt: model generates final number without showing work. If wrong, debugging is impossible. CoT forces visible reasoning.

Standard Prompt

"Calculate the marketing ROI. Channel A spent $10k, generated 50 conversions. Channel B spent $15k, generated 80 conversions. Avg customer value: $500. Attribution: last-touch."

Output: "ROI is 150%" - No intermediate steps. Cannot verify if attribution model was applied correctly.

CoT Prompt

Same input + "Calculate step-by-step: 1) Revenue per channel 2) Cost per channel 3) Profit per channel 4) ROI formula 5) Final answer."

Output includes: "Channel A: 50 conv * $500 = $25k revenue. Cost: $10k. Profit: $15k..." Full audit trail.

CoT Implementation Patterns

Zero-shot CoT: Add "Think step-by-step" or "Show your work." Effective for straightforward multi-step problems. No examples needed.
Structured CoT: Provide explicit steps: "1) Extract variables 2) Apply formula 3) Verify units 4) State answer." Use for domain-specific reasoning (financial analysis, scientific calculations).
Self-consistency: Generate 3-5 CoT paths, select most common answer. Reduces errors from single reasoning path. Used in production for high-stakes decisions.
Verification step: End with "Review your reasoning for errors." Model catches 20-30% of its own mistakes.

Few-Shot Learning

Few-shot learning provides task-specific training data in the prompt context. Models adjust their distribution based on examples without parameter updates. This is in-context learning: the model's behavior changes for that session only. Research shows 3-5 high-quality examples often match fine-tuned model performance for specific tasks.

Use Case: Classification with Edge Cases

Task: Categorize support tickets into Bug/Feature/Billing. Challenge: Ambiguous cases ("Feature doesn't work as expected" - is this a bug or feature request?). Instructions alone can't cover all edge cases. Examples demonstrate boundary decisions.

Few-Shot Implementation

"Categorize support tickets: Bug, Feature, Billing. Examples:

Input: "Export crashes with 500 error"

Category: Bug

Reasoning: System error, existing feature broken

Input: "Salesforce integration would be useful"

Category: Feature

Reasoning: New capability request

Input: "Dashboard doesn't show data the way I expected"

Category: Feature

Reasoning: Not a bug, user wants different behavior

Note the third example: demonstrates the boundary between bugs and features.

Including reasoning increases accuracy by 15-20% (internal testing). The model learns not just the what, but the why. This is few-shot CoT: combining both techniques.

Few-Shot Engineering

Example selection: Choose edge cases over obvious ones. Model already handles "Invoice question -> Billing." Show ambiguous cases that define boundaries.
Example ordering: Most recent example has strongest influence. Place most similar example last.
Format consistency: Exact delimiter consistency matters. "Input:" vs "Input -" changes pattern matching. Use same structure across all examples.
Optimal count: 3-5 examples for most tasks. More examples improve accuracy with diminishing returns. Context window limits apply (examples use tokens).
Dynamic few-shot: For production, retrieve similar examples from database based on input. This is retrieval-augmented prompting.

Technique Composition

CoT and few-shot are composable. Few-shot CoT (providing examples that include reasoning) outperforms either technique alone. Research paper "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" demonstrates this empirically.

Other advanced combinations: Self-consistency + Few-shot (generate multiple few-shot reasoning paths), Structured CoT + Examples (provide reasoning template + filled examples). Choose based on task complexity and accuracy requirements.

Production Considerations

Cost: CoT increases token usage 2-3x (both prompt and completion). For high-volume applications, evaluate cost vs accuracy tradeoff.

Latency: Longer completions mean higher latency. For real-time applications (<200ms), consider smaller models with few-shot instead of large models with CoT.

Reliability: CoT makes errors visible and debuggable. For high-stakes decisions (financial, legal, medical), the audit trail justifies cost.

Next: Domain Applications

The next chapters apply these techniques to specific domains: content creation, business analysis, and education. Each includes production-tested workflows and troubleshooting guides.

Chapter 4: Content Creation