Prompt Engineering

Prompt engineering is the art and science of crafting inputs (prompts) to foundation models to elicit the most accurate, relevant, and useful outputs. It is a critical skill for the AIF-C01 exam.

Exam Tip: The exam will test your understanding of different prompting techniques and when to use each one. Know the differences between zero-shot, few-shot, and chain-of-thought prompting.

Zero-Shot Prompting

What: Asking the model to perform a task without providing any examples
How: Simply describe the task in the prompt and let the model use its pre-trained knowledge
When to Use:
- The task is straightforward and well-understood
- You don't have examples to provide
- You're testing the model's baseline capability
Strengths:
- Simplest approach — no data preparation needed
- Works well for common, well-defined tasks
Limitations:
- May not produce desired format or style
- Less accurate for domain-specific or nuanced tasks

Example:

Prompt: "Classify the following text as positive, negative, or neutral: 'The product arrived on time and works great!'"

Output: "Positive"

Exam Tip: Zero-shot = no examples given. If the question asks about prompting without providing reference examples, it's zero-shot.

Few-Shot Prompting (One-Shot, Single-Shot)

What: Providing the model with one or more examples of the desired input-output behavior before asking it to perform the task
Variants:
- One-Shot (Single-Shot): Provide exactly one example
- Few-Shot: Provide 2-6 examples (typically 3-5)
How: Include example input-output pairs in the prompt, followed by the actual query
When to Use:
- You need the model to follow a specific format or style
- The task is domain-specific or uncommon
- Zero-shot isn't producing acceptable results
Strengths:
- Significantly improves accuracy for specific tasks
- Teaches the model the desired format without fine-tuning
- Quick to implement — no training required
Limitations:
- Uses more tokens (costs more, uses context window)
- Quality depends on example selection
- May not generalize to edge cases

Example (Few-Shot):

Prompt:
"Classify the sentiment of each review:

Review: 'Great battery life!' → Positive
Review: 'Screen cracked after one day.' → Negative  
Review: 'It's okay, nothing special.' → Neutral

Review: 'The camera quality is amazing!' → "

Output: "Positive"

Exam Tip: Few-shot = examples provided in the prompt. If the question mentions providing examples to guide model behavior without training, it's few-shot prompting. One-shot = 1 example, few-shot = multiple examples.

Chain of Thought (CoT)

What: Prompting the model to show its reasoning step-by-step before arriving at a final answer
How: Include instructions like "Think step by step" or provide examples with explicit reasoning chains
When to Use:
- Complex reasoning tasks (math, logic, multi-step problems)
- Tasks requiring analysis before a conclusion
- When you need to verify the model's reasoning process
Strengths:
- Dramatically improves accuracy on complex reasoning tasks
- Makes the model's logic transparent and verifiable
- Reduces errors in multi-step problems
Limitations:
- Uses more tokens (longer responses)
- Slower response time
- May over-explain simple tasks

Example:

Prompt: "Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? Think step by step."

Output: 
"Step 1: Roger starts with 5 tennis balls.
Step 2: He buys 2 cans of tennis balls.
Step 3: Each can has 3 balls, so 2 × 3 = 6 new balls.
Step 4: Total = 5 + 6 = 11 tennis balls.
Answer: 11"

Exam Tip: Chain of thought = "show your work". If the question involves complex reasoning, math, or multi-step logic, CoT prompting improves accuracy. The trigger phrase is "think step by step."

Tree of Thoughts

What: An advanced prompting technique where the model explores multiple reasoning paths simultaneously, evaluates each path, and selects the best one
How: The model generates multiple possible next steps, evaluates them, and backtracks if a path seems wrong — like a tree search
When to Use:
- Highly complex problems with multiple possible approaches
- Strategic planning and decision-making tasks
- Tasks where the first approach may not be optimal
Strengths:
- Handles complex problems better than linear CoT
- Can recover from wrong initial reasoning paths
- More thorough exploration of solution space
Limitations:
- Very token-intensive (expensive)
- Slower response time
- Overkill for simple tasks
Relationship to CoT: Tree of Thoughts extends CoT by exploring multiple reasoning branches rather than a single linear chain

Exam Tip: Tree of Thoughts = multiple reasoning paths explored and evaluated. It's an advanced version of CoT. If the question mentions complex strategic problems requiring exploration of alternatives, think Tree of Thoughts.

Retrieval-Augmented Generation (RAG)

What: A prompting/architecture pattern that retrieves relevant information from external sources and adds it to the prompt before generation
How:
1. Convert user query to embedding
2. Search vector database for relevant documents
3. Add retrieved documents to the prompt as context
4. Model generates response grounded in retrieved information
When to Use:
- Model needs access to proprietary/current information
- You want to reduce hallucinations
- You need citations/sources for responses
- Knowledge changes frequently
Strengths:
- Grounds responses in factual data
- No model re-training needed when information changes
- Can provide source citations
- Dramatically reduces hallucinations
Limitations:
- Quality depends on retrieval accuracy
- Adds latency (retrieval step)
- Requires maintaining a knowledge base

Exam Tip: RAG = retrieve then generate. It's both a prompting pattern and an architectural pattern. In the context of prompt engineering, RAG modifies what goes INTO the prompt by adding retrieved context.

Prompt Templates

What: Pre-defined, reusable prompt structures with placeholder variables that get filled in at runtime
How: Define a template with variables (e.g.,
{{context}}
,
{{question}}
) and fill them programmatically
When to Use:
- Standardize prompt format across an application
- Ensure consistency in prompt structure
- Simplify prompt management at scale
Benefits:
- Consistency across all invocations
- Easier to version and manage
- Separation of prompt structure from content
- A/B testing different templates

Example:

Template:
"You are a helpful {{role}} assistant. 
Context: {{context}}
Question: {{question}}
Answer in {{format}} format."

Filled:
"You are a helpful medical assistant.
Context: Patient reports chest pain and shortness of breath...
Question: What are the possible conditions?
Answer in bullet point format."

Exam Tip: Prompt templates = reusable prompts with variables. They're about operationalizing prompt engineering for production applications.

Best Practices

General Best Practices

Be Specific: Clearly state what you want — vague prompts produce vague outputs
Provide Context: Give the model relevant background information
Specify Format: Tell the model how you want the output structured (JSON, bullets, table)
Set Constraints: Define boundaries (word count, language, tone)
Use System Prompts: Define the model's role and behavior ("You are a helpful assistant...")
Iterate: Start simple, test, and refine the prompt based on outputs

Reducing Hallucinations

Use RAG to ground responses in factual data
Include instructions like "Only answer based on the provided context"
Add "If you don't know, say 'I don't know'" to the prompt
Use Guardrails contextual grounding checks

Improving Output Quality

Temperature: Lower values (0.0-0.3) for factual/deterministic tasks; higher values (0.7-1.0) for creative tasks
Top-P: Control the diversity of token selection
Max Tokens: Set appropriate limits for response length
Stop Sequences: Define where the model should stop generating

Prompt Structure

[System/Role Definition]
[Context/Background Information]
[Task Instructions]
[Examples (if few-shot)]
[Input Data]
[Output Format Specification]
[Constraints/Rules]

Exam Tip: The exam may ask which inference parameter to adjust for a specific goal:

More creative output → Increase temperature
More factual/deterministic output → Decrease temperature
Longer responses → Increase max tokens
More diverse word choice → Increase top-p