Amazon Bedrock

Amazon Bedrock is a fully managed service that makes high-performing foundation models (FMs) from leading AI companies available through a single API. It provides a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI built in.

Exam Tip: Amazon Bedrock is the central service for the AIF-C01 exam. Nearly every generative AI question will involve Bedrock in some capacity. Know its features, integrations, and limitations deeply.

Foundation Models (FMs)

Foundation models are large-scale AI models trained on vast amounts of data that can be adapted to a wide range of downstream tasks. Amazon Bedrock provides access to multiple FMs from different providers.

Amazon Titan

Provider: Amazon (first-party)
Capabilities: Text generation, embeddings, image generation, multimodal
Key Models:
- Titan Text — General-purpose text generation, summarization, Q&A
- Titan Embeddings — Convert text to vector representations for search and RAG
- Titan Image Generator — Text-to-image generation with watermarking
- Titan Multimodal Embeddings — Embeddings for both text and images
Key Differentiator: Built by Amazon, fully integrated with AWS services, supports watermark detection for AI-generated images
Use Cases: Enterprise text generation, semantic search, content creation

Exam Tip: Titan Embeddings is the go-to model for building Knowledge Bases and RAG workflows in Bedrock. If the question mentions vector embeddings or semantic search, think Titan Embeddings.

Claude (Anthropic)

Provider: Anthropic
Capabilities: Text generation, analysis, coding, math, reasoning, vision
Key Models: Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku
Key Differentiator: Known for safety, helpfulness, and honesty; excels at complex reasoning, long context windows (up to 200K tokens), and following nuanced instructions
Strengths:
- Long document analysis (200K context window)
- Complex multi-step reasoning
- Code generation and review
- Vision/multimodal understanding
Use Cases: Document analysis, code generation, customer service, research

Exam Tip: If the question mentions long documents, complex reasoning, or safe AI outputs, Claude is likely the answer.

Llama (Meta)

Provider: Meta
Capabilities: Text generation, coding, reasoning
Key Models: Llama 3.1, Llama 3.2
Key Differentiator: Open-source foundation; strong performance on benchmarks; available for fine-tuning
Strengths:
- Open-weight model (transparent)
- Strong coding capabilities
- Community-driven improvements
- Available for customization
Use Cases: Code generation, text generation, chatbots, research

Exam Tip: If the question mentions open-source or open-weight models, think Llama.

Stable Diffusion (Stability AI)

Provider: Stability AI
Capabilities: Image generation, image editing, image-to-image
Key Differentiator: Leading text-to-image model; supports inpainting, outpainting, and style transfer
Strengths:
- High-quality image generation
- Fine-grained control over image output
- Supports negative prompts for exclusion
Use Cases: Marketing content, product visualization, creative design, media

Exam Tip: If the question is about image generation from text prompts, Stable Diffusion or Titan Image Generator are the answers. Stable Diffusion is the third-party option.

Amazon Nova

Provider: Amazon (first-party)
Capabilities: Text, image, and video generation; multimodal understanding
Key Models:
- Nova Micro — Text-only, lowest latency, most cost-effective
- Nova Lite — Multimodal (text, image, video input), cost-effective
- Nova Pro — Best balance of accuracy, speed, and cost for wide range of tasks
- Nova Canvas — Image generation
- Nova Reel — Video generation
Key Differentiator: Amazon's latest generation of FMs; optimized for speed, cost, and accuracy; supports video generation
Strengths:
- State-of-the-art multimodal capabilities
- Cost-effective across model tiers
- Native video generation (Nova Reel)
- Built-in content safety

Exam Tip: Amazon Nova is the newest Amazon FM family. Know the tiering: Micro (text-only, cheapest) → Lite (multimodal, budget) → Pro (balanced). Nova Canvas = images, Nova Reel = videos.

Model Selection and Evaluation

Choosing the right FM depends on your use case, requirements, and constraints.

Selection Criteria

Criteria	Considerations
Task Type	Text generation, summarization, code, image, embeddings, multimodal
Performance	Accuracy, quality of output, reasoning capability
Latency	Response time requirements (real-time vs. batch)
Cost	Per-token pricing, throughput requirements
Context Window	Maximum input length (tokens) the model can process
Language Support	Multilingual capabilities needed
Customization	Need for fine-tuning or continued pre-training
Safety	Content filtering, toxicity prevention requirements
Compliance	Data residency, regulatory requirements

Model Evaluation in Bedrock

Automatic Evaluation — Use built-in metrics to evaluate model quality:
- Accuracy — How correct the model's outputs are
- Robustness — How consistent outputs are across similar inputs
- Toxicity — How well the model avoids harmful content
Human Evaluation — Set up human review workflows to assess quality
Model Evaluation Jobs — Run evaluation jobs to compare models side-by-side against your specific use case
Benchmarks — Compare models using standard benchmarks (MMLU, HellaSwag, etc.)

Exam Tip: Bedrock's Model Evaluation lets you compare FMs for YOUR specific task. Use automatic evaluation for scalable testing, human evaluation for subjective quality assessment.

Model Customization

Amazon Bedrock allows you to customize foundation models with your own data to improve performance for your specific use case.

Fine-Tuning (Supervised)

What: Train the model on labeled examples (input-output pairs) to improve performance on specific tasks
How: Provide a training dataset of prompt-completion pairs in JSONL format
When to Use:
- You have a specific task with consistent format
- You have labeled training data
- You want to improve quality for a narrow use case
Data Requirements: Labeled dataset with examples of desired behavior
Supported Models: Select Titan, Llama, and other models
Result: A custom model version stored in your account

Exam Tip: Fine-tuning uses labeled data (input → expected output pairs). If the question says "the company has labeled training data" and wants better task-specific performance, fine-tuning is the answer.

Continued Pre-training

What: Train the model on unlabeled domain-specific data to teach it new knowledge
How: Provide raw text data (documents, articles, manuals) without labels
When to Use:
- You want the model to understand domain-specific terminology (medical, legal, financial)
- You have a large corpus of unlabeled domain text
- The model needs to learn new facts/concepts not in its original training data
Data Requirements: Large corpus of unlabeled domain text
Key Difference from Fine-Tuning: No labels needed; teaches the model new knowledge rather than new behavior

Exam Tip: Continued pre-training uses unlabeled data to teach domain knowledge. Fine-tuning uses labeled data to teach task behavior. This distinction is critical for the exam.

Reinforcement Learning from Human Feedback (RLHF)

What: A training technique where humans rate model outputs, and the model is trained to produce outputs that humans prefer
How It Works:
1. Model generates multiple responses to a prompt
2. Human reviewers rank the responses (best to worst)
3. A reward model is trained from human preferences
4. The FM is optimized using the reward model to produce preferred outputs
When to Use:
- You want outputs that align more closely with human preferences
- You need to reduce harmful or biased outputs
- You want more helpful, harmless, and honest responses
Key Concept: RLHF is how models like Claude are trained to be safe and helpful

Exam Tip: RLHF is about human preferences guiding model behavior. If the question mentions aligning model outputs with human values or reducing unsafe outputs, think RLHF.

Model Distillation

What: Creating a smaller, faster model (student) that mimics the behavior of a larger, more capable model (teacher)
How: The teacher model generates training data/labels, and the student model is trained to replicate those outputs
When to Use:
- You need lower latency or lower cost without significant quality loss
- You want to deploy a model at the edge or in resource-constrained environments
- You want to reduce inference costs while maintaining quality
Benefits:
- Smaller model size → faster inference
- Lower cost per inference
- Maintains much of the teacher model's quality
Available in Bedrock: Amazon Bedrock supports model distillation as a customization option

Exam Tip: Model distillation = large model teaches small model. If the question mentions reducing latency/cost while keeping quality, think distillation.

Knowledge Bases

Amazon Bedrock Knowledge Bases allow you to connect FMs to your company's private data sources for more accurate and relevant responses.

What: A fully managed RAG (Retrieval-Augmented Generation) solution
How It Works:
1. You point Knowledge Bases to your data sources (S3, web crawlers, Confluence, SharePoint, Salesforce)
2. Bedrock automatically chunks, embeds, and indexes your data into a vector database
3. At query time, relevant chunks are retrieved and provided as context to the FM
Supported Data Sources: Amazon S3, Web Crawler, Confluence, SharePoint, Salesforce, custom data sources
Vector Databases Supported:
- Amazon OpenSearch Serverless (default, fully managed)
- Amazon Aurora PostgreSQL (pgvector extension)
- Pinecone
- Redis Enterprise Cloud
- MongoDB Atlas
Chunking Strategies:
- Fixed-size chunking — Split by character count
- Default chunking — Automatic, balanced chunks
- No chunking — Each document is one chunk
- Hierarchical chunking — Parent-child chunk relationships
- Semantic chunking — Split by meaning/topic boundaries
Embedding Models: Amazon Titan Embeddings, Cohere Embed

Exam Tip: Knowledge Bases = managed RAG. The key workflow is: Data → Chunk → Embed → Store in Vector DB → Retrieve at query time → Augment FM prompt. Know the supported vector databases.

Retrieval-Augmented Generation (RAG)

RAG is a technique that enhances FM responses by retrieving relevant information from external knowledge sources before generating a response.

How RAG Works

User Query → User asks a question
Retrieval → The query is converted to an embedding and used to search a vector database for relevant documents
Augmentation → Retrieved documents are added to the prompt as context
Generation → The FM generates a response using both the prompt and the retrieved context

RAG Data Sources

Structured Data: Databases, spreadsheets, CSV files
Unstructured Data: PDFs, Word documents, web pages, emails
Semi-structured Data: JSON, XML, log files
Real-time Data: APIs, web crawlers, live feeds

Vector Databases

Vector databases store embeddings (numerical representations of text/images) and enable similarity search:

Vector Database	Type	Key Feature
OpenSearch Serverless	Managed	Default for Bedrock; fully managed, auto-scaling
Aurora PostgreSQL	Managed	pgvector extension; SQL-compatible
Pinecone	Third-party	Purpose-built for vectors; high performance
Redis Enterprise	Third-party	In-memory; ultra-low latency
MongoDB Atlas	Third-party	Document DB with vector search

RAG Use Cases

Enterprise Q&A: Answer questions from company documents
Customer Support: Provide accurate answers from knowledge articles
Legal Research: Search and cite relevant case law or regulations
Medical Assistance: Reference medical literature for clinical decisions
Technical Documentation: Help developers find relevant code/docs

Exam Tip: RAG solves the hallucination problem by grounding FM responses in factual data. If the question says "the model needs access to company/proprietary data" or "reduce hallucinations," RAG is the answer.

Agents

Amazon Bedrock Agents enable FMs to take actions by connecting to external systems and APIs. Agents can break down complex tasks, call APIs, and execute multi-step workflows.

How Agents Work

User sends a natural language request
The Agent uses an FM to understand the request and create an execution plan
The Agent calls Action Groups (APIs/Lambda functions) to fulfill the request
The Agent can also query Knowledge Bases for information
The Agent orchestrates multiple steps and returns a final response

Action Groups

What: Define the actions an Agent can take by connecting to Lambda functions or APIs
Configuration:
- Define an OpenAPI schema describing available actions
- Point to a Lambda function that executes the action
- Or use Return of Control to let your application handle the action execution
Examples: Create a ticket, look up order status, schedule a meeting, query a database

Agent Configuration

Instructions: Natural language instructions telling the agent its role and behavior
Foundation Model: Choose which FM powers the agent's reasoning
Knowledge Bases: Attach one or more Knowledge Bases for information retrieval
Action Groups: Define available actions via Lambda or API schemas
Guardrails: Attach guardrails to control agent behavior
Session Management: Agents maintain conversation context within a session
Prompt Templates: Customize the prompts used at each orchestration step

Exam Tip: Agents = FMs that can take actions. Key components: FM (brain) + Action Groups (hands) + Knowledge Bases (memory). If the question involves an AI that needs to call APIs or perform multi-step tasks, Agents is the answer.

Guardrails

Amazon Bedrock Guardrails help you implement safeguards for your generative AI applications.

What: Configurable policies that filter and control FM inputs and outputs
Capabilities:
- Content Filters: Block harmful content across categories (hate, insults, sexual, violence, misconduct) with configurable thresholds (NONE, LOW, MEDIUM, HIGH)
- Denied Topics: Define specific topics the model should refuse to discuss
- Word Filters: Block specific words, phrases, or profanity
- Sensitive Information Filters (PII): Detect and redact personally identifiable information (names, SSNs, credit cards, etc.)
- Contextual Grounding Check: Verify that model responses are grounded in the provided source information (reduces hallucinations)
How They Work: Applied to both input (user prompts) and output (model responses)
Integration: Can be attached to FM invocations, Agents, and Knowledge Bases
ApplyGuardrail API: Can be used independently of model calls to filter any text

Exam Tip: Guardrails are the primary mechanism for responsible AI in Bedrock. If the question asks about blocking harmful content, PII redaction, or preventing the model from discussing certain topics, Guardrails is the answer.

Prompt Management

Amazon Bedrock Prompt Management helps you create, manage, and version prompts for your generative AI applications.

Prompt Templates: Create reusable prompt templates with variables
Prompt Versioning: Track changes to prompts over time
Prompt Flows: Build visual workflows that chain multiple prompts and model calls together
Variables: Use placeholders in prompts that get filled at runtime
A/B Testing: Compare different prompt versions for quality

Exam Tip: Prompt Management is about operationalizing prompt engineering — versioning, reuse, and workflow orchestration.

Model Invocation

Invoke Model API

Synchronous:
InvokeModel
— Send a prompt, get a response
Streaming:
InvokeModelWithResponseStream
— Get responses token by token as they're generated
Converse API: A unified API that works across all models with a consistent format (recommended for multi-turn conversations)

Batch Inference

What: Process large volumes of prompts in bulk, asynchronously
How: Submit a batch job with prompts stored in S3; results written to S3
When to Use:
- Processing thousands/millions of prompts
- Non-time-sensitive workloads
- Cost optimization (batch pricing is cheaper)
Benefits: Up to 50% cost savings compared to on-demand inference

Exam Tip: Batch inference = large-scale, async, cheaper. If the question mentions processing many prompts at once or cost optimization for non-real-time tasks, batch inference is the answer.

Provisioned Throughput

What: Reserve dedicated model inference capacity for consistent, predictable performance
When to Use:
- You have consistent, high-volume inference workloads
- You need guaranteed latency (no throttling)
- You're running a custom model (fine-tuned or continued pre-training)
Commitment: 1-month or 6-month terms (with discounts)
Key Concept: Measured in Model Units — a fixed amount of throughput capacity

Exam Tip: Provisioned throughput = reserved capacity for consistent performance. If the question says "predictable latency" or "high throughput" or "custom model in production," think Provisioned Throughput.

Pricing

Pricing Model	How It Works	Best For
On-Demand	Pay per input/output token processed	Variable, unpredictable workloads
Batch	Submit batch jobs; up to 50% cheaper than on-demand	Large-scale, non-time-sensitive processing
Provisioned Throughput	Reserve capacity with 1-month or 6-month commitment	Consistent, high-volume production workloads
Model Customization	Pay for training (per token processed during training)	Fine-tuning, continued pre-training
Knowledge Bases	Pay for storage, embeddings, and retrieval queries	RAG applications

Exam Tip: Know the three pricing tiers: On-Demand (most flexible, highest cost per token), Batch (cheapest, async), Provisioned (predictable, committed). The exam may ask which pricing model fits a given scenario.

Integration with CloudWatch

Metrics: Monitor Bedrock API calls, latency, errors, throttling
- InvocationCount
  — Number of model invocations
- InvocationLatency
  — Time to process requests
- InvocationClientErrors
  (4xx) and
  InvocationServerErrors
  (5xx)
- InvocationThrottles
  — Number of throttled requests
Logs: Enable model invocation logging to capture:
- Full request and response payloads
- Input/output tokens
- Model ID, request ID, timestamps
- Log destinations: CloudWatch Logs or S3
Alarms: Set alarms on metrics (e.g., alert when error rate exceeds threshold)
Dashboards: Create visual dashboards for FM usage monitoring

Exam Tip: CloudWatch = operational monitoring for Bedrock. If the question asks about monitoring model performance, tracking usage, or alerting on errors, CloudWatch is the answer.

Integration with CloudTrail

What: Records all API calls made to Amazon Bedrock as events
What It Captures:
- Who made the API call (IAM identity)
- When the call was made (timestamp)
- What action was performed (API name)
- What resources were affected
- Source IP address
Use Cases:
- Security auditing: Who accessed which model and when
- Compliance: Prove regulatory compliance with usage logs
- Forensics: Investigate unauthorized or unusual API activity
- Governance: Track model usage across the organization

Exam Tip: CloudTrail = who did what, when for Bedrock API calls. If the question is about auditing, compliance, or security investigation of Bedrock usage, CloudTrail is the answer. CloudTrail ≠ CloudWatch — CloudTrail is audit, CloudWatch is monitoring.

VPC Endpoints and PrivateLink

What: Access Amazon Bedrock APIs privately from your VPC without traversing the public internet
How: Create a VPC Interface Endpoint (powered by AWS PrivateLink) for Bedrock
Benefits:
- Security: Data never leaves the AWS network
- Compliance: Meet regulatory requirements for private connectivity
- Performance: Lower latency, more reliable connectivity
Endpoint Types:
- bedrock
  — For control plane operations (manage models, create endpoints)
- bedrock-runtime
  — For data plane operations (invoke models, run inference)
- bedrock-agent
  — For Agent operations
- bedrock-agent-runtime
  — For Agent runtime operations
Security: Use VPC Endpoint Policies to control which actions/resources can be accessed through the endpoint

Exam Tip: If the question mentions "private access to Bedrock," "no internet," or "data must not traverse public internet," the answer is VPC Endpoints (PrivateLink). Remember to create endpoints for BOTH

bedrock

and

bedrock-runtime

for full functionality.