Amazon Bedrock
Amazon Bedrock is a fully managed service that makes high-performing foundation models (FMs) from leading AI companies available through a single API. It provides a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI built in.
Exam Tip: Amazon Bedrock is the central service for the AIF-C01 exam. Nearly every generative AI question will involve Bedrock in some capacity. Know its features, integrations, and limitations deeply.
Foundation Models (FMs)
Foundation models are large-scale AI models trained on vast amounts of data that can be adapted to a wide range of downstream tasks. Amazon Bedrock provides access to multiple FMs from different providers.
Amazon Titan
- Provider: Amazon (first-party)
- Capabilities: Text generation, embeddings, image generation, multimodal
- Key Models:
- Titan Text — General-purpose text generation, summarization, Q&A
- Titan Embeddings — Convert text to vector representations for search and RAG
- Titan Image Generator — Text-to-image generation with watermarking
- Titan Multimodal Embeddings — Embeddings for both text and images
- Key Differentiator: Built by Amazon, fully integrated with AWS services, supports watermark detection for AI-generated images
- Use Cases: Enterprise text generation, semantic search, content creation
Exam Tip: Titan Embeddings is the go-to model for building Knowledge Bases and RAG workflows in Bedrock. If the question mentions vector embeddings or semantic search, think Titan Embeddings.
Claude (Anthropic)
- Provider: Anthropic
- Capabilities: Text generation, analysis, coding, math, reasoning, vision
- Key Models: Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku
- Key Differentiator: Known for safety, helpfulness, and honesty; excels at complex reasoning, long context windows (up to 200K tokens), and following nuanced instructions
- Strengths:
- Long document analysis (200K context window)
- Complex multi-step reasoning
- Code generation and review
- Vision/multimodal understanding
- Use Cases: Document analysis, code generation, customer service, research
Exam Tip: If the question mentions long documents, complex reasoning, or safe AI outputs, Claude is likely the answer.
Llama (Meta)
- Provider: Meta
- Capabilities: Text generation, coding, reasoning
- Key Models: Llama 3.1, Llama 3.2
- Key Differentiator: Open-source foundation; strong performance on benchmarks; available for fine-tuning
- Strengths:
- Open-weight model (transparent)
- Strong coding capabilities
- Community-driven improvements
- Available for customization
- Use Cases: Code generation, text generation, chatbots, research
Exam Tip: If the question mentions open-source or open-weight models, think Llama.
Stable Diffusion (Stability AI)
- Provider: Stability AI
- Capabilities: Image generation, image editing, image-to-image
- Key Differentiator: Leading text-to-image model; supports inpainting, outpainting, and style transfer
- Strengths:
- High-quality image generation
- Fine-grained control over image output
- Supports negative prompts for exclusion
- Use Cases: Marketing content, product visualization, creative design, media
Exam Tip: If the question is about image generation from text prompts, Stable Diffusion or Titan Image Generator are the answers. Stable Diffusion is the third-party option.
Amazon Nova
- Provider: Amazon (first-party)
- Capabilities: Text, image, and video generation; multimodal understanding
- Key Models:
- Nova Micro — Text-only, lowest latency, most cost-effective
- Nova Lite — Multimodal (text, image, video input), cost-effective
- Nova Pro — Best balance of accuracy, speed, and cost for wide range of tasks
- Nova Canvas — Image generation
- Nova Reel — Video generation
- Key Differentiator: Amazon's latest generation of FMs; optimized for speed, cost, and accuracy; supports video generation
- Strengths:
- State-of-the-art multimodal capabilities
- Cost-effective across model tiers
- Native video generation (Nova Reel)
- Built-in content safety
Exam Tip: Amazon Nova is the newest Amazon FM family. Know the tiering: Micro (text-only, cheapest) → Lite (multimodal, budget) → Pro (balanced). Nova Canvas = images, Nova Reel = videos.
Model Selection and Evaluation
Choosing the right FM depends on your use case, requirements, and constraints.
Selection Criteria
| Criteria | Considerations |
|---|---|
| Task Type | Text generation, summarization, code, image, embeddings, multimodal |
| Performance | Accuracy, quality of output, reasoning capability |
| Latency | Response time requirements (real-time vs. batch) |
| Cost | Per-token pricing, throughput requirements |
| Context Window | Maximum input length (tokens) the model can process |
| Language Support | Multilingual capabilities needed |
| Customization | Need for fine-tuning or continued pre-training |
| Safety | Content filtering, toxicity prevention requirements |
| Compliance | Data residency, regulatory requirements |
Model Evaluation in Bedrock
- Automatic Evaluation — Use built-in metrics to evaluate model quality:
- Accuracy — How correct the model's outputs are
- Robustness — How consistent outputs are across similar inputs
- Toxicity — How well the model avoids harmful content
- Human Evaluation — Set up human review workflows to assess quality
- Model Evaluation Jobs — Run evaluation jobs to compare models side-by-side against your specific use case
- Benchmarks — Compare models using standard benchmarks (MMLU, HellaSwag, etc.)
Exam Tip: Bedrock's Model Evaluation lets you compare FMs for YOUR specific task. Use automatic evaluation for scalable testing, human evaluation for subjective quality assessment.
Model Customization
Amazon Bedrock allows you to customize foundation models with your own data to improve performance for your specific use case.
Fine-Tuning (Supervised)
- What: Train the model on labeled examples (input-output pairs) to improve performance on specific tasks
- How: Provide a training dataset of prompt-completion pairs in JSONL format
- When to Use:
- You have a specific task with consistent format
- You have labeled training data
- You want to improve quality for a narrow use case
- Data Requirements: Labeled dataset with examples of desired behavior
- Supported Models: Select Titan, Llama, and other models
- Result: A custom model version stored in your account
Exam Tip: Fine-tuning uses labeled data (input → expected output pairs). If the question says "the company has labeled training data" and wants better task-specific performance, fine-tuning is the answer.
Continued Pre-training
- What: Train the model on unlabeled domain-specific data to teach it new knowledge
- How: Provide raw text data (documents, articles, manuals) without labels
- When to Use:
- You want the model to understand domain-specific terminology (medical, legal, financial)
- You have a large corpus of unlabeled domain text
- The model needs to learn new facts/concepts not in its original training data
- Data Requirements: Large corpus of unlabeled domain text
- Key Difference from Fine-Tuning: No labels needed; teaches the model new knowledge rather than new behavior
Exam Tip: Continued pre-training uses unlabeled data to teach domain knowledge. Fine-tuning uses labeled data to teach task behavior. This distinction is critical for the exam.
Reinforcement Learning from Human Feedback (RLHF)
- What: A training technique where humans rate model outputs, and the model is trained to produce outputs that humans prefer
- How It Works:
- Model generates multiple responses to a prompt
- Human reviewers rank the responses (best to worst)
- A reward model is trained from human preferences
- The FM is optimized using the reward model to produce preferred outputs
- When to Use:
- You want outputs that align more closely with human preferences
- You need to reduce harmful or biased outputs
- You want more helpful, harmless, and honest responses
- Key Concept: RLHF is how models like Claude are trained to be safe and helpful
Exam Tip: RLHF is about human preferences guiding model behavior. If the question mentions aligning model outputs with human values or reducing unsafe outputs, think RLHF.
Model Distillation
- What: Creating a smaller, faster model (student) that mimics the behavior of a larger, more capable model (teacher)
- How: The teacher model generates training data/labels, and the student model is trained to replicate those outputs
- When to Use:
- You need lower latency or lower cost without significant quality loss
- You want to deploy a model at the edge or in resource-constrained environments
- You want to reduce inference costs while maintaining quality
- Benefits:
- Smaller model size → faster inference
- Lower cost per inference
- Maintains much of the teacher model's quality
- Available in Bedrock: Amazon Bedrock supports model distillation as a customization option
Exam Tip: Model distillation = large model teaches small model. If the question mentions reducing latency/cost while keeping quality, think distillation.
Knowledge Bases
Amazon Bedrock Knowledge Bases allow you to connect FMs to your company's private data sources for more accurate and relevant responses.
- What: A fully managed RAG (Retrieval-Augmented Generation) solution
- How It Works:
- You point Knowledge Bases to your data sources (S3, web crawlers, Confluence, SharePoint, Salesforce)
- Bedrock automatically chunks, embeds, and indexes your data into a vector database
- At query time, relevant chunks are retrieved and provided as context to the FM
- Supported Data Sources: Amazon S3, Web Crawler, Confluence, SharePoint, Salesforce, custom data sources
- Vector Databases Supported:
- Amazon OpenSearch Serverless (default, fully managed)
- Amazon Aurora PostgreSQL (pgvector extension)
- Pinecone
- Redis Enterprise Cloud
- MongoDB Atlas
- Chunking Strategies:
- Fixed-size chunking — Split by character count
- Default chunking — Automatic, balanced chunks
- No chunking — Each document is one chunk
- Hierarchical chunking — Parent-child chunk relationships
- Semantic chunking — Split by meaning/topic boundaries
- Embedding Models: Amazon Titan Embeddings, Cohere Embed
Exam Tip: Knowledge Bases = managed RAG. The key workflow is: Data → Chunk → Embed → Store in Vector DB → Retrieve at query time → Augment FM prompt. Know the supported vector databases.
Retrieval-Augmented Generation (RAG)
RAG is a technique that enhances FM responses by retrieving relevant information from external knowledge sources before generating a response.
How RAG Works
- User Query → User asks a question
- Retrieval → The query is converted to an embedding and used to search a vector database for relevant documents
- Augmentation → Retrieved documents are added to the prompt as context
- Generation → The FM generates a response using both the prompt and the retrieved context
RAG Data Sources
- Structured Data: Databases, spreadsheets, CSV files
- Unstructured Data: PDFs, Word documents, web pages, emails
- Semi-structured Data: JSON, XML, log files
- Real-time Data: APIs, web crawlers, live feeds
Vector Databases
Vector databases store embeddings (numerical representations of text/images) and enable similarity search:
| Vector Database | Type | Key Feature |
|---|---|---|
| OpenSearch Serverless | Managed | Default for Bedrock; fully managed, auto-scaling |
| Aurora PostgreSQL | Managed | pgvector extension; SQL-compatible |
| Pinecone | Third-party | Purpose-built for vectors; high performance |
| Redis Enterprise | Third-party | In-memory; ultra-low latency |
| MongoDB Atlas | Third-party | Document DB with vector search |
RAG Use Cases
- Enterprise Q&A: Answer questions from company documents
- Customer Support: Provide accurate answers from knowledge articles
- Legal Research: Search and cite relevant case law or regulations
- Medical Assistance: Reference medical literature for clinical decisions
- Technical Documentation: Help developers find relevant code/docs
Exam Tip: RAG solves the hallucination problem by grounding FM responses in factual data. If the question says "the model needs access to company/proprietary data" or "reduce hallucinations," RAG is the answer.
Agents
Amazon Bedrock Agents enable FMs to take actions by connecting to external systems and APIs. Agents can break down complex tasks, call APIs, and execute multi-step workflows.
How Agents Work
- User sends a natural language request
- The Agent uses an FM to understand the request and create an execution plan
- The Agent calls Action Groups (APIs/Lambda functions) to fulfill the request
- The Agent can also query Knowledge Bases for information
- The Agent orchestrates multiple steps and returns a final response
Action Groups
- What: Define the actions an Agent can take by connecting to Lambda functions or APIs
- Configuration:
- Define an OpenAPI schema describing available actions
- Point to a Lambda function that executes the action
- Or use Return of Control to let your application handle the action execution
- Examples: Create a ticket, look up order status, schedule a meeting, query a database
Agent Configuration
- Instructions: Natural language instructions telling the agent its role and behavior
- Foundation Model: Choose which FM powers the agent's reasoning
- Knowledge Bases: Attach one or more Knowledge Bases for information retrieval
- Action Groups: Define available actions via Lambda or API schemas
- Guardrails: Attach guardrails to control agent behavior
- Session Management: Agents maintain conversation context within a session
- Prompt Templates: Customize the prompts used at each orchestration step
Exam Tip: Agents = FMs that can take actions. Key components: FM (brain) + Action Groups (hands) + Knowledge Bases (memory). If the question involves an AI that needs to call APIs or perform multi-step tasks, Agents is the answer.
Guardrails
Amazon Bedrock Guardrails help you implement safeguards for your generative AI applications.
- What: Configurable policies that filter and control FM inputs and outputs
- Capabilities:
- Content Filters: Block harmful content across categories (hate, insults, sexual, violence, misconduct) with configurable thresholds (NONE, LOW, MEDIUM, HIGH)
- Denied Topics: Define specific topics the model should refuse to discuss
- Word Filters: Block specific words, phrases, or profanity
- Sensitive Information Filters (PII): Detect and redact personally identifiable information (names, SSNs, credit cards, etc.)
- Contextual Grounding Check: Verify that model responses are grounded in the provided source information (reduces hallucinations)
- How They Work: Applied to both input (user prompts) and output (model responses)
- Integration: Can be attached to FM invocations, Agents, and Knowledge Bases
- ApplyGuardrail API: Can be used independently of model calls to filter any text
Exam Tip: Guardrails are the primary mechanism for responsible AI in Bedrock. If the question asks about blocking harmful content, PII redaction, or preventing the model from discussing certain topics, Guardrails is the answer.
Prompt Management
Amazon Bedrock Prompt Management helps you create, manage, and version prompts for your generative AI applications.
- Prompt Templates: Create reusable prompt templates with variables
- Prompt Versioning: Track changes to prompts over time
- Prompt Flows: Build visual workflows that chain multiple prompts and model calls together
- Variables: Use placeholders in prompts that get filled at runtime
- A/B Testing: Compare different prompt versions for quality
Exam Tip: Prompt Management is about operationalizing prompt engineering — versioning, reuse, and workflow orchestration.
Model Invocation
Invoke Model API
- Synchronous:— Send a prompt, get a response
InvokeModel - Streaming:— Get responses token by token as they're generated
InvokeModelWithResponseStream - Converse API: A unified API that works across all models with a consistent format (recommended for multi-turn conversations)
Batch Inference
- What: Process large volumes of prompts in bulk, asynchronously
- How: Submit a batch job with prompts stored in S3; results written to S3
- When to Use:
- Processing thousands/millions of prompts
- Non-time-sensitive workloads
- Cost optimization (batch pricing is cheaper)
- Benefits: Up to 50% cost savings compared to on-demand inference
Exam Tip: Batch inference = large-scale, async, cheaper. If the question mentions processing many prompts at once or cost optimization for non-real-time tasks, batch inference is the answer.
Provisioned Throughput
- What: Reserve dedicated model inference capacity for consistent, predictable performance
- When to Use:
- You have consistent, high-volume inference workloads
- You need guaranteed latency (no throttling)
- You're running a custom model (fine-tuned or continued pre-training)
- Commitment: 1-month or 6-month terms (with discounts)
- Key Concept: Measured in Model Units — a fixed amount of throughput capacity
Exam Tip: Provisioned throughput = reserved capacity for consistent performance. If the question says "predictable latency" or "high throughput" or "custom model in production," think Provisioned Throughput.
Pricing
| Pricing Model | How It Works | Best For |
|---|---|---|
| On-Demand | Pay per input/output token processed | Variable, unpredictable workloads |
| Batch | Submit batch jobs; up to 50% cheaper than on-demand | Large-scale, non-time-sensitive processing |
| Provisioned Throughput | Reserve capacity with 1-month or 6-month commitment | Consistent, high-volume production workloads |
| Model Customization | Pay for training (per token processed during training) | Fine-tuning, continued pre-training |
| Knowledge Bases | Pay for storage, embeddings, and retrieval queries | RAG applications |
Exam Tip: Know the three pricing tiers: On-Demand (most flexible, highest cost per token), Batch (cheapest, async), Provisioned (predictable, committed). The exam may ask which pricing model fits a given scenario.
Integration with CloudWatch
- Metrics: Monitor Bedrock API calls, latency, errors, throttling
- — Number of model invocations
InvocationCount - — Time to process requests
InvocationLatency - (4xx) and
InvocationClientErrors(5xx)InvocationServerErrors - — Number of throttled requests
InvocationThrottles
- Logs: Enable model invocation logging to capture:
- Full request and response payloads
- Input/output tokens
- Model ID, request ID, timestamps
- Log destinations: CloudWatch Logs or S3
- Alarms: Set alarms on metrics (e.g., alert when error rate exceeds threshold)
- Dashboards: Create visual dashboards for FM usage monitoring
Exam Tip: CloudWatch = operational monitoring for Bedrock. If the question asks about monitoring model performance, tracking usage, or alerting on errors, CloudWatch is the answer.
Integration with CloudTrail
- What: Records all API calls made to Amazon Bedrock as events
- What It Captures:
- Who made the API call (IAM identity)
- When the call was made (timestamp)
- What action was performed (API name)
- What resources were affected
- Source IP address
- Use Cases:
- Security auditing: Who accessed which model and when
- Compliance: Prove regulatory compliance with usage logs
- Forensics: Investigate unauthorized or unusual API activity
- Governance: Track model usage across the organization
Exam Tip: CloudTrail = who did what, when for Bedrock API calls. If the question is about auditing, compliance, or security investigation of Bedrock usage, CloudTrail is the answer. CloudTrail ≠ CloudWatch — CloudTrail is audit, CloudWatch is monitoring.
VPC Endpoints and PrivateLink
- What: Access Amazon Bedrock APIs privately from your VPC without traversing the public internet
- How: Create a VPC Interface Endpoint (powered by AWS PrivateLink) for Bedrock
- Benefits:
- Security: Data never leaves the AWS network
- Compliance: Meet regulatory requirements for private connectivity
- Performance: Lower latency, more reliable connectivity
- Endpoint Types:
- — For control plane operations (manage models, create endpoints)
bedrock - — For data plane operations (invoke models, run inference)
bedrock-runtime - — For Agent operations
bedrock-agent - — For Agent runtime operations
bedrock-agent-runtime
- Security: Use VPC Endpoint Policies to control which actions/resources can be accessed through the endpoint
Exam Tip: If the question mentions "private access to Bedrock," "no internet," or "data must not traverse public internet," the answer is VPC Endpoints (PrivateLink). Remember to create endpoints for BOTH
bedrockbedrock-runtime