Amazon SageMaker
Amazon SageMaker is a fully managed, end-to-end machine learning platform that provides every tool needed to build, train, and deploy ML models. It is the most comprehensive ML service on AWS.
Exam Tip: SageMaker has MANY components. The exam tests whether you know which SageMaker feature to use for each ML task (data prep, labeling, training, deployment, monitoring). Know each component's purpose.
SageMaker Studio
Unified Interface
- What: A single web-based IDE for all ML development activities
- Features:
- Integrated development environment for ML
- Unified access to ALL SageMaker features
- Collaboration features for teams
- Git integration
- Visual experiment tracking
Notebooks
- JupyterLab notebooks: Interactive coding environment for data exploration and model prototyping
- Pre-configured environments: Python, R, and ML libraries pre-installed
- Managed infrastructure: Automatically provisions and scales compute
- Kernel management: Switch between different compute instances and ML frameworks
SageMaker Data Wrangler
Data Preparation
- What: Visual tool for data preparation and transformation — no coding required
- Capabilities:
- Import data from S3, Athena, Redshift, Snowflake, Databricks, and more
- 300+ built-in data transformations
- Visual data flow builder (drag-and-drop)
- Export to SageMaker Pipelines, notebooks, or Python scripts
- Quick model training to validate data quality
Data Quality Tool
- Automatically assess data quality
- Identify missing values, duplicates, outliers
- Generate data quality reports
- Recommend transformations for common data issues
Exam Tip: Data Wrangler = visual, no-code data preparation. If the question mentions preparing/transforming data for ML without coding, think Data Wrangler.
SageMaker Ground Truth
Data Labeling
- What: Managed service for creating labeled training datasets using human annotators
- How:
- Define labeling task (image classification, text classification, bounding boxes, etc.)
- Use Amazon Mechanical Turk, private workforce, or third-party vendors
- Built-in workflow management for labeling tasks
- Supported Tasks:
- Image classification, object detection, semantic segmentation
- Text classification, named entity recognition
- Video frame labeling
- 3D point cloud labeling
- Audio classification
Ground Truth Plus
- What: A premium, turnkey data labeling service where AWS expert workforce handles the entire labeling process
- Difference from Ground Truth: With Plus, you provide the data and AWS manages the entire labeling workflow (project setup, workforce management, quality control)
- Best For: Organizations that don't want to manage labeling workflows
Exam Tip: Ground Truth = data labeling. If the question involves creating labeled datasets for training ML models, the answer is Ground Truth. Ground Truth Plus = fully managed (AWS handles everything).
SageMaker Feature Store
- What: A centralized, managed repository for storing, sharing, and managing ML features
- Features:
- Online Store: Low-latency feature retrieval for real-time inference
- Offline Store: Historical feature data for training (stored in S3)
- Feature versioning and lineage tracking
- Feature sharing across teams and models
- Batch or streaming ingestion
- Use Case: Ensure consistent features between training and inference (avoid training-serving skew)
Exam Tip: Feature Store = central feature repository. If the question mentions reusing features across models or ensuring feature consistency between training and inference, think Feature Store.
SageMaker Training
Training Jobs
- What: Managed infrastructure for training ML models at any scale
- How:
- Specify training algorithm (built-in or custom container)
- Point to training data in S3
- Choose instance type and count
- SageMaker provisions infrastructure, runs training, stores model artifacts in S3
- Instance Types: CPU, GPU (P3, P4, G4, G5), and ML-optimized instances
- Spot Training: Use Spot Instances for up to 90% cost savings on training
- Managed Warm Pools: Keep instances warm between training jobs to reduce startup time
Distributed Training
- What: Train large models across multiple instances
- Strategies:
- Data Parallelism: Split training data across instances; each instance has a copy of the model
- Model Parallelism: Split the model across instances when the model is too large for a single instance
- Supported Frameworks: TensorFlow, PyTorch, MXNet, Hugging Face
SageMaker Autopilot
AutoML
- What: Automatically builds, trains, and tunes ML models with no ML expertise required
- How:
- Provide a tabular dataset (CSV)
- Specify the target column
- Autopilot automatically:
- Analyzes data and identifies problem type
- Generates multiple ML pipelines (feature engineering + algorithms)
- Trains and tunes multiple models
- Ranks models by performance
- Provides full visibility into the generated code and models
- Transparency: Unlike black-box AutoML, Autopilot provides all generated notebooks and code for review
- Problem Types: Classification, regression, time-series forecasting
Exam Tip: Autopilot = AutoML (automatic model building). If the question mentions automatically building the best ML model from data without ML expertise, or the ability to see the generated code, think Autopilot.
SageMaker JumpStart
Pre-trained Models
- What: A model hub with hundreds of pre-trained models ready for deployment or fine-tuning
- Sources: Hugging Face, PyTorch Hub, TensorFlow Hub, and Amazon models
- Capabilities:
- One-click deployment of pre-trained models
- Fine-tuning with your data (transfer learning)
- Solution templates for common ML tasks
Foundation Models
- Access to foundation models (similar to Bedrock but within SageMaker)
- Deploy FMs on SageMaker infrastructure
- Fine-tune FMs with SageMaker training jobs
- Full control over model hosting infrastructure
Exam Tip: JumpStart = model zoo / pre-trained model hub. If the question mentions quickly deploying or fine-tuning a pre-trained model within SageMaker, think JumpStart. Key difference from Bedrock: JumpStart gives you more control over infrastructure; Bedrock is fully managed.
SageMaker Canvas
No-Code ML
- What: Build ML models using a visual, no-code interface — designed for business analysts
- No ML expertise required
- How:
- Import data
- Select target column
- Canvas automatically builds and trains models
- Make predictions through a visual interface
- Built-in Integrations: Connect to data in S3, Redshift, and local files
- Model Types: Classification, regression, time-series forecasting, text and image analysis
- Sharing: Share models with data scientists for review via SageMaker Studio
Exam Tip: Canvas = no-code ML for business users. If the question mentions business analysts building ML models without coding, the answer is Canvas.
SageMaker Clarify
Bias Detection
- Detect bias in training data and model predictions
- Pre-training bias metrics: Check data for imbalances before training
- Post-training bias metrics: Check model predictions for unfair bias
- Bias Types Detected: Class imbalance, label bias, feature bias, prediction bias
Explainability
- Explain model predictions using SHAP (SHapley Additive exPlanations) values
- Show which features contributed most to each prediction
- Generate feature importance reports
- Supports tabular, NLP, and computer vision models
Exam Tip: Clarify = bias detection + explainability. If the question asks about detecting bias in ML models or explaining why a model made a specific prediction, the answer is Clarify. This is a KEY service for responsible AI questions.
SageMaker Model Monitor
Model Quality Monitoring
- Continuously monitor deployed models for quality degradation
- Detect when model accuracy drops below acceptable thresholds
- Compare current predictions against baseline
- Alert when intervention is needed
Data Drift Detection
- Detect when input data distribution changes over time (data drift)
- Types of drift monitored:
- Data Quality Drift: Changes in data statistics (mean, std, missing values)
- Model Quality Drift: Changes in prediction accuracy
- Bias Drift: Changes in fairness metrics over time
- Feature Attribution Drift: Changes in feature importance
Exam Tip: Model Monitor = post-deployment monitoring. If the question asks about monitoring a deployed model's performance or detecting data drift in production, think Model Monitor.
SageMaker Debugger
- What: Debug and profile ML training jobs in real-time
- Capabilities:
- Monitor training metrics (loss, accuracy) in real-time
- Detect training issues (vanishing gradients, overfitting, exploding gradients)
- Profile hardware utilization (CPU, GPU, memory, I/O)
- Automatic issue detection with built-in rules
- Generate profiling reports
SageMaker Experiments
- What: Track, organize, and compare ML experiments
- Capabilities:
- Log hyperparameters, metrics, and artifacts for each training run
- Compare experiments side-by-side
- Visualize training curves
- Organize experiments into groups
- Reproduce past experiments
SageMaker Pipelines
CI/CD for ML
- What: Build automated, repeatable ML workflows (MLOps)
- How: Define a Directed Acyclic Graph (DAG) of ML steps
- Steps Include:
- Data processing
- Model training
- Model evaluation
- Conditional logic (deploy if accuracy > threshold)
- Model registration
Pipeline Steps
| Step Type | Description |
|---|---|
| Processing | Data preprocessing and transformation |
| Training | Model training |
| Tuning | Hyperparameter optimization |
| Transform | Batch inference |
| Model | Create/register model |
| Condition | Branching logic (if-then-else) |
| Callback | Wait for external signal |
| Lambda | Run AWS Lambda functions |
| Quality Check | Data/model quality validation |
| Clarify Check | Bias and explainability checks |
| Fail | Mark pipeline as failed |
Exam Tip: Pipelines = ML CI/CD (MLOps). If the question involves automating the ML workflow from data prep to deployment, think SageMaker Pipelines.
SageMaker Model Registry
Model Versioning
- What: Central catalog for managing ML model versions
- Capabilities:
- Track model versions with metadata
- Model approval workflow (Pending → Approved → Rejected)
- Deploy approved models to endpoints
- Integration with SageMaker Pipelines for automated registration
- Track model lineage (training data, algorithm, hyperparameters)
SageMaker Inference
Real-time Inference
- Deploy models as persistent endpoints for low-latency, synchronous predictions
- Auto-scaling based on traffic
- A/B testing with traffic splitting
Batch Transform
- Process large datasets asynchronously (no persistent endpoint needed)
- Pay only for compute time used
- Results stored in S3
Serverless Inference
- What: Auto-scaling inference endpoints that scale to zero when not in use
- When to Use: Intermittent or unpredictable traffic
- Benefits: Pay only when requests are made (no idle costs)
- Limitation: Cold start latency when scaling from zero
Exam Tip: Know the three inference options: Real-time (persistent, low-latency), Batch (async, large datasets), Serverless (auto-scaling to zero, intermittent traffic).
SageMaker Edge Manager
- Deploy and manage ML models on edge devices
- Monitor model performance on edge
- Over-the-air model updates
- Supports devices like IoT sensors, cameras, industrial equipment
SageMaker Model Cards
Model Documentation
- What: Create standardized documentation for ML models
- Contents:
- Model purpose and intended use
- Training details (algorithm, data, hyperparameters)
- Evaluation results and metrics
- Ethical considerations
- Known limitations
- Purpose: Responsible AI — ensure transparency and accountability
Exam Tip: Model Cards = model documentation for responsible AI. If the question asks about documenting model purpose, limitations, and ethical considerations, think Model Cards.
SageMaker Model Dashboard
- Centralized view of all deployed models
- Monitor model health, performance, and compliance
- Track model lineage and versions
- Identify models that need attention
SageMaker Role Manager
- Simplify creation of IAM roles for SageMaker with pre-built permissions
- Role templates for common ML personas (data scientist, ML engineer, ML operations)
- Least-privilege access by default
MLFlow on SageMaker
- What: Run open-source MLflow on SageMaker managed infrastructure
- Track experiments, package models, and deploy using the familiar MLflow interface
- Managed servers (no infrastructure to manage)
- Integration with SageMaker training and deployment
Network Isolation Mode
- Run SageMaker training and inference in complete network isolation
- No outbound network access from training containers
- Data only accessible from S3 via VPC endpoints
- Prevents data exfiltration during training
- Ensures sensitive training data stays within the VPC
- Required for some compliance scenarios
Exam Tip: Network Isolation = no internet access for training containers. If the question mentions preventing data exfiltration or ensuring models train without internet access, think Network Isolation Mode.
Quick Reference Table
| SageMaker Component | Purpose | Key Exam Keyword |
|---|---|---|
| Studio | Unified ML IDE | Development environment |
| Data Wrangler | Visual data prep | No-code data transformation |
| Ground Truth | Data labeling | Human labeling workforce |
| Feature Store | Feature management | Reusable features, consistency |
| Autopilot | AutoML | Automatic model building |
| JumpStart | Model hub | Pre-trained models |
| Canvas | No-code ML | Business analysts |
| Clarify | Bias & explainability | Responsible AI, SHAP values |
| Model Monitor | Drift detection | Production monitoring |
| Debugger | Training debugging | Training issues, profiling |
| Pipelines | ML CI/CD | MLOps, workflow automation |
| Model Registry | Model versioning | Model catalog, approval workflow |
| Model Cards | Documentation | Responsible AI, transparency |
| Edge Manager | Edge deployment | IoT, edge devices |
| Network Isolation | Security | No internet, data protection |