Amazon SageMaker

Amazon SageMaker is a fully managed, end-to-end machine learning platform that provides every tool needed to build, train, and deploy ML models. It is the most comprehensive ML service on AWS.

Exam Tip: SageMaker has MANY components. The exam tests whether you know which SageMaker feature to use for each ML task (data prep, labeling, training, deployment, monitoring). Know each component's purpose.

SageMaker Studio

Unified Interface

What: A single web-based IDE for all ML development activities
Features:
- Integrated development environment for ML
- Unified access to ALL SageMaker features
- Collaboration features for teams
- Git integration
- Visual experiment tracking

Notebooks

JupyterLab notebooks: Interactive coding environment for data exploration and model prototyping
Pre-configured environments: Python, R, and ML libraries pre-installed
Managed infrastructure: Automatically provisions and scales compute
Kernel management: Switch between different compute instances and ML frameworks

SageMaker Data Wrangler

Data Preparation

What: Visual tool for data preparation and transformation — no coding required
Capabilities:
- Import data from S3, Athena, Redshift, Snowflake, Databricks, and more
- 300+ built-in data transformations
- Visual data flow builder (drag-and-drop)
- Export to SageMaker Pipelines, notebooks, or Python scripts
- Quick model training to validate data quality

Data Quality Tool

Automatically assess data quality
Identify missing values, duplicates, outliers
Generate data quality reports
Recommend transformations for common data issues

Exam Tip: Data Wrangler = visual, no-code data preparation. If the question mentions preparing/transforming data for ML without coding, think Data Wrangler.

SageMaker Ground Truth

Data Labeling

What: Managed service for creating labeled training datasets using human annotators
How:
- Define labeling task (image classification, text classification, bounding boxes, etc.)
- Use Amazon Mechanical Turk, private workforce, or third-party vendors
- Built-in workflow management for labeling tasks
Supported Tasks:
- Image classification, object detection, semantic segmentation
- Text classification, named entity recognition
- Video frame labeling
- 3D point cloud labeling
- Audio classification

Ground Truth Plus

What: A premium, turnkey data labeling service where AWS expert workforce handles the entire labeling process
Difference from Ground Truth: With Plus, you provide the data and AWS manages the entire labeling workflow (project setup, workforce management, quality control)
Best For: Organizations that don't want to manage labeling workflows

Exam Tip: Ground Truth = data labeling. If the question involves creating labeled datasets for training ML models, the answer is Ground Truth. Ground Truth Plus = fully managed (AWS handles everything).

SageMaker Feature Store

What: A centralized, managed repository for storing, sharing, and managing ML features
Features:
- Online Store: Low-latency feature retrieval for real-time inference
- Offline Store: Historical feature data for training (stored in S3)
- Feature versioning and lineage tracking
- Feature sharing across teams and models
- Batch or streaming ingestion
Use Case: Ensure consistent features between training and inference (avoid training-serving skew)

Exam Tip: Feature Store = central feature repository. If the question mentions reusing features across models or ensuring feature consistency between training and inference, think Feature Store.

SageMaker Training

Training Jobs

What: Managed infrastructure for training ML models at any scale
How:
- Specify training algorithm (built-in or custom container)
- Point to training data in S3
- Choose instance type and count
- SageMaker provisions infrastructure, runs training, stores model artifacts in S3
Instance Types: CPU, GPU (P3, P4, G4, G5), and ML-optimized instances
Spot Training: Use Spot Instances for up to 90% cost savings on training
Managed Warm Pools: Keep instances warm between training jobs to reduce startup time

Distributed Training

What: Train large models across multiple instances
Strategies:
- Data Parallelism: Split training data across instances; each instance has a copy of the model
- Model Parallelism: Split the model across instances when the model is too large for a single instance
Supported Frameworks: TensorFlow, PyTorch, MXNet, Hugging Face

SageMaker Autopilot

AutoML

What: Automatically builds, trains, and tunes ML models with no ML expertise required
How:
- Provide a tabular dataset (CSV)
- Specify the target column
- Autopilot automatically:
  1. Analyzes data and identifies problem type
  2. Generates multiple ML pipelines (feature engineering + algorithms)
  3. Trains and tunes multiple models
  4. Ranks models by performance
  5. Provides full visibility into the generated code and models
Transparency: Unlike black-box AutoML, Autopilot provides all generated notebooks and code for review
Problem Types: Classification, regression, time-series forecasting

Exam Tip: Autopilot = AutoML (automatic model building). If the question mentions automatically building the best ML model from data without ML expertise, or the ability to see the generated code, think Autopilot.

SageMaker JumpStart

Pre-trained Models

What: A model hub with hundreds of pre-trained models ready for deployment or fine-tuning
Sources: Hugging Face, PyTorch Hub, TensorFlow Hub, and Amazon models
Capabilities:
- One-click deployment of pre-trained models
- Fine-tuning with your data (transfer learning)
- Solution templates for common ML tasks

Foundation Models

Access to foundation models (similar to Bedrock but within SageMaker)
Deploy FMs on SageMaker infrastructure
Fine-tune FMs with SageMaker training jobs
Full control over model hosting infrastructure

Exam Tip: JumpStart = model zoo / pre-trained model hub. If the question mentions quickly deploying or fine-tuning a pre-trained model within SageMaker, think JumpStart. Key difference from Bedrock: JumpStart gives you more control over infrastructure; Bedrock is fully managed.

SageMaker Canvas

No-Code ML

What: Build ML models using a visual, no-code interface — designed for business analysts
No ML expertise required
How:
- Import data
- Select target column
- Canvas automatically builds and trains models
- Make predictions through a visual interface
Built-in Integrations: Connect to data in S3, Redshift, and local files
Model Types: Classification, regression, time-series forecasting, text and image analysis
Sharing: Share models with data scientists for review via SageMaker Studio

Exam Tip: Canvas = no-code ML for business users. If the question mentions business analysts building ML models without coding, the answer is Canvas.

SageMaker Clarify

Bias Detection

Detect bias in training data and model predictions
Pre-training bias metrics: Check data for imbalances before training
Post-training bias metrics: Check model predictions for unfair bias
Bias Types Detected: Class imbalance, label bias, feature bias, prediction bias

Explainability

Explain model predictions using SHAP (SHapley Additive exPlanations) values
Show which features contributed most to each prediction
Generate feature importance reports
Supports tabular, NLP, and computer vision models

Exam Tip: Clarify = bias detection + explainability. If the question asks about detecting bias in ML models or explaining why a model made a specific prediction, the answer is Clarify. This is a KEY service for responsible AI questions.

SageMaker Model Monitor

Model Quality Monitoring

Continuously monitor deployed models for quality degradation
Detect when model accuracy drops below acceptable thresholds
Compare current predictions against baseline
Alert when intervention is needed

Data Drift Detection

Detect when input data distribution changes over time (data drift)
Types of drift monitored:
- Data Quality Drift: Changes in data statistics (mean, std, missing values)
- Model Quality Drift: Changes in prediction accuracy
- Bias Drift: Changes in fairness metrics over time
- Feature Attribution Drift: Changes in feature importance

Exam Tip: Model Monitor = post-deployment monitoring. If the question asks about monitoring a deployed model's performance or detecting data drift in production, think Model Monitor.

SageMaker Debugger

What: Debug and profile ML training jobs in real-time
Capabilities:
- Monitor training metrics (loss, accuracy) in real-time
- Detect training issues (vanishing gradients, overfitting, exploding gradients)
- Profile hardware utilization (CPU, GPU, memory, I/O)
- Automatic issue detection with built-in rules
- Generate profiling reports

SageMaker Experiments

What: Track, organize, and compare ML experiments
Capabilities:
- Log hyperparameters, metrics, and artifacts for each training run
- Compare experiments side-by-side
- Visualize training curves
- Organize experiments into groups
- Reproduce past experiments

SageMaker Pipelines

CI/CD for ML

What: Build automated, repeatable ML workflows (MLOps)
How: Define a Directed Acyclic Graph (DAG) of ML steps
Steps Include:
- Data processing
- Model training
- Model evaluation
- Conditional logic (deploy if accuracy > threshold)
- Model registration

Pipeline Steps

Step Type	Description
Processing	Data preprocessing and transformation
Training	Model training
Tuning	Hyperparameter optimization
Transform	Batch inference
Model	Create/register model
Condition	Branching logic (if-then-else)
Callback	Wait for external signal
Lambda	Run AWS Lambda functions
Quality Check	Data/model quality validation
Clarify Check	Bias and explainability checks
Fail	Mark pipeline as failed

Exam Tip: Pipelines = ML CI/CD (MLOps). If the question involves automating the ML workflow from data prep to deployment, think SageMaker Pipelines.

SageMaker Model Registry

Model Versioning

What: Central catalog for managing ML model versions
Capabilities:
- Track model versions with metadata
- Model approval workflow (Pending → Approved → Rejected)
- Deploy approved models to endpoints
- Integration with SageMaker Pipelines for automated registration
- Track model lineage (training data, algorithm, hyperparameters)

SageMaker Inference

Real-time Inference

Deploy models as persistent endpoints for low-latency, synchronous predictions
Auto-scaling based on traffic
A/B testing with traffic splitting

Batch Transform

Process large datasets asynchronously (no persistent endpoint needed)
Pay only for compute time used
Results stored in S3

Serverless Inference

What: Auto-scaling inference endpoints that scale to zero when not in use
When to Use: Intermittent or unpredictable traffic
Benefits: Pay only when requests are made (no idle costs)
Limitation: Cold start latency when scaling from zero

Exam Tip: Know the three inference options: Real-time (persistent, low-latency), Batch (async, large datasets), Serverless (auto-scaling to zero, intermittent traffic).

SageMaker Edge Manager

Deploy and manage ML models on edge devices
Monitor model performance on edge
Over-the-air model updates
Supports devices like IoT sensors, cameras, industrial equipment

SageMaker Model Cards

Model Documentation

What: Create standardized documentation for ML models
Contents:
- Model purpose and intended use
- Training details (algorithm, data, hyperparameters)
- Evaluation results and metrics
- Ethical considerations
- Known limitations
Purpose: Responsible AI — ensure transparency and accountability

Exam Tip: Model Cards = model documentation for responsible AI. If the question asks about documenting model purpose, limitations, and ethical considerations, think Model Cards.

SageMaker Model Dashboard

Centralized view of all deployed models
Monitor model health, performance, and compliance
Track model lineage and versions
Identify models that need attention

SageMaker Role Manager

Simplify creation of IAM roles for SageMaker with pre-built permissions
Role templates for common ML personas (data scientist, ML engineer, ML operations)
Least-privilege access by default

MLFlow on SageMaker

What: Run open-source MLflow on SageMaker managed infrastructure
Track experiments, package models, and deploy using the familiar MLflow interface
Managed servers (no infrastructure to manage)
Integration with SageMaker training and deployment

Network Isolation Mode

Run SageMaker training and inference in complete network isolation
No outbound network access from training containers
Data only accessible from S3 via VPC endpoints
Prevents data exfiltration during training
Ensures sensitive training data stays within the VPC
Required for some compliance scenarios

Exam Tip: Network Isolation = no internet access for training containers. If the question mentions preventing data exfiltration or ensuring models train without internet access, think Network Isolation Mode.

Quick Reference Table

SageMaker Component	Purpose	Key Exam Keyword
Studio	Unified ML IDE	Development environment
Data Wrangler	Visual data prep	No-code data transformation
Ground Truth	Data labeling	Human labeling workforce
Feature Store	Feature management	Reusable features, consistency
Autopilot	AutoML	Automatic model building
JumpStart	Model hub	Pre-trained models
Canvas	No-code ML	Business analysts
Clarify	Bias & explainability	Responsible AI, SHAP values
Model Monitor	Drift detection	Production monitoring
Debugger	Training debugging	Training issues, profiling
Pipelines	ML CI/CD	MLOps, workflow automation
Model Registry	Model versioning	Model catalog, approval workflow
Model Cards	Documentation	Responsible AI, transparency
Edge Manager	Edge deployment	IoT, edge devices
Network Isolation	Security	No internet, data protection