Back to AIF-C01 Catalog
ML Platform

Amazon SageMaker

"End-to-end machine learning platform for building, training, and deploying ML models at scale."

Amazon SageMaker

Amazon SageMaker is a fully managed, end-to-end machine learning platform that provides every tool needed to build, train, and deploy ML models. It is the most comprehensive ML service on AWS.

Exam Tip: SageMaker has MANY components. The exam tests whether you know which SageMaker feature to use for each ML task (data prep, labeling, training, deployment, monitoring). Know each component's purpose.


SageMaker Studio

Unified Interface

  • What: A single web-based IDE for all ML development activities
  • Features:
    • Integrated development environment for ML
    • Unified access to ALL SageMaker features
    • Collaboration features for teams
    • Git integration
    • Visual experiment tracking

Notebooks

  • JupyterLab notebooks: Interactive coding environment for data exploration and model prototyping
  • Pre-configured environments: Python, R, and ML libraries pre-installed
  • Managed infrastructure: Automatically provisions and scales compute
  • Kernel management: Switch between different compute instances and ML frameworks

SageMaker Data Wrangler

Data Preparation

  • What: Visual tool for data preparation and transformation — no coding required
  • Capabilities:
    • Import data from S3, Athena, Redshift, Snowflake, Databricks, and more
    • 300+ built-in data transformations
    • Visual data flow builder (drag-and-drop)
    • Export to SageMaker Pipelines, notebooks, or Python scripts
    • Quick model training to validate data quality

Data Quality Tool

  • Automatically assess data quality
  • Identify missing values, duplicates, outliers
  • Generate data quality reports
  • Recommend transformations for common data issues

Exam Tip: Data Wrangler = visual, no-code data preparation. If the question mentions preparing/transforming data for ML without coding, think Data Wrangler.


SageMaker Ground Truth

Data Labeling

  • What: Managed service for creating labeled training datasets using human annotators
  • How:
    • Define labeling task (image classification, text classification, bounding boxes, etc.)
    • Use Amazon Mechanical Turk, private workforce, or third-party vendors
    • Built-in workflow management for labeling tasks
  • Supported Tasks:
    • Image classification, object detection, semantic segmentation
    • Text classification, named entity recognition
    • Video frame labeling
    • 3D point cloud labeling
    • Audio classification

Ground Truth Plus

  • What: A premium, turnkey data labeling service where AWS expert workforce handles the entire labeling process
  • Difference from Ground Truth: With Plus, you provide the data and AWS manages the entire labeling workflow (project setup, workforce management, quality control)
  • Best For: Organizations that don't want to manage labeling workflows

Exam Tip: Ground Truth = data labeling. If the question involves creating labeled datasets for training ML models, the answer is Ground Truth. Ground Truth Plus = fully managed (AWS handles everything).


SageMaker Feature Store

  • What: A centralized, managed repository for storing, sharing, and managing ML features
  • Features:
    • Online Store: Low-latency feature retrieval for real-time inference
    • Offline Store: Historical feature data for training (stored in S3)
    • Feature versioning and lineage tracking
    • Feature sharing across teams and models
    • Batch or streaming ingestion
  • Use Case: Ensure consistent features between training and inference (avoid training-serving skew)

Exam Tip: Feature Store = central feature repository. If the question mentions reusing features across models or ensuring feature consistency between training and inference, think Feature Store.


SageMaker Training

Training Jobs

  • What: Managed infrastructure for training ML models at any scale
  • How:
    • Specify training algorithm (built-in or custom container)
    • Point to training data in S3
    • Choose instance type and count
    • SageMaker provisions infrastructure, runs training, stores model artifacts in S3
  • Instance Types: CPU, GPU (P3, P4, G4, G5), and ML-optimized instances
  • Spot Training: Use Spot Instances for up to 90% cost savings on training
  • Managed Warm Pools: Keep instances warm between training jobs to reduce startup time

Distributed Training

  • What: Train large models across multiple instances
  • Strategies:
    • Data Parallelism: Split training data across instances; each instance has a copy of the model
    • Model Parallelism: Split the model across instances when the model is too large for a single instance
  • Supported Frameworks: TensorFlow, PyTorch, MXNet, Hugging Face

SageMaker Autopilot

AutoML

  • What: Automatically builds, trains, and tunes ML models with no ML expertise required
  • How:
    • Provide a tabular dataset (CSV)
    • Specify the target column
    • Autopilot automatically:
      1. Analyzes data and identifies problem type
      2. Generates multiple ML pipelines (feature engineering + algorithms)
      3. Trains and tunes multiple models
      4. Ranks models by performance
      5. Provides full visibility into the generated code and models
  • Transparency: Unlike black-box AutoML, Autopilot provides all generated notebooks and code for review
  • Problem Types: Classification, regression, time-series forecasting

Exam Tip: Autopilot = AutoML (automatic model building). If the question mentions automatically building the best ML model from data without ML expertise, or the ability to see the generated code, think Autopilot.


SageMaker JumpStart

Pre-trained Models

  • What: A model hub with hundreds of pre-trained models ready for deployment or fine-tuning
  • Sources: Hugging Face, PyTorch Hub, TensorFlow Hub, and Amazon models
  • Capabilities:
    • One-click deployment of pre-trained models
    • Fine-tuning with your data (transfer learning)
    • Solution templates for common ML tasks

Foundation Models

  • Access to foundation models (similar to Bedrock but within SageMaker)
  • Deploy FMs on SageMaker infrastructure
  • Fine-tune FMs with SageMaker training jobs
  • Full control over model hosting infrastructure

Exam Tip: JumpStart = model zoo / pre-trained model hub. If the question mentions quickly deploying or fine-tuning a pre-trained model within SageMaker, think JumpStart. Key difference from Bedrock: JumpStart gives you more control over infrastructure; Bedrock is fully managed.


SageMaker Canvas

No-Code ML

  • What: Build ML models using a visual, no-code interface — designed for business analysts
  • No ML expertise required
  • How:
    • Import data
    • Select target column
    • Canvas automatically builds and trains models
    • Make predictions through a visual interface
  • Built-in Integrations: Connect to data in S3, Redshift, and local files
  • Model Types: Classification, regression, time-series forecasting, text and image analysis
  • Sharing: Share models with data scientists for review via SageMaker Studio

Exam Tip: Canvas = no-code ML for business users. If the question mentions business analysts building ML models without coding, the answer is Canvas.


SageMaker Clarify

Bias Detection

  • Detect bias in training data and model predictions
  • Pre-training bias metrics: Check data for imbalances before training
  • Post-training bias metrics: Check model predictions for unfair bias
  • Bias Types Detected: Class imbalance, label bias, feature bias, prediction bias

Explainability

  • Explain model predictions using SHAP (SHapley Additive exPlanations) values
  • Show which features contributed most to each prediction
  • Generate feature importance reports
  • Supports tabular, NLP, and computer vision models

Exam Tip: Clarify = bias detection + explainability. If the question asks about detecting bias in ML models or explaining why a model made a specific prediction, the answer is Clarify. This is a KEY service for responsible AI questions.


SageMaker Model Monitor

Model Quality Monitoring

  • Continuously monitor deployed models for quality degradation
  • Detect when model accuracy drops below acceptable thresholds
  • Compare current predictions against baseline
  • Alert when intervention is needed

Data Drift Detection

  • Detect when input data distribution changes over time (data drift)
  • Types of drift monitored:
    • Data Quality Drift: Changes in data statistics (mean, std, missing values)
    • Model Quality Drift: Changes in prediction accuracy
    • Bias Drift: Changes in fairness metrics over time
    • Feature Attribution Drift: Changes in feature importance

Exam Tip: Model Monitor = post-deployment monitoring. If the question asks about monitoring a deployed model's performance or detecting data drift in production, think Model Monitor.


SageMaker Debugger

  • What: Debug and profile ML training jobs in real-time
  • Capabilities:
    • Monitor training metrics (loss, accuracy) in real-time
    • Detect training issues (vanishing gradients, overfitting, exploding gradients)
    • Profile hardware utilization (CPU, GPU, memory, I/O)
    • Automatic issue detection with built-in rules
    • Generate profiling reports

SageMaker Experiments

  • What: Track, organize, and compare ML experiments
  • Capabilities:
    • Log hyperparameters, metrics, and artifacts for each training run
    • Compare experiments side-by-side
    • Visualize training curves
    • Organize experiments into groups
    • Reproduce past experiments

SageMaker Pipelines

CI/CD for ML

  • What: Build automated, repeatable ML workflows (MLOps)
  • How: Define a Directed Acyclic Graph (DAG) of ML steps
  • Steps Include:
    • Data processing
    • Model training
    • Model evaluation
    • Conditional logic (deploy if accuracy > threshold)
    • Model registration

Pipeline Steps

Step TypeDescription
ProcessingData preprocessing and transformation
TrainingModel training
TuningHyperparameter optimization
TransformBatch inference
ModelCreate/register model
ConditionBranching logic (if-then-else)
CallbackWait for external signal
LambdaRun AWS Lambda functions
Quality CheckData/model quality validation
Clarify CheckBias and explainability checks
FailMark pipeline as failed

Exam Tip: Pipelines = ML CI/CD (MLOps). If the question involves automating the ML workflow from data prep to deployment, think SageMaker Pipelines.


SageMaker Model Registry

Model Versioning

  • What: Central catalog for managing ML model versions
  • Capabilities:
    • Track model versions with metadata
    • Model approval workflow (Pending → Approved → Rejected)
    • Deploy approved models to endpoints
    • Integration with SageMaker Pipelines for automated registration
    • Track model lineage (training data, algorithm, hyperparameters)

SageMaker Inference

Real-time Inference

  • Deploy models as persistent endpoints for low-latency, synchronous predictions
  • Auto-scaling based on traffic
  • A/B testing with traffic splitting

Batch Transform

  • Process large datasets asynchronously (no persistent endpoint needed)
  • Pay only for compute time used
  • Results stored in S3

Serverless Inference

  • What: Auto-scaling inference endpoints that scale to zero when not in use
  • When to Use: Intermittent or unpredictable traffic
  • Benefits: Pay only when requests are made (no idle costs)
  • Limitation: Cold start latency when scaling from zero

Exam Tip: Know the three inference options: Real-time (persistent, low-latency), Batch (async, large datasets), Serverless (auto-scaling to zero, intermittent traffic).


SageMaker Edge Manager

  • Deploy and manage ML models on edge devices
  • Monitor model performance on edge
  • Over-the-air model updates
  • Supports devices like IoT sensors, cameras, industrial equipment

SageMaker Model Cards

Model Documentation

  • What: Create standardized documentation for ML models
  • Contents:
    • Model purpose and intended use
    • Training details (algorithm, data, hyperparameters)
    • Evaluation results and metrics
    • Ethical considerations
    • Known limitations
  • Purpose: Responsible AI — ensure transparency and accountability

Exam Tip: Model Cards = model documentation for responsible AI. If the question asks about documenting model purpose, limitations, and ethical considerations, think Model Cards.


SageMaker Model Dashboard

  • Centralized view of all deployed models
  • Monitor model health, performance, and compliance
  • Track model lineage and versions
  • Identify models that need attention

SageMaker Role Manager

  • Simplify creation of IAM roles for SageMaker with pre-built permissions
  • Role templates for common ML personas (data scientist, ML engineer, ML operations)
  • Least-privilege access by default

MLFlow on SageMaker

  • What: Run open-source MLflow on SageMaker managed infrastructure
  • Track experiments, package models, and deploy using the familiar MLflow interface
  • Managed servers (no infrastructure to manage)
  • Integration with SageMaker training and deployment

Network Isolation Mode

  • Run SageMaker training and inference in complete network isolation
  • No outbound network access from training containers
  • Data only accessible from S3 via VPC endpoints
  • Prevents data exfiltration during training
  • Ensures sensitive training data stays within the VPC
  • Required for some compliance scenarios

Exam Tip: Network Isolation = no internet access for training containers. If the question mentions preventing data exfiltration or ensuring models train without internet access, think Network Isolation Mode.


Quick Reference Table

SageMaker ComponentPurposeKey Exam Keyword
StudioUnified ML IDEDevelopment environment
Data WranglerVisual data prepNo-code data transformation
Ground TruthData labelingHuman labeling workforce
Feature StoreFeature managementReusable features, consistency
AutopilotAutoMLAutomatic model building
JumpStartModel hubPre-trained models
CanvasNo-code MLBusiness analysts
ClarifyBias & explainabilityResponsible AI, SHAP values
Model MonitorDrift detectionProduction monitoring
DebuggerTraining debuggingTraining issues, profiling
PipelinesML CI/CDMLOps, workflow automation
Model RegistryModel versioningModel catalog, approval workflow
Model CardsDocumentationResponsible AI, transparency
Edge ManagerEdge deploymentIoT, edge devices
Network IsolationSecurityNo internet, data protection
Amazon A2I
AI & ML Fundamentals
SWIPE ZONE
< DRAG ME >