AWS Batch

AWS Batch enables developers, scientists, and engineers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS. It dynamically provisions the optimal quantity and type of compute resources (e.g., CPU or memory-optimized instances) based on the specific volume and resource requirements of the batch jobs submitted.

Key Concepts

Job: A unit of work (e.g., a shell script, a Linux executable, or a Docker container image) that you submit to AWS Batch.
Job Definition: A blueprint for the job (like a template). It specifies the Docker image, vCPUs, memory, IAM roles, and environment variables.
Job Queue: Where you submit jobs. Queues can have priorities. Jobs wait here until they are scheduled onto a compute environment.
Compute Environment: The actual compute resources (EC2 instances or Fargate) that run the jobs.
- Managed: AWS scales the capacity for you.
- Unmanaged: You manage your own EC2 instances (less common).

Features

Fully Managed: No need to install and manage batch computing software or server clusters.
Cost-Optimized: Can use Spot Instances to reduce costs by up to 90%.
Container Support: Native support for Docker containers.
Multi-Node Parallel Jobs: Run jobs that span multiple EC2 instances (tightly coupled HPC workloads).
Event-Driven: Can be triggered by CloudWatch Events (EventBridge) or S3 uploads.

Exam Tips

Batch vs Lambda:
- Use Lambda for short-lived (< 15 min), real-time, event-driven tasks.
- Use Batch for long-running, complex, compute-intensive tasks (e.g., DNA sequencing, Rendering, Risk analysis).
Docker: Batch is essentially a wrapper around ECS (Elastic Container Service) to orchestrate batch jobs.
Spot Instances: Always mention using Spot instances with AWS Batch for cost savings on retryable workloads.

Common Use Cases

Financial risk modeling.
Genomic analysis.
Media transcoding.
Visual effects rendering.