Amazon Redshift
Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools. It uses Columnar Storage and Massively Parallel Processing (MPP).
Key Concepts
- Columnar Storage: Data is stored by columns rather than rows. Ideal for analytical queries (OLAP) which often aggregate data over a few columns (e.g., "Sum of Sales").
- MPP (Massively Parallel Processing): Distributes data and query execution across multiple nodes.
- Redshift Serverless: Automatically provisions and scales capacity in seconds.
- Redshift Spectrum: Query data directly in S3 (data lake) without loading it into Redshift tables.
- Zero-ETL: Integrate directly with Aurora, RDS, and DynamoDB without building complex pipelines.
Node Types
- RA3 Nodes: Separate compute and storage. Scale each independently. Best for most workloads.
- Dense Compute (DC): Best for high performance with less data (SSD based).
- Dense Storage (DS): (Legacy) Best for large storage needs at lower cost.
Exam Tips
- "Data Warehouse" / "OLAP" / "SQL Analytics": The answer is Redshift.
- "Columnar Storage": Redshift feature for performance.
- Single-AZ: Redshift is effectively Single-AZ (by default). If the AZ goes down, the cluster is unavailable (though data is backed up to S3). Note: Multi-AZ is now available for RA3 clusters but classic Redshift is often tested as Single-AZ.
- Redshift Spectrum: Keyword "Query data in S3 without loading it".
- BI Tools: Integrates with QuickSight, Tableau, PowerBI.
Common Use Cases
- Enterprise Data Warehousing.
- Big Data Analytics.
- Log Analysis.
- Migration from on-premise solutions like Teradata, Oracle DW.