Back to Catalog
Data & Analytics

Amazon Redshift

"Petabyte-scale data warehouse."

Amazon Redshift

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools. It uses Columnar Storage and Massively Parallel Processing (MPP).

Key Concepts

  • Columnar Storage: Data is stored by columns rather than rows. Ideal for analytical queries (OLAP) which often aggregate data over a few columns (e.g., "Sum of Sales").
  • MPP (Massively Parallel Processing): Distributes data and query execution across multiple nodes.
  • Redshift Serverless: Automatically provisions and scales capacity in seconds.
  • Redshift Spectrum: Query data directly in S3 (data lake) without loading it into Redshift tables.
  • Zero-ETL: Integrate directly with Aurora, RDS, and DynamoDB without building complex pipelines.

Node Types

  • RA3 Nodes: Separate compute and storage. Scale each independently. Best for most workloads.
  • Dense Compute (DC): Best for high performance with less data (SSD based).
  • Dense Storage (DS): (Legacy) Best for large storage needs at lower cost.

Exam Tips

  • "Data Warehouse" / "OLAP" / "SQL Analytics": The answer is Redshift.
  • "Columnar Storage": Redshift feature for performance.
  • Single-AZ: Redshift is effectively Single-AZ (by default). If the AZ goes down, the cluster is unavailable (though data is backed up to S3). Note: Multi-AZ is now available for RA3 clusters but classic Redshift is often tested as Single-AZ.
  • Redshift Spectrum: Keyword "Query data in S3 without loading it".
  • BI Tools: Integrates with QuickSight, Tableau, PowerBI.

Common Use Cases

  • Enterprise Data Warehousing.
  • Big Data Analytics.
  • Log Analysis.
  • Migration from on-premise solutions like Teradata, Oracle DW.
Athena
EMR
SWIPE ZONE
< DRAG ME >