What is Amazon CloudWatch?
Amazon CloudWatch is a monitoring and observability service built for DevOps engineers, developers, site reliability engineers (SREs), and IT managers. It provides data and actionable insights to monitor your applications, respond to system-wide performance changes, and optimize resource utilization. It collects monitoring and operational data in the form of logs, metrics, and events.
Key Concepts
1. Metrics
- Data points representing a time-ordered set of data values.
- Examples:
CPUUtilizationfor EC2,Throttlesfor DynamoDB. - Standard Monitoring: Metrics sent every 5 minutes (free for EC2).
- Detailed Monitoring: Metrics sent every 1 minute (paid).
2. Alarms
- Watch a specific metric and trigger an action based on the value relative to a threshold over time.
- Actions:
- Send notification (SNS topic -> Email/SMS).
- Auto Scaling actions (Scale out/in).
- EC2 actions (Stop/Terminate/Reboot).
3. Logs
- Centralized location to store, monitor, and access log files from EC2 instances, CloudTrail, Route 53, and other sources.
- CloudWatch Logs Insights: Interactive search and analysis of log data.
4. Dashboards
- Customizable home pages in the CloudWatch console to monitor your resources in a single view.
Exam Tips
[!IMPORTANT] CloudWatch vs. CloudTrail:
- CloudWatch monitors Performance (CPU, Memory, Logs).
- CloudTrail tracks API Calls (Who did what and when for auditing).
[!WARNING] If you want to stop an EC2 instance standardly when CPU is low (to save money), use a CloudWatch Alarm.
Common Use Cases
- Monitor Infrastructure: Track CPU usage, disk I/O, and network traffic of EC2 instances.
- Application Monitoring: Monitor response times and error rates.
- Log Aggregation: Collect logs from multiple servers into one central place.
- Billing Alarms: Set an alarm to notify you if your estimated AWS charges exceed a certain dollar amount.