Understanding CloudWatch Alarms

Watches a single metric over a specified time period, and performs specified actions,
It initiates actions on behalf.
An alarm can result in taking action on basis of metric value against a threshold over time period.
Action can be notification to SNS or Auto Scaling policy.
Can add alarms to dashboards.
Actions only for sustained state changes only.
Always select a period greater or equal to the frequency of the metric to be monitored.
Maximum limit to create 5000 alarms/Region in a AWS account.
To create or update an alarm, use PutMetricAlarm API action
Alarm names must contain only ASCII characters.
list currently configured alarms, by DescribeAlarms (mon-describe-alarms).
Disable or enable alarms by DisableAlarmActions and EnableAlarmActions
Test alarm by setting it to any state using SetAlarmState (mon-set-alarm-state).
View alarm’s history using DescribeAlarmHistory (mon-describe-alarm-history).
CloudWatch saves alarm history for two weeks.
The value of evaluation periods number for alarm multiplied by evaluation period length, should be less than one day.
Following permissions are required to create or change a Cloudwatch alarm
- For alarms with EC2 actions
  - iam:CreateServiceLinkedRole
  - iam:GetPolicy
  - iam:GetPolicyVersion
  - iam:GetRole
- For alarms on EC2 instance status metrics
  - ec2:DescribeInstanceStatus
  - ec2:DescribeInstances
- For alarms with stop actions
  - ec2:StopInstances
- For alarms with terminate actions
  - ec2:TerminateInstances
- No specific permissions are needed for alarms with recover actions.

CloudWatch Monitoring

Cloudwatch can be used to monitor
- EC2 instances
- Autoscaling Groups
- ELBs
- Route53 Health Checks
- EBS Volumes
- Storage Gateways
- CloudFront
- DynamoDB
- Other AWS services
- logs generated by applications and services.
EC2 will by default monitor instances @5 minute intervals
EC2 instances can monitor instances @1 minute intervals if the ‘detailed monitoring’ option is set on the instance
CloudWatch monitors following, by default
- CPU
- Network
- Disk
- Status Checks
RAM utilization metric
- is a custom metric
- has to be added manually to EC2 instances for tracking.
2 types of Status Checks:
- System Status Checks (Physical Host):
  - Checks the underlying physical host
  - Checks for loss of network connectivity
  - Checks for loss of system power
  - Checks for software issues on the physical host
  - Checks for hardware issues on the physical host
  - Stop the instance and start again, for resolution (will switch physical hosts)
- Instance Status Checks
  - Checks the VM itself
  - Checks for failed system status checks
  - Checks for mis-configured networking or startup configs
  - Checks for exhausted memory
  - Checks for corrupted file systems
  - Checks for an incompatible kernel
  - rebooting instance or changing instance OS, for troubleshooting
CloudWatch metrics are saved for 2 weeks only, by default
use GetMetricStatistics API endpoint to get data more than 2 weeks
Data from terminated EC2/ ELB instance, after termination can be obtained up to 2 weeks
As per service the default metrics can be 1 min or 3-5 minutes
The minimum granularity for custom metrics is 1 minute
Alarms can be created to monitor any CloudWatch metric in account
Alarms can include EC2, CPU, ELB, Latency, or even changes on AWS bill
Following can be specified in a alarm
- actions can be set
- triggering lambda functions or SNS notifications against a threshold

Alarm has states

OK –metric within threshold.
ALARM –metric outside threshold.
INSUFFICIENT_DATA – indicates that alarm has initiated but metric is not accessible

Data point reported to CloudWatch classified as

Not breaching (within the threshold)
Breaching (violating the threshold)
Missing

Missing data points against each alarm, can be treated as

notBreaching – Missing data points are treated as good and within the threshold,
breaching –data points are missing, considered as bad and assume to breach threshold
ignore –Present alarm state is held
missing –missing data points are not taken by the alarm, during the evaluation of changing the state

Become an AWS Certified Security – Specialty Professional with hundreds of practice tests and expert guidance. Try Free Now!

Understanding CloudWatch Alarms

Prepare for Assured Success