Understanding CloudWatch Alarms
- Watches a single metric over a specified time period, and performs specified actions,
- It initiates actions on behalf.
- An alarm can result in taking action on basis of metric value against a threshold over time period.
- Action can be notification to SNS or Auto Scaling policy.
- Can add alarms to dashboards.
- Actions only for sustained state changes only.
- Always select a period greater or equal to the frequency of the metric to be monitored.
- Maximum limit to create 5000 alarms/Region in a AWS account.
- To create or update an alarm, use PutMetricAlarm API action
- Alarm names must contain only ASCII characters.
- list currently configured alarms, by DescribeAlarms (mon-describe-alarms).
- Disable or enable alarms by DisableAlarmActions and EnableAlarmActions
- Test alarm by setting it to any state using SetAlarmState (mon-set-alarm-state).
- View alarm’s history using DescribeAlarmHistory (mon-describe-alarm-history).
- CloudWatch saves alarm history for two weeks.
- The value of evaluation periods number for alarm multiplied by evaluation period length, should be less than one day.
- Following permissions are required to create or change a Cloudwatch alarm
- For alarms with EC2 actions
- iam:CreateServiceLinkedRole
 - iam:GetPolicy
 - iam:GetPolicyVersion
 - iam:GetRole
 
 - For alarms on EC2 instance status metrics
- ec2:DescribeInstanceStatus
 - ec2:DescribeInstances
 
 - For alarms with stop actions
- ec2:StopInstances
 
 - For alarms with terminate actions
- ec2:TerminateInstances
 
 - No specific permissions are needed for alarms with recover actions.
 
- For alarms with EC2 actions
CloudWatch Monitoring
- Cloudwatch can be used to monitor
- EC2 instances
 - Autoscaling Groups
 - ELBs
 - Route53 Health Checks
 - EBS Volumes
 - Storage Gateways
 - CloudFront
 - DynamoDB
 - Other AWS services
 - logs generated by applications and services.
 
- EC2 will by default monitor instances @5 minute intervals
- EC2 instances can monitor instances @1 minute intervals if the ‘detailed monitoring’ option is set on the instance
- CloudWatch monitors following, by default
- CPU
 - Network
 - Disk
 - Status Checks
 
- RAM utilization metric
- is a custom metric
 - has to be added manually to EC2 instances for tracking.
 
- 2 types of Status Checks:
- System Status Checks (Physical Host):
- Checks the underlying physical host
 - Checks for loss of network connectivity
 - Checks for loss of system power
 - Checks for software issues on the physical host
 - Checks for hardware issues on the physical host
 - Stop the instance and start again, for resolution (will switch physical hosts)
 
 - Instance Status Checks
- Checks the VM itself
 - Checks for failed system status checks
 - Checks for mis-configured networking or startup configs
 - Checks for exhausted memory
 - Checks for corrupted file systems
 - Checks for an incompatible kernel
 - rebooting instance or changing instance OS, for troubleshooting
 
 
- System Status Checks (Physical Host):
- CloudWatch metrics are saved for 2 weeks only, by default
- use GetMetricStatistics API endpoint to get data more than 2 weeks
- Data from terminated EC2/ ELB instance, after termination can be obtained up to 2 weeks
- As per service the default metrics can be 1 min or 3-5 minutes
- The minimum granularity for custom metrics is 1 minute
- Alarms can be created to monitor any CloudWatch metric in account
- Alarms can include EC2, CPU, ELB, Latency, or even changes on AWS bill
- Following can be specified in a alarm
- actions can be set
 - triggering lambda functions or SNS notifications against a threshold
 

Alarm has states
- OK –metric within threshold.
- ALARM –metric outside threshold.
- INSUFFICIENT_DATA – indicates that alarm has initiated but metric is not accessible
Data point reported to CloudWatch classified as
- Not breaching (within the threshold)
- Breaching (violating the threshold)
- Missing

Missing data points against each alarm, can be treated as
- notBreaching – Missing data points are treated as good and within the threshold,
- breaching –data points are missing, considered as bad and assume to breach threshold
- ignore –Present alarm state is held
- missing –missing data points are not taken by the alarm, during the evaluation of changing the state
Become an AWS Certified Security – Specialty Professional with hundreds of practice tests and expert guidance. Try Free Now!
AWS Certified Security - Specialty Free Practice TestTake a Quiz
		