CloudWatch

Amazon CloudWatch monitors

  • AWS resources
  • applications running on AWS

CloudWatch

  • collects and tracks metrics, for AWS resources and applications.
  • CloudWatch home page displays metrics about every AWS service in use.
  • Can create custom dashboards to display metrics
  • Alarms can be configured to monitor metrics and send notifications , if needed
  • Alarms can automatically make changes to the resources under monitoring against a threshold

Access CloudWatch by

CloudWatch Namespaces

A cloudwatch namespace is

  • It is a container for CloudWatch metrics.
  • Metrics are isolated if they are in different namespaces
  • There is no default namespace.
  • Must specify a namespace for each data point to be published to CloudWatch.
  • While creating a metric, provide namespace name.
  • These names must contain valid XML characters,
  • Be fewer than 256 characters in length.
  • Possible characters are: alphanumeric characters (0-9A-Za-z), period (.), hyphen (-), underscore (_), forward slash (/), hash (#), and colon (:).
  • The AWS namespaces, naming convention: AWS/service

CloudWatch Dimensions

A dimension

  • is a name/value pair
  • part of the identity of a metric.
  • a metric can be given a maximum of 10 dimensions
  • Used to describe characteristic of a metric
  • Also used to filter the results that CloudWatch returns
  • For few AWS services like EC2, CloudWatch can aggregate data across dimensions
  • Example – Server=Producton,Domain=City01

CloudWatch Statistics

  • It is metric data aggregations over specified periods of time.
  • Aggregations use the namespace, metric name, dimensions, and the data point unit of measure, within the specified time period.

Available statistics

  • Minimum  – lowest value observed during a period. indicates, when low activity
  • Maximum – highest value observed during a period. indicates, when high activity
  • Sum – Add all values submitted for matching metric indicates, total activity
  • Average – The value of Sum / SampleCount during a period.
  • SampleCount  – The count (number) of data points used for statistical calculation.
  • pNN.NN  – Value of specified percentile up to 2 decimal places like p95.45. Not for negative value  metrics.

CloudWatch Metrics

A cloudwatch metric

  • Is a group of data points which are arranged as per time and sent to CloudWatch.
  • To illustrate, consider it as a variable whose value changes over time and has to be monitored.
  • Data points are generated by all AWS services
  • AWS services send metrics to CloudWatch
  • Can send custom metrics to CloudWatch also
  • Can add data points in any order or at any rate
  • Retrieve statistics about data points as an ordered set of time-series data.
  • Metrics are specific to a Region in which were created
  • Metrics cannot be deleted,
  • By default all data point expire automatically, after 15 months if no new data is added.
  • They expire on a rolling basis; as new data points come in, data older than 15 months is dropped.
  • Metrics are defined uniquely by, specific
    • name
    • namespace
    • zero or more dimensions.
  • Each data point in a metric has a time stamp, and (optionally) a unit of measure.

CloudWatch Metrics Time Stamps

  • Each metric data point must be associated with a time stamp.
  • The range of time stamp value can be of past two weeks or future two hours
  • If no time stamp is given, CloudWatch creates a time stamp on time data point was received.
  • Time stamps are dateTime objects
  • Coordinated Universal Time (UTC) is recommended
  • Time values are specified in UTC, in CloudWatch
  • Metrics are checked by CloudWatch alarms with current time specified in UTC.

CloudWatch Metrics Retention

CloudWatch retains metric data as follows:

  • For a period <60 seconds, available for 3 hours. Also called as high-resolution custom metrics.
  • For a period of 60 seconds/1 minute, available for 15 days
  • For a period of 300 seconds/5 minute, available for 63 days
  • For a period of 3600 seconds/1 hour, available for 455 days (15 months)

CloudWatch Metrics Units

  • Each statistic has a unit of measure.
  • Few example metric units are
    • Bytes
    • Seconds
    • Count
    • Percent.
  • custom metric  creation needs unit to be specified
  • If not specified, CloudWatch uses None as the unit.
  • No significance is given to a unit by CloudWatch internally
  • unit of measure are aggregated separately  Metric data points that specify a unit of measure are aggregated separately.
  • Statistics without specifying a unit, CloudWatch aggregates all data points of the same unit together.

CloudWatch Metrics Periods

  • Period refers to duration of time linked with a specific CloudWatch statistic.
  • Periods defined in seconds, and valid values for period are 1, 5, 10, 30, or any multiple of 60.
  • For period of six minutes, use 360 as the period value.
  • varying period values, can help in see changes in data aggregation
  • sub-minute periods are supported for those custom metrics having storage resolution of 1 second
  • Retrieval of statistics needs
    • Period
    • start time
    • end time
  • The default values for the start time and end time get you the last hour’s worth of statistics.
  • For statistics aggregated over the entire hour, specify a period of 3600.
  • aggregated statistics are stamped with the time corresponding to the beginning of the period.
  • Periods are also important for CloudWatch alarms.

CloudWatch Metrics Aggregation

  • CloudWatch aggregates statistics as per specified period length
  • publish as many data points as needed with same or similar time stamps.
  • CloudWatch aggregates them as per specified period length.
  • CloudWatch does not aggregate data across Regions.
  • pre-aggregated dataset (statistic set ) should be added in case of large datasets
  • With statistic sets, gives Min, Max, Sum, and SampleCount for a number of data points.
  • No differentiation is done by CloudWatch on basis of source of metric.
  • metric with namespace and dimensions is treated as single metric, even if having different sources

CloudWatch Alarms

  • Watches a single metric over a specified time period, and performs specified actions,
  • It initiates actions on behalf.
  • An alarm can result in taking action on basis of metric value against a threshold over time period.
  • Action can be notification to SNS or Auto Scaling policy.
  • Can add alarms to dashboards.
  • Actions only for sustained state changes only.
  • Always select a period  greater or equal to the frequency of the metric to be monitored.
  • Maximum limit to create 5000 alarms/Region in a AWS account.
  • To create or update an alarm, use PutMetricAlarm API action
  • Alarm names must contain only ASCII characters.
  • list currently configured alarms, by DescribeAlarms (mon-describe-alarms).
  • Disable or enable alarms by DisableAlarmActions and EnableAlarmActions
  • Test alarm by setting it to any state using SetAlarmState (mon-set-alarm-state).
  • View alarm’s history using DescribeAlarmHistory (mon-describe-alarm-history).
  • CloudWatch saves alarm history for two weeks.
  • The value of evaluation periods number for alarm multiplied by evaluation period length, should be less than one day.
  • Following permissions are required to create or change a Cloudwatch alarm
    • For alarms with EC2 actions
      • iam:CreateServiceLinkedRole
      • iam:GetPolicy
      • iam:GetPolicyVersion
      • iam:GetRole
    • For alarms on EC2 instance status metrics
      • ec2:DescribeInstanceStatus
      • ec2:DescribeInstances
    • For alarms with stop actions
      • ec2:StopInstances
    • For alarms with terminate actions
      • ec2:TerminateInstances
    • No specific permissions are needed for alarms with recover actions.

CloudWatch Monitoring

  • Cloudwatch can be used to monitor
    • EC2 instances
    • Autoscaling Groups
    • ELBs
    • Route53 Health Checks
    • EBS Volumes
    • Storage Gateways
    • CloudFront
    • DynamoDB
    • Other AWS services
    • logs generated by applications and services.
  • EC2 will by default monitor instances @5 minute intervals
  • EC2 instances can monitor instances @1 minute intervals if the ‘detailed monitoring’ option is set on the instance
  • CloudWatch monitors following, by default
    • CPU
    • Network
    • Disk
    • Status Checks
  • RAM utilization metric
    • is a custom metric
    • has to be added manually to EC2 instances for tracking.
  • 2 types of Status Checks:
    • System Status Checks (Physical Host):
      • Checks the underlying physical host
      • Checks for loss of network connectivity
      • Checks for loss of system power
      • Checks for software issues on the physical host
      • Checks for hardware issues on the physical host
      • Stop the instance and start again, for resolution (will switch physical hosts)
    • Instance Status Checks
      • Checks the VM itself
      • Checks for failed system status checks
      • Checks for mis-configured networking or startup configs
      • Checks for exhausted memory
      • Checks for corrupted file systems
      • Checks for an incompatible kernel
      • rebooting instance or changing instance OS, for troubleshooting
  • CloudWatch metrics are saved for 2 weeks only, by default
  • use GetMetricStatistics API endpoint to get data more than 2 weeks
  • Data from terminated EC2/ ELB instance, after termination can be obtained up to 2 weeks
  • As per service the default metrics can be 1 min or 3-5 minutes
  • The minimum granularity for custom metrics is 1 minute
  • Alarms can be created to monitor any CloudWatch metric in account
  • Alarms can include EC2, CPU, ELB, Latency, or even changes on AWS bill
  • Following can be specified in a alarm
    • actions can be set
    • triggering lambda functions or SNS notifications against a threshold

Alarm has states

  • OK –metric within threshold.
  • ALARM –metric outside threshold.
  • INSUFFICIENT_DATA – indicates that alarm has initiated but metric is not accessible

Data point reported to CloudWatch classified as

  • Not breaching (within the threshold)
  • Breaching (violating the threshold)
  • Missing

CloudWatch Logs

  • CloudWatch is integrated with CloudTrail
  • CloudTrail provides record of actions taken by a user, role, or AWS service
  • CloudTrail captures API calls made by or on behalf of AWS account.
  • The calls captured include
    • calls from CloudWatch console
    • code calls to the CloudWatch API operations.
  • After trail creation, continuous delivery of CloudTrail events are done to S3 bucket
  • Actions logged in CloudTrail log files in CloudWatch are
    • DeleteAlarms
    • DeleteDashboards
    • DescribeAlarmHistory
    • DescribeAlarms
    • DescribeAlarmsForMetric
    • DisableAlarmActions
    • EnableAlarmActions
    • GetDashboard
    • ListDashboards
    • PutDashboard
    • PutMetricAlarm
    • SetAlarmState

CloudTrail

  • It is a web service that records API activity in AWS account.
  • It is enabled on AWS account when created.
  • All activity occurring in AWS account, is recorded in a CloudTrail event.
  • Activity of past 90 days can be viewed/ searched/downloaded from event history view
  • It logs information on
  • who made a request
    • the services used
    • the actions performed
    • parameters for the actions
    • the response elements returned by the AWS service.
  • Stores Logs in specific log group.
  • Logs provide specific information on what occurred in AWS account.
  • focuses more on AWS API calls made in AWS account.
  • helps in meeting compliance and regulatory standards.
  • Usually delivers an event within 15 minutes of the API call.
  • It helps you enable governance, compliance, and operational and risk auditing.
  • CloudTrail records all actions taken on user-wise/role-wise/service -wise
  • Events cover all actions in
    • AWS Management Console
    • AWS Command Line Interface
    • AWS SDKs and APIs.
  • Trail is a configuration which delivers event details to specified S3 bucket
  • Trail is employed for archival, analysis against changes in AWS resources
  • create a trail with
    • CloudTrail console
    • AWS CLI
    • CloudTrail API
  • Types of trails
    • A trail that applies to all regions – records events in each region. Default with console
    • A trail that applies to one region – records the events in that region only. Default option with AWS CLI or CloudTrail API.

CloudTrail Logs

  • Monitor existing system, application and custom logs in real time.
  • Send existing logs to CloudWatch; Create patterns to look for in logs; Alert based on finding of these patterns.
  • Free agents for Ubuntu, Amazon Linux, Windows.
  • Purpose
    • Monitor logs from EC2 instances in realtime. (track number of errors in application logs and send notification if exceed thresold)
    • Monitor AWS CloudTrail logged events (API Activity such as manual EC2 instance termination)
    • Archive log data (change log retention setting to automatically delete)
  • Log events – record stored to CloudWatch Logs with the Timestamp and Message to store.
  • Log Streams – Refers to the log events sequence sharing same resource (like for Apache access logs, they are automatically deleted after every 2 months).
  • Log Groups – Refer to log stream group sharing same settings for
    • Retention
    • monitoring
    • access control
  • CMetric Filters – define how a service would extract metric observations from events and turn them into data points for a CloudWatch metric.
  • Retention Settings – Settings for duration to keep events. Automatic deletion of expired logs.
  • The duration offered for Log Group Retention ranges from 1 day to 10 years.
  • CloudWatch Log Filters: filter log data pushed to CloudWatch; won’t work on existing log data, only work after log filter created, only returns
  • first 50 results. Metric contains 1. Filter Pattern 2. Metric Name 3. Metric NameSpace 4. Metric value
  • Modify rsyslog (/etc/rsyslog.d/50-default.conf) and remove auth on line number 9, sudo service rsyslog restart
  • Real-Time Log processing: It needs subscription Filters and applicable for AWS Kinesis Streams, AWS Lambda and AWS Kinesis Firehouse
  • aws kinesis command is used for creation/ describing stream. Command can also list the stream ARN. Them update the permissions.json file with ARN’s of the stream and role.

Advanced tasks with CloudTrail log files

  • Create multiple trails per region.
  • CloudWatch Logs are used to monitor CloudTrail log files
  • Share log files between accounts.
  • Log processing applications can be developed in Java by using CloudTrail Processing Library.
  • Validate log files to verify that they have not changed after delivery by CloudTrail.

To receive CloudTrail log files from multiple regions

  • Sign in to the AWS Management Console and open the CloudTrail console at https://console.aws.amazon.com/cloudtrail/.
  • Choose the option – “Trails”, and then select a trail name.
  • Next, click on pencil icon adjacent to “Apply trail to all regions”, and then select “Yes”.
  • Choose Save. The original trail will be replicated across all AWS regions. CloudTrail will deliver log files present in all regions to S3 bucket.
Menu