Monitoring and Log Processing
CloudWatch
Amazon CloudWatch monitors
- AWS resources
- applications running on AWS
CloudWatch
- collects and tracks metrics, for AWS resources and applications.
- CloudWatch home page displays metrics about every AWS service in use.
- Can create custom dashboards to display metrics
- Alarms can be configured to monitor metrics and send notifications , if needed
- Alarms can automatically make changes to the resources under monitoring against a threshold
Access CloudWatch by
- Amazon CloudWatch console – https://console.aws.amazon.com/cloudwatch/
- AWS CLI
- CloudWatch API
- AWS SDKs
CloudWatch Namespaces
A cloudwatch namespace is
- It is a container for CloudWatch metrics.
- Metrics are isolated if they are in different namespaces
- There is no default namespace.
- Must specify a namespace for each data point to be published to CloudWatch.
- While creating a metric, provide namespace name.
- These names must contain valid XML characters,
- Be fewer than 256 characters in length.
- Possible characters are: alphanumeric characters (0-9A-Za-z), period (.), hyphen (-), underscore (_), forward slash (/), hash (#), and colon (:).
- The AWS namespaces, naming convention: AWS/service
CloudWatch Dimensions
A dimension
- is a name/value pair
- part of the identity of a metric.
- a metric can be given a maximum of 10 dimensions
- Used to describe characteristic of a metric
- Also used to filter the results that CloudWatch returns
- For few AWS services like EC2, CloudWatch can aggregate data across dimensions
- Example – Server=Producton,Domain=City01
CloudWatch Statistics
- It is metric data aggregations over specified periods of time.
- Aggregations use the namespace, metric name, dimensions, and the data point unit of measure, within the specified time period.
Available statistics
- Minimum – lowest value observed during a period. indicates, when low activity
- Maximum – highest value observed during a period. indicates, when high activity
- Sum – Add all values submitted for matching metric indicates, total activity
- Average – The value of Sum / SampleCount during a period.
- SampleCount – The count (number) of data points used for statistical calculation.
- pNN.NN – Value of specified percentile up to 2 decimal places like p95.45. Not for negative value metrics.
A cloudwatch metric
- Is a group of data points which are arranged as per time and sent to CloudWatch.
- To illustrate, consider it as a variable whose value changes over time and has to be monitored.
- Data points are generated by all AWS services
- AWS services send metrics to CloudWatch
- Can send custom metrics to CloudWatch also
- Can add data points in any order or at any rate
- Retrieve statistics about data points as an ordered set of time-series data.
- Metrics are specific to a Region in which were created
- Metrics cannot be deleted,
- By default all data point expire automatically, after 15 months if no new data is added.
- They expire on a rolling basis; as new data points come in, data older than 15 months is dropped.
- Metrics are defined uniquely by, specific
- name
- namespace
- zero or more dimensions.
- Each data point in a metric has a time stamp, and (optionally) a unit of measure.
CloudWatch Metrics Time Stamps
- Each metric data point must be associated with a time stamp.
- The range of time stamp value can be of past two weeks or future two hours
- If no time stamp is given, CloudWatch creates a time stamp on time data point was received.
- Time stamps are dateTime objects
- Coordinated Universal Time (UTC) is recommended
- Time values are specified in UTC, in CloudWatch
- Metrics are checked by CloudWatch alarms with current time specified in UTC.
CloudWatch Metrics Retention
CloudWatch retains metric data as follows:
- For a period <60 seconds, available for 3 hours. Also called as high-resolution custom metrics.
- For a period of 60 seconds/1 minute, available for 15 days
- For a period of 300 seconds/5 minute, available for 63 days
- For a period of 3600 seconds/1 hour, available for 455 days (15 months)
CloudWatch Metrics Units
- Each statistic has a unit of measure.
- Few example metric units are
- Bytes
- Seconds
- Count
- Percent.
- custom metric creation needs unit to be specified
- If not specified, CloudWatch uses None as the unit.
- No significance is given to a unit by CloudWatch internally
- unit of measure are aggregated separately Metric data points that specify a unit of measure are aggregated separately.
- Statistics without specifying a unit, CloudWatch aggregates all data points of the same unit together.
CloudWatch Metrics Periods
- Period refers to duration of time linked with a specific CloudWatch statistic.
- Periods defined in seconds, and valid values for period are 1, 5, 10, 30, or any multiple of 60.
- For period of six minutes, use 360 as the period value.
- varying period values, can help in see changes in data aggregation
- sub-minute periods are supported for those custom metrics having storage resolution of 1 second
- Retrieval of statistics needs
- Period
- start time
- end time
- The default values for the start time and end time get you the last hour’s worth of statistics.
- For statistics aggregated over the entire hour, specify a period of 3600.
- aggregated statistics are stamped with the time corresponding to the beginning of the period.
- Periods are also important for CloudWatch alarms.
CloudWatch Metrics Aggregation
- CloudWatch aggregates statistics as per specified period length
- publish as many data points as needed with same or similar time stamps.
- CloudWatch aggregates them as per specified period length.
- CloudWatch does not aggregate data across Regions.
- pre-aggregated dataset (statistic set ) should be added in case of large datasets
- With statistic sets, gives Min, Max, Sum, and SampleCount for a number of data points.
- No differentiation is done by CloudWatch on basis of source of metric.
- metric with namespace and dimensions is treated as single metric, even if having different sources
- Watches a single metric over a specified time period, and performs specified actions,
- It initiates actions on behalf.
- An alarm can result in taking action on basis of metric value against a threshold over time period.
- Action can be notification to SNS or Auto Scaling policy.
- Can add alarms to dashboards.
- Actions only for sustained state changes only.
- Always select a period greater or equal to the frequency of the metric to be monitored.
- Maximum limit to create 5000 alarms/Region in a AWS account.
- To create or update an alarm, use PutMetricAlarm API action
- Alarm names must contain only ASCII characters.
- list currently configured alarms, by DescribeAlarms (mon-describe-alarms).
- Disable or enable alarms by DisableAlarmActions and EnableAlarmActions
- Test alarm by setting it to any state using SetAlarmState (mon-set-alarm-state).
- View alarm’s history using DescribeAlarmHistory (mon-describe-alarm-history).
- CloudWatch saves alarm history for two weeks.
- The value of evaluation periods number for alarm multiplied by evaluation period length, should be less than one day.
- Following permissions are required to create or
change a Cloudwatch alarm
- For
alarms with EC2 actions
- iam:CreateServiceLinkedRole
- iam:GetPolicy
- iam:GetPolicyVersion
- iam:GetRole
- For
alarms on EC2 instance status metrics
- ec2:DescribeInstanceStatus
- ec2:DescribeInstances
- For
alarms with stop actions
- ec2:StopInstances
- For
alarms with terminate actions
- ec2:TerminateInstances
- No specific permissions are needed for alarms with recover actions.
- For
alarms with EC2 actions
CloudWatch Monitoring
- Cloudwatch can be used to monitor
- EC2 instances
- Autoscaling Groups
- ELBs
- Route53 Health Checks
- EBS Volumes
- Storage Gateways
- CloudFront
- DynamoDB
- Other AWS services
- logs generated by applications and services.
- EC2 will by default monitor instances @5 minute intervals
- EC2 instances can monitor instances @1 minute intervals if the ‘detailed monitoring’ option is set on the instance
- CloudWatch monitors following, by default
- CPU
- Network
- Disk
- Status Checks
- RAM utilization metric
- is a custom metric
- has to be added manually to EC2 instances for tracking.
- 2 types of Status Checks:
- System Status Checks (Physical Host):
- Checks the underlying physical host
- Checks for loss of network connectivity
- Checks for loss of system power
- Checks for software issues on the physical host
- Checks for hardware issues on the physical host
- Stop the instance and start again, for resolution (will switch physical hosts)
- Instance Status Checks
- Checks the VM itself
- Checks for failed system status checks
- Checks for mis-configured networking or startup configs
- Checks for exhausted memory
- Checks for corrupted file systems
- Checks for an incompatible kernel
- rebooting instance or changing instance OS, for troubleshooting
- System Status Checks (Physical Host):
- CloudWatch metrics are saved for 2 weeks only, by default
- use GetMetricStatistics API endpoint to get data more than 2 weeks
- Data from terminated EC2/ ELB instance, after termination can be obtained up to 2 weeks
- As per service the default metrics can be 1 min or 3-5 minutes
- The minimum granularity for custom metrics is 1 minute
- Alarms can be created to monitor any CloudWatch metric in account
- Alarms can include EC2, CPU, ELB, Latency, or even changes on AWS bill
- Following can be specified in a alarm
- actions can be set
- triggering lambda functions or SNS notifications against a threshold
Alarm has states
- OK –metric within threshold.
- ALARM –metric outside threshold.
- INSUFFICIENT_DATA – indicates that alarm has initiated but metric is not accessible
Data point reported to CloudWatch classified as
- Not breaching (within the threshold)
- Breaching (violating the threshold)
- Missing
CloudWatch Logs
- CloudWatch is integrated with CloudTrail
- CloudTrail provides record of actions taken by a user, role, or AWS service
- CloudTrail captures API calls made by or on behalf of AWS account.
- The calls captured include
- calls from CloudWatch console
- code calls to the CloudWatch API operations.
- After trail creation, continuous delivery of CloudTrail events are done to S3 bucket
- Actions logged in CloudTrail log files in CloudWatch are
- DeleteAlarms
- DeleteDashboards
- DescribeAlarmHistory
- DescribeAlarms
- DescribeAlarmsForMetric
- DisableAlarmActions
- EnableAlarmActions
- GetDashboard
- ListDashboards
- PutDashboard
- PutMetricAlarm
- SetAlarmState
CloudTrail
- It is a web service that records API activity in AWS account.
- It is enabled on AWS account when created.
- All activity occurring in AWS account, is recorded in a CloudTrail event.
- Activity of past 90 days can be viewed/ searched/downloaded from event history view
- It logs information on
- who made a request
- the services used
- the actions performed
- parameters for the actions
- the response elements returned by the AWS service.
- Stores Logs in specific log group.
- Logs provide specific information on what occurred in AWS account.
- focuses more on AWS API calls made in AWS account.
- helps in meeting compliance and regulatory standards.
- Usually delivers an event within 15 minutes of the API call.
- It helps you enable governance, compliance, and operational and risk auditing.
- CloudTrail records all actions taken on user-wise/role-wise/service -wise
- Events cover all actions in
- AWS Management Console
- AWS Command Line Interface
- AWS SDKs and APIs.
- Trail is a configuration which delivers event details to specified S3 bucket
- Trail is employed for archival, analysis against changes in AWS resources
- create a trail with
- CloudTrail console
- AWS CLI
- CloudTrail API
- Types of trails
- A trail that applies to all regions – records events in each region. Default with console
- A trail that applies to one region – records the events in that region only. Default option with AWS CLI or CloudTrail API.
CloudTrail Logs
- Monitor existing system, application and custom logs in real time.
- Send existing logs to CloudWatch; Create patterns to look for in logs; Alert based on finding of these patterns.
- Free agents for Ubuntu, Amazon Linux, Windows.
- Purpose
- Monitor logs from EC2 instances in realtime. (track number of errors in application logs and send notification if exceed thresold)
- Monitor AWS CloudTrail logged events (API Activity such as manual EC2 instance termination)
- Archive log data (change log retention setting to automatically delete)
- Log events – record stored to CloudWatch Logs with the Timestamp and Message to store.
- Log Streams – Refers to the log events sequence sharing same resource (like for Apache access logs, they are automatically deleted after every 2 months).
- Log Groups – Refer to log stream group sharing
same settings for
- Retention
- monitoring
- access control
- CMetric Filters – define how a service would extract metric observations from events and turn them into data points for a CloudWatch metric.
- Retention Settings – Settings for duration to keep events. Automatic deletion of expired logs.
- The duration offered for Log Group Retention ranges from 1 day to 10 years.
- CloudWatch Log Filters: filter log data pushed to CloudWatch; won’t work on existing log data, only work after log filter created, only returns
- first 50 results. Metric contains 1. Filter Pattern 2. Metric Name 3. Metric NameSpace 4. Metric value
- Modify rsyslog (/etc/rsyslog.d/50-default.conf) and remove auth on line number 9, sudo service rsyslog restart
- Real-Time Log processing: It needs subscription Filters and applicable for AWS Kinesis Streams, AWS Lambda and AWS Kinesis Firehouse
- aws kinesis command is used for creation/ describing stream. Command can also list the stream ARN. Them update the permissions.json file with ARN’s of the stream and role.
Advanced tasks with CloudTrail log files
- Create multiple trails per region.
- CloudWatch Logs are used to monitor CloudTrail log files
- Share log files between accounts.
- Log processing applications can be developed in Java by using CloudTrail Processing Library.
- Validate log files to verify that they have not changed after delivery by CloudTrail.
To receive CloudTrail log files from multiple regions
- Sign in to the AWS Management Console and open the CloudTrail console at https://console.aws.amazon.com/cloudtrail/.
- Choose the option – “Trails”, and then select a trail name.
- Next, click on pencil icon adjacent to “Apply trail to all regions”, and then select “Yes”.
- Choose Save. The original trail will be replicated across all AWS regions. CloudTrail will deliver log files present in all regions to S3 bucket.
AWS Certified Advanced Networking Specialty Free Practice TestTake a Quiz