Monitoring and Log Processing
CloudWatch
Amazon CloudWatch monitors
- AWS resources
- applications running on AWS
CloudWatch
- collects and tracks metrics, for AWS resources and applications.
- CloudWatch home page displays metrics about every AWS service in use.
- Can create custom dashboards to display metrics
- Alarms can be configured to monitor metrics and send notifications , if needed
- Alarms can automatically make changes to the resources under monitoring against a threshold
Access CloudWatch by
- Amazon CloudWatch console – https://console.aws.amazon.com/cloudwatch/
- AWS CLI
- CloudWatch API
- AWS SDKs
data:image/s3,"s3://crabby-images/26224/262241c1489cc797fd2efa3c01c1ea6787b5558d" alt=""
CloudWatch Namespaces
A cloudwatch namespace is
- It is a container for CloudWatch metrics.
- Metrics are isolated if they are in different namespaces
- There is no default namespace.
- Must specify a namespace for each data point to be published to CloudWatch.
- While creating a metric, provide namespace name.
- These names must contain valid XML characters,
- Be fewer than 256 characters in length.
- Possible characters are: alphanumeric characters (0-9A-Za-z), period (.), hyphen (-), underscore (_), forward slash (/), hash (#), and colon (:).
- The AWS namespaces, naming convention: AWS/service
CloudWatch Dimensions
A dimension
- is a name/value pair
- part of the identity of a metric.
- a metric can be given a maximum of 10 dimensions
- Used to describe characteristic of a metric
- Also used to filter the results that CloudWatch returns
- For few AWS services like EC2, CloudWatch can aggregate data across dimensions
- Example – Server=Producton,Domain=City01
CloudWatch Statistics
- It is metric data aggregations over specified periods of time.
- Aggregations use the namespace, metric name, dimensions, and the data point unit of measure, within the specified time period.
Available statistics
- Minimum – lowest value observed during a period. indicates, when low activity
- Maximum – highest value observed during a period. indicates, when high activity
- Sum – Add all values submitted for matching metric indicates, total activity
- Average – The value of Sum / SampleCount during a period.
- SampleCount – The count (number) of data points used for statistical calculation.
- pNN.NN – Value of specified percentile up to 2 decimal places like p95.45. Not for negative value metrics.
A cloudwatch metric
- Is a group of data points which are arranged as per time and sent to CloudWatch.
- To illustrate, consider it as a variable whose value changes over time and has to be monitored.
- Data points are generated by all AWS services
- AWS services send metrics to CloudWatch
- Can send custom metrics to CloudWatch also
- Can add data points in any order or at any rate
- Retrieve statistics about data points as an ordered set of time-series data.
- Metrics are specific to a Region in which were created
- Metrics cannot be deleted,
- By default all data point expire automatically, after 15 months if no new data is added.
- They expire on a rolling basis; as new data points come in, data older than 15 months is dropped.
- Metrics are defined uniquely by, specific
- name
- namespace
- zero or more dimensions.
- Each data point in a metric has a time stamp, and (optionally) a unit of measure.
CloudWatch Metrics Time Stamps
- Each metric data point must be associated with a time stamp.
- The range of time stamp value can be of past two weeks or future two hours
- If no time stamp is given, CloudWatch creates a time stamp on time data point was received.
- Time stamps are dateTime objects
- Coordinated Universal Time (UTC) is recommended
- Time values are specified in UTC, in CloudWatch
- Metrics are checked by CloudWatch alarms with current time specified in UTC.
CloudWatch Metrics Retention
CloudWatch retains metric data as follows:
- For a period <60 seconds, available for 3 hours. Also called as high-resolution custom metrics.
- For a period of 60 seconds/1 minute, available for 15 days
- For a period of 300 seconds/5 minute, available for 63 days
- For a period of 3600 seconds/1 hour, available for 455 days (15 months)
CloudWatch Metrics Units
- Each statistic has a unit of measure.
- Few example metric units are
- Bytes
- Seconds
- Count
- Percent.
- custom metric creation needs unit to be specified
- If not specified, CloudWatch uses None as the unit.
- No significance is given to a unit by CloudWatch internally
- unit of measure are aggregated separately Metric data points that specify a unit of measure are aggregated separately.
- Statistics without specifying a unit, CloudWatch aggregates all data points of the same unit together.
CloudWatch Metrics Periods
- Period refers to duration of time linked with a specific CloudWatch statistic.
- Periods defined in seconds, and valid values for period are 1, 5, 10, 30, or any multiple of 60.
- For period of six minutes, use 360 as the period value.
- varying period values, can help in see changes in data aggregation
- sub-minute periods are supported for those custom metrics having storage resolution of 1 second
- Retrieval of statistics needs
- Period
- start time
- end time
- The default values for the start time and end time get you the last hour’s worth of statistics.
- For statistics aggregated over the entire hour, specify a period of 3600.
- aggregated statistics are stamped with the time corresponding to the beginning of the period.
- Periods are also important for CloudWatch alarms.
CloudWatch Metrics Aggregation
- CloudWatch aggregates statistics as per specified period length
- publish as many data points as needed with same or similar time stamps.
- CloudWatch aggregates them as per specified period length.
- CloudWatch does not aggregate data across Regions.
- pre-aggregated dataset (statistic set ) should be added in case of large datasets
- With statistic sets, gives Min, Max, Sum, and SampleCount for a number of data points.
- No differentiation is done by CloudWatch on basis of source of metric.
- metric with namespace and dimensions is treated as single metric, even if having different sources
- Watches a single metric over a specified time period, and performs specified actions,
- It initiates actions on behalf.
- An alarm can result in taking action on basis of metric value against a threshold over time period.
- Action can be notification to SNS or Auto Scaling policy.
- Can add alarms to dashboards.
- Actions only for sustained state changes only.
- Always select a period greater or equal to the frequency of the metric to be monitored.
- Maximum limit to create 5000 alarms/Region in a AWS account.
- To create or update an alarm, use PutMetricAlarm API action
- Alarm names must contain only ASCII characters.
- list currently configured alarms, by DescribeAlarms (mon-describe-alarms).
- Disable or enable alarms by DisableAlarmActions and EnableAlarmActions
- Test alarm by setting it to any state using SetAlarmState (mon-set-alarm-state).
- View alarm’s history using DescribeAlarmHistory (mon-describe-alarm-history).
- CloudWatch saves alarm history for two weeks.
- The value of evaluation periods number for alarm multiplied by evaluation period length, should be less than one day.
- Following permissions are required to create or
change a Cloudwatch alarm
- For
alarms with EC2 actions
- iam:CreateServiceLinkedRole
- iam:GetPolicy
- iam:GetPolicyVersion
- iam:GetRole
- For
alarms on EC2 instance status metrics
- ec2:DescribeInstanceStatus
- ec2:DescribeInstances
- For
alarms with stop actions
- ec2:StopInstances
- For
alarms with terminate actions
- ec2:TerminateInstances
- No specific permissions are needed for alarms with recover actions.
- For
alarms with EC2 actions
CloudWatch Monitoring
- Cloudwatch can be used to monitor
- EC2 instances
- Autoscaling Groups
- ELBs
- Route53 Health Checks
- EBS Volumes
- Storage Gateways
- CloudFront
- DynamoDB
- Other AWS services
- logs generated by applications and services.
- EC2 will by default monitor instances @5 minute intervals
- EC2 instances can monitor instances @1 minute intervals if the ‘detailed monitoring’ option is set on the instance
- CloudWatch monitors following, by default
- CPU
- Network
- Disk
- Status Checks
- RAM utilization metric
- is a custom metric
- has to be added manually to EC2 instances for tracking.
- 2 types of Status Checks:
- System Status Checks (Physical Host):
- Checks the underlying physical host
- Checks for loss of network connectivity
- Checks for loss of system power
- Checks for software issues on the physical host
- Checks for hardware issues on the physical host
- Stop the instance and start again, for resolution (will switch physical hosts)
- Instance Status Checks
- Checks the VM itself
- Checks for failed system status checks
- Checks for mis-configured networking or startup configs
- Checks for exhausted memory
- Checks for corrupted file systems
- Checks for an incompatible kernel
- rebooting instance or changing instance OS, for troubleshooting
- System Status Checks (Physical Host):
- CloudWatch metrics are saved for 2 weeks only, by default
- use GetMetricStatistics API endpoint to get data more than 2 weeks
- Data from terminated EC2/ ELB instance, after termination can be obtained up to 2 weeks
- As per service the default metrics can be 1 min or 3-5 minutes
- The minimum granularity for custom metrics is 1 minute
- Alarms can be created to monitor any CloudWatch metric in account
- Alarms can include EC2, CPU, ELB, Latency, or even changes on AWS bill
- Following can be specified in a alarm
- actions can be set
- triggering lambda functions or SNS notifications against a threshold
data:image/s3,"s3://crabby-images/47d1d/47d1d5a9dcddca839e2d26a9fb74d9fa85cd24f9" alt=""
Alarm has states
- OK –metric within threshold.
- ALARM –metric outside threshold.
- INSUFFICIENT_DATA – indicates that alarm has initiated but metric is not accessible
Data point reported to CloudWatch classified as
- Not breaching (within the threshold)
- Breaching (violating the threshold)
- Missing
CloudWatch Logs
- CloudWatch is integrated with CloudTrail
- CloudTrail provides record of actions taken by a user, role, or AWS service
- CloudTrail captures API calls made by or on behalf of AWS account.
- The calls captured include
- calls from CloudWatch console
- code calls to the CloudWatch API operations.
- After trail creation, continuous delivery of CloudTrail events are done to S3 bucket
- Actions logged in CloudTrail log files in CloudWatch are
- DeleteAlarms
- DeleteDashboards
- DescribeAlarmHistory
- DescribeAlarms
- DescribeAlarmsForMetric
- DisableAlarmActions
- EnableAlarmActions
- GetDashboard
- ListDashboards
- PutDashboard
- PutMetricAlarm
- SetAlarmState
data:image/s3,"s3://crabby-images/fbe09/fbe094cb69d0a178e8c49c00bed9206251b0aeb4" alt=""
CloudTrail
- It is a web service that records API activity in AWS account.
- It is enabled on AWS account when created.
- All activity occurring in AWS account, is recorded in a CloudTrail event.
- Activity of past 90 days can be viewed/ searched/downloaded from event history view
- It logs information on
- who made a request
- the services used
- the actions performed
- parameters for the actions
- the response elements returned by the AWS service.
- Stores Logs in specific log group.
- Logs provide specific information on what occurred in AWS account.
- focuses more on AWS API calls made in AWS account.
- helps in meeting compliance and regulatory standards.
- Usually delivers an event within 15 minutes of the API call.
- It helps you enable governance, compliance, and operational and risk auditing.
- CloudTrail records all actions taken on user-wise/role-wise/service -wise
- Events cover all actions in
- AWS Management Console
- AWS Command Line Interface
- AWS SDKs and APIs.
- Trail is a configuration which delivers event details to specified S3 bucket
- Trail is employed for archival, analysis against changes in AWS resources
- create a trail with
- CloudTrail console
- AWS CLI
- CloudTrail API
- Types of trails
- A trail that applies to all regions – records events in each region. Default with console
- A trail that applies to one region – records the events in that region only. Default option with AWS CLI or CloudTrail API.
data:image/s3,"s3://crabby-images/92ab2/92ab2453be70272d73a12d0d498f17256980fdf2" alt=""
CloudTrail Logs
- Monitor existing system, application and custom logs in real time.
- Send existing logs to CloudWatch; Create patterns to look for in logs; Alert based on finding of these patterns.
- Free agents for Ubuntu, Amazon Linux, Windows.
- Purpose
- Monitor logs from EC2 instances in realtime. (track number of errors in application logs and send notification if exceed thresold)
- Monitor AWS CloudTrail logged events (API Activity such as manual EC2 instance termination)
- Archive log data (change log retention setting to automatically delete)
- Log events – record stored to CloudWatch Logs with the Timestamp and Message to store.
- Log Streams – Refers to the log events sequence sharing same resource (like for Apache access logs, they are automatically deleted after every 2 months).
- Log Groups – Refer to log stream group sharing
same settings for
- Retention
- monitoring
- access control
- CMetric Filters – define how a service would extract metric observations from events and turn them into data points for a CloudWatch metric.
- Retention Settings – Settings for duration to keep events. Automatic deletion of expired logs.
- The duration offered for Log Group Retention ranges from 1 day to 10 years.
- CloudWatch Log Filters: filter log data pushed to CloudWatch; won’t work on existing log data, only work after log filter created, only returns
- first 50 results. Metric contains 1. Filter Pattern 2. Metric Name 3. Metric NameSpace 4. Metric value
- Modify rsyslog (/etc/rsyslog.d/50-default.conf) and remove auth on line number 9, sudo service rsyslog restart
- Real-Time Log processing: It needs subscription Filters and applicable for AWS Kinesis Streams, AWS Lambda and AWS Kinesis Firehouse
- aws kinesis command is used for creation/ describing stream. Command can also list the stream ARN. Them update the permissions.json file with ARN’s of the stream and role.
Advanced tasks with CloudTrail log files
- Create multiple trails per region.
- CloudWatch Logs are used to monitor CloudTrail log files
- Share log files between accounts.
- Log processing applications can be developed in Java by using CloudTrail Processing Library.
- Validate log files to verify that they have not changed after delivery by CloudTrail.
To receive CloudTrail log files from multiple regions
- Sign in to the AWS Management Console and open the CloudTrail console at https://console.aws.amazon.com/cloudtrail/.
- Choose the option – “Trails”, and then select a trail name.
- Next, click on pencil icon adjacent to “Apply trail to all regions”, and then select “Yes”.
- Choose Save. The original trail will be replicated across all AWS regions. CloudTrail will deliver log files present in all regions to S3 bucket.
data:image/s3,"s3://crabby-images/d13ba/d13ba83bf55eababe8a885b45a5371e911fdee4b" alt=""
AWS Certified Advanced Networking Specialty Free Practice TestTake a Quiz