Goals for monitoring, are

  • Provide information on health of running services
  • Recording data for issue diagnosis
  • predictive alerting for a potential outage
  • Develop, apply and verify successful mitigation
  • Provide data for trending activities
  • Building permanent fixes
  • reconfiguring and monitoring

Monitoring metrics

  • Detection Time – Defined as time (in minutes) taken to detect and report an anomaly
  • Monitoring Misses – outages, not detected
  • Noise – Monitoring events qualified as an incident and later closed as noise
Menu