Amazon Web Services (AWS) has introduced an enhanced incident reporting service that aims to assist customers in documenting issues that arise with their cloud-hosted resources. This announcement comes just days after a significant outage affected numerous customers, highlighting the need for such a tool.
The new feature is an upgrade to the CloudWatch tool, which AWS markets as an essential resource for real-time monitoring of applications and resources hosted on its platform. On Wednesday, AWS revealed that the tool now includes the ability to generate interactive incident reports, allowing users to compile comprehensive analyses following an incident in just a matter of minutes.
AWS stated that the service collects and correlates telemetry data automatically, along with user inputs and actions taken during the investigation, to create a concise incident report. This report encompasses critical operational telemetry, service configurations, and findings from investigations, resulting in detailed summaries, event timelines, impact assessments, and actionable recommendations.
According to AWS, these reports are designed to aid users in recognizing patterns, implementing preventive measures, and continuously enhancing their operational strategies through structured post-incident analyses.
The recent outage of DynamoDB, which disrupted multiple online services, underscores the relevance of this new tool. Many AWS customers, as well as AWS itself, would have greatly benefited from having this feature available during that incident. In a related development, Datadog, a competitor in the observability space, launched a free service providing status updates on numerous major SaaS platforms and several AWS services. Datadog reported that it detected the DynamoDB failure 32 minutes prior to AWS”s first communication regarding the issue, suggesting that AWS”s timing in releasing the CloudWatch upgrade was less than optimal.
