AWS Announces Amazon DevOps Guru
Today at AWS re:Invent, Amazon Web Services, Inc. (AWS), an Amazon.com, Inc. company (NASDAQ: AMZN), announced Amazon DevOps Guru, a fully-managed operations service that uses machine learning to make it easier for developers to improve application availability by automatically detecting operational issues and recommending specific actions for remediation. Amazon DevOps Guru applies machine learning informed by years of Amazon.com and AWS operational excellence to automatically collect and analyze data like application metrics, logs, events, and traces for identifying behaviors that deviate from normal operating patterns (e.g. under-provisioned compute capacity, database I/O over-utilization, memory leaks, etc.). When Amazon DevOps Guru identifies anomalous application behavior (e.g. increased latency, error rates, resource constraints, etc.) that could cause potential outages or service disruptions, it alerts developers with issue details (e.g. resources involved, issue timeline, related events, etc.) via Amazon Simple Notification Service (SNS) and partner integrations like Atlassian Opsgenie and PagerDuty to help them quickly understand the potential impact and likely causes of the issue with specific recommendations for remediation. Developers can use remediation suggestions from Amazon DevOps Guru to reduce time to resolution when issues arise and improve application availability and reliability with no manual setup or machine learning expertise required. There are no upfront costs or commitments with Amazon DevOps Guru, and customers pay only for the data Amazon DevOps Guru analyzes. To get started with Amazon DevOps Guru, visit https://aws.amazon.com/devops-guru
Den Basisprospekt sowie die Endgültigen Bedingungen und die Basisinformationsblätter erhalten Sie bei Klick auf das Disclaimer Dokument. Beachten Sie auch die weiteren Hinweise zu dieser Werbung.
As more organizations move to cloud-based application deployment and microservice architectures to globally scale their businesses and operations without the limitations of on-premises deployments, applications have become increasingly distributed to meet customer needs, and developers need more automated practices to maintain application availability and reduce the time and effort spent detecting, debugging, and resolving operational issues. Application downtime events caused by faulty code or config changes, unbalanced container clusters, or resource exhaustion (e.g. CPU, memory, disk, etc.) inevitably lead to bad customer experiences and lost revenue. Companies invest considerable money and developer time to deploy multiple monitoring tools, often managed separately, and then have to develop and maintain custom alerts for common issues like spikes in load balancer errors or drops in application request rates. Setting thresholds to identify and alert when application resources are behaving abnormally is difficult to get right, involves manual setup, and requires thresholds that must be continually updated as application usage changes (e.g. an unusually large numbers of requests during holiday shopping season). If a threshold is set too high, developers don’t see alarms until operational performance is severely impacted. When a threshold is set too low, developers get too many false positives, which ultimately get ignored. Even if developers get alerted to a potential operational issue, the process of identifying the root cause can still prove difficult. Using existing tools, developers often have difficulty triangulating the root cause of an operational issue from graphs and alarms, and even when they are able to find the root cause, they are often left without a means to fix it. Each troubleshooting attempt is a cold start where teams must spend hours or days to identify problems, and this leads to time consuming, tedious work that slows down the time to resolve an operational failure and can prolong application disruptions.