Through systematic investigation of alarms stemming from system changes, input anomalies, resource limits, component failures , and dependency issues across your entire stack, AWS DevOps Agent guides DevOps teams with targeted mitigation steps, reducing mean time to resolution (MTTR) from hours to minutes. For example: System changes: If an incident is caused by Amazon DynamoDB getting throttled due to a recent code change that results in high latency from inefficient use, AWS DevOps Agent may recommend rolling back the change as an immediate mitigation. System changes: If an incident is caused by Amazon SNS subscription errors due to filter policy mismatch following a code deployment, AWS DevOps Agent may recommend rolling back the code change that altered the message structure as an immediate mitigation to restore message flow. Input anomalies: If an incident is caused by AWS Lambda throttling on notifications due to high traffic exceeding limits, AWS DevOps Agent may recommend increasing concurrency limits as an immediate mitigation. Input Anomalies: If an incident is caused by Amazon SNS message publish failures due to message size issues, AWS DevOps Agent may recommend adding validation to Amazon SNS message publishing as an immediate mitigation. Resource Limits : If an incident is caused by API throttling due to exceeded rate limits, AWS DevOps Agent may recommend raising rate/burst limits as an immediate mitigation. Resource Limits: If an incident is caused by Amazon DynamoDB throttling due to exceeded write capacity, AWS DevOps Agent may recommend increasing write capacity as an immediate mitigation. Component Failures: If an incident is caused by cold start latency due to performance degradation, AWS DevOps Agent may recommend increasing provisioned concurrency as an immediate mitigation.

AWS DevOps Agent

AWS DevOps Agent features

Autonomous incident response
7
Proactive incident prevention
5
On-Demand SRE Task Handling
5

Autonomous incident response

Open all

AWS DevOps Agent integrates with ticketing and alarming systems like ServiceNow to automatically launch investigations from incident tickets, accelerating incident response within your existing workflows to reduce mean time to resolution (MTTR).

You can also initiate and guide investigations using interactive chat. AWS DevOps Agent acts as a member of your operations team, working directly within your collaboration tools like ServiceNow and Slack to share findings and coordinate response. When needed, create an AWS Support case directly from an investigation, giving AWS Support experts immediate context for faster resolution.

AWS DevOps Agent automatically triages incidents and correlates related alarms to identify when they originate from the same event. This accelerates incident response by immediately understanding which alarms are related and which require separate investigation, reducing noise and enabling teams to focus on the most critical issues first.

AWS DevOps Agent integrates with observability tools, code repositories, and CI/CD pipelines to correlate and analyze telemetry, code, and deployment data, sharing its explored hypotheses, observations, and root cause findings. Through systematic investigations, AWS DevOps Agent identifies root cause of issues stemming from system changes, input anomalies, resource limits, component failures, and dependency issues across your entire environment.

Once AWS DevOps Agent has identified the root cause, it provides detailed mitigation plans, which include actions to resolve the incident, validate success, and revert a change if needed. AWS DevOps Agent also provides agent-ready instructions that can be implemented by another frontier agent, for example, code improvements that can be implemented by Kiro autonomous agent.

AWS DevOps Agent enhances investigation capabilities by reviewing past investigations to create learned investigation skills. The learned investigation skill analyzes past investigations to learn how to triage events and generate root cause analysis and mitigation plans better and faster, getting smarter over time.

Through systematic investigation of alarms stemming from system changes, input anomalies, resource limits, component failures, and dependency issues across your entire stack, AWS DevOps Agent guides DevOps teams with targeted mitigation steps, reducing mean time to resolution (MTTR) from hours to minutes. For example:

System changes: If an incident is caused by Amazon DynamoDB getting throttled due to a recent code change that results in high latency from inefficient use, AWS DevOps Agent may recommend rolling back the change as an immediate mitigation.
System changes: If an incident is caused by Amazon SNS subscription errors due to filter policy mismatch following a code deployment, AWS DevOps Agent may recommend rolling back the code change that altered the message structure as an immediate mitigation to restore message flow.
Input anomalies: If an incident is caused by AWS Lambda throttling on notifications due to high traffic exceeding limits, AWS DevOps Agent may recommend increasing concurrency limits as an immediate mitigation.
Input Anomalies: If an incident is caused by Amazon SNS message publish failures due to message size issues, AWS DevOps Agent may recommend adding validation to Amazon SNS message publishing as an immediate mitigation.
Resource Limits: If an incident is caused by API throttling due to exceeded rate limits, AWS DevOps Agent may recommend raising rate/burst limits as an immediate mitigation.
Resource Limits: If an incident is caused by Amazon DynamoDB throttling due to exceeded write capacity, AWS DevOps Agent may recommend increasing write capacity as an immediate mitigation.
Component Failures: If an incident is caused by cold start latency due to performance degradation, AWS DevOps Agent may recommend increasing provisioned concurrency as an immediate mitigation.

Proactive incident prevention

Open all

AWS DevOps Agent analyzes patterns across historical incidents to provide actionable recommendations that strengthen four key areas: observability, infrastructure optimization, deployment pipeline enhancement, and application resilience. For example, AWS DevOps Agent can identify testing gaps that would have prevented an issue from reaching production. Recommendations also include agent-ready specs to hand implementation off to your coding agent or a colleague to update application or infrastructure code. This drives continuous improvement without need to manage a backlog.

AWS DevOps Agent identifies gaps in observability coverage and opportunities to fine tune your alarms, reducing the mean time to detection (MTTD) so you can identify issues before they become a larger problem. For example, after identifying that incident detection for recent failures took too long, AWS DevOps Agent may recommend implementing monitoring and anomaly detection closer to the error source to reduce detection time, preventing extended outages.

Using a learning loop, AWS DevOps Agent continues to refine its recommendations, align with your operational priorities, and deliver increasingly relevant recommendations tailored to your organizational needs based on your team’s feedback on recommendations.

AWS DevOps Agent analyzes patterns across historical incidents to provide targeted recommendations that prevent future outages and strengthen system resilience. By evaluating real incidents, it delivers specific, actionable improvements that reduce both frequency and impact of similar issues in four key areas: observability, infrastructure optimization, deployment pipeline enhancement, and application resilience.

Observability improvement: AWS DevOps Agent may recommend adjusting alarm thresholds from 15 failures over 20 minutes to 3 failures within 5 minutes for critical authentication systems to reduce detection time, preventing extended integration outages.
Observability improvement: AWS DevOps Agent may recommend implementing targeted CloudWatch metric filters to track anomalous "Access Denied" patterns for IAM role changes, enabling faster detection compared to a prior alarm.
Infrastructure improvement: After analyzing that the Amazon DynamoDB table schema doesn't match the service's main access pattern, forcing inefficient full table scans, AWS DevOps Agent recommends creating a Global Secondary Index (GSI) with the frequently-queried attribute as the partition key. This would transform operations from Scans to Queries, reducing latency from 2,500-3,500ms to under 100ms and preventing throttling.
Infrastructure improvement: AWS DevOps Agent’s analysis shows the application has adequate resources but is constrained by a single-pod bottleneck where all requests queue to one instance during traffic spikes. AWS DevOps Agent may recommend adding Horizontal Pod Autoscaler to the Kubernetes cluster, which will automatically scale the service horizontally based on demand, effectively distributing the load across multiple pods.
Deployment pipeline: After analyzing failed Amazon ECS deployments, AWS DevOps Agent may recommend enabling automatic rollbacks and monitoring deployment states with Amazon EventBridge. These changes will quickly detect and address task health check failures, preventing disruption of customer transactions.
Deployment pipeline: After analyzing deployment failures, AWS DevOps Agent may recommend mandatory pre-deployment validation of Amazon Managed Service for Prometheus connectivity for Amazon ECS task definitions. This recommendation would reduce failed deployments by detecting connectivity issues during the deployment process.

On-Demand SRE Task Handling

Open all

Ask DevOps Agent any operational question and get immediate, contextual answers grounded in your actual infrastructure without navigating between consoles or monitoring tools. Beyond Q&A, create, save, and share custom charts and reports such as daily ops health summaries or 4xx error trends that help you track operational metrics and communicate insights with your team.

AWS DevOps Agent offers built-in integrations with your existing tools including observability tools (CloudWatch, Dynatrace, Datadog, New Relic, Splunk), code repositories and CI/CD pipelines (GitHub, GitLab, Azure DevOps), and ticketing/collaboration tools (ServiceNow, PagerDuty, Slack) to quickly identify root causes, proactively prevent future incidents, and get on-demand, contextual answers about your environment.

Connect to private or remote MCP servers to integrate with additional tools including proprietary systems, specialized platforms, customer-managed version control systems, and internal infrastructure documentation. This enables AWS DevOps Agent to securely access your internal tools, data, and workflows to deliver more accurate insights and automate actions using real context from your organization.

AWS DevOps Agent learns your environment, automatically discovering applications, their component services, and the resources that compose these services. Using its topology skill, the agent looks across all configured tools, accepts user input, and creates a rich understanding of your application resources, relationships, and key flows. It maps these relationships into a dynamic, continuously updated topology, giving you a true high-level view of your applications. By correlating this live resource map with telemetry, code, and deployment data, AWS DevOps Agent builds a deep understanding of your environment, enabling faster incident resolution, proactive prevention of future issues, and context-aware answers grounded in how your applications run.

Add reusable, modular skills that AWS DevOps Agent can invoke to execute tasks consistently and reliably. Customer- and partner-defined skills let you extend the agent's capabilities to fit your environment—for example, you can define a skill that enables AWS DevOps Agent to query on- prem database logs by providing knowledge of log locations, naming conventions, and query strategies. By passing institutional knowledge to the agent, you can empower everything from service discovery and log analytics to incident response runbooks and team ownership information.

Next steps

Console

Try AWS DevOps Agent

Learn more

Documentation

Get started with AWS DevOps Agent

Learn more

Blog

Read how AWS DevOps Agent acts as your operations teammate across AWS, multicloud, and on-prem environments

Learn more

Web

Find out how DevOps Agent credits are included with your AWS Support Plan

Learn more

Did you find what you were looking for today?

Let us know so we can improve the quality of the content on our pages

AWS DevOps Agent features

Autonomous incident response

Proactive incident prevention

On-Demand SRE Task Handling

Next steps

Try AWS DevOps Agent

Get started with AWS DevOps Agent

Read how AWS DevOps Agent acts as your operations teammate across AWS, multicloud, and on-prem environments

Find out how DevOps Agent credits are included with your AWS Support Plan

Did you find what you were looking for today?

Learn

Resources

Developers

Help

AWS DevOps Agent features

Autonomous incident response

Automated investigations

Incident coordination

Correlated alarm insights

Root cause analysis

Detailed mitigation plans

Continuously improving investigations

Example use cases

Proactive incident prevention

Targeted recommendations

Early issue detection

Continuous learning

Continuous service improvements

Example use cases

On-Demand SRE Task Handling

Always-Available Operations Teammate

Built-In Integrations

Custom tool integrations

Application Mapping

Extensible agent skills

Next steps

Try AWS DevOps Agent

Get started with AWS DevOps Agent

Read how AWS DevOps Agent acts as your operations teammate across AWS, multicloud, and on-prem environments

Find out how DevOps Agent credits are included with your AWS Support Plan

Did you find what you were looking for today?

Learn

Resources

Developers

Help