PERF07-BP02 - Use monitoring solutions to understand where performance is most critical
Implementation Guidance
Use “Use monitoring solutions to understand where performance is most critical” to make architecture and operational choices based on evidence rather than assumptions. Define clear criteria, collect current-state data, and compare options against business outcomes and risk tolerance.
For the question “How do you monitor your resources to ensure they are performing?”, define measurable outcomes, assign owners, and review execution regularly. Integrate this practice into delivery and operations processes so improvements persist as workloads and requirements evolve.
Key Steps
-
Define decision criteria and scope:
- Document what “Use monitoring solutions to understand where performance is most critical” must address in your environment
- Set quantitative and qualitative criteria for comparing options
- Identify stakeholders who approve final decisions
-
Perform analysis with objective evidence:
- Collect metrics, constraints, and dependency data from current operations
- Compare alternatives against reliability, performance, and risk outcomes
- Record tradeoffs and assumptions in an architecture decision log
-
Operationalize and revisit decisions:
- Implement selected decisions with explicit owner accountability
- Define review triggers for demand, risk, or architecture changes
- Update standards and patterns based on observed outcomes
Risk / Impact
Level of risk if not implemented: Medium
Impact: Without this best practice, workloads typically accumulate inefficiencies and execution drift that increase failure probability over time. Problems often surface during traffic spikes, major releases, or dependency failures.
Benefits of implementation:
- More predictable operational and engineering outcomes
- Better alignment between architecture decisions and business goals
- Continuous improvement through measurable feedback loops
AWS Services to Consider
Amazon CloudWatch
Collects metrics, logs, and alarms that support operational insight and performance management.
AWS X-Ray
Traces distributed requests to identify latency sources and dependency failures.
Amazon EventBridge
Routes events and triggers automation workflows for rapid operational response.
AWS Systems Manager
Provides automation, inventory, and operational runbooks for day-2 management.