PERF04-BP03 - Collect and review database performance metrics
Implementation Guidance
“Collect and review database performance metrics” ensures teams can detect, diagnose, and prioritize issues before customer impact grows. Establish baseline signals, ownership, and escalation rules so telemetry translates into actionable operations.
For the question “How do you select your database solution?”, define measurable outcomes, assign owners, and review execution regularly. Integrate this practice into delivery and operations processes so improvements persist as workloads and requirements evolve.
Key Steps
-
Define monitoring model and ownership:
- Map “Collect and review database performance metrics” to concrete signals and target thresholds
- Assign response owners for each alert or KPI breach
- Define severity levels based on customer and business impact
-
Implement telemetry and response paths:
- Instrument logs, metrics, and traces at critical system boundaries
- Create dashboards and alerts tied to runbooks and escalation policies
- Integrate incident workflows with monitoring events
-
Tune and govern continuously:
- Review false positives, blind spots, and missed detections regularly
- Refine thresholds and alert logic using historical trend data
- Use post-incident findings to improve monitoring coverage
Risk / Impact
Level of risk if not implemented: High
Impact: If this best practice is missing, teams are more likely to experience preventable incidents, delayed recovery, and inconsistent change outcomes. Control gaps and weak visibility can increase customer impact during high-pressure events.
Benefits of implementation:
- Reduced operational risk through repeatable controls
- Faster detection and response during incidents
- Stronger auditability and decision traceability
AWS Services to Consider
Amazon RDS
Runs managed relational databases with backups, patching, and scaling options.
Amazon Aurora
Provides high-performance relational database engines with fast failover.
Amazon DynamoDB
Delivers serverless key-value performance at scale with low-latency access.
Amazon ElastiCache
Improves read and response performance using in-memory caching.