Skip to content

OPS03 - How does your organizational culture support your business outcomes?

Best Practices

Best Practices

This question includes the following best practices:

Key Concepts

Strategy and Governance

Learning organization: Use this concept to guide architecture and operating decisions for this question area. Define measurable targets, assign clear ownership, and review results regularly against expected business outcomes.

Psychological safety: Use this concept to guide architecture and operating decisions for this question area. Define measurable targets, assign clear ownership, and review results regularly against expected business outcomes.

Operational ownership: Use this concept to guide architecture and operating decisions for this question area. Define measurable targets, assign clear ownership, and review results regularly against expected business outcomes.

Operational Execution

Blameless analysis: Use this concept to guide architecture and operating decisions for this question area. Define measurable targets, assign clear ownership, and review results regularly against expected business outcomes.

Continuous improvement: Use this concept to guide architecture and operating decisions for this question area. Define measurable targets, assign clear ownership, and review results regularly against expected business outcomes.

Customer focus: Use this concept to guide architecture and operating decisions for this question area. Define measurable targets, assign clear ownership, and review results regularly against expected business outcomes.

Implementation Approach

1. Set cultural principles

  • Define explicit engineering and operational values
  • Require outcome-oriented post-incident reviews
  • Establish blameless communication expectations
  • Tie operational excellence goals to performance objectives

2. Embed daily practices

  • Run regular operational reviews with product teams
  • Normalize small, reversible changes in production
  • Share incident learnings across teams
  • Create forums for proposing improvement experiments

3. Measure cultural health

  • Track change failure rate and recovery metrics
  • Measure participation in retrospectives and game days
  • Capture feedback from on-call engineers
  • Monitor customer impact trends by service

4. Reinforce and evolve

  • Reward behaviors that improve reliability and flow
  • Retire process steps that add friction without value
  • Update training based on recurring failure patterns
  • Continuously align culture with business priorities

AWS Services to Consider

AWS Systems Manager Incident Manager

Helps prepare response plans, escalation paths, and timeline tracking during incidents.

Amazon CloudWatch

Collects metrics, logs, alarms, and dashboards so teams can detect issues early and track operational outcomes.

Amazon EventBridge

Routes events between services and triggers automated responses for operational events.

AWS Well-Architected Tool

Captures workload reviews, risks, and improvement plans so teams can continuously track architecture quality.

AWS Fault Injection Service

Runs controlled chaos experiments to validate resilience and recovery mechanisms.

Common Challenges and Solutions

Challenge: Blame-oriented incident response

Solution: Adopt blameless post-incident templates focused on systemic causes and measurable follow-up actions.

Challenge: Resistance to change

Solution: Start with small improvements, publish results, and scale approaches that show clear operational benefit.

Challenge: Limited feedback loops

Solution: Use regular retrospectives and transparent metrics to turn operational insights into backlog items.