Start Assessment

Reliability

Questions

REL01: How do you manage service quotas and constraints?

6 best practices

REL02: How do you plan your network topology?

5 best practices

REL02-BP01: BP01 - Use highly available network connectivity for your workload public endpoints
REL02-BP02: BP02 - Provision redundant connectivity between private networks in the cloud and on-premises environments
REL02-BP03: BP03 - Ensure IP subnet allocation accounts for expansion and availability
REL02-BP04: BP04 - Prefer hub-and-spoke topologies over many-to-many mesh
REL02-BP05: BP05 - Enforce non-overlapping private IP address ranges in all private address spaces where they are connected

REL03: How do you design your workload service architecture?

3 best practices

REL04: How do you design interactions in a distributed system to prevent failures?

4 best practices

REL05: How do you design interactions in a distributed system to mitigate or withstand failures?

7 best practices

REL06: How do you monitor workload resources?

7 best practices

REL07: How do you design your workload to adapt to changes in demand?

4 best practices

REL08: How do you implement change?

5 best practices

REL09: How do you back up data?

4 best practices

REL10: How do you use fault isolation to protect your workload?

3 best practices

REL11: How do you design your workload to withstand component failures?

7 best practices

REL12: How do you test reliability?

5 best practices

REL13: How do you plan for disaster recovery?

5 best practices

The Reliability pillar includes the ability to support development and run workloads effectively, gain insight into their operations, and to continuously improve supporting processes and procedures to deliver business value.

AWS Services for Reliability

Amazon CloudWatch

Monitors your AWS resources and the applications you run on AWS in real time.

AWS Auto Scaling

Monitors your applications and automatically adjusts capacity to maintain steady, predictable performance.

Amazon RDS

Makes it easy to set up, operate, and scale a relational database in the cloud with high availability.

AWS Elastic Disaster Recovery

Minimizes downtime and data loss with fast, reliable recovery of on-premises and cloud-based applications.

AWS Backup

Centrally manages and automates backups across AWS services.

Elastic Load Balancing

Automatically distributes incoming application traffic across multiple targets.

Amazon Route 53

Provides highly available and scalable cloud Domain Name System (DNS) web service.