SUS04 - How do you take advantage of data access and usage patterns to support your sustainability goals?
Best Practices
Best Practices
This question includes the following best practices:
- SUS04-BP01: Implement a data classification policy
- SUS04-BP02: Use technologies that support data access and storage patterns
- SUS04-BP03: Use policies to manage the lifecycle of your datasets
- SUS04-BP04: Use elasticity and automation to expand block storage or file system
- SUS04-BP05: Remove unneeded or redundant data
- SUS04-BP06: Use shared file systems or storage to access common data
- SUS04-BP07: Minimize data movement across networks
- SUS04-BP08: Back up data only when difficult to recreate
Key Concepts
Sustainability Design Foundations
Data access optimization: Use this concept to guide architecture and operating decisions for this question area. Define measurable targets, assign clear ownership, and review results regularly against expected business outcomes.
Storage lifecycle design: Use this concept to guide architecture and operating decisions for this question area. Define measurable targets, assign clear ownership, and review results regularly against expected business outcomes.
Data minimization: Use this concept to guide architecture and operating decisions for this question area. Define measurable targets, assign clear ownership, and review results regularly against expected business outcomes.
Operational Sustainability Controls
Query efficiency: Use this concept to guide architecture and operating decisions for this question area. Define measurable targets, assign clear ownership, and review results regularly against expected business outcomes.
Data movement reduction: Use this concept to guide architecture and operating decisions for this question area. Define measurable targets, assign clear ownership, and review results regularly against expected business outcomes.
Retention governance: Use this concept to guide architecture and operating decisions for this question area. Define measurable targets, assign clear ownership, and review results regularly against expected business outcomes.
Implementation Approach
1. Understand data usage
- Classify data by access frequency and criticality
- Identify hot, warm, and cold datasets
- Map expensive data movement paths
- Define retention and deletion requirements
2. Optimize storage and access
- Place hot data on performant tiers and archive cold data
- Use lifecycle policies for automated transitions
- Cache repeated reads and precompute frequent aggregations
- Reduce redundant data copies across environments
3. Improve processing efficiency
- Tune queries and partition strategies
- Process data incrementally rather than full scans
- Run batch processing during efficient windows
- Use compression and efficient formats for analytics
4. Govern and refine continuously
- Audit retention policy adherence
- Monitor cost and performance of data workflows
- Retire stale datasets and unused pipelines
- Update access patterns as application behavior changes
AWS Services to Consider
Amazon S3
Delivers highly durable object storage with storage classes and lifecycle controls for performance and cost optimization.
AWS Glue
Builds and automates data cataloging and ETL pipelines to improve data processing efficiency.
Amazon Athena
Runs serverless SQL queries on data in S3 for analytics and operational reporting.
Amazon EMR
Runs scalable big data frameworks for batch and streaming data workloads.
Amazon CloudWatch
Collects metrics, logs, alarms, and dashboards so teams can detect issues early and track operational outcomes.
Common Challenges and Solutions
Challenge: Cold data kept on high-performance tiers
Solution: Automate tiering and lifecycle policies based on access telemetry.
Challenge: Large repeated full-table scans
Solution: Adopt partitioning, pruning, and incremental processing techniques.
Challenge: Data sprawl across environments
Solution: Use governance controls and retention enforcement to remove unnecessary copies.