SEC07-BP01: Understand your data classification scheme
Data classification provides a way to categorize organizational data based on levels of sensitivity and criticality to help you determine appropriate protection and retention controls. Develop a data classification scheme that is aligned to your organization's risk tolerance and regulatory requirements. Ensure that your classification scheme is well-documented, consistently applied, and regularly reviewed.
Implementation guidance
Understanding your data classification scheme is the foundation of an effective data protection strategy. A well-designed classification scheme enables you to apply appropriate security controls, meet regulatory requirements, and make informed decisions about data handling throughout its lifecycle.
Key steps for implementing this best practice:
- Define data classification levels:
- Establish clear classification categories based on sensitivity and business impact
- Define criteria for each classification level
- Align classifications with regulatory and compliance requirements
- Consider business impact of unauthorized disclosure, modification, or loss
- Document classification definitions and examples
- Identify data types and sources:
- Catalog all types of data your organization processes
- Identify data sources and collection points
- Map data flows and processing activities
- Document data ownership and stewardship responsibilities
- Understand data dependencies and relationships
- Establish classification criteria and procedures:
- Create decision trees and guidelines for classification
- Define roles and responsibilities for data classification
- Establish processes for initial classification and re-classification
- Implement quality assurance and validation procedures
- Create training materials and awareness programs
- Align with regulatory and compliance requirements:
- Map classification levels to regulatory frameworks
- Understand data residency and sovereignty requirements
- Identify cross-border data transfer restrictions
- Document compliance obligations for each classification level
- Establish audit and reporting procedures
- Implement classification governance:
- Establish data governance committees and roles
- Create policies and procedures for data classification
- Implement approval workflows for classification changes
- Establish exception handling and escalation procedures
- Regular review and update of classification schemes
- Enable classification automation and tooling:
- Implement automated data discovery and classification tools
- Integrate classification with data management systems
- Use metadata and tagging for classification tracking
- Implement policy enforcement based on classification
- Establish monitoring and reporting capabilities
Implementation examples
Example 1: Data classification scheme definition
Example 2: Automated data classification with Amazon Macie
markdown
Data Classification Policy
1. Purpose and Scope
This policy establishes the framework for classifying data based on its sensitivity, value, and criticality to the organization. It applies to all employees, contractors, and third parties who handle organizational data.
2. Data Classification Levels
2.1 Public Data
Definition: Information that can be freely shared with the public without harm to the organization.
Criteria:
- No competitive disadvantage if disclosed
- Already publicly available or intended for public release
- No regulatory restrictions on disclosure
Examples:
- Marketing materials and brochures
- Public website content
- Press releases
- Published research papers
Handling Requirements:
- Access Control: None required
- Encryption: Not required
- Storage: Standard business practices
- Transmission: No special requirements
- Disposal: Standard disposal methods
2.2 Internal Data
Definition: Information intended for use within the organization that could cause minor harm if disclosed externally.
Criteria:
- Limited competitive impact if disclosed
- Intended for internal business operations
- No regulatory restrictions
Examples:
- Internal policies and procedures
- Employee directories (non-sensitive)
- General business correspondence
- Training materials
Handling Requirements:
- Access Control: Organization members only
- Encryption: Recommended for external transmission
- Storage: Secure internal systems
- Transmission: Encrypted when sent externally
- Disposal: Secure disposal methods
2.3 Confidential Data
Definition: Sensitive information that could cause significant harm to the organization if disclosed to unauthorized parties.
Criteria:
- Significant competitive disadvantage if disclosed
- Contains personal information
- Proprietary business information
Examples:
- Financial reports and budgets
- Customer information and lists
- Employee personal information
- Vendor contracts
- Strategic business plans
Handling Requirements:
- Access Control: Need-to-know basis with manager approval
- Encryption: Required for storage and transmission
- Storage: Encrypted systems with access controls
- Transmission: Encrypted channels only
- Disposal: Certified secure destruction
2.4 Restricted Data
Definition: Highly sensitive information that could cause severe harm if disclosed and is subject to regulatory requirements.
Criteria:
- Severe harm if disclosed
- Subject to regulatory compliance requirements
- Legal or contractual obligations for protection
Examples:
- Social Security Numbers
- Payment card information
- Health records
- Trade secrets
- Legal privileged communications
Handling Requirements:
- Access Control: Explicit authorization required
- Encryption: Strong encryption mandatory
- Storage: Highly secure systems with audit logging
- Transmission: Encrypted with additional controls
- Disposal: Certified destruction with audit trail
3. Classification Procedures
3.1 Initial Classification
- Data owner reviews data content and context
- Applies classification criteria and decision tree
- Documents classification rationale
- Obtains required approvals
- Applies classification labels and controls
3.2 Classification Review
- Annual review of all classified data
- Event-driven review for significant changes
- Quality assurance sampling and validation
- Update classification as needed
3.3 Reclassification
- Triggered by changes in sensitivity, regulations, or business context
- Requires approval from data governance committee
- Documentation of rationale for change
- Update of all related systems and controls
4. Roles and Responsibilities
4.1 Data Owner
- Assign initial data classification
- Approve access requests
- Review classification periodically
- Ensure compliance with handling requirements
4.2 Data Steward
- Implement classification decisions
- Monitor data usage and access
- Report classification issues
- Maintain classification documentation
4.3 Data Custodian
- Apply technical controls based on classification
- Implement data handling procedures
- Maintain audit logs and monitoring
- Execute secure disposal procedures
4.4 All Employees
- Follow data handling requirements
- Report suspected classification errors
- Complete required training
- Comply with access controls
5. Compliance and Enforcement
5.1 Monitoring
- Regular audits of classification compliance
- Automated monitoring where possible
- Incident reporting and investigation
- Metrics and reporting to management
5.2 Violations
- Immediate investigation of violations
- Corrective actions and remediation
- Disciplinary actions as appropriate
- Process improvements to prevent recurrence
6. Training and Awareness
6.1 Required Training
- Annual data classification training for all employees
- Role-specific training for data handlers
- New employee orientation on data classification
- Regular updates on policy changes
6.2 Awareness Programs
- Regular communications about data classification
- Examples and case studies
- Recognition of good practices
- Incident lessons learned
7. Policy Review and Updates
This policy will be reviewed annually and updated as needed to reflect:
- Changes in business requirements
- New regulatory requirements
- Technology changes
- Lessons learned from incidents
8. Related Documents
- Data Governance Policy
- Information Security Policy
- Privacy Policy
- Incident Response Procedures
- Data Retention Policy
Policy Owner: Chief Data Officer Approved By: Executive Committee Effective Date: January 1, 2024 Next Review: January 1, 2025 <!– CODE SNIPPET HIDDEN - Original content below:
### Example 4: Classification decision tree and workflow
CODE SNIPPET WILL BE PROVIDED SOON –>
python class DataClassificationDecisionTree: “"”Decision tree for automated data classification”””
def __init__(self):
self.classification_rules = {
'regulatory_data': {
'pii': 'CONFIDENTIAL',
'phi': 'RESTRICTED',
'pci': 'RESTRICTED',
'financial': 'CONFIDENTIAL'
},
'business_impact': {
'high': 'RESTRICTED',
'medium': 'CONFIDENTIAL',
'low': 'INTERNAL',
'none': 'PUBLIC'
},
'sensitivity_indicators': {
'ssn': 'RESTRICTED',
'credit_card': 'RESTRICTED',
'medical': 'RESTRICTED',
'financial_account': 'CONFIDENTIAL',
'employee_id': 'CONFIDENTIAL',
'customer_info': 'CONFIDENTIAL'
}
}
def classify_data(self, data_attributes):
"""
Classify data based on attributes and decision tree logic
Args:
data_attributes (dict): Dictionary containing data attributes
- content_type: Type of content
- contains_pii: Boolean indicating PII presence
- regulatory_scope: List of applicable regulations
- business_impact: Impact level if disclosed
- sensitivity_indicators: List of sensitive data types found
Returns:
dict: Classification result with level and rationale
"""
classification_scores = {
'PUBLIC': 0,
'INTERNAL': 1,
'CONFIDENTIAL': 2,
'RESTRICTED': 3
}
max_score = 0
classification_rationale = []
# Check regulatory requirements
if data_attributes.get('regulatory_scope'):
for regulation in data_attributes['regulatory_scope']:
if regulation.lower() in ['gdpr', 'hipaa', 'pci-dss']:
max_score = max(max_score, classification_scores['RESTRICTED'])
classification_rationale.append(f"Subject to {regulation} regulation")
elif regulation.lower() in ['sox', 'ferpa']:
max_score = max(max_score, classification_scores['CONFIDENTIAL'])
classification_rationale.append(f"Subject to {regulation} regulation")
# Check for PII
if data_attributes.get('contains_pii'):
max_score = max(max_score, classification_scores['CONFIDENTIAL'])
classification_rationale.append("Contains personally identifiable information")
# Check sensitivity indicators
if data_attributes.get('sensitivity_indicators'):
for indicator in data_attributes['sensitivity_indicators']:
if indicator in self.classification_rules['sensitivity_indicators']:
required_level = self.classification_rules['sensitivity_indicators'][indicator]
max_score = max(max_score, classification_scores[required_level])
classification_rationale.append(f"Contains {indicator}")
# Check business impact
business_impact = data_attributes.get('business_impact', 'low')
if business_impact in self.classification_rules['business_impact']:
required_level = self.classification_rules['business_impact'][business_impact]
max_score = max(max_score, classification_scores[required_level])
classification_rationale.append(f"Business impact level: {business_impact}")
# Determine final classification
final_classification = 'PUBLIC'
for level, score in classification_scores.items():
if score == max_score:
final_classification = level
break
return {
'classification': final_classification,
'confidence_score': max_score,
'rationale': classification_rationale,
'recommended_controls': self.get_recommended_controls(final_classification)
}
def get_recommended_controls(self, classification):
"""Get recommended security controls for classification level"""
controls = {
'PUBLIC': {
'access_control': 'None required',
'encryption': 'Not required',
'monitoring': 'Standard logging',
'retention': 'Business requirements'
},
'INTERNAL': {
'access_control': 'Organization members only',
'encryption': 'Recommended for transmission',
'monitoring': 'Access logging',
'retention': 'Standard retention policy'
},
'CONFIDENTIAL': {
'access_control': 'Need-to-know with approval',
'encryption': 'Required for storage and transmission',
'monitoring': 'Comprehensive access logging',
'retention': 'Minimum required retention'
},
'RESTRICTED': {
'access_control': 'Explicit authorization required',
'encryption': 'Strong encryption mandatory',
'monitoring': 'Full audit logging and monitoring',
'retention': 'Strict retention limits with audit trail'
}
}
return controls.get(classification, controls['INTERNAL'])
Example usage and workflow
def classification_workflow_example(): “"”Example of classification workflow”””
classifier = DataClassificationDecisionTree()
# Example data attributes for different scenarios
test_cases = [
{
'name': 'Customer Database',
'attributes': {
'content_type': 'database',
'contains_pii': True,
'regulatory_scope': ['GDPR'],
'business_impact': 'high',
'sensitivity_indicators': ['customer_info', 'financial_account']
}
},
{
'name': 'Marketing Brochure',
'attributes': {
'content_type': 'document',
'contains_pii': False,
'regulatory_scope': [],
'business_impact': 'none',
'sensitivity_indicators': []
}
},
{
'name': 'Employee Records',
'attributes': {
'content_type': 'database',
'contains_pii': True,
'regulatory_scope': [],
'business_impact': 'medium',
'sensitivity_indicators': ['ssn', 'employee_id']
}
}
]
print("Data Classification Results:")
print("=" * 50)
for test_case in test_cases:
result = classifier.classify_data(test_case['attributes'])
print(f"\nData: {test_case['name']}")
print(f"Classification: {result['classification']}")
print(f"Confidence Score: {result['confidence_score']}")
print(f"Rationale: {'; '.join(result['rationale'])}")
print("Recommended Controls:")
for control_type, control_desc in result['recommended_controls'].items():
print(f" - {control_type.replace('_', ' ').title()}: {control_desc}")
if name == “main”: classification_workflow_example() <!– CODE SNIPPET HIDDEN - Original content below:
## AWS services to consider
<div class="aws-service">
<div class="aws-service-content">
<h4>Amazon Macie</h4>
<p>Uses machine learning and pattern matching to discover and protect your sensitive data in AWS. Automatically identifies personally identifiable information (PII) and provides detailed classification findings.</p>
</div>
</div>
<div class="aws-service">
<div class="aws-service-content">
<h4>AWS Resource Groups</h4>
<p>Helps you organize your AWS resources using tags. Enables grouping and management of resources based on data classification and other criteria.</p>
</div>
</div>
<div class="aws-service">
<div class="aws-service-content">
<h4>Amazon S3</h4>
<p>Object storage service with built-in tagging capabilities. Supports object-level and bucket-level tags for data classification and automated policy enforcement.</p>
</div>
</div>
<div class="aws-service">
<div class="aws-service-content">
<h4>AWS Config</h4>
<p>Enables you to assess, audit, and evaluate the configurations of your AWS resources. Helps track data storage configurations and ensure compliance with classification policies.</p>
</div>
</div>
<div class="aws-service">
<div class="aws-service-content">
<h4>AWS CloudTrail</h4>
<p>Records API calls for your account and delivers log files to you. Provides audit trails for data access and classification activities across your AWS environment.</p>
</div>
</div>
<div class="aws-service">
<div class="aws-service-content">
<h4>AWS Systems Manager</h4>
<p>Gives you visibility and control of your infrastructure on AWS. Provides automation capabilities for applying classification-based policies and controls.</p>
</div>
</div>
## Benefits of understanding your data classification scheme
- **Appropriate protection**: Enables application of security controls proportionate to data sensitivity and business value
- **Regulatory compliance**: Helps meet legal and regulatory requirements for data protection and privacy
- **Risk management**: Provides foundation for data-related risk assessment and mitigation strategies
- **Resource optimization**: Allows efficient allocation of security resources based on data criticality
- **Incident response**: Enables prioritized response to data security incidents based on classification levels
- **Data governance**: Supports effective data governance and stewardship programs
- **Cost optimization**: Helps optimize storage and protection costs based on data value and requirements
## Related resources
<div class="related-resources">
<h2>Related Resources</h2>
<ul>
<li><a href="https://docs.aws.amazon.com/wellarchitected/latest/framework/sec_data_classification_identify_data.html">AWS Well-Architected Framework - Understand your data classification scheme</a></li>
<li><a href="https://docs.aws.amazon.com/macie/latest/user/what-is-macie.html">Amazon Macie User Guide</a></li>
<li><a href="https://docs.aws.amazon.com/config/latest/developerguide/WhatIsConfig.html">AWS Config Developer Guide</a></li>
<li><a href="https://aws.amazon.com/blogs/security/how-to-use-amazon-macie-to-preview-sensitive-data-in-s3-buckets/">How to use Amazon Macie to preview sensitive data in S3 buckets</a></li>
<li><a href="https://aws.amazon.com/blogs/security/how-to-implement-data-classification-and-protection-using-aws-services/">How to implement data classification and protection using AWS services</a></li>
<li><a href="https://www.nist.gov/privacy-framework">NIST Privacy Framework</a></li>
</ul>
</div>
CODE SNIPPET WILL BE PROVIDED SOON –>