Large-Scale Secrets Migration
HashiCorp Vault to AWS Secrets Manager for a Fortune 100 Financial Services Institution
Executive Summary
Architected and implemented the core Lambda migration system that successfully transferred 62,800+ secrets from HashiCorp Vault to AWS Secrets Manager for a Fortune 100 financial services institution. The project followed the Pareto Distribution—85% of secrets were migrated in 4 months using standard processes, while the remaining 15% required custom solutions for complex scenarios like Active Directory authentication. The solution included white glove user support for secret identification, code migration assistance, and just-in-time access controls with zero standing privileges.
The Challenge
Business Context
A Fortune 100 financial services institution needed to migrate away from HashiCorp Vault due to financial concerns with their operating model. The organization faced a strict 12-month timeline to complete the migration to AWS Secrets Manager.
Technical Constraints
- Scale: 62,800+ secrets across multiple vault namespaces and paths
- Security: Zero-trust requirements with no standing privileges
- Compliance: Highly regulated financial services environment
- Metadata: Existing secrets lacked proper tagging and categorization
- User Knowledge Gap: Many users didn't know what their secrets were used for
- Code Migration: Applications needed updates to use AWS SDK instead of Vault API
- Auditability: Every migration action needed comprehensive logging
Technical Solution
Architecture Overview
We designed a just-in-time access Lambda architecture that ensured zero standing privileges while maintaining security and auditability throughout the migration process.
Core Components
Migration Lambda
Orchestrated the entire migration process with built-in error handling and rollback capabilities.
Admin Role
Granted just-in-time permissions to reader role, self-revoked upon completion/failure.
Reader Role
Received temporary path-specific read access to HashiCorp Vault secrets.
User Intake System
Microservice + SQS queue for user-provided metadata, with white glove support for validation.
Metadata Engine
Combined automated parsing with user input to enhance secret metadata and AWS tags.
Security Model
🔒 Zero Standing Privileges Architecture
- 1. Pre-migration: No roles have access to vault paths
- 2. JIT Access Grant: Admin role grants reader role permission to specific vault path
- 3. Secret Read: Lambda uses reader role to access only the designated secret
- 4. Migration: Secret copied to AWS with enhanced metadata and tags
- 5. Access Revocation: Reader role permissions immediately revoked from vault path
- 6. Token Cleanup: Admin role revokes its own token
This approach ensured that even if a bad actor gained access to the Lambda or its roles, they could not re-access previously migrated vault paths, as permissions were permanently revoked after each migration.
Implementation Strategy
Phased Approach & The Pareto Distribution
📊 The 80/20 Rule in Action
Our migration exemplified the Pareto Distribution: 80% of secrets followed standard patterns and were migrated in 5 months, 15% required custom solutions for complex scenarios like Active Directory authentication, and 5% needed white glove support for tricky migrations.
Discovery & User Engagement (Months 1-3)
Catalogued all vault paths, launched user intake system for metadata collection, and provided white glove support helping users identify secret purposes and plan code migrations.
Pilot & Standard Migration (Months 4-7)
Executed pilot with 500 secrets, then scaled to migrate 85% of all secrets using standard processes. Provided multi-language code examples and migration guidance.
Complex Cases & Cleanup (Months 8-12)
Developed custom solutions for Active Directory secrets, legacy system integrations, and edge cases. Deleted unused secrets identified during the discovery process.
Technical Challenges Overcome
Design Decision: Single-Threaded by Design
Security Requirement: Multi-threading was intentionally avoided to maintain JIT access principles.
Rationale: We required that the Lambda execution role never have access to multiple vault paths simultaneously, ensuring perfect isolation and minimizing blast radius of any potential compromise.
Challenge: Authentication Reliability
Issue: Vault authentication could occasionally fail due to network issues or temporary service unavailability.
Solution: Implemented custom retry logic with exponential backoff specifically for Vault authentication, ensuring robust connectivity while maintaining security principles.
Challenge: User Knowledge Gap
Issue: Many users didn't know what their secrets were used for or how to migrate their code.
Solution: Built user intake system with SQS queue for metadata collection, provided white glove support with code migration examples across multiple programming languages (Go, Java, Python, Node.js).
Challenge: Active Directory Integration
Issue: AD-related secrets required custom authentication logic and additional security safeguards.
Solution: Developed specialized migration patterns with enhanced access controls and validation specific to directory service authentication patterns.
Challenge: Error Recovery
Issue: Network failures and API errors required robust retry mechanisms.
Solution: Implemented idempotent operations with comprehensive logging and automated retry logic for transient failures.
Results & Impact
Business Impact
- Cost Savings: Eliminated expensive HashiCorp Vault licensing fees, saving significant operational costs
- Operational Efficiency: Improved secret management with native AWS integration and automated rotation capabilities
- Security Posture: Enhanced metadata and tagging enabled better governance and compliance reporting
- Knowledge Transfer: White glove support process educated 4,600+ engineers enterprise-wide on modern secret management practices
- Risk Reduction: Eliminated dependency on external vendor, improving supply chain security
Key Learnings & Takeaways
✅ What Worked Well
- • Just-in-time access with single-path isolation eliminated security risks
- • Security-first design decisions (no multi-threading) proved correct
- • Pareto Distribution (80/20 rule) guided resource allocation
- • White glove user support was critical for adoption
- • User intake system improved metadata quality significantly
- • Custom solutions for complex cases paid dividends
🔄 Future Improvements
- • LLM-powered secret analysis to reduce reliance on user input for metadata
- • Intelligent pattern recognition to automatically categorize secret types
- • Real-time progress dashboard for stakeholder visibility
- • Automated rollback capabilities for failed migrations
- • Integration with existing CI/CD pipelines for secret updates
Technical Skills Demonstrated
Cloud Architecture
- • AWS Lambda
- • IAM Roles & Policies
- • Secrets Manager
- • CloudWatch Logging
Security Engineering
- • Zero-trust Architecture
- • Just-in-time Access
- • Secret Management
- • Compliance Frameworks
System Design
- • Large-scale Migration
- • Error Handling
- • Rate Limit Management
- • Monitoring & Alerting