
Building a Unified Energy Data Lake Foundation for Enterprise-Scale Analytics
Protagona LLC partnered with a leading energy management organization to validate a medallion-style data lake architecture on AWS, delivering a production-grade Bronze and Silver layer implementation and a clear roadmap for unifying six disparate operational data sources.
Industry
Energy
Teams & Services
Data Architecture, Data Engineering, Engagement Management, Cloud Infrastructure
Tech & Tools
AWS Lake Formation, Amazon S3, Amazon RDS for PostgreSQL, AWS Lambda, AWS Glue, AWS IAM, Amazon VPC, Terraform, Jira, Confluence
Key Data Points
The Vision
A leader in energy conservation, this organization partners with K-12 school districts, universities, and public institutions across North America to reduce energy waste and lower operating costs. That mission generates a significant volume of operational and measurement data across multiple systems — and the ability to analyze that data at scale is central to delivering results for clients. The organization recognized that unlocking the full value of their data required moving beyond siloed systems toward a unified, governed data platform. This engagement represented a pivotal first step: proving that a modern, cloud-native data lake architecture could bring their most critical data sources together in a reliable, scalable, and analytically ready form that their team could own and extend.
The Goal
The goal was to validate a medallion-style data lake architecture on AWS using the organization's energy measurement data from a PostgreSQL source as the proof of concept. Success meant delivering a working Bronze and Silver layer implementation, a prioritized backlog for full-scale expansion across all six data sources, and a target architecture blueprint that the client team could own and extend independently or in continued partnership with Protagona.
The Challenge
The organization's data landscape spans six distinct source systems, each with its own structure, schema conventions, and integration requirements. Before committing to a full-scale build, they needed confidence that a single architectural pattern could accommodate that diversity without creating technical debt or governance gaps. The proof of concept had to be rigorous enough to serve as a genuine template — not just a demonstration — which meant addressing data type conformance, storage optimization, cleaning and normalization logic, and IAM and network security from the outset.
Adding complexity, the engagement needed to satisfy AWS MAP program requirements alongside the technical deliverables, meaning cloud maturity and modernization assessments had to be completed in parallel with architecture design and implementation. Scoping a time-boxed engagement that could validate the concept, produce production-grade artifacts, and transfer operational knowledge to the client team — all within a 60-day window — required disciplined prioritization and close collaboration with stakeholders at every sprint.
The Solution
Protagona designed a two-tier medallion data lake architecture on AWS, anchored by AWS Lake Formation for centralized governance and Amazon S3 as the durable storage foundation. The Bronze layer was purpose-built to land raw data from the organization's PostgreSQL data measurement source in a reliable, schema-consistent form, using event-driven Lambda functions backed by a connector-based ingestion framework to handle extraction, type conformance, and initial transformation directly into Iceberg-formatted tables. This layer was designed for auditability and replayability, ensuring raw data could always be traced back to its origin without modification.
The Silver layer applied cleaning, normalization, and storage optimization on top of the Bronze data, producing analytics-ready datasets that downstream consumers could query without application-specific knowledge. Transformation logic and compaction rules were designed to be source-agnostic by pattern, so the same architectural approach can be extended to the remaining five data sources in a subsequent phase. IAM roles and Lake Formation permissions were structured to support both current consumption needs and future multi-team governance requirements.
The team operated in weekly agile sprints throughout, delivering a prioritized backlog with epics, user stories, and acceptance criteria alongside a full target architecture blueprint. AWS MAP deliverables were completed in parallel. At close of engagement, the client team received a complete operational handoff — artifacts, documentation, and the knowledge needed to drive the next phase independently.

