How a FinTech Organization Built an AI-Ready Data Foundation on AWS

A growing FinTech organization needed a centralized data hub to securely connect every platform across its enterprise while meeting strict regulatory requirements and laying the groundwork for future AI and ML capabilities. The engagement delivered a compliant, production-grade data environment on AWS, built to scale from day one.

Industry

Financial Services

Teams & Services

Data Engineering, Cloud Infrastructure, Security & Compliance, DevOps

Tech & Tools

AWS Organizations, Amazon VPC, AWS Transit Gateway, AWS PrivateLink, AWS KMS, Amazon S3, AWS CloudTrail, VPC Flow Logs, AWS Inspector, Amazon Route 53, Databricks on AWS, Databricks Unity Catalog, Okta SCIM, Terraform, GitHub Actions, AWS IAM, AWS Secrets Manager, MOVEit SFTP

Key Data Points

Multi-Account AWS Foundation and FedRAMP Controls Established

Live Data Source Connectivity Activated

On-Premises Pipeline Enabled and CI/CD Handoff Completed

The Vision

A growing financial institution, establishing independent data infrastructure while operating within a parent bank's ecosystem, needed more than a standard analytics warehouse. Leadership had a broader ambition: a central data hub that would eventually connect every SaaS platform, on-premises system, and cloud service the bank relies on. Databricks was selected not just for its data engineering capabilities, but for its machine learning and AI potential, giving the institution a platform that could scale from initial ingestion pipelines into advanced modeling as it matures. That ambition required getting the foundation exactly right from day one, with enterprise-grade security, regulatory compliance, and the architectural flexibility to absorb new data sources as the bank grows.

The Goal

The project had three concrete objectives: deploy a production-grade Databricks environment on AWS with FedRAMP Moderate compliance controls in place; establish live connectivity to the bank's initial data sources including its online banking platform, CRM, and banking core; and connect to on-premises databases through secure VPN-routed pipelines to support data loads into Workday. Every architectural decision had to support future AI and ML workloads without requiring a rebuild.

The Challenge

The technical complexity came from two directions simultaneously. First, FedRAMP Moderate requirements had to be met across a multi-account AWS environment and a Databricks deployment, with no public endpoints, customer-managed encryption everywhere, centralized audit logging, and identity governance enforced at the platform level rather than bolted on afterward.

‍

Second, the initial data sources each presented different connectivity requirements. The banking core's business intelligence service had no API, requiring SFTP-based file transfer. On-premises operational data was accessible only through read-only database replicas exposed via a DMZ server, with views wrapping queries across linked servers spanning Oracle, Great Plains, IBS, and an external Workday tenant. Reaching that data required routing Databricks compute through the bank's VPN tunnels over AWS networking infrastructure, with IP addressing coordinated across both non-production and production environments. Designing a single platform architecture that accommodated all of these ingestion patterns while remaining compliant, auditable, and extensible was the central challenge.

‍

The Solution

The team designed a hub-and-spoke AWS network architecture with Databricks workspaces deployed across two dedicated accounts, one for non-production and one for production, both governed under a centralized AWS Organizations structure. No public subnets were used in the data plane. All egress traffic routes through a Transit Gateway for centralized inspection, and PrivateLink endpoints ensure data never traverses the public internet. Customer-managed KMS keys encrypt all data at rest, while VPC Flow Logs, CloudTrail, and Databricks audit logs feed a centralized SIEM for continuous monitoring.

‍

Databricks Unity Catalog serves as the governance layer across both environments, with identity managed through Okta-to-Databricks SCIM synchronization. Cluster policies enforce node type restrictions, autoscaling boundaries, and runtime versions to prevent configuration drift.

‍

For data ingestion, the team implemented API-based connectivity for Q2 and HubSpot, SFTP file transfer for the banking core, and VPN-routed pipelines to the parent bank's on-premises DMZ for batch ingestion across Oracle, Great Plains, IBS, and Workday. A streaming ingestion architecture was also scoped and documented to support future real-time data and AI and ML use cases.

‍

AI-Ready Data Foundation

A production-grade Databricks environment on AWS was deployed with the architecture and scalability needed to support future AI and ML workloads from day one.

Regulatory Confidence and Secure Connectivity

FedRAMP Moderate compliance controls were established across both AWS and Databricks, with secure pipelines connecting cloud and on-premises data sources without public internet exposure.

Unified, Scalable Ingestion

Live connectivity was activated across multiple data sources with no manual intervention, and the architecture is designed to onboard new sources as the organization grows.