Untangling the Multi-Cloud Data Estate


The Context

A prominent financial services institution had aggressively expanded through acquisitions over a five-year period. The consequence of this growth was a deeply fragmented technical landscape.

The parent company operated primarily on Azure. The acquired entities operated on AWS and GCP. Each cloud environment contained its own data warehouse, its own ETL pipelines, and its own definition of core business entities (e.g., "Customer," "Transaction").

The business was demanding unified reporting and had ambitious plans for Generative AI. However, data scientists were spending 80% of their time manually reconciling data across the three clouds, and regulatory reporting required an army of analysts to stitch together spreadsheets every month end.

The Diagnostic Discovery

The client's initial brief was a classic "lift-and-shift" mandate: Migrate everything to Azure.

I was engaged to design the migration architecture. However, an initial diagnostic review of the data estate revealed a critical flaw in the brief:

  1. Data Gravity: Several of the acquired entities ran specialized, high-volume operational systems on AWS that would cost tens of millions to rewrite and migrate to Azure.
  2. Business Disruption: A forced migration of the legacy data warehouses would freeze all new feature development for the acquired entities for at least two years.
  3. The Real Problem: The executives did not actually need the data to live in one cloud. They simply needed to be able to query the data as if it lived in one place.

The Architectural Redesign

I advised the leadership team to abandon the monolithic "lift-and-shift" strategy. The goal was not cloud homogeneity; the goal was data interoperability.

Instead of moving the data to a single cloud, I designed a Logical Data Fabric architecture.

  • Federated Compute: We introduced a cloud-agnostic data platform capable of querying data where it resided. Instead of building massive, fragile ETL pipelines to copy data from AWS and GCP into Azure, the platform pushed the compute down to the source clouds.
  • Unified Semantic Layer: We built a single semantic layer on top of the federated compute. This meant that a data scientist or an AI agent could query SELECT * FROM global_customers, and the platform would automatically translate, route, and execute the query across all three clouds simultaneously.
  • Governed Access: We implemented a unified access control plane, ensuring that regardless of which cloud the data lived in, regulatory and privacy policies were enforced consistently.

The Outcome

By challenging the initial assumption that all data must physically reside in one cloud, we fundamentally changed the trajectory of the transformation.

  1. Time to Value: The organization achieved unified reporting in six months, rather than the three years estimated for a physical migration.
  2. Risk Reduction: The acquired entities were able to continue their business-as-usual operations without the massive disruption of a forced cloud migration.
  3. AI Readiness: The unified semantic layer provided a clean, governed foundation for their Generative AI initiatives, allowing AI agents to reason across the entire global enterprise seamlessly.