Unify data sharing across Snowflake, Databricks, and Fabric for a lakehouse trifecta

This post examines strategies to reduce data duplication and enhance performance by enabling shared access to data at rest across Snowflake, Databricks, and Microsoft Fabric. These platforms are widely adopted in modern data ecosystems and each offers lakehouse capabilities that decouple compute from storage. In environments where multiple platforms coexist, sharing storage while orchestrating compute workloads across different engines can significantly reduce data duplication. This approach not only streamlines data management but also drives cost efficiency and can improve query performance.

This post is the first in a three part series focusing on interoperability amongst Snowflake, Databricks, and Microsoft Fabric. The following list will be updated with urls as the posts are published:

  1. Unify data sharing across Snowflake, Databricks, and Fabric for a lakehouse trifecta (this article)
  2. Snowflake and Microsoft Fabric integration connectivity options (coming soon)
  3. Databricks and Microsoft Fabric integration connectivity options (coming soon)

While consolidating data solutions onto a single platform is often ideal, many large organizations operate across multiple cloud environments due to team autonomy, legacy investments, or strategic diversification. Consider a scenario where Team B initiates a project to analyze the impact of supply chain disruptions on product sales. They manage and fund their analytics workloads on Cloud Platform B, which houses the sales data. However, the curated supply chain data resides on Cloud Platform A, owned by a separate team. With lakehouse architectures that support cross-platform data sharing, Team B can query data from Cloud Platform A using compute resources exclusively on Cloud Platform B. This model avoids unnecessary data replication, isolates compute to the platform using the data, reduces storage costs, and enables efficient cross-cloud analytics without compromising performance:

Figure 1.1 – Compatible modern lakehouse architectures can restrict compute to one platform when sharing data across platforms

When using a modern lakehouse cross-platform architecture as in Figure 1.1 above, the following benefits are possible:

  1. Cost containment by platform – Data from Cloud Platform A can be used by Cloud Platform B without incurring compute costs in Cloud Platform A. In the real world, a team with in-demand data in Cloud Platform A can share large volumes of data with many other teams without incurring additional costs.
  2. Minimize data duplication – With a lakehouse architecture using data sharing amongst compatible platforms, the replication of identical data can usually be reduced overall.
  3. Performance benefits – Sharing data amongst lakehouse cloud platforms can potentially reduce latency by eliminating unnecessary data copy steps.
  4. Platform flexibility – Large organizations won’t need to standardize on a single platform. Value can be realized faster amongst existing teams with diverse platforms. Data can also be integrated faster after mergers and acquisitions. Vendor lock-in risks can be avoided.

Before listing out the options for data sharing amongst Snowflake, Databricks, and Microsoft Fabric please note the following:

  • I limited the options in this article to data sharing using lakehouse architectures. SQL endpoints (all three platforms have them), third party connectivity options, and other compute-to-compute options were left off this list. Fabric mirroring of Snowflake was included because it creates a lakehouse table as a carbon copy mirror of a Snowflake table. Upcoming posts listed above will cover other options beyond lakehouse storage.
  • I focused on lakehouse integration where the files are interoperable versions of delta parquet or iceberg. Other file formats can move across platforms, but metadata compatibility is key to minimizing data duplication for analytic scenarios.
  • At the time of writing this article, some of the options are still in Preview. I’ll try to update these articles as the status changes.
  • I left off options that are not “out of the box” for the three cloud platforms. For example, some customers will write delta parquet files to Azure Data Lake using Fabric and then reuse the files with Databricks. Other customers have successfully used Apache Iceberg change data capture tooling to shift Snowflake Iceberg tables to Fabric.
  • There are important details about the connectivity options left out of this article for the sake of simplicity. For example options are impacted if Snowflake or Databricks are in another cloud other than Azure, or when private endpoint and private link capabilities are enabled on the platforms. If I covered every nuanced scenario, this article would become a book.
  • I consulted colleagues to confirm the accuracy of this list, but if anything is mis-stated or missing please let me know and I will make corrections.
  • These articles are not an attempt to compare or rank these three cloud platforms. At the time of writing this article I am a Microsoft employee, and all three products are fully supported on Microsoft Azure.

The diagram below in Figure 1.2 may not initially be easy on the eyes, but if you follow each of the Lakehouse Connectivity Options one-at-a-time you can walk through the different ways to share lakehouse data amongst Snowflake, Databricks, and Microsoft Fabric:

Storage integration options when using a lakehouse architecture with Snowflake, Databricks, and Fabric
Figure 1.2 – Options for sharing lakehouse storage amongst Snowflake, Databricks, and Microsoft Fabric.

Figure 1.3 below lists out each of the nine options above with details about availability status, potential use case scenarios, and details about where the data physically exists:

Figure 1.3 – Details about the methods for sharing data amongst Snowflake, Databricks, and Microsoft Fabric

For options 1-7 in Figure 1.3 above, the following list details examples of where the feature might add value, along with a url to learn more about the capability:

FeatureReference url
Fabric mirroring of Snowflake DB (Copies Metadata & Data)Microsoft Fabric Mirrored Databases From Snowflake – Microsoft Fabric | Microsoft Learn
Snowflake write Iceberg to FabricCREATE EXTERNAL VOLUME | Snowflake Documentation
Fabric shortcut to Snowflake IcebergUse Iceberg tables with OneLake – Microsoft Fabric | Microsoft Learn
Fabric shortcut to Databricks unmanaged Delta ParquetUnify data sources with OneLake shortcuts – Microsoft Fabric | Microsoft Learn
Fabric mirroring of Databricks Unity Catalog (just Metadata)Microsoft Fabric Mirrored Catalog From Azure Databricks – Microsoft Fabric | Microsoft Learn
Snowflake read table from Fabric as IcebergNew in OneLake: Access your Delta Lake tables as Iceberg automatically (Preview) | Microsoft Fabric Blog | Microsoft Fabric
Databricks read Fabric Delta Parquet via Managed IdentityIntegrate OneLake with Azure Databricks – Microsoft Fabric | Microsoft Learn
Figure 1.4 – Reference links for the first seven options listed in Figure 1.2 and 1.3

Notice that I left options 8-9 off of the url reference table in Figure 1.4, I was unable to locate official documentation pages from Databricks or Snowflake regarding those capabilities. If anyone has those links, let me know and I’ll add them to the table.

Per the links at the beginning of this article, I will follow up to this post with two more posts about 1) Fabric/Snowflake and 2) Fabric/Databricks integration that include connectivity options beyond shared lakehouse storage.

In summary, solutions built on Snowflake, Databricks, or Fabric can share data across the platforms using lakehouse architecture tools to minimize data duplication, reduce data latency, and optimize costs without consolidating on a single platform.

Posted in

One response to “Unify data sharing across Snowflake, Databricks, and Fabric for a lakehouse trifecta”

  1. […] Unify data sharing across Snowflake, Databricks, and Fabric for a lakehouse trifecta […]

Leave a Reply

Discover more from Greg Beaumont's Data & Analytics Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading