When working in the field of data and analytics, have you ever heard concerns about maintaining a single source of truth? How about strategic data replication causing many sources of truth? In this video I attempt to explain the history of these misconceptions and lay out a strategy for a data ecosystem that is highly performant, cost optimized, has redundancy, keeps end users satisfied, and preps your organization for AI data needs. I also provide some fun analogies to cryptocurrency (such as Bitcoin) and Biology:
Greg Beaumont's Data & Analytics Blog
Data, Analytics, AI, Business Intelligence posts and videos
recent posts
- Strategic Data Replication Enhances a Single Source of Truth Architecture for Analytics & AI
- Fabric Data Agent for Semantic Model added to our free Fabric Github Repo
- Snowflake, Fabric and Power BI Integration Options
- Video – Fabric Semantic Models add Good Math to AI Agents in Power BI, Azure AI Foundry, and Copilot Studio
- SQL Saturday – Minnesota 2025 Presentation
about
- February 2026
- January 2026
- November 2025
- September 2025
- August 2025
- April 2025
- January 2025
- November 2024
- October 2024
- September 2024
- June 2024
- April 2024
- February 2024
- January 2024
- December 2023
- November 2023
- August 2023
- April 2023
- August 2022
- July 2022
- May 2022
- January 2022
- August 2021
- June 2021
- April 2021
- March 2021
- February 2021
- December 2020
- April 2020
- March 2020
- February 2020
- November 2019
- October 2019
- January 2019
- April 2018
- February 2016
- January 2016
- June 2015
- January 2015
- December 2014
- November 2014
- October 2014
- July 2014
- June 2014
- May 2014
- April 2014
- March 2014
- February 2014
- April 2013
- March 2013
- August 2012
- July 2012
- June 2012
- May 2012
- March 2012
- May 2011
- February 2011
- December 2010
- June 2010
-
For the past two years I’ve been working with my colleague Inder Rana to build out a GitHub repo that can be leveraged by Fabric users for training and testing purposes. In the most recent update we have added a new Fabric Data Agent that works with the semantic model using Direct Lake mode. The new AI Instructions for the Direct Lake Semantic Model will also work with Power BI Copilot (see examples for Power BI Copilot standalone version in the video below!). Alongside the existing Lakehouse/Warehouse Fabric Data Agent, you can now compare and contrast both versions.
A video reviewing the new Semantic Model Fabric Data Agent capabilities in the GitHub repo can be watched below or by clicking this link:
The updated GitHub repo that deploys 275 million rows of real healthcare data to your Fabric environment for an end-to-end demo and testing solution can be found at this link: fabric-samples-healthcare/analytics-bi-directlake-starschema at main · isinghrana/fabric-samples-healthcare
If you have already deployed the GitHub repo, you can add the new Semantic Model Data Agent by following these instructions: https://github.com/isinghrana/fabric-samples-healthcare/blob/main/analytics-bi-directlake-starschema/docs/5-CreateAISkill.md#optimize-semantic-model-for-power-bi-copilot-and-fabric-data-agent
The GitHub repo can deploy all of the following Fabric components in a single quick install script:
- Lakehouse
- Notebooks
- Data Factory Pipeline
- Semantic Model (optimized for Power BI Copilot and Fabric Data Agents)
- Power BI Report
- Data Agent for Lakehouse
The following components can be deployed manually after running the quick install script:
- Warehouse
- Data Agent for Semantic Model
I previously recorded a video to review best practices for Semantic Models with AI for my YouTube channel which can be viewed below or by clicking here:

-


Figure 2.0 – Snowflake and Fabric / Power BI integration options Back in September I published an article reviewing options for sharing lakehouse format data amongst Fabric, Snowflake, and Databricks. The article can be found by clicking here. This article expands on connectivity options between Snowflake and Fabric / Power BI, covering both traditional SQL endpoint connectivity and new lakehouse integration options. Interoperability between Snowflake and Fabric / Power BI is a hot topic these days, Vice Presidents from both companies even co-authored an article on the topic at this link.
Per Figure 2.0 above, each of the seven options above will be explained along with potential use case pros and cons. After taking a deep dive into these options while preparing for this article, my opinion is that all of them will have scenarios for which they make sense albeit some should be reserved for niche architectural use cases. Depending on project personas (IT vs Business), data latency requirements, report & AI agent rendering time expectations, cost management priorities, and cloud egress costs there are many factors influencing the connectivity choices for your projects.
Please note that the views expressed in this article are my own and do not necessarily reflect the views of my employer. I am not comparing Fabric / Power BI versus Snowflake, but rather discussing these options as ways to make them work better together. Also, I plan to write a similar article for Fabric / Power BI integration options with Databricks in the near future.
For each of the seven options for Fabric / Power BI connectivity and integration in Figure 2.0 above, I have added a section below with a diagram and some examples of how the option might benefit a project:
1. Fabric / Power BI read Snowflake SQL endpoint

Figure 2.1 – Fabric / Power BI query the Snowflake SQL endpoint Power BI reading the Snowflake SQL endpoint has been generally available and supported for many years. Based on what I see in my daily work, most solutions built with Power BI and Snowflake use this method of connectivity. Fabric tools and Power BI Semantic Models can query Snowflake in a similar manner as SQL Server, Oracle, Teradata, Redshift, etc. I further break this connectivity method into three modes: Import Mode, Direct Query Mode, and Composite Mode. Import mode is also similar to the queries from Fabric tools including Dataflows, Dataflows Gen 2, and Pipelines querying the Snowflake SQL endpoint in iterative batches.
1.1 Import Mode
Power BI Semantic Model Import Mode queries the Snowflake SQL endpoint on a schedule or if triggered by an API. The results of the query are cached in a compressed columnar database of a Fabric Semantic Model that is optimized for a reduced data storage footprint, high query complexity, and high query concurrency. When architected correctly, I’ve seen Import Mode Semantic Models perform well with hundreds of millions and sometimes billions of rows in a table.
Fabric tools such as Dataflows, Dataflows Gen 2, and Pipelines also query the Snowflake SQL endpoint in a similar manner, so I have included them in this mode.
When is an Import Mode Semantic Model a good choice?
Snowflake Administrators can serve up data to users via a SQL endpoint. There’s no need to grant end users access to build anything in Snowflake, and Snowflake doesn’t need to do anything in Fabric / Power BI. Fabric / Power BI users can still do transformations with Power Query or Dataflows / Dataflows Gen 2. Using Fabric Pipelines to ingest tables, views, or custom queries from Snowflake is an option for SQL query based ingestion into Fabric.
What are the risks of an Import Mode Semantic Model?
Data latency can be an issue if data from Snowflake needs to be less than thirty minutes old. Large or complex queries can also take longer to run on Snowflake and can increase costs in pay-per-query scenarios with Snowflake. If many semantic models are hitting the same Snowflake tables, duplicate queries could compound the problem and increase total costs. Also, Import Mode Semantic Models can reach performance limits with a combination of extreme query complexity and extremely large volumes of data.
1.2 Direct Query Mode
Direct Query Mode will not create a cached columnar database in the Fabric Semantic Model. The Fabric Semantic Model will convert queries to SQL and send them directly to Snowflake for processing.
When is a Direct Query Mode Semantic Model a good choice?
Direct Query Mode to Snowflake should be a niche option reserved for either 1) use cases requiring extreme low latency for data freshness, or 2) extremely large tables of data (too large for Import Mode) with simple query logic.
What are the risks of a Direct Query Mode Semantic Model?
Every single web part on a report, or every query from an AI data agent, will send a SQL query back to Snowflake. If there are 15 widgets on the screen, clicking a filter value will send 15 queries. If 10 people are using the report concurrently, there will be (15 widgets x 10 people) 150 potentially complex queries running concurrently on Snowflake. Costs will usually be higher on the Snowflake side of the architecture, and reports may not render as quickly as Import Mode.
1.3 Composite Mode
Composite Mode is an option to use both Import Mode and Direct Query Mode in the same Fabric / Power BI Semantic Model. Composite Mode Semantic Models offer the best of both worlds, and can also be ideal use cases for advanced capabilities such as Aggregations to improve query performance on large and complex data models. I wrote an article about Composite Mode Semantic Models and Aggregations a few years ago at this link. Composite Mode will also support Direct Lake Mode Semantic Models as sources, which are covered as part of options 2-4 below.
When is a Composite Mode Semantic Model a good choice?
Composite Mode Semantic Models are good options when a complex data model is neither a good fit for Import or Direct Query Mode alone. Extremely large fact tables mixed with smaller fact tables and dimensions are usually a good fit for Composite Mode.
What are the risks of a Composite Mode Semantic Model?
Individual tables within a Composite Mode Semantic Model have the same inherent risks as Import and Direct Query Mode. Also, the complexity of the design requires strong knowledge of data modeling best practices.
2. Fabric Mirroring of Snowflake

Figure 2.2 – Fabric OneLake can mirror tables from Snowflake using Change Data Capture (CDC) With a large Power BI environment or numerous AI Data Agents, queries between Snowflake and Fabric / Power BI can be complex with extremely high concurrency. Even for Import Mode Semantic Models using the Snowflake SQL endpoint, numerous semantic models refreshing frequently can result in high compute costs and duplicate data moving across networks. Fabric OneLake has a capability called Mirroring which can be used to optimize costs and query performance when connecting to Snowflake. Mirroring will connect to a Snowflake table change data capture (CDC) mechanism to pull updates into OneLake as incrementally updated delta parquet tables. Fabric / Power BI tools and reports can then query the copy of the table which has been moved over the network from Snowflake once, opposed to each query crossing the network separately. The Fabric compute and storage costs for Mirroring are also free up to limits described at this link.
When is Mirroring a good choice?
- Numerous Fabric / Power BI items are querying the same table in Snowflake
- The Snowflake table supports CDC (is not a view)
- The Snowflake table is updated periodically with small changes or in small batches (not a Type 1 table with full refreshes)
- Cost savings for both Snowflake and Fabric compute are a goal
- Direct Lake Semantic Models are used for Power BI or AI Fabric Data Agents
What are the risks of Mirroring?
- Snowflake views and custom queries are not supported, only tables.
- Type 1 (full refresh) Snowflake tables, or tables that have bulk daily updates, are not good candidates for Mirroring
- If Mirroring breaks or needs to be reset, there may be a delay as the historical data re-populates
- Security must be reconfigured on the new iceberg tables in Fabric / Power BI
3. Snowflake write Iceberg to Fabric

Figure 2.3 – Snowflake writes Iceberg tables to Fabric OneLake Tools in Fabric / Power BI are optimized to run on delta parquet or Iceberg tables in OneLake. Snowflake natively supports Iceberg format and can write Iceberg tables to external storage. Writing those Iceberg tables to OneLake enables Fabric / Power BI tools to query the Iceberg tables directly without crossing between the platforms every time a query runs. The Snowflake team can write the table to Fabric, keep it updated, and then all the compute and traffic from Fabric / Power BI can be contained withing Fabric.
When is Snowflake writing Iceberg to Fabric a good choice?
- Numerous Fabric / Power BI items are querying the same table in Snowflake
- Snowflake data can be written from a table, a view, or a custom query
- The Snowflake environment can be configured to update the table in Fabric as needed. Less frequent updates will reduce total compute usage
- Data is sent from Snowflake to Fabric by the Snowflake team, so end users do not need permission to query Snowflake
- Cost savings for both Snowflake and Fabric compute are a goal
- Direct Lake Semantic Models are used for Power BI or AI Fabric Data Agents
What are the risks of Snowflake writing Iceberg to Fabric?
- Refreshes are triggered from Snowflake, not by end users in Fabric / Power BI
- Security must be reconfigured on the new iceberg tables in Fabric / Power BI
4. Fabric Shortcut to Snowflake Iceberg

Figure 2.4 – Fabric OneLake can shortcut to Snowflake iceberg tables In addition to Snowflake writing Iceberg tables to Fabric OneLake, Fabric OneLake can also shortcut to Iceberg tables in the Snowflake environment. This scenario is similar to Snowflake writing Iceberg to Fabric (previous option), but the storage will be outside of Fabric and accessed by OneLake as needed.
When is Fabric Shortcut to Snowflake Iceberg a good choice?
- Numerous Fabric / Power BI items are querying the same table in Snowflake
- Snowflake data can be written from a table, a view, or a custom query
- The Snowflake environment can be configured to update the Iceberg table as needed. Less frequent updates will reduce total compute usage
- Only the Fabric team setting up shortcuts will need access to Snowflake
- Cost savings for both Snowflake and Fabric compute are a goal
- Direct Lake Semantic Models are used for Power BI or AI Fabric Data Agents
What are the risks of Fabric Shortcut to Snowflake Iceberg?
- Refreshes to the Iceberg tables are configured in Snowflake, not by end users in Fabric / Power BI
- Security must be reconfigured on the new iceberg tables in Fabric / Power BI
- Fabric OneLake shortcuts will still pull the data across the network from Snowflake when the data is needed for a query or Direct Lake Semantic Model.
5. Snowflake Read Fabric SQL endpoint

Figure 2.5 – Snowflake can read the Fabric SQL endpoint What if the Snowflake team needs a secure and supported means to query data from Fabric OneLake into Snowflake? For example business users may port data into OneLake from Excel, flat files, or Power Platform tools. Snowflake can query the Fabric SQL endpoint to pull that data into Snowflake using SQL queries. The Fabric SQL endpoint should be a niche option for Snowflake users, and the next option on this list will likely be a better choice (6. Snowflake read table from Fabric as Iceberg).
When is Snowflake reading the Fabric SQL endpoint a good choice?
- Business users have uploaded curated data to Fabric, and the data is needed for projects in Snowflake
- Projects need a secure platform for business users to upload data (Fabric OneLake) as a source for Snowflake
- Data from sources that natively integrate with Fabric OneLake are needed for Snowflake projects
What are the risks of Snowflake reading the Fabric SQL endpoint?
- Connecting to OneLake and reading tables as Iceberg is probably a better option for most use cases (next option below)
- Authentication to Fabric with this option requires using managed identity
6. Snowflake read table from Fabric as Iceberg

Figure 2.6 – Snowflake can read tables form Fabric OneLake as Iceberg Just as Fabric can shortcut to Snowflake and read Iceberg tables, Snowflake can also connect to Fabric OneLake and read tables as Iceberg. In most scenarios, Snowflake reading tables from Fabric OneLake as Iceberg should be the best method to get data from Fabric into Snowflake from the perspective of cost and scalability.
When is Snowflake read table from Fabric as Iceberg a good choice?
- Business users have uploaded curated data to Fabric, and the data is needed for projects in Snowflake
- Projects need a secure platform for business users to upload data (Fabric OneLake) as a source for Snowflake
- Data from other sources that natively integrate with Fabric OneLake are needed for Snowflake projects
- Snowflake team will use Snowflake compute to work with data from Fabric
What are the risks of Snowflake read table from Fabric as Iceberg?
Compared to other options for getting Fabric data into Snowflake, I cannot find any relative risks.
7. Snowflake ingest Fabric Real-Time data as Iceberg

Figure 2.7 – Fabric Real-Time data can be ingested into Snowflake for cold path storage and analytics Fabric Real-Time Intelligence is an area of great potential growth for analytics and AI. Data can be pushed into Fabric (rather than pulled) and used for alerting and agentic operations. Real-Time Intelligence is often referred to as the hot path of a lambda architecture. For the cold path of a lambda architecture, the streaming data is stored for deep analytics and machine learning purposes. Many data science and analytics teams use Snowflake for storing historical data. Snowflake can ingest Fabric Eventhouse (Real-Time) data via OneLake as Iceberg for cold path storage.
When is Snowflake ingest Fabric Real-Time data as Iceberg a good choice?
- Fabric Real-Time Intelligence tools are used for lambda hot path AI, alerting and analytics but cold path data is stored and used in Snowflake
- Snowflake teams need access to data that is easily streamed into Fabric from Fabric-friendly sources
What are the risks of Snowflake ingest Fabric Real-Time data as Iceberg?
Compared to other options for getting Fabric Real-Time data into Snowflake, I cannot find any relative risks.
Summary of Options

Option Reference url Fabric read Snowflake SQL endpoint https://learn.microsoft.com/en-us/fabric/data-factory/connector-snowflake-overview
https://learn.microsoft.com/en-us/power-bi/connect-data/service-connect-snowflakeFabric mirroring of Snowflake DB (Copies Metadata & Data) https://learn.microsoft.com/en-us/fabric/mirroring/snowflake Snowflake write Iceberg to Fabric https://docs.snowflake.com/en/sql-reference/sql/create-external-volume#label-create-external-volume-onelake Fabric shortcut to Snowflake Iceberg https://learn.microsoft.com/en-us/fabric/onelake/onelake-iceberg-tables Snowflake read Fabric SQL endpoint https://learn.microsoft.com/en-us/fabric/data-warehouse/query-warehouse Snowflake read table from Fabric as Iceberg https://blog.fabric.microsoft.com/en-us/blog/new-in-onelake-access-your-delta-lake-tables-as-iceberg-automatically?ft=All Snowflake ingest Fabric Real-Time data as Iceberg Query Fabric OneLake Delta tables from Snowflake – Microsoft Fabric | Microsoft Learn Closing Thoughts
All of these seven options for connectivity between Snowflake and Fabric will likely have use cases in the real world. In my opinion, the following options will be the most popular due to performance, cost, and administrative considerations:
- 3. Snowflake write Iceberg to Fabric – High query complexity and user concurrency are the primary drivers of unnecessary costs when using Power BI with Snowflake. This option will allow Snowflake teams to write data (tables, views, custom queries) to OneLake on a schedule or when triggered, and then hundreds of queries can hit OneLake for AI and reporting use cases. In theory, this method should minimize cross-platform traffic and reduce total costs while offering fast query response times to end users.
- 1.3 Composite Mode Semantic Models – Leveraging both Import Mode and Direct Lake Mode for the best of both worlds, can be used in combination with #3 and #2 above.
- 6. Snowflake read table from Fabric as Iceberg – When Snowflake users need data from OneLake, this option should offer the best cost and performance.
- 7. Snowflake ingest Fabric Real-Time data as Iceberg – Similar to #6, but in the context of pulling data from the Fabric EventHouse for cold path storage and analytics in Snowflake.
Feedback and suggestions are welcome via my LinkedIn page or Twitter.
-

We are currently experiencing a generational surge in AI use cases and transformation. But will AI be able to connect directly to your raw data, perform all the necessary transformations, apply business-friendly names to fields, and add accurate logic to solutions? Does data engineering and architecture still matter?
The video embedded at the bottom of this article is a recorded version of the presentation that I gave at SQL Saturday 2025 Minnesota. In February I will be presenting an updated version of this presentation at the Power BI & Fabric Summit which is organized by Radacad. You can register for the upcoming Summit at this link: https://fabricsummit.com/product/PowerBI-Fabric-Summit-2026 . I expect these tools to evolve quite a bit over the next few months!
A well-known limitation of AI—specifically large language models (LLMs)—is that they are not fundamentally designed to perform accurate math. While newer LLMs can handle some mathematical tasks, query speeds are often slow, and the compute costs can be high. Translating the specialized context of natural language questions into precise logic also presents challenges. For example, if a business user asks, “Show me total sales for the year,” what exactly does “year” mean? Is it a calendar year, a fiscal year, or year-to-date?
Now imagine how much more complex the math becomes with a question like, “Show average sales for blue and red widgets for customers in the East, excluding store holidays.” Traditional best practices known by data professionals for decades provide a solid foundation for accurate math in AI-driven applications. These practices will continue to evolve as we design data architectures optimized for AI. Microsoft Fabric semantic models are a powerful tool for building that logic in a way that provides both context for accurate calculations and fast, efficient query performance.
If you’re a data professional with skills in dimensional modeling, query optimization, ETL/ELT, RLS/OLS—your expertise is now more crucial than ever for AI solutions that require “good math.” This video explores strategic reasons for using semantic models as the foundation for AI when querying structured data. It walks through a use case that begins with 275 million rows of raw data, demonstrates how to model the data for AI, leverages tools in Fabric semantic models to prepare the data, and then serves it to AI tools and agents using Fabric Data Agents, Power BI Copilot, Azure AI Foundry, and Microsoft 365 Copilot. GitHub repo to (no code) deploy the demo in the video to Fabric with 275M rows of real data: https://github.com/isinghrana/fabric-samples-healthcare/tree/main/analytics-bi-directlake-starschema
Here are a few helpful links:
Official Microsoft Blog article on this topic: Power BI semantic models as accelerators for AI-enabled consumption | Microsoft Power BI Blog | Microsoft Power BI
Related article on this topic from a Partner: AI in Power BI: Time to pay attention – SQLBI
-


For many years, SQL Saturday conferences around the world have provided fantastic educational opportunities for professionals specializing in Microsoft tools such as Fabric, Power BI, and SQL Server. SQL Saturday Minnesota is coming up on September 27, 2025 and I was fortunately selected to be a presenter. I’ll be presenting on best practices for Fabric and Power BI Semantic Models when designing them for use with Copilots and Agentic AI tools such as AI Foundry and Copilot Studio.
Hint: Best practices for AI with Semantic Models is going to be a hot topic in coming years and an area where dimensional modeling experts will be in high demand!
My session is scheduled to precede the event raffle at the end of the day, and will be tailored for all audiences from beginners to advanced users. Many other Microsoft experts will also be presenting throughout the day:
Title: Fabric Semantic Models are the Foundation for good math with AI
Date: Saturday, 27 Sep 2025 3:15 pm – 4:15 pm CST (60 minutes)
Location: St. Paul College, Saint Paul, Minnesota, United States
Registration link: https://sqlsaturday.com/2025-09-27-sqlsaturday1124/
Description:
We are currently experiencing a generational surge in AI use cases and transformation. But will AI be able to connect directly to your raw data, perform all the necessary transformations, apply business-friendly names to fields, and add accurate logic to solutions? Does data engineering and architecture still matter?A well-known limitation of AI—specifically large language models (LLMs)—is that they are not fundamentally designed to perform accurate math. While newer LLMs can handle some mathematical tasks, query speeds are often slow, and the compute costs can be high. Translating the specialized context of natural language questions into precise logic also presents challenges. For example, if a business user asks, “Show me total sales for the year,” what exactly does “year” mean? Is it a calendar year, a fiscal year, or year-to-date? Now imagine how much more complex the math becomes with a question like, “Show average sales for blue and red widgets for customers in the East, excluding store holidays.”
Traditional best practices known by data professionals for decades provide a solid foundation for accurate math in AI-driven applications. These practices will continue to evolve as we design data architectures optimized for AI. Microsoft Fabric semantic models are a powerful tool for building that logic in a way that provides both context for accurate calculations and fast, efficient query performance. If you’re a data professional with skills in dimensional modeling, query optimization, ETL/ELT, RLS/OLS—your expertise is now more crucial than ever for AI solutions that require “good math.”
This presentation will explore strategic reasons for using semantic models as the foundation for AI when querying structured data. We’ll walk through a use case that begins with 275 million rows of raw data, demonstrates how to model the data for AI, leverages tools in Fabric semantic models to prepare the data, and then serves it to AI tools and agents using Fabric Data Agents, Power BI Copilot, Azure AI Foundry, and Microsoft 365 Copilot.
Again, here is the registration link and it would be great to see you there!: https://sqlsaturday.com/2025-09-27-sqlsaturday1124/
-


This post examines strategies to reduce data duplication and enhance performance by enabling shared access to data at rest across Snowflake, Databricks, and Microsoft Fabric. These platforms are widely adopted in modern data ecosystems and each offers lakehouse capabilities that decouple compute from storage. In environments where multiple platforms coexist, sharing storage while orchestrating compute workloads across different engines can significantly reduce data duplication. This approach not only streamlines data management but also drives cost efficiency and can improve query performance.
This post is the first in a three part series focusing on interoperability amongst Snowflake, Databricks, and Microsoft Fabric. The following list will be updated with urls as the posts are published:
- Unify data sharing across Snowflake, Databricks, and Fabric for a lakehouse trifecta (this article)
- Snowflake and Microsoft Fabric integration connectivity options (coming soon)
- Databricks and Microsoft Fabric integration connectivity options (coming soon)
While consolidating data solutions onto a single platform is often ideal, many large organizations operate across multiple cloud environments due to team autonomy, legacy investments, or strategic diversification. Consider a scenario where Team B initiates a project to analyze the impact of supply chain disruptions on product sales. They manage and fund their analytics workloads on Cloud Platform B, which houses the sales data. However, the curated supply chain data resides on Cloud Platform A, owned by a separate team. With lakehouse architectures that support cross-platform data sharing, Team B can query data from Cloud Platform A using compute resources exclusively on Cloud Platform B. This model avoids unnecessary data replication, isolates compute to the platform using the data, reduces storage costs, and enables efficient cross-cloud analytics without compromising performance:

Figure 1.1 – Compatible modern lakehouse architectures can restrict compute to one platform when sharing data across platforms When using a modern lakehouse cross-platform architecture as in Figure 1.1 above, the following benefits are possible:
- Cost containment by platform – Data from Cloud Platform A can be used by Cloud Platform B without incurring compute costs in Cloud Platform A. In the real world, a team with in-demand data in Cloud Platform A can share large volumes of data with many other teams without incurring additional costs.
- Minimize data duplication – With a lakehouse architecture using data sharing amongst compatible platforms, the replication of identical data can usually be reduced overall.
- Performance benefits – Sharing data amongst lakehouse cloud platforms can potentially reduce latency by eliminating unnecessary data copy steps.
- Platform flexibility – Large organizations won’t need to standardize on a single platform. Value can be realized faster amongst existing teams with diverse platforms. Data can also be integrated faster after mergers and acquisitions. Vendor lock-in risks can be avoided.
Before listing out the options for data sharing amongst Snowflake, Databricks, and Microsoft Fabric please note the following:
- I limited the options in this article to data sharing using lakehouse architectures. SQL endpoints (all three platforms have them), third party connectivity options, and other compute-to-compute options were left off this list. Fabric mirroring of Snowflake was included because it creates a lakehouse table as a carbon copy mirror of a Snowflake table. Upcoming posts listed above will cover other options beyond lakehouse storage.
- I focused on lakehouse integration where the files are interoperable versions of delta parquet or iceberg. Other file formats can move across platforms, but metadata compatibility is key to minimizing data duplication for analytic scenarios.
- At the time of writing this article, some of the options are still in Preview. I’ll try to update these articles as the status changes.
- I left off options that are not “out of the box” for the three cloud platforms. For example, some customers will write delta parquet files to Azure Data Lake using Fabric and then reuse the files with Databricks. Other customers have successfully used Apache Iceberg change data capture tooling to shift Snowflake Iceberg tables to Fabric.
- There are important details about the connectivity options left out of this article for the sake of simplicity. For example options are impacted if Snowflake or Databricks are in another cloud other than Azure, or when private endpoint and private link capabilities are enabled on the platforms. If I covered every nuanced scenario, this article would become a book.
- I consulted colleagues to confirm the accuracy of this list, but if anything is mis-stated or missing please let me know and I will make corrections.
- These articles are not an attempt to compare or rank these three cloud platforms. At the time of writing this article I am a Microsoft employee, and all three products are fully supported on Microsoft Azure.
The diagram below in Figure 1.2 may not initially be easy on the eyes, but if you follow each of the Lakehouse Connectivity Options one-at-a-time you can walk through the different ways to share lakehouse data amongst Snowflake, Databricks, and Microsoft Fabric:

Figure 1.2 – Options for sharing lakehouse storage amongst Snowflake, Databricks, and Microsoft Fabric. Figure 1.3 below lists out each of the nine options above with details about availability status, potential use case scenarios, and details about where the data physically exists:

Figure 1.3 – Details about the methods for sharing data amongst Snowflake, Databricks, and Microsoft Fabric For options 1-7 in Figure 1.3 above, the following list details examples of where the feature might add value, along with a url to learn more about the capability:
Feature Reference url Fabric mirroring of Snowflake DB (Copies Metadata & Data) Microsoft Fabric Mirrored Databases From Snowflake – Microsoft Fabric | Microsoft Learn Snowflake write Iceberg to Fabric CREATE EXTERNAL VOLUME | Snowflake Documentation Fabric shortcut to Snowflake Iceberg Use Iceberg tables with OneLake – Microsoft Fabric | Microsoft Learn Fabric shortcut to Databricks unmanaged Delta Parquet Unify data sources with OneLake shortcuts – Microsoft Fabric | Microsoft Learn Fabric mirroring of Databricks Unity Catalog (just Metadata) Microsoft Fabric Mirrored Catalog From Azure Databricks – Microsoft Fabric | Microsoft Learn Snowflake read table from Fabric as Iceberg New in OneLake: Access your Delta Lake tables as Iceberg automatically (Preview) | Microsoft Fabric Blog | Microsoft Fabric Databricks read Fabric Delta Parquet via Managed Identity Integrate OneLake with Azure Databricks – Microsoft Fabric | Microsoft Learn Figure 1.4 – Reference links for the first seven options listed in Figure 1.2 and 1.3 Notice that I left options 8-9 off of the url reference table in Figure 1.4, I was unable to locate official documentation pages from Databricks or Snowflake regarding those capabilities. If anyone has those links, let me know and I’ll add them to the table.
Per the links at the beginning of this article, I will follow up to this post with two more posts about 1) Fabric/Snowflake and 2) Fabric/Databricks integration that include connectivity options beyond shared lakehouse storage.
In summary, solutions built on Snowflake, Databricks, or Fabric can share data across the platforms using lakehouse architecture tools to minimize data duplication, reduce data latency, and optimize costs without consolidating on a single platform.
-


I have been collaborating with my colleague Inder Rana on a GitHub repository for an end-to-end Microsoft Fabric solution for some time, as it provides a simple means to learn about Fabric with real open source healthcare data. This most recent update enables non-technical people who have never used Fabric with a means to install a Fabric solution built to best practices in less than an hour for learning, testing, exploration.
The new Quick Setup Github guide is available at this link: fabric-samples-healthcare/analytics-bi-directlake-starschema/quick-setup.md at main · isinghrana/fabric-samples-healthcare
The most recent updates enable you to install the end-to-end solution using a single Fabric Notebook that can downloaded from the GitHub repo, uploaded to Fabric, and then run in a blank Workspace. The full process takes less than an hour. The brilliant automation to enable this new deployment option was entirely the work of Inder Rana. You can network with and follow Inder on LinkedIn and Medium:
https://linkedin.com/in/singhinderjithttps://isinghrana.medium.com/
In less than one hour a single Fabric Notebook creates ELT Spark Notebooks, a Lakehouse with 275 rows of real public data, a Data Factory Pipeline, a Semantic Model, and a Power BI report. You can add a Fabric Data Agent from the following link at the GitHub site: fabric-samples-healthcare/analytics-bi-directlake-starschema/docs/5-CreateAISkill.md at main · isinghrana/fabric-samples-healthcare
The dataset used is the publicly available Medicare Part D Prescribers – by Provider and Drug, sourced from the Centers for Medicare & Medicaid Services (CMS): https://data.cms.gov/provider-summary-by-type-of-service/medicare-part-d-prescribers/medicare-part-d-prescribers-by-provider-and-drug
Below is a video in which Inder walks you through the process of deploying the new Quick Install: -
I’ve been sharing insights on data technologies through blogs, videos, and podcasts for over 15 years. Over time, my content has been published across a variety of platforms, some of which have since become outdated or difficult to access. To make things simpler, I’ve created this new blog as a centralized hub where you can find all of my past work, along with new content moving forward. It’s designed to be a single, reliable source for everything I’ve created past, present, and future in the world of data.
Why am I consolidating links to my old content at this site?
- With advances in AI, a consolidated record of my contributions to public content could be useful as a source of reference for my LLM projects and testing.
- A few of the blogs that I used in the past have been deleted or broken. I have been able to link to most (sadly not all) archived versions of those articles via the Way Back Machine internet archive. Many data diagram images and old OLAP MDX code snippets are forever lost.
- All of my future content, even if it is published to other sites, will be discoverable on this new site.
Where does the 80+ posts of historical content come from?
- My YouTube channel – https://www.youtube.com/@insightsoutliers
- My previous self-hosted blog titled Insights & Outliers (soon to be deprecated)
- A deprecated blog site dedicated to Open Data analytics titled Open Data Bits
- Articles and videos published to the Microsoft Healthcare and Life Sciences Blog and YouTube channel – https://techcommunity.microsoft.com/category/healthcareandlifesciences/blog/healthcareandlifesciencesblog
- Blog and YouTube channel from a former employer (citations with links provided to the original sites/videos for the full content)
- Deprecated blog from a former employer. Citations and links to archived versions of articles at the Way Back Machine are provided – https://web.archive.org/
Feedback and suggestions are always welcome!
-

Query Fabric Data Agents from Azure AI Foundry for Agentic Solutions
One of the early criticisms of Large Language Models (LLMs) was an inability to do good math when answering questions. Newer LLMs can do great reasoning and mathematical tasks, but at a high compute cost and the query times can be slow. Microsoft Fabric offers easy-to-use SaaS data tools that can scale to very large volumes of data and run traditional relational database queries. While custom RAG models (Retrieval-Augmented Generation) have been able to map natural language queries to structured databases, most implementations have been high-code custom designs.
Microsoft Fabric is a vast suite of tools that are easy to use, but using the right tools for the right purpose can require some training and knowledge. Along with my teammate Inder Rana, we created a GitHub repo that uses 250 million rows of real CMS open source Medicare Part D healthcare data, which can be found at this link: fabric-samples-healthcare/analytics-bi-directlake-starschema at main · isinghrana/fabric-samples-healthcare . The entire GitHub repo can be installed in a few hours without requiring a coding background to deploy.
In the video below, I show you how to connect an Azure AI Foundry Agent to a Microsoft Fabric Data Agent which can be created with the GitHub repo. At the conclusion of the build, you can start asking questions of the AI Foundry Agent and getting mathematically correct answers from 250M rows of data!
-

Are you tired of watching Fabric demos where you can’t use the tools hands-on and push them to the limits? Would you like to test out Microsoft Fabric with 250 million rows of real data? Maybe you need a solid demo or you have a real Healthcare use case for 10 years of #OpenData CMS Medicare Part D data? This session will walk you through the steps to easily deploy a solution using Fabric Notebooks, OneLake (Lakehouse), Warehouse, Pipelines, a Direct Lake Semantic Model, Power BI, and with everything optimized for use with the Fabric Copilots. All you need is a Fabric Workspace and a Power BI Pro license. You can even spin up an F64, deploy the solution from GitHub (link here: https://github.com/isinghrana/fabric-… ), and then pause the capacity while not in use. Inder Rana and Greg Beaumont will walk you through the process by which to install the solution, end-to-end, without writing any code. The entire 250M row medallion Lakehouse architecture will even be fully deployed during the session, end-to-end from the CMS servers to a Gold Lakehouse in Fabric. Register for monthly events such as this one at this link: https://aka.ms/HLSPowerBI