Greg Beaumont's Data & Analytics Blog

Webinar for Real-World Fabric: Ingest 250M Rows of Healthcare Data in Under 2 Hours

January 15, 2025

Are you tired of watching Fabric demos where you can’t use the tools hands-on and push them to the limits? Would you like to test out Microsoft Fabric with 250 million rows of real data? Maybe you need a solid demo or you have a real Healthcare use case for 10 years of #OpenData CMS Medicare Part D data? This session will walk you through the steps to easily deploy a solution using Fabric Notebooks, OneLake (Lakehouse), Warehouse, Pipelines, a Direct Lake Semantic Model, Power BI, and with everything optimized for use with the Fabric Copilots. All you need is a Fabric Workspace and a Power BI Pro license. You can even spin up an F64, deploy the solution from GitHub (link here: https://github.com/isinghrana/fabric-… ), and then pause the capacity while not in use. Inder Rana and Greg Beaumont will walk you through the process by which to install the solution, end-to-end, without writing any code. The entire 250M row medallion Lakehouse architecture will even be fully deployed during the session, end-to-end from the CMS servers to a Gold Lakehouse in Fabric. Register for monthly events such as this one at this link: https://aka.ms/HLSPowerBI
Fabric AI Skill – example AI Agent for your Data using Healthcare data

January 9, 2025

Interested in trying out the new Microsoft Fabric AI Skill? Fabric AI Skill is a new Generative AI capability in preview for Fabric that functions as the basis for a SaaS AI Agent in Fabric using natural language to query your data. An AI Skill module has been added to the GitHub repo created along with my colleague Inderjit Rana.

Why would you want to use AI Skill? AI Skill uses a pattern similar to a RAG large language model (LLM) so that context about your data and examples of proper SQL queries will provide a basis for the LLM to generate accurate queries that then run against your data such as “What was total costs in 2022 for Dallas, Texas?” or “Show me the top cities in Florida based on total beneficiaries?”

If you’ve already deployed the 250M row Lakehouse / Warehouse / Direct Lake solution from the repo, the new AI Skill content can be deployed in about 5-10 minutes. If you haven’t deployed the rest of the repo, steps in the repo documentation will guide you through the deployment process which should take less than an hour and result in an optimized Lakehouse and/or Warehouse that is ready for Fabric AI Skills. All that you need to deploy the entire solution is a Fabric Workspace that uses a Fabric or Premium capacity with the Fabric tools enabled.

I’m currently working on an in-depth article to review use cases for AI Skill, but for now you can get started by using the GitHub repo at this link: https://github.com/isinghrana/fabric-samples-healthcare/tree/main/analytics-bi-directlake-starschema

A video walking through a demo of the AI Skill module, along with a walkthrough of the deployment process can be found here: https://youtu.be/ftout8UX4lg
SQL Stored Procedures in Fabric Warehouse offer Blazing Speed and Power at Scale

November 6, 2024

If you come from a SQL background the Fabric Warehouse is made for you.

This article will discuss Stored Procedures (SPROCs) in the Fabric Warehouse, how they can benefit customers with SQL skills and provides a video to demonstrate the speed and power of SPROCs in the Fabric Warehouse.

Before Fabric Pipelines and Dataflows, before Azure Data Factory, and before SQL Server Integration Services (SSIS), there were Stored Procedures on SQL Server. For data professionals who are literate in SQL, Stored Procedures are a simple and reliable way to perform tasks such as creating and updating tables, running queries for data transformations, and more.

In the Healthcare industry segment, especially with Providers, both on-premises and cloud SQL Server deployments are still a popular and reliable platform for data and analytics. Many data professionals working in these environments write and think about the SQL language more frequently than they do spoken languages.

A great deal of the press and media for Microsoft Fabric has focused on Spark Notebooks (deservedly so, Spark Notebooks are fantastic). While you can write SQL in Spark Notebooks, the most popular language for Notebooks is Python which is very different from the SQL Server world many of us grew up embracing. I’ve had some customers tell me that “Fabric looks promising, but our SQL experts will need to re-skill for Spark Notebooks. Also, our SQL Views and Stored Procedures will need to be rewritten.” The Fabric Warehouse, often overlooked by the press and media, is designed in the spirit of a SQL Server environment. For some use cases, it provides a great path for moving to Fabric without re-skilling and re-writing queries.

Along with my colleague Inder Rana, we recently created a GitHub Repo for deploying 249 million rows of real CMS data to a Fabric environment for testing and demos. The end-to-end solution takes about 20 minutes to deploy using fully automated scripts in Spark Notebooks, and then has instructions to deploy the Power BI components. As an alternative to the Spark Notebooks, there is also an option to deploy the Gold layer of the medallion architecture to the Fabric Warehouse using Stored Procedures. You can try out the GitHub Repo from this link: fabric-samples-healthcare/analytics-bi-directlake-starschema at main · gregbeaumont/fabric-samples-healthcare

Initially, we offered the Stored Procedures option as an alternative to Spark Notebooks for the purposes of 1) demonstrating different options in Fabric, 2) providing a path that SQL experts would appreciate, and 3) comparing semantic model query performance in the Warehouse verses the Lakehouse. When testing, the Stored Procedures often ran twice as fast as the Spark Notebooks! The results were at first surprising, and after validating that the Warehouse Stored Procedures are indeed very powerful, I was inspired to write this video and blog.

If you come from a SQL background and you prefer to write SQL, or if you are assessing migration of analytic workloads to Fabric but you don’t want to convert your Stored Procedures and Views to Python, the Fabric Warehouse is a powerful option and worth the effort to evaluate.

Below is a link to the video that walks through deployment of Stored Procedures that transform a flattened table of data (249M rows) into a star schema with four dimensions and a fact table in the Fabric Warehouse that are optimized for Power BI query performance:
Updated Fabric GitHub Repo for 250M rows of CMS Healthcare data

October 14, 2024
Reposted from the Microsoft Healthcare & Life Sciences Blog at this link: https://techcommunity.microsoft.com/blog/healthcareandlifesciencesblog/updated-fabric-github-repo-for-250m-rows-of-cms-healthcare-data/4270097

Last year I teamed up with my colleague Inder Rana to build and release a GitHub repo for using CMS Medicare Part D data within Microsoft Fabric. The repo is intended to provide an example of an end-to-end analytics solution in Fabric that can be easily deployed by anyone with a Fabric environment. We have updated the analytics solution with some valuable improvements:
- The ELT (extract, load, and transform) process, end-to-end from CMS to the Gold layer of the Lakehouse, now takes less than 20 minutes to run with increased automation.
- The repo now contains logic to import new data for the year 2022 so that the solution contains 10 years of data (2013-2022) and nearly 250 million rows.
- There are two simple options to move the data from the CMS servers to the Gold layer in less than 20 minutes:
  1. Spark Notebooks orchestrated with a Pipeline, or 2) Spark Notebooks and SQL Stored Procedures to move the data to the Gold layer.
  2. Option 2 lands the Gold layer in the Fabric Warehouse for those of you who come from a SQL versus a Python background
The updated GitHub repo can be found at this link, please give us a “Star” if you find it useful!: fabric-samples-healthcare/analytics-bi-directlake-starschema at main · isinghrana/fabric-samples-healthcare (github.com)

The first option, using three Spark Notebooks with a single Pipeline, is reviewed in the video below.
Video: AI Podcast with a Simulation of Benjamin Franklin

September 22, 2024

What if students and lifelong learners could have an AI chat bot simulating a historical figure for educational conversations? This video is a synthetic podcast using Microsoft Azure OpenAI and Azure AI text to speech to create an AI simulation of Benjamin Franklin in the form of a podcast. With Azure AI, tools exists for anyone to easily create an educational AI simulation of a historical figure. The synthetic podcast walks through some examples of questions that don’t always have answers in books, such as “What would Ben Franklin have thought of the modern world?,” or “What questions might Benjamin Franklin have asked Nikola Tesla?”
Moving Fabric (Power BI) Workspaces from Premium or Free Trial to new F SKUs

June 26, 2024
Reposted from the Microsoft Healthcare & Life Sciences Blog at this link: https://techcommunity.microsoft.com/blog/healthcareandlifesciencesblog/moving-fabric-power-bi-workspaces-from-premium-or-free-trial-to-new-f-skus/4176482

Do you have a Microsoft Fabric Free Trial that expires soon, or are you planning to move your Power BI Workspaces using Premium capacity (P SKUs) to Fabric capacity (F SKUs)? Numerous Power BI and Fabric customers are currently faced with these scenarios. As long as the Fabric F SKU is either 1) in the same Azure Region as your Power BI Premium P SKU, or 2) in the same Azure Region as your Power BI (Fabric) tenant for the Fabric Free Trial, the process should be a point-and-click exercise for a Workspace.

I’ve created the following two videos that are intended to provide some guidance on this topic for customers:
1. Create a Microsoft Fabric Node in Azure and Attach to a Workspace – The first video walks through the process of creating a Pay as You Go (PAYGO) Fabric F SKU node in Azure. The Fabric node is then surfaced in the Fabric (Power BI) Admin portal, additional administrators are added, and an example of assigning a new Workspace to it is provided. The video is embedded below and also linked here: https://youtu.be/Cbb-jA7Hqe0
2. Moving Fabric (Power BI) Workspace from Free Trial or P SKU to F SKU – The second video demonstrates how to move a Workspace from either the Fabric Free Trial or Power BI Premium capacity to the new Fabric capacity created in the first video. As long as the Workspace is currently in the same Azure Region as the new Fabric capacity, you’ll see that the process is rather straightforward. The Video is embedded below and also linked here: https://youtu.be/q2zAf2f7wmA
Video 1 – Create a Microsoft Fabric Node in Azure and Attach to a Workspace

Video 2 – Moving Fabric (Power BI) Workspace from Free Trial or P SKU to F SKU
Video: Connect SSMS (SQL Server Management Studio) to Fabric Warehouse

April 8, 2024

The Microsoft Fabric Warehouse provides a familiar interface for users with SQL skills, even though it is based on Data Lakehouse technology. This video demonstrates how SQL Server Management Studio (SSMS) can be connected to the Fabric Warehouse to provide a user experience similar to that of SQL Server. Microsoft Entra ID is used to ensure that inbound traffic to Fabric is secure. You can recreate the demo using a Fabric solution using open source data that can be deployed from GitHub: fabric-samples-healthcare/analytics-bi-directlake-starschema at main · isinghrana/fabric-samples-healthcare
Video: Spreadsheets to Lakehouse – A History of Microsoft Fabric

February 22, 2024

Over the course of the last year I’ve had customers and colleagues ask “What exactly is Microsoft Fabric?”, and decided there could be value in creating a presentation that takes a deep dive into that topic. The video below is a recorded version of a presentation I gave at SQL Saturday Minnesota 2023 reviewing a few decades of the data and analytics tools leading up to Fabric.

Most of you probably know that Power BI and Azure Synapse components are baked into Microsoft Fabric, but what else is in Fabric? What led up to Fabric? How do tools such as SQL Server, Azure ML, Purview, Excel, and Azure OpenAI fit in? What contributions (if any) were made by older tools such as ProClarity, Panorama, Datazen, and PerformancePoint? What are the differences between a Semantic Model, a Dataset, a Tabular Model, a Vertipaq Engine, a Cube and a Multidimensional Model? What are the operational benefits of Fabric compared to older architectures?

If you have experience with these older tools, or if you’d like to learn the history of Fabric to better understand the contemporary context and the future, hopefully you find the presentation valuable. “The Evolutionary History of Microsoft Fabric – Spreadsheets to Lakehouse” can be viewed below or at the following link: https://youtu.be/JCZnv3RhTJQ
Add LLM Prompts to Reports using Power BI Copilot for Microsoft Fabric

January 11, 2024

Interested in learning more about Power BI Copilot for Microsoft Fabric? I’ve published a new video walking through the Power BI Narrative visual with Copilot that provides a no-code (SaaS) mechanism for report developers to embed Azure OpenAI (Copilot) prompts into their reports.

There are a few great videos out there on the web for building and editing reports using Power BI Copilot, but the new Copilot Narrative (still in preview at time of recording) visual deserves more attention. LLM prompts can be added to the visual, which can be re-run every time an end user filters a report. Switching your filters from “Florida in December” to “Maine in January,” and you’d like to enhance the report with some external demographic data that ties to the data from your Power BI Semantic Model? All you need to do is push a button for a new narrative.

Also, by enabling report developers to store prompts in the visual, you can instruct the Azure OpenAI LLM that is powering Copilot to add urls and citations for the data that was used in the response.

The demo in the video is using over 220 million rows of data from the Git Repo that I put together with Inderjit Rana for customers to try out Microsoft Fabric and the Power BI Direct Lake connector, and you can recreate it yourself at this link: fabric-samples-healthcare/analytics-bi-directlake-starschema at main · isinghrana/fabric-samples-healthcare
Video: Copilot for Data Factory in Microsoft Fabric for a Fiscal Date Table

December 4, 2023

At the time of this recording, Copilot for Data Factory is a preview capability for Microsoft Fabric Dataflows Generation 2. Copilot for Data Factory in Fabric enables no code data transformation such as joins, metadata changes, filtering, and more. Natural language entered into the Copilot is interpreted using OpenAI LLM technology and used to generate code within Dataflows. This example uses Copilot for content generation, which goes beyond transformation. Fiscal Date Tables usually take some time, thought, and code to create for Business Intelligence tools. This example creates the basis for a Fiscal Date table from scratch.

recent posts

about

If you come from a SQL background the Fabric Warehouse is made for you.