SQL Saturday Atlanta 2024

On April 30, 2024April 26, 2024 By Kristyna FerrisIn Conference Recaps, General, Microsoft Fabric, Microsoft SQL Server, Speaking, SQL Saturday

Hey there happy coders! Last weekend I had the pleasure of speaking at the SQL Saturday Atlanta event in Georgia! It was an awesome time of seeing data friends and getting to make some new friends. If you live near a SQL Saturday event and are looking for a great way to learn new things, I can’t recommend SQL Saturday’s enough. They are free and an excellent way to meet people who can help you face challenges in a new way. Below are my notes from various sessions attended as well as the materials from my own session. Enjoy!

Link to the event – https://sqlsaturday.com/2024-04-20-sqlsaturday1072/#schedule

My session – Real-Time Analytics in Fabric

Thank you so much to everyone who came out to see my first ever session on Real-Time Analytics in Fabric! We had a couple of glitches in the Logic App creation, but had an excellent time troubleshooting together as a group. Please check out my GitHub for the slide deck as well as all the code used in the demonstration. Happy coding!

Link to my GitHub – https://github.com/Anytsirk12/DataOnWheels/tree/main/Real-Time%20Analytics%20in%20Fabric

Introduction to SQL Server Essential Concepts by Bradley Ball

Goal is to understand the core things about databases to enable deeper understanding.

ACID = atomicity, consistency, isolation, and durability.
Atomicity = either all of it commits or none of it commits. Consistency = my data must be in a consistent state before a transaction completes. Isolation = transaction must operate independently from other transactions. Durability = complex logging that is the transaction log. The log has all the commit history and ensures accurate data.

Transaction isolation levels – serializable, read committed (SQL server default), read uncommitted, repeatable read, snapshot isolation. You can set some of these at the db level. Serializable = blocks anything trying to get data on the same page, you can set this at the transaction level. Read committed = I can’t read data if a transaction is currently occurring. Read uncommitted = a dirty read, grabs data that isn’t committed. Repeatable read = nobody uses this lol. It’s a shared lock that holds for a longer period of time than the typical micro second. Shared lock means everyone can read the data. Snapshot isolation = Oracle and postgres use this. Every thing is an append and update. If I have 4 transactions, 1 update and 3 read, usually update would block reads, but this would redirect readers to a copy of the data in the TempDB at the point in time they requested to read it (aka before the update commits).

DMV = dynamic management view. Bradley used one that allows you to see active sessions in your database. We can see that a read query is blocked by an uncommitted transaction. We can see the wait_type = LCK_M_S and the blocking_session_id which is our uncommitted transaction. To get around this, he can run the read script and get a dirty read by setting the isolation level to read uncommitted. To unblock the original request, he can use ROLLBACK TRANSACTION to allow it to unlock that page of data.

How does SQL Server work on the inside? We have a relational engine and a storage engine. User interacts with a SNI which translates the request to the relational engine. User > SNI > Relational Engine [command parser > optimizer (if not in planned cache otherwise goes straight to storage engine) > query executer] > Storage Engine [access methods (knows where all data is) > buffer manager (checks the data cache but if not found then goes to the disk and pulls that into the buffer pool data cache). This gets extremely complicated for other processes like in-memory OLTP. The SQL OS is what orchestrates all these items.

SQL OS – pre-emptive scheduling (operating system) & cooperative pre-emptive scheduling (accumulates wait stats to identify why something is running slower).

Locks, latches, waits. Locks are like a stop light (row, page, and table escalation). If you lock a row, it will lock a page. Latches are who watches the locks/watchmen. It’s a lock for locks. Waits are cooperative scheduling. If a query takes too long, it will give up it’s place in line willingly. That creates a signal wait which signals there’s too much lined up.

SQL data hierarchy. Records are a row. Records are on a data page (8 k). Extents are 8 pages (64 k). It’s faster to read extents than pages. Allocation bit maps are 1s and 0s that signify data on a data page that enables even faster data reads – allows governing on 400 GB of data on 1 8KB page. IAM chains and allocation units allows quick navigation of pages. Every table is divided into in row data, row overflow data (larger than 8064 k), and lob data (large object like VARCHAR max and images).

Allocation units are made of 3 types:
1. IN_ROW_DATA (also known as HoBTs or Heap or B-Trees)
2. LOB_DATA (also known as LOBs or large object data)
3. ROW_OVERFLOW_DATA (also known as SLOBs, small large object data)

Heaps vs Tables. Oracle stores data as a heap which is super fast to insert. In SQL, these have bad performance due to clustered indexes and inserting new data. This is very situational. A table is either heap or clustered index, cannot be both. But heaps can have non-clustered indexes.

B-Tree allows you to get to the record with less reads by following a logic tree (think h is before j so we don’t need to read records after j). Heaps create a 100% table scan without a clustered index. Adding the clustered index dropped that significantly to only 1 read instead of the 8000 reads.

Recovery models – full, bulk logged, simple (on-prem). In the cloud everything is full by default. Full means everything is backed up. Bulk means you can’t recover the data but you can rerun the input process. Simple means you can get a snapshot but you can’t do any point in time restore. This will largely be determined by any SLAs you have.

Transaction log. This will constantly be overwritten. Your log should be at least 2.5 as large as your largest cluster. DBCC SQLPERF(logspace) will get you all the space available for logs in the various dbs. Selecting from the log is always not recommended since it creates a lock and logs are always running, so don’t do this in prod lol. Rebuilding indexes will grow your transaction log massively. To free up space in the transaction log, you have to a backup log operation which is why those are super important.

Fun tip, when creating a table you can put DEFAULT ‘some value’ at the end of a column name to provide it a default value if one is not provided. Pretty cool.

You can use file group or piecemeal restores to restore hot data much faster then go back and restore older, cold data afterward. To restore, you must have zero locks on the db. While restoring, the database is not online. Note, if you do a file group restore, you cannot query data that is in a unrestored file group so queries like SELECT * will not work.

Tales from the field has a ton of YouTube videos on these subjects as well.

Lessons Learned using Fabric Data Factory dataflow by Belinda Allen

What are dataflows? Dataflows are a low-code interface tool for ingesting data from hundreds of data sources, transforming your data using 300+ data transformations. The goal is to allow for more people to manipulate and use data within your organization. At the heart, it’s Power Query.

Why start with dataflows as a citizen developer? It’s power query and you know that. It’s low-code data transformation. Excellent for migrating Power BI reports to Fabric.

Lots of great discussion about when it makes sense to use a dataflow gen2.

You can copy and paste power query from Power BI by going into the advanced editor OR you can hold shift and select all the queries you want then ctrl c then go to power query for a dataflow gen2 in the online service and hit ctrl v and it will populate with all your tables! Pretty neat. You can also make your relationships within the online portal.

DBA Back to Basics: Getting Started with Performance Tuning by John Sterrett

For the code visit: https://johnsterrett.com/blog-series/sql-server-performance-root-cause-analysis/

Goal of today’s session – arm anyone who is new to performance tuning with processes and sills to solve common problems.

Basic query runtime. SQL has something called wait stats that tells you what caused the query to be slow. When you run a massive query, it will go into a suspended state which will require reading from disc instead of from memory cache (T1). After that, you’re in a runable state (T2). Finally, you get to run it (T3).

Basic bottlenecks = memory, disk, CPU, network, locking blocking & deadlocks. Adding memory is typically the fastest way to improve performance.

Identify performance problems happening right now:

EXEC sp_whoisactive. This is an open source script that gives you insight into who’s running what right now. You can get this from https://whoisactive.com. The cool thing about this is there are more ways to run it than just EXEC sp_whoisactive. Identify what’s consuming the most CPU from the column. There’s also some parameters you can use like @sort_order. EXEC sp_whoIsActive @sort_order = ‘[CPU] DESC’, @get_task_info = 2. The task info parameter will give more information in a wait_info column. The best command is exec sp_whoIsActive @help = 1. This provides ALL the documentation on what it does. Adam (the creator) also has a 30 day blog series on everything it can do for you! One option to make things run faster is to kill the process causing the issue lol.

How to handle blocking. You can do explicit transactions with BEGIN TRANSACTION which will lock the table. At the end, you need to either COMMIT or ROLLBACK or else that lock holds. SQL uses pessimistic locking as default so it won’t let you read data that’s locked – it will simply wait and spin until that lock is removed. You can use exec sp_whoisactive @get_plans = 1 to get the execution plan. Be careful, the wait_info can be deceptive since the thing that takes the most time may not be the problem. It may be blocked by something else, check the blocking_session_id to ve sure. Also check the status and open_tran_count to see if something is sleeping and not committed. Keep in mind that the sql_text will only show you the last thing that ran in that session. SO if you run a select in the same session (query window) as the original update script, it won’t be blocked and can run and THAT query will show up in the who is active which can be super confusing. To resolve this issue, you can use ROLLBACK in that session to drop that UPDATE statement.

To find blocking queries use EXEC sp_whoIsActive @find_block_leaders = 1, @sort_order = ‘[blocked_session_count] DESC’.

Identifying top offenders over the long term:

There’s a feature in SQL 2016 forward called Query Store which persists data for you even after you restart data. It’s essentially a black box for SQL. Query Store is on by default in SQL 2022 and online servers. It’s available for express edition as well. Be sure to triple check this is on, because if you migrated servers it will keep the original settings from the old server. If you right click on the DB, you can navigate to query store and turn it on via Operation Mode (Requested) to Read write. Out of the box is pretty good, but you can adjust how often it refreshes and for how much history. To see if it’s enabled, you should see Query Store as a folder under the db in SSMS.

Under query store, you can select Top Resource Consuming queries. There’s lots of configuration options including time interval and what metric. SQL Server 2017 and newer have a Query Wait Statistics report as well to see what was causing pain. It’ll show you what queries were running in the blocking session. You won’t get who ran the query from query store, but you can write sp_whoisactive to a table that automatically loops (very cool). This will have overhead on top of your db, so be mindful of that.

Intro to execution plans:

Keep in mind, SQL’s goal is to get you a “good enough” plan, not necessarily the best plan. Follow the thick lines. That’s where things are happening. Cost will tell you the percentage of the total time taken.

Key lookups. It’s a fancy way to say you have an index, so we can skip the table and go straight to the data you have indexed. BUT if there’s a nest loop, then there’s an additional columns in the select statement so it’s doing that key lookup for every value. More indexes can make your select statements worse if it’s using the wrong index that isn’t best for your query.

Index tuning process.
1. Identify tables in query
2. Identify columns being selected
3. Identify filters (JOIN and WHERE)
4. Find total rows for each table in the query
5. Find selectivity (rows with filter/table rows)
6. Enable statistics io, time, and the actual execution plan
7. Run the query and document your findings
8. Review existed indexes for filters and columns selected
9. Add index for lowest selectivity adding the selected columns as included columns
10. Run the query again and document findings
11. Compare findings with baseline (step 7)
12. Repeat last 5 steps as needed

To see existing indexes, you can run sp_help ‘tableName’. In the example, there’s an index key on OnlineSalesKey but that field is not used in our filter context (joins and where statements) in the query. Order of fields in indexes do matter because it looks in that order.

Brent Ozar made a SP you can use called sp_blitzIndex that will give you a ton of info on an index for a table including stats, usage, and compression. It also includes Create TSQL and Drop TSQL for that index to alter the table.

To turn on stats, use SET STATISTICS IO, TIME ON at the beginning of the query. Be sure to also include the actual execution plan (estimated doesn’t always match what actually happened). Now we can benchmark. Use SET STATISTICS IO OFF and SET STATISTICS TIME OFF. Create an non clustered index with our filter context columns.

Derby City Data Days

On April 15, 2024April 14, 2024 By Kristyna FerrisIn General, Microsoft Fabric, Microsoft SQL Server, Power BI, Speaking, SQL Saturday

It was awesome to see the Kentucky data community come out for the first Derby City Data Days in Louisville, KY! Bringing together communities from Ohio, Tennessee, and Kentucky, the Derby City Data Days event was an excellent follow-up to Data Tune in March and deepened relationships and networks made at the Nashville event. In case you missed it, below are my notes from the sessions I attended as well as the resources for my session. Be sure to check out these speakers if you see them speaking at a conference near you!

Schedule

Building Self-Service Data Models in Power BI by John Ecken

What and why: we need to get out of the way of business insights. If you try to build one size fits all, it fits none. Make sure you keep your data models simple and streamlined.

Security is paramount to a self-service data model. RLS is a great option so folks only see their own data. You can provide access to the underlying data model for read and build which enables them to create their own reports off the data they have access to. If you give your user contributor access, then RLS will go away for that user. Keep in mind, business users need pro license OR need to be in a premium workspace.

One really great option is for people to use the analyze in Excel option to interact with the most popular BI tool – Excel. This allows them to build pivot tables that can refresh whenever needed. You can also directly connect to Power BI datasets from their organization! You can set up the display field option as well to get information about the record you connect to. Pretty slick! With this, security still applies from RLS which is awesome.

Data modeling basics – clean up your model by hiding or removing unnecessary columns (ie sorting columns). Relationships matter. Configure your data types intentionally. Appropriate naming is vital to business user success. Keep in mind where to do your transformations – SQL vs DAX (think Roche’s Maxum). Be sure to default your aggregations logically as well (year shouldn’t be summed up).

Power BI Measures – creations, quick create, measure context, time-based functions. Whenever possible, make explicit measures (using DAX) and hide the column that it was created off of so people utilize the measure you intended. Make sure you add descriptions, synonyms (for Copilot and QA), featured tables, and folders of measures. The functionality of featured tables makes it wise to use folders of measures within your fact tables.

John likes to use LOOKUP to pull dims back into the fact table so he ends up with as few tables as possible. There are drawbacks to this such as slower performance and model bloat, but the goal is for end users who don’t have data modeling experience or understanding. Not sure I agree with this method since it’s not scalable at all and destroys the purpose of a data model. Make sure you hide columns you don’t want end users to interact with.

To turn on feature table, go to the model view then go to Properties pane and toggle that is featured table button. It will require a description, the label that will populate, and the key column (cannot be hidden) that the user will put in excel as a reference for the business user to call records off of.

The PIVOT() TSQL Operations by Jeff Foushee

GitHub: https://github.com/jbfoushee/MyPresentations/tree/main/TSQL_Pivot_Operators

Be sure to look at his GitHub for the awesome source code!

Come to Lousiville on May 9th to see a presentation on JSON and TSQL.

The goal of this is to avoid FULL OUTER JOINs. This is extremely unscalable since maintenance would be terrible. We will avoid this by using pivot. Pivot means less rows, more columns. PIVOT promotes data to column headers.

You get to decide how the tuple that’s created on the pivot is aggregated (count, min, max, sum, avg, etc.). Exactly one aggregate can be applied, one column can be aggregates, and one columns values can be promoted into the column header.

PIVOT ( SUM(Col1) FOR [ID] IN ([ID_value_1], [ID_value_2], etc.)
SUM = the aggregate, ID = the column that will become more columns, the IN values = the column values from ID that will be promoted into the column header.

Time for a 3 column pivot. For this, we are doing a two column pivot and ignoring one of the fields. You can even pivot on computed fields but make sure you include the values in that inclusion clause. Be careful about adding unnecessary data.

How do you manage the VTCs (the column values that end up as column headers)? Option 1 – don’t. Option 2 – explicitly request the ones of interest and provision for future extras. Option 3 – use dynamic SQL! You can use cursor, XML, etc. Check out his ppt deck from github for code samples!

An n-column PIVOT works by essentially creating a 2-column pivot (at the end of the day it’s only two columns that ever get pivoted) and knowing which you want split into new columns.

Ugly side of PIVOT = lookups. The more fields you need to add in from additional tables, the worse performance will be. Your best option there would be to do a group by, the pivot. Another limitation is you can’t use a function in your pivot aggregation (SUM() vs SUM() *10). Get your raw data clean then pivot.

Time for UNPIVOT!

Unpivot = convert horizontal data to vertical. Less columns, more rows. Unpivot demotes column headers back into data.

Be very very careful with your data type your about to create. Remember lowest common denominator, all the values must be able to fit in one common datatype without overflow, truncation, or collation.

UNPIVOT( newColPropertyValue FOR newColPropertyName IN ([originalCol1], [originalCol2],etc.)

You need to make sure all your original columns have the same datatype. NULLs get automatically dropped out. If they are needed, you can convert them using an ISNULL function to a string value or int value depending on your need.

There’s also an option for XML-based unpivot.

Cross Apply = acquires a subset of data for each row found by the outer query. Behaves like an inner join, if no subset is found then the outer row disappears from the result set. Outer Apply is similar but it’s more like a left join. Cross Apply does keep your NULL values. You can also use a STRING_SPLIT with cross apply.

Multi-Unpivot normalizes hard-core denormalized data. You just add more UNPIVOT lines after your initial FROM statement! Make sure you have a WHERE statement to drop any records that don’t align at a column level. Something like WHERE LEFT(element1,8) = LEFT(element2, 8).

My Session – Time for Power BI To Git CI/CD

Thanks to everyone that attended my session! Had some great questions and conversations! Here’s the link to my github with the slide deck from today: https://github.com/Anytsirk12/DataOnWheels/tree/main/Power%20BI%20CICD

Medallion Architecture for Fabric by Steve Hughes

This session was awesome! We were able to watch a series of Fabric 5 minute videos and had an amazing discussion between them about options for building out a Fabric infrastructure using medallion techniques. Check out Steve’s YouTube channel for his Fabric 5 playlist and to learn more about his experience working with ALS.

SQL Saturday ATL – BI Recap

On February 13, 2024February 12, 2024 By Kristyna FerrisIn General, Speaking, SQL Saturday

First of all, wow! What an incredible event! Loved seeing everyone and meeting so many wonderful new people. Thank you to everyone who organized, spoke, and attended this event! Below are my notes from the sessions I attended as well as a link to my github with all the resources from my talk. Thank you again to all those who came to my session, you were all an amazing crowd and I really enjoyed learning with you all!

Link to the event and speakers: https://sqlsaturday.com/2024-02-10-sqlsaturday1071/#schedule

Introduction to Fabric Lakehouse by Shabnam Watson

For SQL Server users, a lakehouse will be used for landing zone and the data warehouse will be used for silver and gold layers. Currently, no demonstrated difference in price/performance between lakehouse and warehouse. Raw/Bronze = likely a lot of small files. Use with a lakehouse. Lots of ways to do (or not do) medallion architecture.

Cost is now capacity based. If you go over your capacity, there’s some smoothing but if you’re still over then they will throttle performance until smoothing catches up with your existing capacity. There is no per user license for Fabric items. F SKUs can be purchased on Azure portal, but P SKUs are through M365 licenses. F2 jobs will not be killed in the short term and the experience will match a F64. Allows you to play around for a cheaper price.

Parquet file formats are not deletable, so Delta tables are key. They build on top of parquet to soft delete (it toggles that parquet file as inactive). You can time travel because of that, but there is a cost to that since it requires a lot of storage. Fabric engines write Delta with V-order (Vertipaq). Data is automatically optimized for PBI VertiPaq engine. For example, Databricks writing to a datalake will run slower in PBI than a Fabric notebook writing to that datalake since it’ll be optimized.

Lakehouse works directly with files in the lake (workspace). Data ingestion options: Low Code/No Code (pipelines, dataflows gen2, shortcuts) and code first (notebooks, pyspark, sparkSQL). Data transformation uses the same tools. Data orchestration is through pipelines or notebooks.

Data load considerations
Use Case (Recommendation)
Small file upload from local machine (use local file upload)
Small data (dataflows)
Large data (Copy tool in pipelines)
Complex data transformations (Notebook code)

Use pipelines to orchestrate, not to transform. Use dataflows or notebooks to transform data. Notebooks can optimize your data files into larger files for faster runs.

Power BI import mode will still be fastest and direct query will still be more real time, but direct lake will allow for near real time and nearly as fast as import since there are no more scan operations. Limitations with direct lake – web modeling only (desktop is coming), it will fall back to direct query when capacity runs low which will slow your reports.

You can tie down security for the sql objects that are query-able via the SQL endpoint. You can query cross-workspace, so you can lock down the editing of data sources in a different workspace from where people consume it. Spark runs much faster in Fabric than in Synapse because the clusters are always up in Fabric so it doesn’t need to spin up a cluster to run your workload. Lakehouse land requires notebooks to interact with it. You can toggle to the SQL analytics endpoint within the service to build relationships, create measures, stored procedures, views, etc. Visual query allows you to drag and drop and right tables to generate sql behind the scenes and allows you to save that sql as a view. You can also manipulate the data using M queries. You can use CICD currently with semantic models, report, and notebooks.

How to Weave DataOps into Microsoft Fabric by John Kerski

Goal is to make production easier by combining principals from DevOps, Agile, and Lean manufacturing (every transformation steps needs to be tested for quality). If you check early, you’ll beat the customer. John has a video on this on YouTube as well.

GitHub: https://github.com/kerski
GitHub for this subject: https://github.com/kerski/fabric-dataops-patterns

Principals:

Make it reproducible using version control
Quality is paramount (TEST

Pattern 1 – DAX Query View Testing Pattern

Prerequisites: dax query view, PBIP format, Azure DevOps/Git integration (Visual Studio Code)

Use a naming convention for the tabs because we can get this out from the pbip file later. Recommended to use Measure.Tests and Column.Tests. You’ll also want to apply a schema like – TestName, ExpectedValue, ActualValue, Passed. John has a template for these tests, it’s essentially a union of rows. When you build the test cases, use the performance analyzer to grab the dax from a KPI card visual that has your testing values.

Good examples of this are making sure your date table has the correct number of days or your yearly sales are within a threshold. This only really works if you have expected values. This can be achieved by having a static dataset that eliminates variables of error to ensure your DAX is or is not the issue.

You can also test if your table still has unique values, or the % of null values in a table. DAX query view has a quick query feature to get the column features. AMAZING. You can now use the INFO.TABLES() DAX function that came out in December to grab the schema and see if it’s changing (which can break visuals). This is also vital if a data type changes and it breaks the relationship. You do need to hard code and define the schema of the table (easily create this using another script).

If you save this in the PBIP file, within the .dataset folder all your DAX queries are stored in a folder called DAX Queries! That means you can easily see if a test has changed during the PR and also easily add these tests to other models. May be wise to have sample set of tests with instructions for set up that you can easily copy/paste into other existing dataset folders.

You’ll need your workplace governance set up (dev, test, prod). Standarize your schema and naming conventions for the tests (name.environment.test). Build tests – test calcs, content, and schema.

Pattern 2 – Gen2, Notebook Automated Testing Pattern

Devops stores DAX queries > Dataflow Gen 2 can call those queries > store results in Lakehouse. Leverage notebook to orchestrate and output back into Lakehouse for PBI report to consume.

Push the pbip into DevOps git repo so DevOps has the DAX queries. You do need to have environment variables (workspace guid, branch) called out in a config file (John has instructions in his github). You can use Azure DevOps Services REST APIs to grab the items in a project/repo from a dataflow. Dataflow is called API – Azure DevOps DAX Queries – Gen 2. Lakehouse consumes the data from dataflow then we can have the notebook use Semantic Link to query a dataset using the DAX queries. You do need to have the public library for semantic link preloaded instead of using pip install. John’s sample contains easy variables to set for the branches.

Not using data factory in Fabric to call the APIs since there are a number of APIs missing for that product at the moment, but you can use a pipeline to run the dataflow and notebook in a scheduled fashion. You can work with DevOps pipeline to call this as part of a PR requirement.

Data activator could be used to trigger alerts. Be sure someone is actually monitoring the report tool since email is not super reliable.

Modernizing ETL: Leveraging Azure Databricks for Enhanced Data Integration by Joshua Higginbotham

GitHub: https://github.com/Jhiggin/Common_ETL_Patterns_Using_DataBricks/tree/main

Not everything here is transferable to Fabric, but a lot of it is.

What is Azure Databricks? Unified data analytics platform. Contains data engineering, analytics, ML, AI.

Common Architecture: ingest using ADF > store in ADLS > prep and train in Databricks > Azure Synapse and AAS to model and serve.
Microsoft recommends ADF does extraction (E) and databricks for transformation (T), but there are reasons to do more of the E within databricks.

Delta Lake Archicture: bronze/raw > silver/filtered > gold/business-level aggregates. Delta lake allows you to incrementally improve quality of data until it’s ready for consumption. Silver could contain a datalake, lots of ways to do this medallion process.

New to Databricks? Check out DataCamp and MS learning pathways. Josh can provide some resources for his favorite people to follow.

You can deploy databricks as a set environment which can make it easy to avoid egress charges or latency that would be caused by having the tenant in a region different from the data.

You have to set up access to the workspace and the items within it. Recommended using security groups for this. You can link this to source control (git hub and Azure DevOps) and do everything including PRs within the databricks UI.

Compute has lots of options. All-purpose can do everything, but may not be fastest nor cheapest option. Summary tells you how many workers, run time, memory, cores, and features that are enabled. Photon enabled is much faster. Tags are super useful for assigning accounting details so you can assign spend to teams. Don’t mess with your configuration unless you know what you’re doing. One useful config is azure data lake storage credential passthrough to pass creds through, but now it’s recommended to use unity catalog. SQL warehouses now have a serverless option that can allow you to have quick spin ups and auto shut downs when not in use to save money. Policies set up limits for scale or who can use what compute.

A mount point can tell databricks where the data is stored within ADLS. You can mount it once and can make your code much more readable and secure without the full ADLS path. dbutils.secrets.get allows you to grab information from keyvault without any access to the actual values.

Recommendation – explicitly set your schema, don’t allow databricks to guess.

Recommendation – don’t worry about partitioning delta lake until you reach 7 million records since the gains are minimal.

When you use delta tables, you can look at the delta log to see changes and time travel.
You can use %sql DESCRIBE DETAIL table to see meta data for a table.

You can use Partner Connet to open data in Power BI desktop. Keep in mind that if it’s too large it will connect with direct query which means your cluster will always be up and costing you a lot of money.

%sql DESCRIBE HISTORY will let you see what the data looked like as of a certain date. Very dependant on how much data you’re storing/pruning.

You can create CLONE s off of a delta table, so you can clone the table as of a certain time. This can also create snapshots.

Does Clone persist with vacuum? Shallow clone is a metadata copy, deep is default which also copies the full data. Unsure if either are impacted by vacuum, but the idea with clone is that it’s not persisted typically.

Configuration_Driven_Development_Demo is another notebook he walked us through (see github). He has a config file in yaml that can config for a certain client (file path and schema for the table and transformations). It does a lot of things including mapping SQL data types to spark data types. This pipeline is for more repeatable patterns for ETL.

Power BI and External Tools: This is the way! by Jason Romans

Goal is to make you a jedi of Power BI.

History of tools – back in the day you would start with a project in visual studio. Behind the scenes it was mostly editing the JSON file. Tabular Editor came out as a glorified json editor.

SSMS (SQL Server Management System): dependable and very useful for SQL. No DAX formatting, no column names displayed. It can process (full refresh, recalculate, etc) premium datasets and you can schedule that using sql jobs. Can’t connect to PBI desktop. Great for a dba.

Azure Data Studio (ADS): shiny but not the tool you’re looking for. Cannot connect to analysis services nor pbi desktop.

DAX studio: an analyst. Has intellisense, links to dax guide, dax formatter, performance tuning. You can see hidden date tables. Select “Server Timings” in DAX studio then run a query to get all the server timings to understand where it’s spending it’s time. You can also use DAX studio to compare two versions of a measure to see which runs faster.

Tabular Editor: the builder or tinkerer. There is a free and paid version. Create and edit measures & calculation groups. You can also do C# scripting and automation (my favorite!) and command line integration! Can help with source control. Tabular Editor github has tons of easy to use scripts you can use for common BI tasks (Tabular Editor GitHub and Useful Scripts Blog). The best practice analyzer can easily apply fixes and script out a fix.

PBI Explorer – navigator. Able to explore the changes. Pbip format is new and compares projects using vscode. Gitignore file is in pbip to tell git to ignore the data itself and only bring in metadata. PBI explorer can visualize the report differences. You don’t need the project file format to use PBI explorer. To run it, you have to run the executable – not located in the external tools tab in power bi desktop. It can actually visualize the changes. There’s additional functionality to even explorer and analyze the visuals.

Bravo – https://bravo.bi/. Jack of all trades. This one seems to fill in the gaps. It can help optimize data model by looking at unused columns, size, cardinality. Allows the adding of custom date tables. Exports data. First screen analyzes the model for you. Also allows you to format dax. You’ll get a warning for two things on date tables – 1 is if you still have auto date/time enabled, 2 is if you have a Date or Holiday table already in the measure. If you give the bravo date table a new name, it will resolve issue number two. The bravo date table is a DAX calculated table.

Semantic Link – A Magic Cauldron to Unlock Power BI with Python by Stephanie Bruno & Stacey Rudoy

Blog – https://data-witches.com/

Content at – bit.ly/ILDpres

Check out fabric.guru for the inspiration behind this session. Here’s the specific blog post relating to semantic link and a data catalog (https://fabric.guru/fabric-semantic-link-and-use-cases#heading-what-about-power-bi-developers).

Semantic link is a Fabric feature announced in Dec 2023. https://learn.microsoft.com/en-us/fabric/data-science/semantic-link-overview

Features – we can evaluate DAX queries, get metadata, and take actions like kicking off a refresh.

Two ways to install – pip install semantic-link or create an environment. When you create an environment, you can add libraries and configure Spark properties. Once created, you can set a notebook to use it.

Why use it? Document the tenant (semantic model catalog), save snapshot of semantic model, download usage metrics.

Data Quality – there’s a find dependency script that allows you to see what columns rely on others. You can also use the Great Expectations library.

For REST APIs, use the FabricRestClient to call the Fabric REST endpoints.

Usage metrics! With semantic link, you can loop through all the workspaces and query the model to dump the data directly into a Datalake! You cannot connect to the Admin monitoring workspace using semantic link (not supported currently). So excited to be able to loop through all the workspaces one has access to!

Before running a notebook in Fabric, you need to create an environment. Add a python library for semantic-library. Then save and publish and you can use this within the notebook. If you don’t use environment, you’ll have to pip install semantic link. There is a learn module about the semantic link library (sempy fabric read table). If you call datatsets() without anything in the parenthesis it will grab datasets within the workspaces that the notebook lives in now. You can also plot relationships.

You can also query the DMVs if needed.

Ooo there’s an option for read_table so you don’t need to evaluate a dax expression to return a Fabric dataframe. To send to lakehouse all you need to use is to_lakehouse_table function and it creates or appends data to a table in lakehouse. Keep in mind, you’ll want to remove spaces for lakehouse functionality. You can only use to_lakehouse_table if your dataframe is a Fabric dataframe. Otherwise, you’ll needto use write.format(“delta”).mode(“overwrite”).

Phil Seamark has a great blog about this as well: https://dax.tips/2023/12/05/visualize-power-bi-refresh-using-sempy/.

Power BI – Performing Data Quality Checks Using Python & SQL by me 🙂

Abstract:

Picture this, you have a report in Power BI that someone passes off to you for data quality checks. Simple enough until they clarify they need every measure checked against the source database.
There are a few ways to make sure the measures match what is in the source data system, but you need something easily repeatable and self documenting. In this presentation, I will walk through how to use python and excel to perform our data quality checks in one batch and document any differences. In order to do that, we are going to build a python script that can run Power BI REST APIs, connect to a SQL Server, and read/write to Excel.
By the end of this session, attendees will have a plug and play tool to check data from SQL against Power BI.

Link to GitHub with Power Point and final scripts: https://github.com/Anytsirk12/DataOnWheels/tree/main/SQL%20Saturday%20ATL%20-%20BI%202024

SQL Saturday Atlanta 2023 Recap

On February 27, 2023February 26, 2023 By Kristyna FerrisIn General, SQL Saturday

This past weekend, I had the pleasure of attending and speaking at SQL Saturday Atlanta! If you’re in the area, I highly recommend connecting with the local user group and getting to know fellow data nerds near you. Thank you to everyone who was able to make it out, it was great to see such a large in person SQL Saturday event post-pandemic! Big shout out to the volunteers, organizers, and sponsors who made it all happen. Thank you all for your time, hard work, and commitment that will lead to rebuilding the data community. This was the first conference I felt confident enough in my presentation to attend a session in every timeslot! Below are some takeaways I took from the sessions I attended, but there were a lot of other incredible sessions and I recommend checking out the schedule for any that interest you and reaching out to the speaker.

Attended Session Takeaways

Practical Use Cases for Composite Models (Kevin Arnold)
- Never thought of using composite models for pulling in an enterprise model with a thin slice specific to the use case of the report. Genius method for maintaining a focused enterprise model while meeting the needs of your end users.
- Perspectives can be used with personalized visuals instead of hiding columns, so end users are not overwhelmed by column and measure options.
- Field parameters can also be used/created by end users for a cultivated experience that meets their business needs without impacting the larger audience of your enterprise model. If you haven’t heard of them (I hadn’t), highly recommend checking out this link.
Planning Steps for a Power BI Report (Belinda Allen)
- Always ask stakeholders what their experience is with Power BI, it will help put all their questions and assumptions in context.
- Ask your stakeholder for the scope of success. If they can’t define what success is for the project, you have the wrong person or the wrong client.
- Show nothing in a needs gathering session. Listen and take notes. Similar to watching a movie before reading a book, it will severely limit the imagination necessary for an impactful report.
- Ask who is maintaining the data currently and who will continue to do so.
- Check out PowerBI.tips Podcast.
- Ask if they want data access or data analytics. This will let you know if a visual report is a waste of resources for them and/or if paginated report or something similar better fits their needs.
- Check out Chris Wagner’s blog, he has a great slide deck for a wireframing session with success owner after the needs gathering session.
- Host office hours or something similar to foster on-going user growth
- After project, always ask if we achieved defined success benchmarks. Try to give them a concrete ROI (ie x hours saved = x $ saved based on average salary).
- Linktr.ee/msbelindaallen
Introduction to Azure Synapse Studio Development Tools (Russel Loski)
- Synapse workspace can allow you to create T-SQL and python notebooks off items in Azure Data Lake Storage like csv and parquet files.
- Notebooks allow markdown to be side-by-side with code
- ctrl + space will bring up snippets to use within a notebook
- No indexing since it’s serverless, prepare for some wait time.
- We can promote column headers using a variable HEADER_ROW = TRUE
DataOps 101 – A Better Way to Develop and Deliver Data Analytics (John Kerski)
- Check out the The DataOps Manifesto – Read The 18 DataOps Principles
- Principles are repeatable and adaptable to new technologies
- Make everything reproducible
- Versioning and automated testing are keys to building sustainable solutions
- Check out the DataOps Cookbook and pbi-tools
Power BI Performance in 6 demos (Patrick LeBlanc & Adam Saxton from Guy in a Cube)
- To reduce the “other” line item in performance analyzer, limit the number of visual objects on a page.
- Optimize DAX when the line item is over 120 milliseconds.
- SLA for page loads is 5 seconds.
- Using drop downs in your slicer will delay DAX from running for that visual object. That deferred execution aids in speeding up the initial load.
- Tooltips run DAX behind the scenes on the initial load for visuals (you can see this by copying out the DAX query into the DAX Studio). To delay this execution until it’s needed, use a tooltip page.
- If the storage engine in DAX Studio is over 20 milliseconds, there’s opportunity to optimize.
- Variables limit the number of times a sub-measure will be run and speed up DAX queries behind visuals.
- Keep in mind while performance tuning, Power BI desktop has 3 caches – visual, report, and analysis services engine. You can clear all caches within the desktop tool except visual. To clear that cache, you need to close and reopen the PBIX file.

My Session

I cannot express enough how grateful I am for everyone who was able to make it to my session! To have so many established professionals in the field approach me afterwards telling me how well it went was a dream come true. If you’re interested in reviewing the slides and code, please check out my GitHub folder for all you will need to recreate the demo we went through. Miss it? No worries! I’ll be presenting this topic at SQLBits and the Power BI Cruise, so come join me! I’m also open to presenting at various user groups, feel free to reach out to me at kristyna@dataonwheels.com.

Again, thank you so much to everyone who made this weekend possible and please connect with me on LinkedIN and Twitter! I’d love to stay connected!

Effectively Integrating FHIR Data from Azure Health Services

On December 13, 2022December 13, 2022 By SteveIn 3Cloud, ADLS, Azure, Azure Data Studio, FHIR, JSON, Jupyter Notebook, Microsoft Azure, SQL Saturday, Synapse, T-SQL1 Comment

This blog is intended to be a follow up from the SQL Saturday 2022 in Oregon & SW Washington. In that session I presented an introduction to FHIR and JSON data produced from the Azure Health Services API’s.

With the recent updated mandates in the healthcare environment in the United States, Microsoft has continued to expand its capability to support the FHIR standard for integrating healthcare data. While the standard is well documented and Microsoft’s capabilities are expansive, it falls on data professionals to interpret that data and build meaningful reports and produce meaningful insights from the data as it is collected and integrated across environments. This requires a good working knowledge of JSON in SQL to manipulate complex data models. In the session, we did a short review of the FHIR standard and the overall implementation of FHIR in Azure. From there we reviewed the resulting data in the data lake and in Synapse. That was followed up with an overview into the heart of complex SQL using JSON functions in Synapse. Whether or not you are active in healthcare today, this will be an enlightening session on how to use JSON SQL functions within the Azure SQL platforms.

What is FHIR and why should you care?

FHIR stands for Fast Healthcare Interoperability Resources. this is the latest specification for interoperability in healthcare produced by HL7. To be clear the word fast has nothing to do with performance, but more about the ability to implement and integrate data quickly. With the latest regulations around the world in health care, this standard is the established standard for integrating healthcare data and we’ll continue to be on the forefront of this work. If you do any work in health care, you will need to understand FHIR because you will likely run across data formatted to the standard from many different sources.

FHIR is very well documented. In many ways when the standard is properly followed the JSON documents or other supported formats are effectively self-documenting. It is commonly understood that the core FHIR specification handles about 80% of the use cases in healthcare. It is designed to be flexible so that it can support specialized needs within regions or healthcare areas. For example, in the US there is a need to support race and ethnicity. The U.S. Core Implementation Guide provides guidance on the specification enhancements to support this need for U.S. healthcare organizations. You will find similar support for other countries as well as specific implementations for healthcare vendors such as Epic.

Neither the notebook, the presentation, or this blog is expected to be and exhaustive coverage of FHIR. before we move on to some of the other implementation pieces, it is important to understand one key aspect of FHIR is the basic building block called a resource. A resource is the core exchangeable content within the specification. All resources share the following characteristics:

A common way to define and represent the resource including data types and patterns
A common set of metadata which can be discovered easily
A human readable part

For more detailed information on the supported resources and other details around FHIR implementation, you should visit the following website:

Azure Health Services and the FHIR API

I will not be digging into a lot of the health care services information nor the FHIR support within Azure in this post. The important things to understand is that Microsoft has made a concerted effort to support this specification which includes technology and architectures for the extraction of data from various healthcare systems which will then use the FHIR APIs to standardize that extracted data into the FHIR spec typically in JSON files in the data lake. Because of the standardized format, Microsoft is able to supply a set of common schemas that can be used in serverless synapse to create external tables and views to accelerate the implementation and usage of data produced from the APIs. It is from this starting point that we are able to start working with the data in reporting and analytics solutions.

At this point I want to put a plug in for the company I work for. If you're interested in learning how Azure health services and the FHIR specification can be implemented at your company, we have FHIR Quick Start and FHIR Data Blueprint solutions. These solutions have been used by many other customers to achieve high levels of integration in their health care data estate. If you're interested in learning more, please reach out to us at: https://3cloudsolutions.com/get-started/

Working with the data from the FHIR API using JSON in SQL

As noted in the previous section, Azure Health Services comes with setup serverless tables and views to be used with the extracted data. However due to the complexity of FHIR, there are a number of columns within those tables and views which still contain JSON snippets. For example, there is one field for name which has several objects and arrays to support the specification. You cannot simply select the name from the table and use that as you move forward. There are many different fields like this throughout the data. For the rest of this blog and in the notebook, we will work through a number of scenarios to build a view of the patient resource that can be used for simple reporting. This view will contain a few JSON functions from SQL Server and solve simple to complex scenarios in the illustration.

The functions we will be using:

ISJSON
JSON_VALUE
OPENJSON

In addition to these functions, we will also be using the CROSS APPLY operator in SQL to join our data with relational data.

The examples in the notebook are built on the tables resulting from working with the Azure FHIR API. I am unable to provide a sample of the data to use with the set of information in the notebook currently. However, the SQL will work if you have your own FHIR implementation and a Patient resource to work with. rather than rewrite the entire contents of the notebook in the blog post, here is a link to the notebook.

If you plan to implement this in the same way, you will need Azure Data Lake, Azure Synapse serverless, and Azure Data Studio. the notebook can be opened in Azure Data Studio. If you are unfamiliar with working with notebooks inside of Azure Data Studio, you are not alone. Check out this post which discusses how to implement your first notebook in Azure Data Studio.

Building our view and SQL with JSON functions

If you decide not to open the notebook but are curious what the view looks like here is a finished product that we created in the notebook.

SELECT TOP (20) p.resourceType + '/' +  p.id as PatientResourceID
    , p.resourceType as ResourceType
    , p.id as ResourceID 
    , cast(p.[meta.versionId] as int) as VersionID 
    , cast(p.[meta.lastUpdated] as DATETIME2(7)) as LastUpdated 
    , JSON_VALUE(p.[name], '$[0].family') as LastName
    , JSON_VALUE(p.[name], '$[0].given[0]') as FirstName
    , cast(p.active as bit) as IsActive
    , p.gender as Gender 
    , CAST(p.birthDate as date) as BirthDate
    , CASE WHEN p.[maritalStatus.coding] is null THEN NULL
           WHEN  JSON_VALUE(p.[maritalStatus.coding], '$[0].system') = 'http://terminology.hl7.org/CodeSystem/v3-MaritalStatus' 
                    THEN JSON_VALUE(p.[maritalStatus.coding], '$[0].code')
           ELSE NULL
           END as MaritalStatus 
    , CASE WHEN JSON_VALUE(p.[address], '$[0].use') = 'home' THEN JSON_VALUE(p.[address], '$[0].state')
            WHEN JSON_VALUE(p.[address], '$[1].use') = 'home' THEN JSON_VALUE(p.[address], '$[1].state')
            WHEN JSON_VALUE(p.[address], '$[2].use') = 'home' THEN JSON_VALUE(p.[address], '$[2].state')
            WHEN JSON_VALUE(p.[address], '$[3].use') = 'home' THEN JSON_VALUE(p.[address], '$[3].state')
            ELSE NULL
            END as HomeStateOrProvince
    , e.Ethnicity
    , r.Race
FROM fhir.Patient p
INNER JOIN (SELECT id, max([meta.versionId]) as currentVersion FROM fhir.Patient GROUP BY id) cp
    ON p.[meta.versionId] = cp.currentVersion
    AND p.id = cp.id
LEFT JOIN 
    (SELECT p.id
        , CASE WHEN JSON_VALUE(ext.value,'$.extension[0].url') = 'ombCategory'
            THEN
            CASE WHEN JSON_VALUE(ext.value, '$.extension[1].valueString') IS NOT NULL  THEN JSON_VALUE(ext.value, '$.extension[1].valueString')
                    WHEN JSON_VALUE(ext.value, '$.extension[0].valueString') IS NOT    NULL THEN JSON_VALUE(ext.value, '$.extension[0].valueString')
                    ELSE JSON_VALUE(ext.value, '$.extension[0].valueCoding.display')
                    END
            ELSE JSON_VALUE(ext.value, '$.valueCodeableConcept.coding[0].display')
            END AS Ethnicity 
        FROM 
        (
            SELECT fp.id, fp.extension FROM fhir.Patient fp
            INNER JOIN (SELECT id, max([meta.versionId]) as currentVersion FROM fhir.Patient GROUP BY id) cp
                ON fp.[meta.versionId] = cp.currentVersion
                AND fp.id = cp.id
            WHERE ISJSON(fp.extension) =1
        ) p 
        CROSS APPLY 
            OPENJSON(p.extension,'$'
            ) as ext
        WHERE JSON_VALUE(ext.value,'$.url') = 'http://hl7.org/fhir/us/core/StructureDefinition/us-core-ethnicity'
    ) e on e.id = p.id 
LEFT JOIN 
    (SELECT p.id
        , CASE WHEN JSON_VALUE(ext.value,'$.extension[0].url') = 'ombCategory'
            THEN
            CASE WHEN JSON_VALUE(ext.value, '$.extension[3].valueString') IS NOT NULL THEN JSON_VALUE(ext.value, '$.extension[3].valueString')
                    WHEN JSON_VALUE(ext.value, '$.extension[2].valueString') IS NOT NULL THEN JSON_VALUE(ext.value, '$.extension[2].valueString')
                    WHEN JSON_VALUE(ext.value, '$.extension[1].valueString') IS NOT NULL THEN JSON_VALUE(ext.value, '$.extension[1].valueString')
                    WHEN JSON_VALUE(ext.value, '$.extension[0].valueString') IS NOT NULL THEN JSON_VALUE(ext.value, '$.extension[0].valueString')
                    ELSE JSON_VALUE(ext.value, '$.extension[0].valueCoding.display')
                    END
            ELSE JSON_VALUE(ext.value, '$.valueCodeableConcept.coding[0].display')
            END AS Race 
        FROM 
        (
            SELECT fp.id, fp.extension FROM fhir.Patient fp
            INNER JOIN (SELECT id, max([meta.versionId]) as currentVersion FROM fhir.Patient GROUP BY id) cp
                ON fp.[meta.versionId] = cp.currentVersion
                AND fp.id = cp.id
            WHERE ISJSON(fp.extension) =1
        ) p 
        CROSS APPLY 
            OPENJSON(p.extension,'$'
            ) as ext
        WHERE JSON_VALUE(ext.value,'$.url') = 'http://hl7.org/fhir/us/core/StructureDefinition/us-core-race'
    ) as r on r.id = p.id

Here is a sample of the results from that view:

PatientResourceID

ResourceType

ResourceID

VersionID

LastUpdated

LastName

FirstName

IsActive

Gender

BirthDate

MaritalStatus

HomeStateOrProvince

Ethnicity

Race

Patient/d8af7bfa-5008-4a0f-85d1-0af3448a31dd

Patient

d8af7bfa-5008-4a0f-85d1-0af3448a31dd

2022-05-31 18:07:03.2150000

DUCK

DONALD

male

1965-07-14

NULL

Patient/78cf7725-a0e1-44a4-94d4-055482781afb

Patient

78cf7725-a0e1-44a4-94d4-055482781afb

2022-05-31 18:07:30.7490000

Gretzky

Wayne

NULL

1990-05-31

NULL

Patient/9e909e52-61a1-be50-1878-a12ef8c36346

Patient

9e909e52-61a1-be50-1878-a12ef8c36346

2022-05-31 18:39:58.1780000

EVERYMAN

ADAM

NULL

male

1988-08-18

NULL

Non Hispanic or Latino

White+Asian

Patient/585f3cc0-c727-4989-9214-a7a7b60a2ade

Patient

585f3cc0-c727-4989-9214-a7a7b60a2ade

2022-05-31 13:14:57.0640000

DUCK

DONALD

male

1965-07-15

NULL

Patient/29a819c4-f553-8189-2354-9441b86d37ef

Patient

29a819c4-f553-8189-2354-9441b86d37ef

2022-05-18 15:18:40.1560000

FORD

ELAINE

NULL

female

1992-03-10

NULL

Patient/d5fe6802-a680-e762-8f43-9659340b00ac

Patient

d5fe6802-a680-e762-8f43-9659340b00ac

2022-05-18 14:39:52.2550000

EVERYMAN

ADAM

NULL

male

1961-06-15

NULL

Patient/4d661053-a8d0-148c-7023-54508fd04a52

Patient

4d661053-a8d0-148c-7023-54508fd04a52

2022-05-21 13:48:24.9720000

EVERYMAN

sam

NULL

male

1966-05-07

NULL

Not Hispanic or Latino

White

Wrapping it up

As you can see, understanding the specification well enough to build a complex SQL statement using JSON functions is required to work within FHIR effectively. Due to the complex nature of the nested JSON, you may not be able to reconcile this in tools such as power BI. Being able to build this out in SQL guarantees that you have provided you will report writers and analysts with a solid result set which can be used with confidence.

Resources summary:

Link to notebook
Link to FHIR spec
Link to 3Cloud FHIR
Link to Azure Health Services

Data on Wheels – Kristyna Ferris & Steve Hughes

Category: SQL Saturday

SQL Saturday Atlanta 2024

My session – Real-Time Analytics in Fabric

Introduction to SQL Server Essential Concepts by Bradley Ball

Lessons Learned using Fabric Data Factory dataflow by Belinda Allen

DBA Back to Basics: Getting Started with Performance Tuning by John Sterrett

Derby City Data Days

Building Self-Service Data Models in Power BI by John Ecken

The PIVOT() TSQL Operations by Jeff Foushee

My Session – Time for Power BI To Git CI/CD

Medallion Architecture for Fabric by Steve Hughes

SQL Saturday ATL – BI Recap

SQL Saturday Atlanta 2023 Recap

Attended Session Takeaways

My Session

Effectively Integrating FHIR Data from Azure Health Services

What is FHIR and why should you care?

Azure Health Services and the FHIR API

Working with the data from the FHIR API using JSON in SQL

Building our view and SQL with JSON functions

Wrapping it up

My session – Real-Time Analytics in Fabric

Introduction to SQL Server Essential Concepts by Bradley Ball

Lessons Learned using Fabric Data Factory dataflow by Belinda Allen

DBA Back to Basics: Getting Started with Performance Tuning by John Sterrett

Share this:

Building Self-Service Data Models in Power BI by John Ecken

The PIVOT() TSQL Operations by Jeff Foushee

My Session – Time for Power BI To Git CI/CD

Medallion Architecture for Fabric by Steve Hughes

Share this:

Share this:

Attended Session Takeaways

My Session

Share this:

What is FHIR and why should you care?

Azure Health Services and the FHIR API

Working with the data from the FHIR API using JSON in SQL

Building our view and SQL with JSON functions

Wrapping it up

Share this: