Power BI: Refreshable Excel Files in PBI Service

On May 14, 2024May 13, 2024 By Kristyna FerrisIn Excel, General, Power BI3 Comments

Ever since Excel made its debut in the 1980’s, it has been used as a quick way for end users to input and manipulate data on their own without going through the extensive data engineering and data ingestion processes. With Power BI coming on to the scene in 2015, it quickly became the go-to visualization tool for various data sources. These two powerful tools can be used together to drive customized insights for your organization. By uploading your Excel file into SharePoint/OneDrive, you can easily connect and set up a refresh to a Power BI report in the Power BI Service without an on-premises gateway.

The benefit of this method is that you can continue to update the Excel workbook from SharePoint/OneDrive and Power BI will pick up those data updates whenever the semantic model is refreshed. Another benefit to this method is that your laptop doesn’t need to be on for the refresh to work. If you have the Excel file on your machine and decide to use a gateway, the only way your refresh will work is if the computer that has the gateway installed is on. Also, by moving your Excel file to the Cloud, you get the benefit of easy sharing, automatic backups, and the ability to edit your file from any computer.

Alright, so now that I’ve sold you on storing your files more securely in the cloud, let’s discuss connecting to that data in Power BI desktop. Because the file is no longer on your machine, we can’t use the Excel workbook connector. Instead, we are going to use the web connector. But first, let’s start by opening our Excel file from OneDrive. There are two options for this, the first is to sync your OneDrive to your computer (highly recommend this option) and the second is to open the Excel file in your browser then navigate to open it in your desktop. If you’d like to sync your OneDrive to your laptop, go to the Settings gear in the upper right-hand corner and select “Sync this OneDrive”.

Now we can open the file in the Excel desktop app! Once you open the Excel file on your desktop, navigate to the File tab and go down to the Info section. Select the Copy Path button.

Alrighty now let’s get back to Power BI. In Power BI, select Get Data drop down in the top ribbon and click on Web.

Paste in the link we copied from Excel and delete the “?web=1” at the end of the URL.

And boom that’s it! Now you can choose what datasheets and/or tables from that Excel workbook you’d like to include in your report!

Okay I lied, one more step but it’s easy I promise. Once you publish your report to the online portal, you’ll need to set up the refresh. To set up the refresh, navigate to the workspace and select the three dots next to the semantic model you published. From there, navigate to the Settings page.

Now for the actual last step, select the Data source credentials drop down and select Edit Credentials. Finally, click on Sign In and sign in using your OneDrive credentials. And boom that’s it! Nice work!

Happy coding friends! 🙂

Dynamically Unpivot columns in SQL

On May 10, 2024 By Kristyna FerrisIn General, sql server2 Comments

Picture this, your data ingestion team has created a table that has the sales for each month year split into different columns. At first glance, you may think “what’s the big deal? Should be pretty easy, right? All I need to do is unpivot these columns in Power BI and I’m good to go.” So you go that route, and the report works for one month. Next month, you get an urgent email from your stakeholders saying they can’t see this month’s numbers. That’s when you realize that this table will grow with new columns every month. That means that any report you make needs a schema refresh every single month. Unfortunately, Power BI will not grab new columns from a table once it’s published into the online service. The only way for the Power Query to pivot the new columns is for you to open the report in your desktop, go to Power Query, and refresh the preview to get all the columns in that table.

If you’re like me, you may not have time (or the will) to manually download, fix, and reupload this report each month. But have no fear! SQL is here! The unpivot function in SQL can feel like a black box, designed for only black belts in SQL to use, but we will break it down together and create a dynamic unpivot script.

What does unpivot in SQL accomplish? Unpivot takes multiple rows and converts them to columns. The general syntax is as follows:

SELECT 
originalColumn1, originalColumnHeaders, originalValues
FROM 
(SELECT originalColumn1, originalColumn2, originialColumn3, originalColumn4 FROM dbo.Table) T 
UNPIVOT (originalColumnHeaders FOR originalValues in (originalColumn2, originialColumn3, originalColumn4)) as UP

Let’s break this down.

The first SELECT statement contains the column we don’t want to unpivot plus the two names of what we want our final columns to be. For example, using our table below, we want to keep the country column as is, but we want to unpivot those sales months and the sales values. So, we may name the originalColumnHeaders as SalesMonth and originalValues as Sales. You can use whatever you’d like, but it has to match what’s in the UNPIVOT statement.

The SELECT after FROM tells the query where to get data from. By default, SQL requires this to be hard-coded, but don’t worry, we have a way for that to be dynamic.

The UNPIVOT statement requires us to provide two new column names, the first will be what you want the column full of the old column headers to be called. In our case, this will be SalesMonth. The second will be the column full of the old values within those columns (aka Sales).

The hard-coded version of our example would look like this:

SELECT 
country, SalesMonth, Sales
FROM 
(SELECT country, sales_jan_2024, sales_feb_2024, sales_mar_2024, sales_apr_2024, sales_may_2024 FROM dbo.Table) T 
UNPIVOT (SalesMonth FOR Sales in (sales_jan_2024, sales_feb_2024, sales_mar_2024, sales_apr_2024, sales_may_2024)) as UP

The result of this code looks pretty good! Now we can easily pull the month and year out from SalesMonth and have some data we can trend. But we still have that pesky hard coded problem.

To get around the hard coding, we have to do some SQL jujitsu. We will use a couple of variables that can be dynamically populated then execute the SQL as a variable using sp_ExecuteQueries. This is a bit easier to explain with in-code notes, so feel free to look through this query and read the notes that go along with it for details on how it works.


--Use this script to create the table for the demo
/* --Comment this line out to run the create table code
DROP TABLE IF EXISTS dbo.PivotedSales
CREATE TABLE dbo.PivotedSales (
	country VARCHAR(250),
	sales_jan_2024 DECIMAL(17,3),
	sales_feb_2024 DECIMAL(17,3),
	sales_mar_2024 DECIMAL(17,3),
	sales_apr_2024 DECIMAL(17,3),
	sales_may_2024 DECIMAL(17,3)
);
INSERT INTO dbo.PivotedSales (country, sales_jan_2024, sales_feb_2024, sales_mar_2024, sales_apr_2024, sales_may_2024)
VALUES 
	('South Africa', 111.22, 222.33, 333.44, 444.55, 555.66),
	('Canada', 112.22, 227.33, 332.44, 400.55, 500.66),
	('United States', 113.22, 228.33, 330.44, 401.55, 501.66),
	('Mexico', 114.22, 229.33, 334.44, 404.55, 504.66),
	('Ireland', 115.22, 230.33, 335.44, 409.55, 509.66),
	('Germany', 116.22, 231.33, 336.44, 499.55, 599.66),
	('South Africa', 1011.22, 2022.33, 3303.44, 4044.55, 5505.66),
	('Canada', 1102.22, 2027.33, 3302.44, 4000.55, 5000.66),
	('United States', 1103.22, 2280.33, 3030.44, 4001.55, 5001.66),
	('Mexico', 1104.22, 2209.33, 3034.44, 4004.55, 5004.66),
	('Ireland', 1105.22, 2300.33, 3305.44, 4009.55, 5009.66),
	('Germany', 1106.22, 2310.33, 3036.44, 4909.55, 5909.66);

--*/

--Original table
SELECT * FROM PivotedSales


--Set up parameters
;DECLARE @UnpivotColumns NVARCHAR(MAX), @FilterPefix NVARCHAR(MAX) 
-- FilterPrefix is the prefix that all the columns we want to pivot have. This will allow us to dynamically grab any new columns as they get created
SET @FilterPefix = 'sales%' --Note, you can adjust this to be a suffix or a middle string if needed by moving the % (wildcard)


--This section sets our @Unpivot column variable to be a comma separated list of the columns in our table with the FilterPrefix
SELECT @UnpivotColumns = STRING_AGG(CONVERT(NVARCHAR(MAX), C.name),',')
FROM sys.columns C 
INNER JOIN sys.types T ON T.user_type_id = C.user_type_id
WHERE C.object_id = object_id('dbo.PivotedSales') --this ensures we only get columns from our table
	AND C.name LIKE ''+@FilterPefix+'' --this makes sure only columns with the filter prefix are returned
	AND (T.name = 'decimal' OR T.name LIKE '%int%') --this ensures we only grab columns with a decimal or int type. You can adjust this if needed

SELECT @UnpivotColumns AS 'Unpivot Columns'


--This section creates a dynamic SQL statement using the comma separated list we just generated
DECLARE @ExecuteSQL NVARCHAR(MAX)
SET @ExecuteSQL = '
SELECT Country, SalesMonth, Sales
FROM 
	(SELECT Country, ' + @UnpivotColumns + ' FROM dbo.PivotedSales) P
	UNPIVOT
	(Sales FOR SalesMonth IN (' + @UnpivotColumns + ')) as S'
SELECT @ExecuteSQL AS 'Dynamic SQL Script' --this will show you the SQL statement we've generated


--Finally, we will use the system stored proc sp_executesql to execute our dynamic sql script
EXECUTE sp_executesql @ExecuteSQL

Now that is a lot of code, so be sure to look at each chunk and make sure you adjust it for your use case. Happy coding folks!

Huge credit for this solution goes Colin Fitzgerald and Miranda Lochner! Thank you both for sharing this with me!

SQL Saturday Atlanta 2024

On April 30, 2024April 26, 2024 By Kristyna FerrisIn Conference Recaps, General, Microsoft Fabric, Microsoft SQL Server, Speaking, SQL Saturday

Hey there happy coders! Last weekend I had the pleasure of speaking at the SQL Saturday Atlanta event in Georgia! It was an awesome time of seeing data friends and getting to make some new friends. If you live near a SQL Saturday event and are looking for a great way to learn new things, I can’t recommend SQL Saturday’s enough. They are free and an excellent way to meet people who can help you face challenges in a new way. Below are my notes from various sessions attended as well as the materials from my own session. Enjoy!

Link to the event – https://sqlsaturday.com/2024-04-20-sqlsaturday1072/#schedule

My session – Real-Time Analytics in Fabric

Thank you so much to everyone who came out to see my first ever session on Real-Time Analytics in Fabric! We had a couple of glitches in the Logic App creation, but had an excellent time troubleshooting together as a group. Please check out my GitHub for the slide deck as well as all the code used in the demonstration. Happy coding!

Link to my GitHub – https://github.com/Anytsirk12/DataOnWheels/tree/main/Real-Time%20Analytics%20in%20Fabric

Introduction to SQL Server Essential Concepts by Bradley Ball

Goal is to understand the core things about databases to enable deeper understanding.

ACID = atomicity, consistency, isolation, and durability.
Atomicity = either all of it commits or none of it commits. Consistency = my data must be in a consistent state before a transaction completes. Isolation = transaction must operate independently from other transactions. Durability = complex logging that is the transaction log. The log has all the commit history and ensures accurate data.

Transaction isolation levels – serializable, read committed (SQL server default), read uncommitted, repeatable read, snapshot isolation. You can set some of these at the db level. Serializable = blocks anything trying to get data on the same page, you can set this at the transaction level. Read committed = I can’t read data if a transaction is currently occurring. Read uncommitted = a dirty read, grabs data that isn’t committed. Repeatable read = nobody uses this lol. It’s a shared lock that holds for a longer period of time than the typical micro second. Shared lock means everyone can read the data. Snapshot isolation = Oracle and postgres use this. Every thing is an append and update. If I have 4 transactions, 1 update and 3 read, usually update would block reads, but this would redirect readers to a copy of the data in the TempDB at the point in time they requested to read it (aka before the update commits).

DMV = dynamic management view. Bradley used one that allows you to see active sessions in your database. We can see that a read query is blocked by an uncommitted transaction. We can see the wait_type = LCK_M_S and the blocking_session_id which is our uncommitted transaction. To get around this, he can run the read script and get a dirty read by setting the isolation level to read uncommitted. To unblock the original request, he can use ROLLBACK TRANSACTION to allow it to unlock that page of data.

How does SQL Server work on the inside? We have a relational engine and a storage engine. User interacts with a SNI which translates the request to the relational engine. User > SNI > Relational Engine [command parser > optimizer (if not in planned cache otherwise goes straight to storage engine) > query executer] > Storage Engine [access methods (knows where all data is) > buffer manager (checks the data cache but if not found then goes to the disk and pulls that into the buffer pool data cache). This gets extremely complicated for other processes like in-memory OLTP. The SQL OS is what orchestrates all these items.

SQL OS – pre-emptive scheduling (operating system) & cooperative pre-emptive scheduling (accumulates wait stats to identify why something is running slower).

Locks, latches, waits. Locks are like a stop light (row, page, and table escalation). If you lock a row, it will lock a page. Latches are who watches the locks/watchmen. It’s a lock for locks. Waits are cooperative scheduling. If a query takes too long, it will give up it’s place in line willingly. That creates a signal wait which signals there’s too much lined up.

SQL data hierarchy. Records are a row. Records are on a data page (8 k). Extents are 8 pages (64 k). It’s faster to read extents than pages. Allocation bit maps are 1s and 0s that signify data on a data page that enables even faster data reads – allows governing on 400 GB of data on 1 8KB page. IAM chains and allocation units allows quick navigation of pages. Every table is divided into in row data, row overflow data (larger than 8064 k), and lob data (large object like VARCHAR max and images).

Allocation units are made of 3 types:
1. IN_ROW_DATA (also known as HoBTs or Heap or B-Trees)
2. LOB_DATA (also known as LOBs or large object data)
3. ROW_OVERFLOW_DATA (also known as SLOBs, small large object data)

Heaps vs Tables. Oracle stores data as a heap which is super fast to insert. In SQL, these have bad performance due to clustered indexes and inserting new data. This is very situational. A table is either heap or clustered index, cannot be both. But heaps can have non-clustered indexes.

B-Tree allows you to get to the record with less reads by following a logic tree (think h is before j so we don’t need to read records after j). Heaps create a 100% table scan without a clustered index. Adding the clustered index dropped that significantly to only 1 read instead of the 8000 reads.

Recovery models – full, bulk logged, simple (on-prem). In the cloud everything is full by default. Full means everything is backed up. Bulk means you can’t recover the data but you can rerun the input process. Simple means you can get a snapshot but you can’t do any point in time restore. This will largely be determined by any SLAs you have.

Transaction log. This will constantly be overwritten. Your log should be at least 2.5 as large as your largest cluster. DBCC SQLPERF(logspace) will get you all the space available for logs in the various dbs. Selecting from the log is always not recommended since it creates a lock and logs are always running, so don’t do this in prod lol. Rebuilding indexes will grow your transaction log massively. To free up space in the transaction log, you have to a backup log operation which is why those are super important.

Fun tip, when creating a table you can put DEFAULT ‘some value’ at the end of a column name to provide it a default value if one is not provided. Pretty cool.

You can use file group or piecemeal restores to restore hot data much faster then go back and restore older, cold data afterward. To restore, you must have zero locks on the db. While restoring, the database is not online. Note, if you do a file group restore, you cannot query data that is in a unrestored file group so queries like SELECT * will not work.

Tales from the field has a ton of YouTube videos on these subjects as well.

Lessons Learned using Fabric Data Factory dataflow by Belinda Allen

What are dataflows? Dataflows are a low-code interface tool for ingesting data from hundreds of data sources, transforming your data using 300+ data transformations. The goal is to allow for more people to manipulate and use data within your organization. At the heart, it’s Power Query.

Why start with dataflows as a citizen developer? It’s power query and you know that. It’s low-code data transformation. Excellent for migrating Power BI reports to Fabric.

Lots of great discussion about when it makes sense to use a dataflow gen2.

You can copy and paste power query from Power BI by going into the advanced editor OR you can hold shift and select all the queries you want then ctrl c then go to power query for a dataflow gen2 in the online service and hit ctrl v and it will populate with all your tables! Pretty neat. You can also make your relationships within the online portal.

DBA Back to Basics: Getting Started with Performance Tuning by John Sterrett

For the code visit: https://johnsterrett.com/blog-series/sql-server-performance-root-cause-analysis/

Goal of today’s session – arm anyone who is new to performance tuning with processes and sills to solve common problems.

Basic query runtime. SQL has something called wait stats that tells you what caused the query to be slow. When you run a massive query, it will go into a suspended state which will require reading from disc instead of from memory cache (T1). After that, you’re in a runable state (T2). Finally, you get to run it (T3).

Basic bottlenecks = memory, disk, CPU, network, locking blocking & deadlocks. Adding memory is typically the fastest way to improve performance.

Identify performance problems happening right now:

EXEC sp_whoisactive. This is an open source script that gives you insight into who’s running what right now. You can get this from https://whoisactive.com. The cool thing about this is there are more ways to run it than just EXEC sp_whoisactive. Identify what’s consuming the most CPU from the column. There’s also some parameters you can use like @sort_order. EXEC sp_whoIsActive @sort_order = ‘[CPU] DESC’, @get_task_info = 2. The task info parameter will give more information in a wait_info column. The best command is exec sp_whoIsActive @help = 1. This provides ALL the documentation on what it does. Adam (the creator) also has a 30 day blog series on everything it can do for you! One option to make things run faster is to kill the process causing the issue lol.

How to handle blocking. You can do explicit transactions with BEGIN TRANSACTION which will lock the table. At the end, you need to either COMMIT or ROLLBACK or else that lock holds. SQL uses pessimistic locking as default so it won’t let you read data that’s locked – it will simply wait and spin until that lock is removed. You can use exec sp_whoisactive @get_plans = 1 to get the execution plan. Be careful, the wait_info can be deceptive since the thing that takes the most time may not be the problem. It may be blocked by something else, check the blocking_session_id to ve sure. Also check the status and open_tran_count to see if something is sleeping and not committed. Keep in mind that the sql_text will only show you the last thing that ran in that session. SO if you run a select in the same session (query window) as the original update script, it won’t be blocked and can run and THAT query will show up in the who is active which can be super confusing. To resolve this issue, you can use ROLLBACK in that session to drop that UPDATE statement.

To find blocking queries use EXEC sp_whoIsActive @find_block_leaders = 1, @sort_order = ‘[blocked_session_count] DESC’.

Identifying top offenders over the long term:

There’s a feature in SQL 2016 forward called Query Store which persists data for you even after you restart data. It’s essentially a black box for SQL. Query Store is on by default in SQL 2022 and online servers. It’s available for express edition as well. Be sure to triple check this is on, because if you migrated servers it will keep the original settings from the old server. If you right click on the DB, you can navigate to query store and turn it on via Operation Mode (Requested) to Read write. Out of the box is pretty good, but you can adjust how often it refreshes and for how much history. To see if it’s enabled, you should see Query Store as a folder under the db in SSMS.

Under query store, you can select Top Resource Consuming queries. There’s lots of configuration options including time interval and what metric. SQL Server 2017 and newer have a Query Wait Statistics report as well to see what was causing pain. It’ll show you what queries were running in the blocking session. You won’t get who ran the query from query store, but you can write sp_whoisactive to a table that automatically loops (very cool). This will have overhead on top of your db, so be mindful of that.

Intro to execution plans:

Keep in mind, SQL’s goal is to get you a “good enough” plan, not necessarily the best plan. Follow the thick lines. That’s where things are happening. Cost will tell you the percentage of the total time taken.

Key lookups. It’s a fancy way to say you have an index, so we can skip the table and go straight to the data you have indexed. BUT if there’s a nest loop, then there’s an additional columns in the select statement so it’s doing that key lookup for every value. More indexes can make your select statements worse if it’s using the wrong index that isn’t best for your query.

Index tuning process.
1. Identify tables in query
2. Identify columns being selected
3. Identify filters (JOIN and WHERE)
4. Find total rows for each table in the query
5. Find selectivity (rows with filter/table rows)
6. Enable statistics io, time, and the actual execution plan
7. Run the query and document your findings
8. Review existed indexes for filters and columns selected
9. Add index for lowest selectivity adding the selected columns as included columns
10. Run the query again and document findings
11. Compare findings with baseline (step 7)
12. Repeat last 5 steps as needed

To see existing indexes, you can run sp_help ‘tableName’. In the example, there’s an index key on OnlineSalesKey but that field is not used in our filter context (joins and where statements) in the query. Order of fields in indexes do matter because it looks in that order.

Brent Ozar made a SP you can use called sp_blitzIndex that will give you a ton of info on an index for a table including stats, usage, and compression. It also includes Create TSQL and Drop TSQL for that index to alter the table.

To turn on stats, use SET STATISTICS IO, TIME ON at the beginning of the query. Be sure to also include the actual execution plan (estimated doesn’t always match what actually happened). Now we can benchmark. Use SET STATISTICS IO OFF and SET STATISTICS TIME OFF. Create an non clustered index with our filter context columns.

First Time Speaking with a Mask at an Event

On April 19, 2024 By SteveIn Microsoft Fabric, Work Enablement

Over the past couple of months, I have started losing my ability to talk without a mask. One of the effects of a disease, ALS, is that I am losing the ability to breathe without support which also impacts my ability to talk without my ventilator. This past weekend I did my first speaking engagement at Derby City Data Days in Louisville, KY. As the saying goes, “It takes a village.”

As many of you know, I have been producing a video series called Fabric 5 which contains five minutes snippets about various Microsoft Fabric architectural topics. I was going to use some of that content to create a slide deck about medallion architectures in Fabric. As the day got closer, I was concerned I would not be able to maintain the ability to speak during an entire presentation in my condition. My wife and I thought that we could use my videos for the presentation, so we recruited my son, Alex, who does video editing, to put together the videos for the presentation. That only left the intro slides and the ending slides to be covered by me or someone else.

My daughter, Kristyna, was presenting right before me in the same room. She and my wife managed most of the introduction including the sponsor slides and the user groups. I was able to kick off with a microphone that was provided by the venue through Redgate. Then we kicked off the video and took questions during the session. I was able to answer questions with the help of my wife and using the mic provided in the room. We wrapped up with a final Q&A and a couple of references to help people in Fabric.

A quick video from the session

Overall, this was a wonderful experience for all of us and I appreciated the patience of everybody in the room as we worked through this process for the first time. Here are the Fabric 5 videos that were used during the presentation so you can follow up with it later and references are down below from the end of the presentation. I would like to thank the organizers – John Morehouse, Josh Higginbotham, Matt Gordon – once again for the support and the help as I presented masked up for the first time!

Fabric 5 Videos Used in the Presentation

Fabric 5 – Introduction to the Series and OneLake

Fabric 5 – Why Capacity Matters

Fabric 5 – Medallion Lakehouses

Fabric 5 – Medallion in a Lakehouse

Fabric 5 – Medallion Workspaces

Fabric 5 – Medallion with Compute Workspace

Fabric 5 – Centralized Data Lake

Fabric 5 – Dimensions and Master Data

Fabric 5 – Workspace Sources

Fabric 5 – ADF to Bronze

References

Microsoft Fabric Career Hub Page

Data On Wheels YouTube Channel – Fabric 5 Playlist

After Party Fun – Stayin’ Alive

During the after party my ventilator ran out of power and my car charger did not work. This presented a problem because the trip home was over an hour. When we went out to the car, we had a moment of panic because the ventilator ran completely out of power. We went back to the bar where we were at for the after party to plug in, but the charger wouldn’t work. I was without the ventilator for 15 to 20 minutes while they troubleshot the issue. We started to think that they may have to call 911! However, this issue was resolved due to a loose connection. I must thank all of individuals there that helped including John Morehouse who went to get a battery backup system to make sure I could make it home and to my son-in-law who went to our house to bring us our backup battery generator as well to help us get home.

Working with ALS is not always easy and all the help from everyone around me is genuinely appreciated!

Derby City Data Days

On April 15, 2024April 14, 2024 By Kristyna FerrisIn General, Microsoft Fabric, Microsoft SQL Server, Power BI, Speaking, SQL Saturday

It was awesome to see the Kentucky data community come out for the first Derby City Data Days in Louisville, KY! Bringing together communities from Ohio, Tennessee, and Kentucky, the Derby City Data Days event was an excellent follow-up to Data Tune in March and deepened relationships and networks made at the Nashville event. In case you missed it, below are my notes from the sessions I attended as well as the resources for my session. Be sure to check out these speakers if you see them speaking at a conference near you!

Schedule

Building Self-Service Data Models in Power BI by John Ecken

What and why: we need to get out of the way of business insights. If you try to build one size fits all, it fits none. Make sure you keep your data models simple and streamlined.

Security is paramount to a self-service data model. RLS is a great option so folks only see their own data. You can provide access to the underlying data model for read and build which enables them to create their own reports off the data they have access to. If you give your user contributor access, then RLS will go away for that user. Keep in mind, business users need pro license OR need to be in a premium workspace.

One really great option is for people to use the analyze in Excel option to interact with the most popular BI tool – Excel. This allows them to build pivot tables that can refresh whenever needed. You can also directly connect to Power BI datasets from their organization! You can set up the display field option as well to get information about the record you connect to. Pretty slick! With this, security still applies from RLS which is awesome.

Data modeling basics – clean up your model by hiding or removing unnecessary columns (ie sorting columns). Relationships matter. Configure your data types intentionally. Appropriate naming is vital to business user success. Keep in mind where to do your transformations – SQL vs DAX (think Roche’s Maxum). Be sure to default your aggregations logically as well (year shouldn’t be summed up).

Power BI Measures – creations, quick create, measure context, time-based functions. Whenever possible, make explicit measures (using DAX) and hide the column that it was created off of so people utilize the measure you intended. Make sure you add descriptions, synonyms (for Copilot and QA), featured tables, and folders of measures. The functionality of featured tables makes it wise to use folders of measures within your fact tables.

John likes to use LOOKUP to pull dims back into the fact table so he ends up with as few tables as possible. There are drawbacks to this such as slower performance and model bloat, but the goal is for end users who don’t have data modeling experience or understanding. Not sure I agree with this method since it’s not scalable at all and destroys the purpose of a data model. Make sure you hide columns you don’t want end users to interact with.

To turn on feature table, go to the model view then go to Properties pane and toggle that is featured table button. It will require a description, the label that will populate, and the key column (cannot be hidden) that the user will put in excel as a reference for the business user to call records off of.

The PIVOT() TSQL Operations by Jeff Foushee

GitHub: https://github.com/jbfoushee/MyPresentations/tree/main/TSQL_Pivot_Operators

Be sure to look at his GitHub for the awesome source code!

Come to Lousiville on May 9th to see a presentation on JSON and TSQL.

The goal of this is to avoid FULL OUTER JOINs. This is extremely unscalable since maintenance would be terrible. We will avoid this by using pivot. Pivot means less rows, more columns. PIVOT promotes data to column headers.

You get to decide how the tuple that’s created on the pivot is aggregated (count, min, max, sum, avg, etc.). Exactly one aggregate can be applied, one column can be aggregates, and one columns values can be promoted into the column header.

PIVOT ( SUM(Col1) FOR [ID] IN ([ID_value_1], [ID_value_2], etc.)
SUM = the aggregate, ID = the column that will become more columns, the IN values = the column values from ID that will be promoted into the column header.

Time for a 3 column pivot. For this, we are doing a two column pivot and ignoring one of the fields. You can even pivot on computed fields but make sure you include the values in that inclusion clause. Be careful about adding unnecessary data.

How do you manage the VTCs (the column values that end up as column headers)? Option 1 – don’t. Option 2 – explicitly request the ones of interest and provision for future extras. Option 3 – use dynamic SQL! You can use cursor, XML, etc. Check out his ppt deck from github for code samples!

An n-column PIVOT works by essentially creating a 2-column pivot (at the end of the day it’s only two columns that ever get pivoted) and knowing which you want split into new columns.

Ugly side of PIVOT = lookups. The more fields you need to add in from additional tables, the worse performance will be. Your best option there would be to do a group by, the pivot. Another limitation is you can’t use a function in your pivot aggregation (SUM() vs SUM() *10). Get your raw data clean then pivot.

Time for UNPIVOT!

Unpivot = convert horizontal data to vertical. Less columns, more rows. Unpivot demotes column headers back into data.

Be very very careful with your data type your about to create. Remember lowest common denominator, all the values must be able to fit in one common datatype without overflow, truncation, or collation.

UNPIVOT( newColPropertyValue FOR newColPropertyName IN ([originalCol1], [originalCol2],etc.)

You need to make sure all your original columns have the same datatype. NULLs get automatically dropped out. If they are needed, you can convert them using an ISNULL function to a string value or int value depending on your need.

There’s also an option for XML-based unpivot.

Cross Apply = acquires a subset of data for each row found by the outer query. Behaves like an inner join, if no subset is found then the outer row disappears from the result set. Outer Apply is similar but it’s more like a left join. Cross Apply does keep your NULL values. You can also use a STRING_SPLIT with cross apply.

Multi-Unpivot normalizes hard-core denormalized data. You just add more UNPIVOT lines after your initial FROM statement! Make sure you have a WHERE statement to drop any records that don’t align at a column level. Something like WHERE LEFT(element1,8) = LEFT(element2, 8).

My Session – Time for Power BI To Git CI/CD

Thanks to everyone that attended my session! Had some great questions and conversations! Here’s the link to my github with the slide deck from today: https://github.com/Anytsirk12/DataOnWheels/tree/main/Power%20BI%20CICD

Medallion Architecture for Fabric by Steve Hughes

This session was awesome! We were able to watch a series of Fabric 5 minute videos and had an amazing discussion between them about options for building out a Fabric infrastructure using medallion techniques. Check out Steve’s YouTube channel for his Fabric 5 playlist and to learn more about his experience working with ALS.

Share this:

Share this:

My session – Real-Time Analytics in Fabric

Introduction to SQL Server Essential Concepts by Bradley Ball

Lessons Learned using Fabric Data Factory dataflow by Belinda Allen

DBA Back to Basics: Getting Started with Performance Tuning by John Sterrett

Share this:

Fabric 5 Videos Used in the Presentation

References

After Party Fun – Stayin’ Alive

Share this:

Building Self-Service Data Models in Power BI by John Ecken

The PIVOT() TSQL Operations by Jeff Foushee

My Session – Time for Power BI To Git CI/CD

Medallion Architecture for Fabric by Steve Hughes

Share this: