SQLBits 2024 Recap

Had an absolutely amazing time at SQLBits this year! It was lovely to see all my data friends again and had the opportunity to introduce my husband to everyone as well! In case you missed a session, or are curious about what I learn at these conferences, below are my notes from the various sessions I attended.

Here is a link to my GitHub that contains the slides and code from my presentation on creating a M-agical Date Table. Thank you to everyone who attended! https://github.com/Anytsirk12/DataOnWheels/tree/main/SQLBits%202024

What’s new in Power Query in Power BI, Fabric and Excel? by Chris Webb

Power query online is coming to PBI desktop and will include diagram & schema views, the query plan, and step folding indicators! Currently in private preview, coming to public preview later this year.

Power query in Excel. Nested data types now work (very cool, check this out if you don’t know what data types are in Excel). Python in excel can reference power query queries now which means we don’t have to load data into Excel! That will allow analysis of even more data because you can transform it upstream of the worksheet itself. Very cool.

Example: import seaborn as sns

mysales = xl(“Sales”)

sns.barplot(mysales, x=”Product”, y = “sales”, ci=None)

Limitations – can’t use data in the current worksheet.

Power query is in Excel online! Later this year, you will be able to connect to data sources that require authentication (like SQL Server!).

Power query can be exported as a template! This will create a .pqt (power query template). You can’t import a template file in Excel, but you can in Dataflows Gen2 in Fabric.

Fabric Dataflow Gen2 – data is not stored in the dataflow itself. We now have Power Query copilot! It’s in preview, but it’s coming. You will need a F64 to use it.

Paginated reports are getting power query!! It will be released to public preview very soon. All data sources will be available that are in power bi power query. This will allow snowflake and big query connections annnddd excel! You can even pass parameters to the power query get data by creating parameters within the power query editor. Then you go to the paginated report parameters and create the parameter there. Finally go to dataset properties and add a parameter statement that sets those two equal to each other.

Rapid PBI Dev with ChatGPT, Tips & Tricks by Pedro Reis

Scenario 1: DAX with SVG & CSS.
SVG – Problem is to create a SVG that can have conditional formatting that doesn’t fill the entire cell. ChatGPT does best with some working code, so give it a sample of the SVG you want and as ChatGPT to make it dynamic. Demo uses GPT4 so we can include images if needed. He uses Kerry’s repo to get the code sample that we put into GPT (KerryKolosko.com/portfolio/progress-callout/).
CSS – uses HTML Content (lite) visual (it’s a certified visual). Prompt asks gpt to update the measure to increase and decrease the size of the visual every 2 seconds. Pretty awesome looking. He’s also asking it to move certain directions for certain soccer players (Ronaldo right and Neymar zigzag).

Scenario 2: Data Modeling and analysis. He uses Vertabelo to visualize a datamodel and it creates a SQL script to generate the model. He passes the screenshot of the datamodel schema to ChatGPT and it was able to identify that a PK was missing and a data type was incorrect (int for company name instead of varchar). Super neat. The image does need to be a good resolution. He also put data from excel into GPT to have it identify issues like blanks and data quality issues (mismatched data types in the same column). It could grab invalid entries and some outliers but it’s not very reliable now.

Scenario 3: Report Design Assessment. He grabbed four screenshots from a Power BI challenge and asked GPT to judge the submissions with a 1-10 rating based on the screenshots and create the criterion to evaluate the reports. Very neat! Then he asked it to suggest improvements. It wasn’t very specific at first so he asked for specfic items like alternatives to donut charts and other issues that lowered the score.

Scenario 4: troubleshooting. He asks it to act as a power bi expert. It was able to take questions he found on the forum, but sometimes it hallucinate some answers and it’s not trained on visual calculations yet.

Scenario 5: Knowledge improvement. He asked GPT to create questions to test his knowledge of Power BI models. It even confirmed what the correct answers. It even created a word doc for a shareable document that can be used to test others in the future.

Streamlining Power BI with Powershell by Sander Stad and Linda Torrang

Data-Masterminds/PSPowerBITools is an open source code that is in the powershell gallery as well as within GitHub. Github link: https://github.com/Data-Masterminds/PSPowerBITools.

There’s a module called MIcrosoftPowerBIMgmt that is used with a service account to wrap around the rest api and allow easier calls of those APIs through powershell. When running PowerShell, to see the things in a grid use the code ” | Out-GridView” at the end of your command and it creates a very nice view of the data. All demos are available at github. The first demo is to grab all the orphaned workspaces then assign an admin. Their module includes more information like the number of users, reports, datasets, etc. They also have a much easier module that will grab all the activity events from a given number of days (it will handle the loop for you).

Another great use case will be moving workspaces from a P SKU capacity to a F SKU capacity automatically. The admin management module doesn’t work for this, but the PSPowerBITools module can! To see help – use “get-help” in front of the command to get details and examples. For more details, add “-Detailed” at the end to see that. Adding “-ShowWindow” at the end of a script will pop open a separate window to see the response (note- doesn’t work with -Detailed but does contain the details). To export use this script: $workspaces | Export-Csv -NoVolbber -NoTypeInformation -Path C:\Temp\export_workspaces.csv.

The goal is to use the module for more ad hoc admin tasks like remove unused reports, personal workspaces, data sources, and licenses.

New to Powershell like me? Check out this article on how to automate Powershell scripts to run on your laptop: https://www.technewstoday.com/how-to-automate-powershell-scripts/.

Using ALLEXCEPT or ALL-VALUES by Marco Russo

ALL: removes all the filters from expanded table/columns specified.
REMOVEFILTERS: like ALL (it’s the same function). We use this for code readability.
ALLEXCEPT: removes filters from expanded table in the first argument, keeping the filter in the following table/columns arguments.

The following produce the same result:
ALLEXCEPT(customer, customer[state], customer[country])
REMOVEFILTERS(customer[customerkey], customer[name], customer[city])
The nice thing about ALLEXCEPT is that it’s future proof against additional fields being added to the table.

But is it the correct tool?

Demo – goal is to get % of continent sales by getting country or state or city sales / total continent sales

To get that total continent sales, we need sales amount in a different filter context. The simplest way to do this is using ALLEXCEPT since we always only want the continent field from the customer table.
CALCULATE([Sales Amount], ALLEXCEPT( customer, customer[continent]))

This works if we include fields from the same table – customer. If we pull in fields from another table, it won’t work as we want because it will be filtered by a different table’s filter context. To get rid of that, you can add REMOVEFILTERS(‘table’) for each OR we can change the ALLEXPECT to use the fact table.
CALCULATE([Sales Amount], ALLEXCEPT( Sales, customer[continent]))

One problem is if a user drops the continent from the visual, the filter context no longer includes continent so the ALLEXCEPT will no longer work properly as it will function as an ALL.

To fix this, use remove filters and values. Using values adds in the continent value in the current filter context, then removes the filter context on customer. Lastly, it applies the value back into the filter context so we end up with what we want which is just continent is filtered. This works quickly because we know each country will only have one continent.

CALCULATE( [sales amount], REMOVEFILTERS( Customer, VALUES (customer[continent]))

Performance difference? Yes – ALLEXCEPT will be faster, but the REMOVEFILTERS and VALUES should not be too much slower.

Analytics at the speed of direct lake by Phillip Seamark and Patrick LeBlanc

Phil did a 100 minute version of this session on Wednesday.

What is direct lake? We will evaluate over three categories – time to import data, model size, and query speed. Direct lake mode takes care of the bad things from both import and direct query modes. Direct lake only works with one data source. Data is paged into the semantic model on demand triggered by query. Tables can have resident and non-resident columns. Column data can get evicted for many reasons. Direct Lake fallback to SQL server for suitable sub-queries. “Framing” for data consistency in Power BI reports. Direct lake model starts life with no data in memory. Data is only pulled in as triggered by the DAX query. Data stays in that model once pulled for hours. We only grab the columns we need which saves a ton of memory.

Limitations – no calculated columns or tables. No composite models. Calc groups & field parameters are allowed. Can’t be used with views, can only be used with security defined in the semantic model (you can have RLS on the semantic model itself). Not all data types are supported. If any of these are true, Power BI will fall back to Direct Query mode.

SKU requirements: only PBI premium P and F skus. No PPU nor pro license.

Why parquet? column-oriented format is optimised for data storage and retrieval. Efficient data compression and encoding for data in bulk. Parquet is also available in languages including rust, java, c++, python, etc. Is lingua franca for data storage format. Enhanced with v-order inside of Fabric which gives you extra compression. Analysis services can read and even prefers that form of compression so we don’t need to burn compute on the compression. V-Order is still within open source standards so this still doesn’t lock you into Fabric only. Microsoft has been working on V-Order since 2009 as part of Analysis services. Row group within parquet files corresponds directly to the partitions/segments in a semantic model. Power BI only keeps one dictionary, but parquet has a different dictionary id for each row group/partition. When calling direct query, low cardinality is key because most of the compute will go to the dictionary values that need to be called across the various partitions. This is why it’s VITAL to only keep data you need for example, drop or separate times from date times and limit decimals to 2 places whenever possible.

You can only use a warehouse OR a data lake. So if you need tables from both, make a shortcut from the data lake to the warehouse and just use the warehouse. You can only create a direct lake model in the online service model viewer or in Tabular editor. You cannot create it in PBI desktop today.

Microsoft data demo. One file was 880 GB in CSV > 268 GB in Parquet > 84 GB with Parquet with V-ORDER.

Sempty is a python library that allows you to interact with semantic models in python.

In the demo, Phil is using sempty and semantic-link libraries. Lots of advantages to using the online service for building the semantic model – no loading of data onto our machines which makes authoring much faster. After building this, you can run a DMV in DAX studio. Next, create a measure/visual that has a simple max of a column. That will create a query plane to reach out to parquet and get the entire column of data needed for that action then filter as needed. After that, the column will now be in memory. Unlike direct query, direct lake will load data into model and keep it for a bit until eviction. The direct lake will also evict pages of columns as needed when running other fields. You can run DAX queries against the model using python notebook and you can see via DMV what happened.

Framing. Framing is a point in time way of tracking what data can be queried by direct lake. Why is this important? data consistency for PBI reports, delta-lake data is transient for many reasons. ETL Process – ingest data to delta lake tables. What this does is creates one version of truth of the data at that point of time. You can do time traveling because of this, but keep in mind that for default models it frames pretty regularly. If you have a big important report to manage, you need to make sure to include reframing in the ETL process so it reloads the data properly. This can be triggered via notebook, service, api, etc. You can also do this in TMSL via SSMS which gives you more granular table control.

DAX to SQL Fallback. Each capacity has guard rails around it. if you have more than a certain number of rows in a table, then it will fall back to direct query automatically. Optimization of parquet files will be key to stop it from falling back.

Deep (Sky)diving into Data Factory in Microsoft Fabric by Jeroen Luitwieler

Data Pipeline Performance – How to copy scales.

Pipeline processing – less memory & less total duration. No need to load everything into memory then right. Read and Write are done in parallel.

Producer and consumer design. Data is partitioned for multiple concurrent connections (even for single large files). Full node utlization. Data can be partitioned differently between source and sink to avoid any starving/idle connections.

Partitions come from a variety of sources including physical partitions on the DB side, dynamic partitions from different queries, multiple files, and multiple parts from a single file.

Copy Parallelism concepts: ITO (Intelligent Throughput Optimization), Parallel copy, max concurrent connections. ITO – A measure that represents the power used which is a combo of CPU, memory, and network resource allocation used for a single copy activity. Parallel copy – max number of threads within the copy activity that read from source and write to sink in parallel. Max concurrent connections – the upper limit of concurrent connections established to the data store during the activity run (helps avoid throttling).

Metadata-driven copy activity. Build large-scale copy pipelines with metadata-driven approach in copy inside Fabric Data pipeline. You can’t parameterize pipelines yet, but it is coming. Use case is to copy data from multiple SQL tables and land as CSV files in Fabric. Control Table will have the name of the objects (database, schema, table name, destination file name, etc). Design has a lookup to get this control table and pass that through a foreach loop and iterate through these to copy the tables into Fabric.

Dataflows Gen2 – performance optimization and AI infused experiences. 4 performance principles – delegate to the most capable resource, sometimes you have to do the most expensive thing first, divide and conquer, be lazy do as little work as possible.

Query folding – known as query delegation, push down, remote/distributed query evaluation. Wherever possible, the script in Power Query Editor should be translated to a native query. The native query is then executed by the underlying data source.

Staging – load data into Fabric storage (staging lakehouse) as a first step. The staged data can be referenced by downstream queries that benefit from SQL compute over the staged data. The staging data is referenced using the SQL endpoint on the lakehouse. Staging tips – data sources without query folding like files are great candidates for staging. For faster authoring, have a different dataflow for staging and another for a transformations.

Partitioning – a way to break down a big query into smaller pieces and run in parallel. Fast copy does this in the background.

Lazy evaluation – power query only evaluates and executes the necessary steps to get the final output. It checks step dependencies then applies query optimization to make the query as efficient as possible.

AI infused experiences – column by example, table by example, fuzzy merge, data profiling, column pair suggestions. Table by example web is pretty awesome. You can extract any data from any HTML page and have it figure out how to turn that HTML into the table you want to see. Basically allows you to screenscrape as part of your ETL process and it will refresh if any changes are done on the website. Table by example for a text/csv is also amazing, it can try to figure out how to take a human readable file and split it into multiple columns. Game changers!

Website Analytics in my pocket using Microsoft Fabric by Catherine Wilhelmsen

Catherine built her own website and blog hosted on Cloudflare. Cloudflare has an API that pulls all the stats. To build this, she worked with Cloudflare GraphQL API. They have one single endpoint for all API calls and queries are passed using a JSON object.

Data is returned as a json object. She built the orchestration inside of Fabric Data Factory.

First portion is to set the date, she has this check if she wants to use today’s date or the custom date.

After that, she accesses the data using a copy data activity. Last step is to copy that data into a lakehouse. The file name is dynamic so she can capture the date that the data was pulled from.

The destination is a Lakehouse where she can build a report off of it. The best part of the copy data is the ability to map that nested JSON response into destination columns within the lakehouse. As a bonus for herself, she made a mobile layout so she can open it on her phone and keep track of it on the go.

Next steps for her project – see country statistics, rank content by popularity, compare statistics across time periods.

Calculation Groups & C# the perfect couple by Paulina Jedrzejewska

Love story between calculation groups and C#.

All slides are available on her github: https://github.com/pjedrzejewska

What makes it a perfect couple? We can add C# to our calc group dev to make it much more flexible.

Calc groups don’t allow you to create a table with customizable sorting of the measures and have measures not using the calc group within the same visual. For example, if I have previous year and current year in my calc group, it would display all of my current year measures together and all the prior year measures together, but I may want to see current year sales right next to prior year sales instead of having current year sales, revenue, cost, etc. then have the prior year much further away. To solve this, we will create separate measures for each time intelligence option. But why do that by hand when we can automate?

What can we automate? Creating & deleting measures, renaming objects, moving objects to folders, formatting objects.

Why automate? Works through a bunch of objects in seconds, reusable, homogeneous structure and naming, and no more tedious tasks for you!

For this demo, we will use Tabular Editor 2, PBI Desktop, Code Editor (visual studio code). She uses a .cs file to script out the code. All the columns in the fact table that will be turned into a measure have a prefix of “KPI”.

column.DaxObjectFullName will give you the full table, column reference which allows you to reference that dynamically within C# code.

Calculation groups are essentially a template for your measures. It’s a feature in data modeling that allows you to take a calculation item and use it across multiple measures. These are very useful for time intelligence, currency conversion, dynamic measure formatting, and dynamic aggregations.

When creating a calc group, always be sure to include one for the actual measure value by using SELECTEDMEASURE(). You can also refer to other calculation items within others as a filter.

Now that we have a calc group, the C# can loop through every base measure and through all the calculation group items and generate our new measures off of that. Remember all the code is in her github 🙂

Another great use case for C# is to add descriptions to all the measures.

This allows for self-documented code.

Power BI Cruise Recap – Deep Data Dives

Thank you so much to those of you who were able to attend this three-hour, C# packed session! The Power BI Cruise was an incredible conference, and I cannot recommend it enough if you’re looking for an opportunity to dig deep on Power BI topics with other passionate individuals. From the first ever Power BI bingo to a three hour session on TMDL from Mathias himself, this was a trip that will stick with me for a while. In this post, I’ll include notes from my session and the sessions I attended plus links to the GitHub. Let me know if anything really piques your interest and you’d like to see more detailed blog posts about it!

  1. My Session – Power BI, C#, and TMDL!
  2. TMDL & Source Control – Mathias Thierbach
  3. Hacking the Visuals – Stepan Resl
  4. Ask Me Anything – Jeroen (Jay) ter Heerdt

Huge shout out to the other speakers and of course the incredible organizers who stuck with this amazing vision through the pandemic and made it happen. Thank you all so much for making this event full of passion and unparalleled learning. I don’t know how you all made this happen without sponsors, but it was truly an incredible event. So thank you again Asgeir, Erik, Johan, and Just!

My Session – Power BI, C#, and TMDL!

Now you may have noticed I give this session – Power BI Meets Programmability – quite often. Bringing programmability to the world of Power BI is something I greatly enjoy, but this session was truly unique. Having three hours to take a deep dive into what C# can do for Power BI development was the opportunity of a lifetime. With two more hours than usual, we explored creating calculation groups in this session for the first time as well as creating automatically generated set of measures based on data types. Not only that, but the attendees were good sports and stuck around for an extra half hour to create calculation groups in C# using TMDL! I plan on doing a very detailed blog on this in the future, but the GitHub folder for this conference (link below) contains TMDL code live coded by the creator himself – Mathias Thierbach! The script contains a section that will create a TMDL version of your data model, allow you to interact with TMDL files within your solution, and publish those changes back to the data model. It blew my mind, and I hope it blows yours as well!

Additional Resources

The final code can be found here: https://github.com/Anytsirk12/DataOnWheels/tree/main/Power%20BI%20Cruise.

TMDL & Source Control – Mathias Thierbach

Speaking of TMDL, this conference hosted the incredible Mathias Thierbach for a session on TMDL & Source Control. Below are my notes from that session, but if you ever have a chance to hear him speak on the subject I highly recommend it. The notes below are in bullet point format for now, as I dig further into this language I look forward to writing more detailed blog posts about this incredible language and what we can do with it.

  • Key goals of TMDL: readable, editable, and allows collaboration
  • There will be a VSCode extention for TMDL coming soon (we were given a special preview in the session) that will have syntax highlighting for both DAX and M as well as TMDL
  • Shortcut to get to the TOM docs: goto.pbi.tools/tom-docs or pbi.onl/tools
  • TMDL is completely case insensitive!
    • Default export of .tmd file is camel case
  • Boolean properties: you only have to say the Boolean property to enforce the default value
    • To explicitly set it, syntax is:
      property: true/false
  • No matter how many times you serialize TMDL, the order of the objects will always be the same
    • Tables: partitions, calc groups, columns, hierarchies…
    • You can reorder objects
    • You can reorder objects to your specifications (aka no longer alphabetical)
  • Meta data IsParameterQuery = true makes it a parameter
  • Ordinal is not a table property in TOM, but you can create one in TMDL! It won’t show up in this order in Desktop…yet. But it will allow consistent order. It’s a weak ordinal, in cases of conflict it won’t cause any errors. It should default to alphabetically on conflict.
  • Indentation for expressions is not white space sensitive (DAX and M)
    • It finds a shared whitespace to the left to create indentation in the final model
  • Indentation only matters for nested objects
  • To fold levels in VS Code use Ctrl + k + [level_number]
  • To unfold levels in VS Code use Ctrl + k + j to unfold all levels
  • Unicode is supported
  • Shared expressions = parameters
  • Preview 1 (what we have), perspectives can’t be read back but can create them
    • Will be fixed in Preview 2
  • 4 previews total, goal is for GA at end of the year
  • tableName.columnName etc.
    • If there are spaces, use single quotes as a delimiter
  • Description is added by using /// text on top of the object declaration. Can be multi-line and line break is maintained.
  • No comment support but you can comment within M and DAX
    • Triple ‘’’[formula] ‘’’ will be a safe way to mark M/DAX expression sections. Likely coming in preview 3 or 4 later this year
  • Language spec will be published
  • Default properties are available to find in Rui’s documentation
    • Default property is what is after the = for the object
    • This can be found on the main TMDL overview page

How to create a TMDL folder from Tabular Editor (need 2.18.1 version or later)

Open the folder you’ve created within VS Code.

Hacking the Visuals – Stepan Resl

Hacking = use various available means to get better insights from data.

This session absolutely blew my mind. Stepan does an incredible job showing the art of the possible by hacking into expression properties within tabular editor as well as calculation group formatting. We had an awesome time connecting at this conference and I hope to see him again at future conferences! In the meantime, below is a link to his GitHub as well as a blog post that covers a lot of the same material as his session. Be prepared to have your mind blown on this one.

Stepan’s GitHub for this session:

https://github.com/tirnovar/public-speaking/tree/main/Power%20BI/Power%20BI%20Cruise%20-%202023

Additional blog post by Stepan on this subject: https://datameerkat.com/conditional-formatting-calculation-groups

  • For bar charts – to easily highlight what is over target, change the color within the bar at the target with grey below the target
  • Visual Vocabulary = a great reference for chart types
    • Ft.com/vocabulary
  • / used in a format string will escape the wildcard characters
  • Keep in mind audience, green is not positive in Japan
  • topProductByLocation = TOPN(1, ALLSELECTED(Products[ProductName]),[# Total Quantity])
  • # sold quantity by Top Product = VAR _product = [topProductByLocation] RETURN CALCULATE([# Total Quantity], Products[Product name] = _product
    • This allows us to use a measure as a filter by precalculating it in a variable
  • Best whitespace option is to use UNICHAR 8203, it’s a very very small dot
  • Color picker: Just Color Picker
  • Use smart text in subtitles in charts to tell people when we hit a target or how close we are
  • You can use format to change numbers to text with a thousands delimiter! See the subtitle measure
  • You can use ; within format to do an if statement! BUT don’t forget to escape any special characters using \
    • Example: “Sales target was set to ” & FORMAT(_target,”#,,,,,,,,,,#”)
      & ” and we have sold ” & FORMAT(_sold,”#,,,,,,,,,,#”)
      & ” which means we ” & FORMAT(_sold-_target,”\have;\haven’t”) & ” fulfilled our target.”
    • In this case “h” is a special character that is trying to get an hour value and will show 0 if you don’t escape it
  • If you create conditional formatting in bar charts, the colors will stay if you switch to a line chart even though conditional colors are not available in the UI
    • Same thing happens if you have an x constant line in a line chart and switch to bar chart
  • You can customize spacing below titles and subtitles.

I’ve added the final PBIX from this session to the GitHub, keep in mind a lot of the features that make it possible are only visible through tabular editor: https://github.com/Anytsirk12/DataOnWheels/blob/main/Power%20BI%20Cruise/Hacking%20the%20Visuals.pbix

Ask Me Anything – Jeroen (Jay) ter Heerdt

Meeting Jay and chatting during various free time on the ship was definitely a highlight of the trip. He’s incredibly knowledgeable on Power BI (especially the elusive DAX language), and has a deep understanding of the Microsoft strategy on new feature creation. It was a pleasure getting to laugh and brainstorm together, and I can’t wait to hang out again at future conferences. Below are some questions that were asked during the AMA session as well as answers provided and links I found while Jay was talking.

  • Why can’t we use RLS and USERELATIONSHIP() together?
    • RLS blocks USERELATIONSHIP() since USERELATIONSHIP() can (but doesn’t always) go around the RLS relationship and creates a security risk
  • Different people own different visuals which is why the options and customizations vary so widely
    • Miguel Myers did a session on the future of visuals in Power BI. Huge plans with him on board to own visuals and align everything
    • There is a linked in group called PBI Core Vision, put comments there on what can be improved in the future. Twitter and LinkedIn
  • When we change the hierarchy name or order via XMLA, it now breaks the visual. Can we get this to update in the visual in the service with adjusted level orders
    • SEND TO JAY
    • Also ask him about Top N + Others
  • Tableau is the primary competition for PBI, Looker is up and coming as Google, and third is Qlik because it’s still getting new clients in Europe
    • These are taken into consideration when making a case for new items
    • Cannot blatantly copy items
    • Tableau has specific people in their agreement that cannot even look at Tableau (can only look as a user, not at the debugger nor dll file)
  • In the future, hoping to have functions that could be shared within a community for DAX
  • Could we get Rulers or aligned grid lines in PBI Desktop?
    • Rosie or Ree-ann would own this
    • Can we customize grids?
    • Not part of visual team, it’s placed under on-boarding
  • Advancements to API?
    • Hardening of PBIX will help with getting better Scanner APIs to get deeper API’s (visual level and page level details on what’s available within the tenant)?
  • DataZen got killed and unsure why
  • Jay would investigate how to get rid of filter direction in DAX
  • Why can I not put a measure name in smart narratives?
    • You can do this by creating a shape and using fx to put in the text you want

Power BI: Dynamically Removing Errors From Columns in M

If you have excel data, or user-entered data, you have likely experienced frustration from repeatedly seeing the error message below upon hitting “Close & Apply” or refreshing your data model. It can be extremely frustrating to see failed refresh emails come in every time a user types in a text value in a number field or adds a formula to the report that results in #N/A.

End users that are not trained in data governance but are actively involved in maintaining a data set can easily make data entry mistakes or create inconsistent data types within columns. For example, you may have a column in the dataset called “Sales” and instead of 0, someone may type “None” or “NA”. When this gets loaded into Power BI, Power BI will not know how to convert the text value “None” to a number, and it will throw an error on the refresh of the report.

One way to mitigate the impact of user-entered data is to replace errors with null values. This is not ideal since it doesn’t fix the data entry issues, but it does enable reports to still be refreshed and used while the data issues are addressed. In Power Query, you can manually replace the errors with null by going to the “Transform” tab then selecting the drop down for “Replace Values” and choosing “Replace Errors”.

After selecting “Replace Errors”, type null (all lowercase) into the value field to replace your errors with a null value that will allow the report to refresh no matter the data type.

As you can imagine, it can get quite tedious to apply this step to every column in every query in your report. Good news! I have a query that you can add to the end of your applied steps that will make sure every single column in your query will replace errors with nulls. Let’s walk through it.

Final Query:

let
    Source = Excel.Workbook(File.Contents("C:\Users\KristynaHughes\OneDrive - DataOnWheels\GitHub\AdventureWorks Sales w Errors.xlsx"), null, true),
    Sales_data_Sheet = Source{[Item="Sales_data",Kind="Sheet"]}[Data],
    #"Promoted Headers" = Table.PromoteHeaders(Sales_data_Sheet, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"SalesOrderLineKey", Int64.Type}, {"ResellerKey", Int64.Type}, {"CustomerKey", Int64.Type}, {"ProductKey", Int64.Type}, {"OrderDateKey", Int64.Type}, {"DueDateKey", Int64.Type}, {"ShipDateKey", Int64.Type}, {"SalesTerritoryKey", Int64.Type}, {"Order Quantity", Int64.Type}, {"Unit Price", type number}, {"Extended Amount", type number}, {"Unit Price Discount Pct", Int64.Type}, {"Product Standard Cost", type number}, {"Total Product Cost", type number}, {"Sales Amount", type number}, {"Large Sales", type number}}),
    //ColumnsInTable grabs all the column names in your existing table dynamically
    ColumnsInTable = Table.ColumnNames(#"Changed Type"), 
    //Converting this list of columns to a table lets us add a column for what we will replace errors with
    #"Converted to Table" = Table.FromList(ColumnsInTable, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    //This adds a column to show what we will replace errors with
    #"Added ReplacementColumn" = Table.AddColumn(#"Converted to Table", "Replacement", each null),
    #"Transposed Table" = Table.Transpose(#"Added ReplacementColumn"),
    //Transforms the table back into a list so we can use it in our replace error step
    ColumnList = Table.ToColumns(#"Transposed Table"),
    //Takes our last step in our original table called Changed Type and replaces all errors in all our columns with null
    #"Replaced Errors" = Table.ReplaceErrorValues(#"Changed Type",ColumnList)
in
    #"Replaced Errors"

In the example above, #”Changed Type” is the last step in my original query but this resulted in some errors in my table (see screenshot below).

In order to apply a replaced error step to all the columns in the table, we need to understand a bit more about the Table.ReplaceErrorValues function. This function is comprised of two arguments: (table as table, errorReplacement as list). The table portion will be the name of our last step because that is the final table that we want to manipulate (#”Changed Type” in this example). The error replacement list will require two values – the column name and the value we want to replace errors with. To build the list, follow the steps below.

  1. Table.ColumnNames(#”Changed Type”) : this function will look at the last step in your query (#”Changed Type” in this example) and generate a list of the columns

2. Table.FromList(ColumnsInTable, Splitter.SplitByNothing(), null, null, ExtraValues.Error) : this function converts our list of columns into a table. The reason we are converting a list to a table then back to a list is so we can easily add in a column of nulls to our list.

3. Table.AddColumn(#”Converted to Table”, “Replacement”, each null) : this function adds a column called “Replacement” that contains the null values that we want to replace our errors with.

4. Table.Transpose(#”Added ReplacementColumn”) : transposing the table will make it easier to put our values into a list appropriately. We need the final result to be a comma separated record that has the column name then the replacement null value.

5. Table.ToColumns(#”Transposed Table”) : this function flips our transposed table back into a table full of lists that we can use in our final function. By clicking on the cell in our list of lists, you can see that the list is comprised of our column name and our replacement value. Pretty handy for our replaced error function.

6. Table.ReplaceErrorValues(#”Changed Type”,ColumnList) : time to replace our errors. Notice that we have #”Changed Type” as our first argument. This will force Power Query to grab our original last step and use that version of our table for the function. ColumnList is the name of the step that contained our list of lists with our column names and null values inside.

And there ya have it! To use the query above, you’ll need to swap #”Changed Type” with whatever the name is of your final step in your original query. Pretty handy when dealing with user-entered data, but keep in mind the ideal solution would be to put more data governance around the original data entry. For example, a simple application that forces people to enter the correct data type or else it throws an error or a drop down selection with valid options (this would be a great option for zip codes or states) would prevent data entry issues in the first place and raise the quality of your data.

Additional Resources

Power BI: Calculating Network Days in M Custom Columns

Knowing the days between events is a fairly common reporting request because a lot of reporting is created to track SLA’s (service level agreement) and other KPI’s (key performance indicators). While getting the days between two dates is fairly easy to achieve, they tend to follow up and ask how many week days there are between two timed events. For example, one company may have a SLA to ship an order within three week days of the order being placed or else a discount is applied to the order. In this case, I would highly recommend that the company have software that calculates these days in the background and stores the actual week days between order date and ship date in a database. Unfortunately, many companies create policies like this without considering future reporting needs and these values have to be calculated on the backend.

Please note, this blog will not cover how to calculate business days (week days that are non-holidays) because that should be done upstream from Power BI in either an application or SQL stored proc that can get/set holidays dynamically (the date of Easter changes every year). International clients often have different holidays which can infinitely complicate the definition of a business day. Also, many coding languages have built in functions to get business days while Power BI does not (sadly, the Excel NETWORKDAYS() function is not currently in Power BI) . So for this, we will stick to excluding weekends and getting a count of the week days between two events.

The Final Equation

To make it easier to read, SD means start date and ED means end date. SWD will mean the work week day of the start date and EWD means work week day of the end date. For example, if the order date was May 2nd 2022 and the ship date was May 4th 2022, then the SD = 5/2/2022, SWD = Monday/day 2 of the week, ED = 5/4/2022, and EWD = Wednesday/ day 4 of the week. But let’s say the SD was May 7th which is a Saturday. We would adjust this to be Friday/ day 6 of the week. I’ll put code below to show how to adjust the weekends to Friday before and how to adjust to the Monday after, but the GitHub file will align weekends to the Friday before.

Week days between = ( if SWD > EWD then (5 + EWD – SWD) else (EWD – SWD)) + (5 * ( Actual Days Between /7 rounded down))

In plain English: the week days between two dates starts by looking at if the start date is earlier in the week than the end date. If the start date is later in the week than the end date, then add five to the difference between their day of the week (for example, end day of Monday to start day of Wednesday would be 5 + 4-2). If the start date is earlier in the week than the end date, then take the difference between those two days of the week (start day Monday to end day Wednesday would be 4-2).

Next, take the actual days between your end and start date including weekends, divide that by seven then round down. That will give you the number of full weeks between your two days. Multiple that number by 5 to represent the five days within each of those weeks.

Finally, add those two numbers together to get your week days between the start and end dates.

Building the Variables

To create that final equation, we will need to build a few extra columns in the Power Query Editor. For each one, we will click on “Transform Data”, go to the “Add Column” tab, then select “Custom Column”. This will open a window to type in some custom M code (yay another coding language!).

Don’t worry though, I have ya covered with the code you need for this demo. To start, identify your columns with your start date and end date. For this demo, we will have Start Date = Order Date and End Date = Ship Date. The code below will use StartDate and EndDate for these fields, but the GitHub file will use Order Date and Ship Date.

The code below will turn Saturday and Sunday into Friday:

Start Date Work Weekday = 
Date.DayOfWeek(
(if (Date.DayOfWeek([StartDate],Day.Sunday)+1) = 1 
then Date.AddDays([StartDate],-2) else
if (Date.DayOfWeek([StartDate],Day.Sunday)+1) = 7 
then Date.AddDays([StartDate],-1) else [StartDate])
,Day.Sunday) +1

End Date Work Weekday = 
Date.DayOfWeek(
(if (Date.DayOfWeek([EndDate],Day.Sunday)+1) = 1 
then Date.AddDays([EndDate],-2) else
if (Date.DayOfWeek([EndDate],Day.Sunday)+1) = 7 
then Date.AddDays([EndDate],-1) else [EndDate])
,Day.Sunday) +1

The code below will turn Saturday and Sunday into Monday

Start Date Work Weekday = 
Date.DayOfWeek(
(if (Date.DayOfWeek([StartDate],Day.Sunday)+1) = 1 
then Date.AddDays([StartDate],1) else
if (Date.DayOfWeek([StartDate],Day.Sunday)+1) = 7 
then Date.AddDays([StartDate],2) else [StartDate])
,Day.Sunday) +1

End Date Work Weekday = 
Date.DayOfWeek(
(if (Date.DayOfWeek([EndDate],Day.Sunday)+1) = 1 
then Date.AddDays([EndDate],1) else
if (Date.DayOfWeek([EndDate],Day.Sunday)+1) = 7 
then Date.AddDays([EndDate],2) else [EndDate])
,Day.Sunday) +1

Here’s the formula with some comments added in:

Work Weekday = 
Date.DayOfWeek( //sets us up to get the day of the week at the end of the formula
(if (Date.DayOfWeek([Date],Day.Sunday)+1) = 1 //if it's Sunday 
then Date.AddDays([Date],-2) else //then go back two days to get Friday's date
if (Date.DayOfWeek([Date],Day.Sunday)+1) = 7 //if it's Saturday
then Date.AddDays([Date],-1) else [Date]) //then go back one day to get Thursday's date, otherwise give me the date 
,Day.Sunday) +1 //now give me the day of the week for the date left from the previous if statements

Alrighty, the next building variable we will need for our equation is the actual days (including weekends).

Days Between = Duration.Days([EndDate]-[StartDate])

In our case though, there is a possibility that the orders have not shipped. To handle any errors where there is no EndDate or StartDate, use the equation below:

Days Between = if [EndDate] is null then null else
if [StartDate] is null then null else
Duration.Days([EndDate]-[StartDate])

Okay great, now we can put it all together using our new fields!

Weekdays Between = 
( if [Start Date Work Weekday] > [End Date Work Weekday] 
then ( 5 + [End Date Work Weekday] - [Start Date Work Weekday] )
 else ( [End Date Work Weekday] - [Start Date Work Weekday] )) 
+ ( 5 * ( Number.RoundDown ( [Days Between] / 7 )))

Boom all done! You made it through a lot of complicated M code, nice work! Below is the M used in the GitHub file so you can see how it will look in the advanced editor.

let
    Source = Excel.Workbook(File.Contents("C:\Users\KristynaHughes\OneDrive - DataOnWheels\GitHub\AdventureWorks Sales.xlsx"), null, true),
    Sales_data_Sheet = Source{[Item="Sales_data",Kind="Sheet"]}[Data],
    #"Promoted Headers" = Table.PromoteHeaders(Sales_data_Sheet, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"SalesOrderLineKey", Int64.Type}, {"ResellerKey", Int64.Type}, {"CustomerKey", Int64.Type}, {"ProductKey", Int64.Type}, {"OrderDateKey", Int64.Type}, {"DueDateKey", Int64.Type}, {"ShipDateKey", Int64.Type}, {"SalesTerritoryKey", Int64.Type}, {"Order Quantity", Int64.Type}, {"Unit Price", type number}, {"Extended Amount", type number}, {"Unit Price Discount Pct", Int64.Type}, {"Product Standard Cost", type number}, {"Total Product Cost", type number}, {"Sales Amount", type number}}),
    #"Added Order Date" = Table.AddColumn(#"Changed Type", "Order Date", each Date.From( Text.Middle(Text.From([OrderDateKey]),4,2)&"/"& Text.End(Text.From([OrderDateKey]),2)&"/"& Text.Start(Text.From([OrderDateKey]),4)),type date),
    #"Added Ship Date" = Table.AddColumn(#"Added Order Date", "Ship Date", each Date.From( Text.Middle(Text.From([ShipDateKey]),4,2)&"/"& Text.End(Text.From([ShipDateKey]),2)&"/"& Text.Start(Text.From([ShipDateKey]),4)), type date),
    #"Added Order Weekday" = Table.AddColumn(#"Added Ship Date", "Order Date Weekday", each Date.DayOfWeek(
(if (Date.DayOfWeek([Order Date],Day.Sunday)+1) = 1 
then Date.AddDays([Order Date],-2) else
if (Date.DayOfWeek([Order Date],Day.Sunday)+1) = 7 
then Date.AddDays([Order Date],-1) else [Order Date])
,Day.Sunday) +1,Int64.Type),
    #"Added Ship Weekday" = Table.AddColumn(#"Added Order Weekday", "Ship Date Weekday", each Date.DayOfWeek(
(if (Date.DayOfWeek([Ship Date],Day.Sunday)+1) = 1 
then Date.AddDays([Ship Date],-2) else
if (Date.DayOfWeek([Ship Date],Day.Sunday)+1) = 7 
then Date.AddDays([Ship Date],-1) else [Ship Date])
,Day.Sunday) +1,Int64.Type),
    #"Added Days Between" = Table.AddColumn(#"Added Ship Weekday", "Days Between Order and Ship", each if [Ship Date] is null then null else if [Order Date] is null then null else Duration.Days([Ship Date] - [Order Date]),Int64.Type),
    #"Added Weekdays Between" = Table.AddColumn(#"Added Days Between", "Weekdays From Order to Ship Date", each if [Days Between Order and Ship] is null then null else
( if [Order Date Weekday] > [Ship Date Weekday] then (5 + [Ship Date Weekday] - [Order Date Weekday] ) else ( [Ship Date Weekday] - [Order Date Weekday] )) + (5 * (Number.RoundDown([Days Between Order and Ship]/7))),Int64.Type)
in
    #"Added Weekdays Between"

Additional Resources

Connecting to Azure Blobs in Power BI

The step-by-step process below walks through connecting to data housed in Azure Blob Storage from Power BI using a SAS token. There are many ways to grab your data from Blob Storage, but this is the most efficient, scalable, and secure way that I found (with some security restrictions from watchful DBAs).

Resources Needed:

  • Base URL for container
  • SAS Token (must have read AND list permissions)
    • Check out the link in resources for a tutorial on generating SAS Tokens.
  • File Path (should end with .csv)
  • Power BI Desktop

Notes:

  • You can skip ahead to the sample M script if you have all your elements. Simply swap out the BaseURL, SASToken, and FilePath and you’re good to go. Otherwise, feel free to walk through the steps below to gain a deeper understanding of the process.
  • Make sure your Base URL ends with a “/”, your SAS Token starts with “?”, and your file path ends with “.csv”
  • Keep the double quotes around each parameter value, this forces Power BI to recognize it as text.

Process:

  1. In Power BI Desktop, go to Get Data and select the Web option.
  2. Switch to the advanced view and put the base URL in the first box.
  3. Put in the second box the SAS token.
  4. In a third box (click add part to get the third one), put “&restype=container&comp=list” (this will allow you to list all the blobs in that container).
  5. Expand the blob down then filter the name on the file path.
  6. Create a custom column to create the entire URL for the file (M code samples are below).
    • FileURL = BaseURL & [Name] & SASToken
  7. Create another custom column to access the web contents of your FileURL column.
    • BinaryURLContents = Web.Contents([FileURL])
  8. Remove all columns except the BinaryURLContents.
  9. Click on the “Binary” value and watch Power BI expand out your CSV file.
  10. Manipulate data from there as needed.

Final M Code:

let
    BaseURL = "BASE_URL_HERE"
    ,SASToken = "SAS_TOKEN_HERE"
    ,FilePath = "FILE_NAME_HERE_(Note do not include section of the URL from Base URL)"
    ,Source = Xml.Tables(Web.Contents(Text.From(BaseURL) &Text.From(SASToken) & "&restype=container&comp=list")),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Attribute:ServiceEndpoint", type text}, {"Attribute:ContainerName", type text}}),
    #"Removed Other Columns" = Table.SelectColumns(#"Changed Type",{"Blobs"}),
    #"Expanded Blobs" = Table.ExpandTableColumn(#"Removed Other Columns", "Blobs", {"Blob"}, {"Blob"}),
    #"Expanded Blob" = Table.ExpandTableColumn(#"Expanded Blobs", "Blob", {"Name", "Properties", "OrMetadata"}, {"Name", "Properties", "OrMetadata"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Blob", each ([Name] = Text.From(FilePath))),
    #"Added Custom" = Table.AddColumn(#"Filtered Rows", "FileURL", each BaseURL &  [Name] &  SASToken),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "BinaryURLContents", each Web.Contents([FileURL])),
    #"Removed Other Columns1" = Table.SelectColumns(#"Added Custom1",{"BinaryURLContents"}),
    BinaryURLContents = #"Removed Other Columns1"{0}[BinaryURLContents],
    #"Imported CSV" = Csv.Document(BinaryURLContents,[Delimiter=",", Columns=24, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(#"Imported CSV", [PromoteAllScalars=true])
  in
   #"Promoted Headers"
//Use this query to validate your file path
let
    Source = Xml.Tables(Web.Contents("BASE URL" & "SAS TOKEN" & "&restype=container&comp=list")),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"NextMarker", type text}, {"Attribute:ServiceEndpoint", type text}, {"Attribute:ContainerName", type text}}),
    #"Removed Other Columns" = Table.SelectColumns(#"Changed Type",{"Blobs"}),
    #"Expanded Blobs" = Table.ExpandTableColumn(#"Removed Other Columns", "Blobs", {"Blob"}, {"Blob"}),
    #"Expanded Blob" = Table.ExpandTableColumn(#"Expanded Blobs", "Blob", {"Name", "Properties", "OrMetadata"}, {"Name", "Properties", "OrMetadata"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Blob", each [Name] = "FILE PATH")
in
    #"Filtered Rows"

Additional Resources: