Storytelling with Data – Enertia User Conference 2022

Thank you to everyone who attended my session at the Enertia User Conference in Las Vegas earlier this month! It was a blast meeting everyone, and I can’t wait to see your BI solutions come to life. Big shout out to my dad, Steve Hughes, for introducing me to this amazingly talented group of people, and to Fernando Salazar, for welcoming me into the Enertia community. Below is all the research used in my presentation on Storytelling with Data. Please use this link for the GitHub folder containing the presentation, sample data used, and a sample report that we worked on together. Happy coding!

In 2015, the average company spent $7.4 million (USD) a year on data-driven initiatives (IDG Big Data & Analytics Survey). In 2021, the average company expected to spend $12.3 million (USD) on data initiatives in 2022 and 55% expect their budgets to increase (Data & Analytics Study). Data and analytics budgets have grown nearly 40% in five years, yet Foundry found in their 2022 study that “lack of appropriate skill sets is a top challenge” (2022 Study). It’s clear that data is a priority to many organizations, but without the proper training on how to use data to deliver insights that data sit like a box of nails without a hammer.

To turn our millions of dollars worth of data into actionable insights, we need to tell a story with it. Why a story? Storytelling has been and is a vital way humans understand and share the world around them. If numbers were as interesting and memorable as stories, our classic works of art would be the Pythagorean theorem instead of the Iliad and Odyssey. Listening to a story engages multiple parts of the brain, which is why listening to your great aunt’s story for hours is exhausting, but also why stories are more memorable. Here are the parts of the brain engaged when you hear a story:

  • Wernicke’s area (language)
  • Amygdala (emotions)
  • Minor neurons (empathizing)

The hippocampus (short-term memory storage) is more likely to push stories into long-term memory than numbers because they engage multiple areas of your brain. This is why so many math classes include word problems. Thankfully, visualizing your data doesn’t need to be like a math test.

There are a few main elements to be successful at data storytelling:

  1. Data
  2. Story
  3. Visualizations

Data powers the visualizations that can communicate your story clearly and memorably. Similar to a picture book, a good report will quickly communicate a story and a great report will help you change that story through actionable insights. So how do we go from numbers to a story? From math class to lit? Through connecting visuals and numbers to elements of a story. Every story contains a few common elements: characters, setting, conflict, and resolution.

For example, imagine you have seen a downward trend in sales from Texas for smart cars. You characters would be your previous and current customers in Texas, setting is the timeline of the downward trend and the customers across the country, conflict may be that news stations in Texas ran a story about hackers getting into smart cars, and resolution may be to advertise a security patch through local news stations in Texas. There may not be a conflict in your story (maybe you’re exceeding your goals), so that section can be skipped in favor of focusing on what is going well.

Always remember that applying analytical techniques to managerial problems requires both art and science.

Jan Hammond, Harvard Business School Professor

Creating Data-Driven Decisions

Storytelling with data can leave an audience asking questions such as “how do we make our sales go up?” or “what is xyz branch doing to increase profit percentages higher than everyone else?” This is where the resolution portion of the story comes into play. A lot of good reports will “tell the news”, that is they will display a current state of affairs retroactively. To generate a true return on investment (ROI) from our data, we need to use it to proactively drive decisions.

For example, in 2008 Starbucks closed hundreds of locations. Howard Schultz returned as CEO and declared they would use data to place their stores strategically going forward. Starbucks now consults with an analytics company called Esri to analyze retail locations for various variables that are proven to drive coffee shop traffic and overall success (reference).

The key to building a data-driven culture is to avoid deceiving your audience. Visualizations are a powerful tool, but they can be used to trick the viewer into disproportionate understandings of data. For example, New York Times came out with a visual in 1983 that showed the mandated fuel economy standards changing from 18 to 27.5 miles, an increase of 53%. The graph in the article, however, had an increase of 783%! A good way to build trust in your data is to avoid the Lie Factor.

The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the quantities represented. The Lie Factor = the ratio of the size of an effect show in the graphic to the size of the effect in the data.

Tufte

This all sounds well and good, but how much money can using data instead of your gut actually save you? According to a study by Harvard Business Review, Fortune 1000 executives have seen the most value by aiming to decrease expenses. The second highest is by finding new innovation avenues. Decreasing expenses is great, it allows your company to lower overhead costs of operation and increase profit margin. However, there is only so much overhead you can eliminate which is why finding new innovation avenues will be the long term ROI from using big data.

Building a Data-Driven Culture

One of the biggest blockers to using big data effectively is adoption by the larger business. According to HBR, more than half of Americans depend on their gut to make decisions even if there is evidence that disproves their theory meanwhile data-driven organizations are 3x more likely to report significant improvements in decision-making. There are many ways to become more data-driven, even in your daily life. HBR summaries a few easy steps to take to begin building a culture of cultivating insights from big data:

  1. Look for patterns everywhere
  2. Tie every decision back to data
  3. Visualize the meaning behind data
  4. Consider furthering your education to learn more data analyst strategies

Thankfully, this approach does not need to be all or nothing. It can be as simple as looking for patterns in your personal life like spending more money on ice cream deliveries after watching sad movies. A simple cost-saving would be to stock up on ice cream in advance and avoid the delivery fees. Finding a new innovation source may look like watching romantic comedies instead of sad movies or switching to popcorn which is much more cost effective.

Additional Resources

Fast Fingers-Function Keypads

This is the third in the series of tools and technologies that I use to deal with the loss of functionality in my hands and arms. Check out this article for the lead up to this series.

Setting the stage

The issue I’m dealing with involves muscle atrophy in my hands and my arms. As a result, I’ve lost a lot of strength in my hands and arms including my fingers. Some of the unintended or unplanned impacts included the inability to successfully type at times or diminished amount of time I can spend typing. I had previously used a Logitech split keyboard which I loved. I considered myself a good typist and used to be able to type and a code very effectively. With the onset of the atrophy, I encountered situations where my hands would stop working. I would be typing and then I couldn’t type anymore. Some of it is related to physical exhaustion or fatigue from the effort required given my condition. I also experience a situation where my fingers curl making it nearly impossible type on a keyboard. The first time this happened was the first time I was concerned about my career. As I noted in a previous article, I am using voice to text for the bulk of my typing including this blog post. However, voice to text does not work that great for coding and frankly I have issues with any multi-key functions that require my hand to stretch across the keyboard.

Discovering a solution

I was watching a show with my wife and daughter when an ad came up that showed the Quick Keys solution from Xencelabs. This was part of a video editing package including a tablet and pens. I was intrigued because I had not seen a solution that allowed me to program keys with text. It also had a wheel on it that could be used for other tasks. I went and looked this up and I was able to buy just a Quick Keys device.

And I started doing some more research and looking into what this tool did, I realized that the space I needed to look more closely at was related to video editing and streaming. They have a series of tools which support macro keys that they use to optimize applications, shortcut keys, and game actions. The variety of these tools is substantial. Shortly thereafter, while at church working with the tech team, I saw a Stream Deck. This was even cooler because each of the buttons have a programmable LCD screen behind it. Now I knew what to look for and started determining what I wanted to do as I move forward.

Xencelabs Quick Keys

I purchased the Xencelabs Quick Keys device first. It has 40 programmable functions and a physical dial that I was able to program.

Xencelabs Quick Keys

I programmed some basic functionality that I really liked to have available with the ease in of a pushing a button close to me such as delete, backspace, a shortcut for speech to text, undo, redo, cut, @, and ctrl. This is my generic set of functions that in addition to the copy, paste, double click, and dictation shortcuts I had on my mouse applied to most of the applications that I worked in. I next set up a screen that you can go to by pushing a function button on the device to support specific functionality within Microsoft Word. The big one that I needed to have in there was a shortcut to change case as Microsoft dictation does not have cap capability at this time. I also added home and end along with a couple of other functions to be helpful.

The Quick Keys device has five customizable screens of eight buttons each. So, I used the first one for my generic set as noted above. My second one was for Word. I added a third screen that contained the web addresses of common locations I needed to go to such as the Azure portal and my blog. This allowed me to open a browser, push a button, and go to that site easily. What I quickly discovered was that I was going to need more functionality for this to be effective in the long term. Before I go to the next solution a couple other things I did on this device included using the physical wheel for moving the cursor and for volume control as it too had five settings.

My default screen setup with Quick Keys

Elgato Stream Deck

Because I had a device already, I wanted to research the Stream Deck before purchasing it. One thing I quickly noticed is that it is a favorite tool among streamers and has not had a significant upgrade because it just works. The device has been a solid device, easily programmable and customizable in a multitude of environment. If you go to YouTube, you will find a number of streamers, gamers, and content creators walking through demos of how to setup and use the Stream Deck. It has a lot of built-in functionality for variety of editing and streaming tools.

 To start with, I was unsure if this would be a good solution for me as most of what I needed was not what they were using it for. So, I dug in. What I discovered was that the deck was highly programmable with effectively an unlimited number of options that you could program. I thought I’d give it a try.

I purchased the 15 key Stream Deck pictured here:

Elgato Stream Deck

Once I got the Stream Deck and uploaded the software to program it, I quickly realize there are number of addons for the Stream Deck to support programming and Windows functionality. These are addons that give you shortcuts to things like locking your computer something that Quick Keys could not do. I added these in as well as some icon sets because icons are cool. Once I had this in place, I programmed my initial set of functionality to enhance what I was doing with Quick Keys. Because I already had Quick Keys, those two screens provided me the generic starting point for most functionality I would need outside a specific programs like Word. This also allows me to keep Quick Keys on the generic set of common functions and use Stream Deck to be more reactive to programs and needs.

I have taken the Stream Deck and programmed it for a couple of specific use cases I really am happy with. Let’s start with the first one which is Word. Stream Deck can detect the application that you’re in and set the keys up with specific profiles that you create. In my case whenever I am in Microsoft Word, it has the editing keys and other functionality that make working with documents easier on the Stream Deck. Because I still have basic keys sitting on the Quick Keys solution I’m able to have a combined set of 23 function keys readily available without switching screens.

Default screen on my Stream Deck
My Word profile

I have also set up a similar set of functionality when working with Microsoft Outlook. While I’m still working through which functions make the most sense for me to work with in each scenario and if I need more than one screen, the amount of functionality available at my fingertips as a result of programming these two devices is substantial. It makes it easier for me to work through a variety of commands without struggling with the keyboard. I am using the Word functionality as I edit this blog post after getting the content created via voice to text.

Optimizing functions for code

Now for the more interesting use case. It’s part of my job occasionally I still need to do some coding. In this case, I was working with T-SQL code. I needed to create some tables, add some keys, and work with data. Coding is one of the most typing intensive activities I do where voice to text does not help me. So, how can Stream Deck help me out? It turns out you can actually send keystrokes through both devices. However, the capacity of Stream Deck is substantially larger than that which is available in Quick Keys (500 versus 24). More importantly with the unlimited number of keys that can be programmed combined with folders to allow you to group together commands in Stream Deck, it was a natural choice. I created a folder on my screen for the work called SQL. In there I created a folder for CREATE commands and will likely add more as they go along. In the first folder of SQL, I have commands such as SELECT, FROM, WHERE, INNER JOIN, and similar common commands used when working with data in SQL. While it may seem at first glance these are short commands, I must call out that the goal for me is to reduce the amount of typing I do as much as possible. When I added the CREATE commands, I had the full syntax for creating a table where I just had to fill in the name and field list. I also added a folder that gave me the data types most used so I wouldn’t have to type those either. I also had the syntax for primary and foreign keys and add indexes in the future. My point here is I was able to reduce the amount of typing required by 30 to 50% depending on what I was doing. This reduces strain on my hands and allowed me to be more productive for a longer period of time.

Here are some screenshots from the Stream Deck programming surface to show you how I set it up for SQL so far.

My SQL default menu
My SQL create menu

I really enjoy a Stream Deck because some of it is just fun. Part of the fun of this is finding icons that you can use that include gifs on the screen. I’m completely planning to continue to extend what I’m doing with my Stream Deck.

Wrapping it up

Finding these tools have been extremely important to maintaining productivity in my work. What I’m learning so far is the tools that I’m discovering are beneficial for me but also for others who might want to build shortcuts out for things they can’t remember or to make working generally easier. These tools are not without cost, but the increased productivity is seriously worth it. And frankly it makes my setup at work look really cool. Hopefully you find this information helpful, or it could be helpful to someone else. Feel free to pass it on.

SQL Saturday Baton Rogue Recap

Thank you to everyone who came out and saw me present at the SQL Saturday in Baton Rogue! It was a blast to see smiling faces and not be the only one laughing at my jokes haha. HUGE thank you to the team of volunteers who made this event possible and to the sponsors for enabling such an amazing event.

My presentation of Power BI Meets Programmability had 58 people in attendance, the largest crowd of the whole weekend! Out of that came many great conversations about the potential shift in IT teams from separating reporting teams from application development to playing to each other’s strengths to deliver sustainable and scalable reporting to the end-users of your application. As promised, below is all the code used in the session. Thank you again to everyone who attended, and happy coding!

using System;
using Microsoft.AnalysisServices.Tabular;

namespace TOM_Testing_Live
{
        internal class Program
        {
        static void Main(string[] args)
        {
//-------------- Connect to PBI Premium Workspace ---------------- //

            // create the connect string - powerbi://api.powerbi.com/v1.0/myorg/WORKSPACE_NAME
            string workspaceConnection = "powerbi://api.powerbi.com/v1.0/myorg/Happy%20Coding";
            string connectString = $"DataSource={workspaceConnection};";

            // connect to the Power BI workspace referenced in connect string
            Server server = new Server();
            server.Connect(connectString);


//---------------- List out current state before we make changes -------------------//

            // enumerate through datasets in workspace to display their names
            foreach (Database database in server.Databases)
            {
                Console.WriteLine($"ID : {database.ID}, Name : {database.Name}, CompatibilityLevel: {database.CompatibilityLevel}, Last Updated : {database.LastSchemaUpdate}");
            }

            // enumerate through tables in one database (use the database ID from previous step)
            Model model = server.Databases["5d161e6b-697a-49b6-a3e6-7d19b940a8fd"].Model;

            //if you don't specify a database, it will only grab models from the first database in the list
            foreach (Table table in model.Tables)
            {
                Console.WriteLine($"Table : {table.Name} IsHidden? : {table.IsHidden}");
            }

            // Specify a single table in the dataset
            Table table_product = model.Tables["Product"];

            // List out the columns in the product table
            foreach (Column column in table_product.Columns)
            {
                Console.WriteLine($"Columns: {column.Name}");
            }

            // List out the measures in the product table
            foreach (Measure productmeasures in table_product.Measures)
            {
                Console.WriteLine($"Measures: {productmeasures.Name}");
            }

            // List of translations on the model
            foreach (Culture culture in model.Cultures)
            {
                Console.WriteLine($"Existing Culture: {culture.Name}");
            }

            // List out the hierarchies in the product table
            foreach (Hierarchy hierarchy in table_product.Hierarchies)
            {
                Console.WriteLine($"Hierarchies: {hierarchy.Name}, Lineage Tag = {hierarchy.LineageTag}");
            }

            // List out the levels in our category hierarchy
            if (table_product.Hierarchies.ContainsName("Category Hierarchy"))
                {
                Hierarchy hierarchy_category_ = table_product.Hierarchies["Category Hierarchy"];
                foreach (Level level_categoryhierarchy in hierarchy_category_.Levels)
                {
                    Console.WriteLine($"Category Hierarchy Level {level_categoryhierarchy.Ordinal}: {level_categoryhierarchy.Name} Lineage Tag: {level_categoryhierarchy.LineageTag} from {level_categoryhierarchy.Column.Name}");
                }
            }
            else
                {
                Console.WriteLine($"No Category Hierarchy");
                }


        
 //----------------Deleting columns and measures ----------------//
   /*         
           // Deleting a column if it exists
             if (table_product.Columns.ContainsName("Testing"))
             //this looks to see if there is a column already named "Testing"
             {
                 Console.WriteLine($"Column Exists");
                 table_product.Columns.Remove("Testing"); //if the column exists, this will remove it
                 Console.WriteLine($"Column Deleted");
             }
             else
             {
                 Console.WriteLine($"Column Does Not Exist");
             } 

             // Deleting a measure if it exists
             if (table_product.Measures.ContainsName("Test Measure"))
             //this looks to see if there is a measure already named "Test Measure"
             {
                  Console.WriteLine($"Measure Exists");
                  table_product.Measures.Remove("Test Measure"); //if the measure exists, this will remove it
                  Console.WriteLine($"Measure Deleted");
              }
              else
              {
                  Console.WriteLine($"Measure Does Not Exist");
              }

*/
//--------------- Adding columns and measures --------------------//            

              // Adding our column if it doesn't already exist
                if (table_product.Columns.ContainsName("Testing")) //this looks to see if there is a column already named "Testing"
                {
                    Console.WriteLine($"Column Exists");
                    //table_product.Columns.Remove("Testing"); //if the column exists, this will remove it
                    //Console.WriteLine($"Column Deleted");
                   // Column column_testing = new DataColumn() //this will add back the deleted column
                    //{
                  //      Name = "Testing",
                  //      DataType = DataType.String,
                  //      SourceColumn = "Product"
                  //  };
                  //  table_product.Columns.Add(column_testing);
                  //  Console.WriteLine($"Column Re-created!");
                }
                else
                {
                    Column column_testing = new DataColumn() //this will add the column
                    {
                        Name = "Testing",  //name your column for Power BI
                        DataType = DataType.String, //set the data type
                        SourceColumn = "Product" //this must match the name of the column your source 
                    };
                    table_product.Columns.Add(column_testing);
                    Console.WriteLine($"Column Created!");
                }

                //Get the partition sources for each table
                foreach (DataSource datasource in model.DataSources)
                {
                    Console.WriteLine($"Datasource : {datasource.Name}");

                };

                // Add a measure if it doesn't already exist in a specified table called product
                if (table_product.Measures.ContainsName("Test Measure"))
                {
                    Measure measure = table_product.Measures["Test Measure"];
                    measure.Expression = "\"Hello there\""; //you can update an existing measure using this script
                    Console.WriteLine($"Measure Exists");
                }
                else
                {
                    Measure measure = new Measure()
                    {
                        Name = "Test Measure",
                        Expression = "\"Hello World\"" //you can also use DAX here
                    };
                    table_product.Measures.Add(measure);
                    Console.WriteLine($"Measure Added");
                }


//------------------- Manipulating Hierarchies ---------------------//

            //Editing an existing hierarchy originally called Category Hierarchy
            Hierarchy hierarchy_category = table_product.Hierarchies["Category Hierarchy Rename"];
            {
                hierarchy_category.Name = "Category Hierarchy Rename"; //this renames the hierarchy, note the lineage tag will remain unchanged
                Console.WriteLine($"Category Hierarchy Renamed");
             }
            
            //Editing an existing hierarchy level 
              Level level_Category = hierarchy_category.Levels.FindByLineageTag("ca792793-d3c3-4b5d-9bee-2b46c01833bb");
              Level level_Subcategory = hierarchy_category.Levels.FindByLineageTag("4304e645-bb8f-4d7e-b25c-f629be2110d8");
              {
                  level_Category.Name = "Category";
                  level_Category.Ordinal = 0;
                  level_Subcategory.Name = "Subcategory";
                  level_Subcategory.Ordinal = 1;
                  Console.WriteLine($"Category Hierarchy Levels Renamed & Reordered");
              }

            //Adding a new level to the hierarchy if it doesn't already exist
              if (hierarchy_category.Levels.ContainsName("Model"))
              {
                  Console.WriteLine($"Hierarchy Level Exists");
              }
              else
              {
                  Level level_Model = new Level()
                  {
                      Name = "Model",
                      Ordinal = 2,
                      Column = table_product.Columns.Find("Model")
                  };
                  hierarchy_category.Levels.Add(level_Model);
                  Console.WriteLine($"Hierarchy Level Added");
              }
          
 
             //Add a new hierarchy if it doesn't already exist
               if (table_product.Hierarchies.ContainsName("New Hierarchy"))
               {
                   Console.WriteLine($"New Hierarchy Exists");
               }
               else
               {
                   Hierarchy hiearchy_new = new Hierarchy()
                   {
                       Name = "New Hierarchy",
                   };
                   table_product.Hierarchies.Add(hiearchy_new);
                   Console.WriteLine($"Hierarchy Added");
             //Creating levels to add to the new hierarchy
                   Level level_one = new Level()
                   {
                       Name = "Model",
                       Ordinal = 0,
                       Column = table_product.Columns.Find("Model")
                   };
                   Level level_two = new Level()
                   {
                       Name = "Product",
                       Ordinal = 1,
                       Column = table_product.Columns.Find("Product")
                   };
                   hiearchy_new.Levels.Add(level_one);
                   hiearchy_new.Levels.Add(level_two);
                   Console.WriteLine($"Levels added to new hiearchy");
               };

         
//-------------------------- Translations ------------------------------//

//Translations can be used to rename existing columns without rebuilding the model. This also updates any visuals that use that column.          

            // List of translations on the model
            foreach (Culture culture in model.Cultures)
            {
                Console.WriteLine($"Existing Culture: {culture.Name}");
            }
            // Let's get a list of the existing translations within the en_US culture
            Culture enUsCulture = model.Cultures.Find("en-US");

            foreach (ObjectTranslation objectTranslation in enUsCulture.ObjectTranslations)
            {
                Console.WriteLine($"Translated Object: {objectTranslation.Value}");
            }
            // Narrow down what column within this culture/language you would like to add the translation to
            MetadataObject dataColumn = table_product.Columns.Find("SKU"); //this needs to always be the original column name within the data model.
            ObjectTranslation proposedTranslation = enUsCulture.ObjectTranslations[dataColumn, TranslatedProperty.Caption];

            // Only one translation per entity per culture.
            if (proposedTranslation != null)
            {
               Console.WriteLine($"Translation Exists for this Culture & Column combo");
               enUsCulture.ObjectTranslations.Remove(proposedTranslation); //need to remove the existing translation to overwrite it
               ObjectTranslation overwriteTranslation = new ObjectTranslation()
               {
                   Object = dataColumn,
                   Property = TranslatedProperty.Caption,
                   Value = "Cool Stuff"
               };
               enUsCulture.ObjectTranslations.Add(overwriteTranslation);
            }
            else
            {
               ObjectTranslation newTranslation = new ObjectTranslation()
               {
                   Object = dataColumn,
                   Property = TranslatedProperty.Caption,
                   Value = "Total Rad"
               };
               enUsCulture.ObjectTranslations.Add(newTranslation);
            }             

           // List out the translations to see what they are now that we have run the script    
           foreach (ObjectTranslation objectTranslation in enUsCulture.ObjectTranslations)
           {
               Console.WriteLine($"Final Translated Object: {objectTranslation.Value}");
           }

//------------------- List out end state --------------------------------//

            // List out the columns in the product table one more time to make sure our column is added
            foreach (Column column in table_product.Columns)
            {
                Console.WriteLine($"Columns: {column.Name}");
            }         

            // List out the measures in the product table one more time to make sure our measure is added
            foreach (Measure productmeasures in table_product.Measures)
            {
                Console.WriteLine($"Measures: {productmeasures.Name}");
            }

            // List out the hierarchies in the product table
            foreach (Hierarchy hierarchy in table_product.Hierarchies)
            {
                Console.WriteLine($"Hierarchies: {hierarchy.Name}, Lineage Tag: {hierarchy.LineageTag}");
                foreach (Level level_hierarchy in hierarchy.Levels)
                {
                    Console.WriteLine($"Level {level_hierarchy.Ordinal}: {level_hierarchy.Name}, Lineage Tag: {level_hierarchy.LineageTag} from {level_hierarchy.Column.Name}");
                }
            }

//-------------- Refresh our version of the data model then push/save changes back to the model in the PBI service -------------//
            table_product.RequestRefresh(RefreshType.Full);
            model.RequestRefresh(RefreshType.Full);
            model.SaveChanges(); 
            //make sure this is the last line! Note, this line will not work unless your dataset has proper data source credentials connected in the service

            Console.WriteLine($"Script Complete!");


        }
    }
}

Power BI: Data Quality Checks Using Python & SQL

Picture this, you have a report in Power BI that someone passes off to you for data quality checks. There are a few ways to make sure your measures match what is in the source data system, but for this demo we are going to use python and excel to perform our data quality checks in one batch. In order to do that, we are going to build a python script that can run Power BI REST APIs, connect to a SQL Server, and connect to Excel to grab the formulas and to push back the quality check into Excel for final review. To find a sample Excel and the final python script, please refer to my GitHub.

To build this, you’ll need a few pieces:

  1. A way to run Power BI REST API’s
    • Requires an Azure AD App – if you can’t make one then you will need someone in your company to create one and give you the client ID for it and credentials to use it with
  2. Ability to run python scripts (Visual Studio Code will be used in this demo)
  3. Access to a Power BI dataset that is published to the Power BI service (Premium is not needed)
  4. Access to a SQL Server
  5. A list of DAX functions that you want to test in Excel
  6. A list of SQL queries that should have the same result as your DAX functions in Excel

Authentication

To start, you’ll need to register an application for Power BI within Azure. First, create & register your app in the Azure portal. This link will take you through the manual process step-by-step including how to add permissions so the application can access the PBI service. You’ll need to give your app the delegated permission of Dataset.ReadWrite.All or Dataset.Read.All as well as Tenant.Read.All or Tenant.ReadWrite.All which will require an admin to grant (check the required scope in API documentation). One important limitation for accessing the Azure AD app, the user that you use to access it must have no MFA on it. If that makes your IT security team squeemish, remind them that the app can be given read only access to only Power BI’s metadata. No proprietary data is at risk and write back is not necessary to accomplish our goals.

Let’s test out running the REST API using python scripts! To run this in python, we will need to generate an access token then call the REST API using the access token acquired earlier.

Adding Libraries in Python

If you have not used some of the libraries that will be imported for this demo, you will need to run “pip install [library name]” in your command prompt. If you don’t have pip, follow directions on this link and/or modify your Python application to include pip. Newer versions often have this enabled for ya by default, but triple check that you have added python to your environment variables or else you will get errors saying Python isn’t installed when you try to run stuff in the command prompt.

Getting the REST API Token

Alrighty, let’s go to Visual Studio Code and build out the authentication portion of our Python. We are going to use the adal library to make a REST API call that will get the token for us. Make sure to put single or double quotes around your Client ID, Username, and Password to format them as text. Keep in mind, our final code will get these values from the excel file instead of hard coding them into our code. That means you can also get these values from a separate Excel file if your IT team would prefer to keep these values out of the final code (which is highly recommended).

import adal
authority_url = 'https://login.windows.net/common'
resource_url = 'https://analysis.windows.net/powerbi/api'
client_id = <INSERT CLIENT ID>
username = <INSERT USERNAME>
password = <INSERT PASSWORD>
context = adal.AuthenticationContext(authority=authority_url,
                                     validate_authority=True,
                                     api_version=None)
token = context.acquire_token_with_username_password(resource=resource_url,
                                                     client_id=client_id,
                                                     username=username,
                                                     password=password)
access_token = token.get('accessToken')
print(access_token)

Awesome, now we have a way to generate an access token to use for our future API calls!

Alrighty, let’s try this out with a simple REST API call – getting a list of the workspaces (called groups in the API for some reason) in my tenant. I only have a couple workspaces, but this will let us test out passing our token to get back a response. Add the following lines to the end of the code above:

get_workspaces_url = 'https://api.powerbi.com/v1.0/myorg/groups'
header = {'Authorization': f'Bearer {access_token}'}
r = requests.get(url=get_workspaces_url, headers=header)
r.raise_for_status()
print(r.text)

You should see something similar to the screenshot below. You’ll notice there are a lot of elements in this response, but we can look at the name section and confirm that the code is working as expected. Exciting stuff!

Connecting to a SQL Server database

To connect to SQL, we are going to use the pymssql library. See below for the format we will need to connect to our SQL Server database and run a sample query.

#SQL Server Authentication
import pymssql 

sql_server = 'server.database.windows.net'
sql_user = 'Sample'
sql_password= 'Password'
sql_database = 'DatabaseName'

sql_con = pymssql.connect(sql_server,sql_user,sql_password,sql_database)
sql_cursor = sql_con.cursor(as_dict=True)

#execute SQL script
sql_cursor.execute('SELECT TOP 1 SalesKey FROM dbo.FactSales')
for row in sql_cursor: 
    print(row)

Your result will be a dictionary with the column name then the value. This will be important when we try to compare it to our Power BI DAX query result.

Comparing DAX Query with SQL Query

Alrighty now to the main event! Running a DAX query against our Power BI data model and comparing it to a SQL query to ensure our measure is working as expected. For this demo, I’ll keep it pretty simple. I have a measure in my dataset, Sales Amount = SUM(FactSales[SalesAmount]). This should equal the result from the SQL query SELECT SUM(SalesAmount) ‘Sales Amount’ FROM FactSales. If you have errors using the execute queries REST API, please review the limitations section of the Microsoft Execute Queries Documentation.

Python Code to Run SQL Script

#SQL Script Execution
sql_cursor.execute('SELECT SUM(SalesAmount) as SalesAmount FROM dbo.FactSales')
for row in sql_cursor: 
    print(row["SalesAmount"])

Python Code to Run PBI Execute Queries REST API

#PBI Execute Queries 
pbi_dataset_id = '10c3f357-9b2a-42f4-8438-748295b47b9b'
execute_queries_url = f'https://api.powerbi.com/v1.0/myorg/datasets/{pbi_dataset_id}/executeQueries'
dax_query_json_body = {
                "queries":
                        [
                            {
                            "query": 'EVALUATE ROW("Sales_Amount", \'_Measures\'[Sales Amount])'
                            }
                        ]
                        
                }
eq = requests.post(url=execute_queries_url, headers=header,json=dax_query_json_body)
eq.raise_for_status()
query_json = eq.text.encode().decode('utf-8-sig')
print("JSON Response ", json.loads(query_json))

Awesome! Looks like now I can visually confirm that my Power BI DAX measure returns the same result as my SQL query. But what if I want to do more than one at a time? In the next section, we will flip the script to use an Excel file as our input and our output. That way, all you would need to do is have two columns for input – one with the SQL query and one with the corresponding DAX formula. Then the output would be a third column that returns the difference between the two (ideally, 0).

Use Excel File as Input and Output

To start, we are going to build an Excel template to use with this script. For ease of use, we are going to use formatted tables so end users can append new measures as needed (a copy is available on our GitHub for download). We will make two sheets- one for credentials and one for the data quality checks. On the credentials sheet, make one table for the SQL credentials and for the PBI credentials. I’m going to name the table “Creds”.

On the Quality Check tab, we will make one more table that has a column for Power BI DAX formula, the SQL query, the result of the DAX formula, the result of the SQL query, and the difference between the two. Let’s avoid using spaces in our column and table names to keep our coding simple. We’ll call this table “Quality”.

Alrighty, now to pull these values into python! Let’s start with grabbing our creds and pulling them in. For this section, we need to import the openpyxl library and use it to pull in our tables. To start, we are going to grab our file then go to the worksheet named “Credentials”. From there, we will see what tables are in that worksheet.

#Grab data from our excel file
from openpyxl import load_workbook

#Read file
wb = load_workbook(r"C:\Users\KristynaHughes\Downloads\PythonDataQualityChecker.xlsx") 
#we need the r at the beginning since "\" is a special character in python
sheets = wb.sheetnames
print(sheets) #see the sheets in our workbook
#access specific sheet called Credentials
cred_ws = wb['Credentials']
print(cred_ws.title)
cred_tables = cred_ws.tables.values()
for table in cred_tables:
    print(table.displayName)

Next, we are going to use the mapping function to create an object that contains a dataframe for our excel file. If you’re not familiar with python, a dataframe is essentially Python’s version of a SQL view or Excel table that can store and manipulate tabular data. Once we get our two tables into dataframes, we can start to pull out values as inputs for our earlier equations.

Don’t worry, at the end of this blog is the entire final script.

cred_mapping = {} #this will help us build a dataframe from our table
for entry, data_boundary in cred_ws.tables.items():
    #parse the data within the ref boundary
    data = cred_ws[data_boundary]
    #extract the data 
    #the inner list comprehension gets the values for each cell in the table
    content = [[cell.value for cell in ent] #ent makes our code cleaner by handling nested data sources
               for ent in data
          ]
    header = content[0]
    #the contents excluding the header (aka column names)
    rest = content[1:]
    #create dataframe with the column names
    #and pair table name with dataframe
    cred_df = pandas.DataFrame(rest, columns = header)
    cred_mapping[entry] = cred_df
print(cred_df)

server = cred_df._get_value(0,"SQL_Server")
db = cred_df._get_value(0,"SQL_Database")
print(server)
print(db)

Sweet, now we can get our SQL_Server and SQL_Database values directly from the spreadsheet instead of hard coding it in Python! To code out the rest of the variables, simply copy then replace “server” and “SQL_Server” in the code above with the variable name and column name that you want to return.

Alrighty, now we need to pull in our data quality spreadsheet and set up some for loops that can help iterate through the list of Power BI and SQL queries and return some outputs to our Excel sheet.

Note: Your SQL query will need to have every column named or else you will get an error when you run the cursor over it. Also, for this section we will need to import a couple more libraries. If you don’t already have win32.com installed, you’ll need to run pip install pypiwin32 in your command prompt.

This script is pretty awesome to see work because we will finally get some values back into our Excel file! The first section will use the win32com library to essentially open excel in edit mode instead of read only like we have been doing. After we open the worksheet to edit it, we will grab our SQL queries then iterate through them to get the results from the query and put it back into Excel. If that doesn’t get the nerd in you excited, I’m not sure what will!

from win32com.client import Dispatch; import os
#Using the win32com library, we can open excel in edit mode
xl = Dispatch("Excel.Application") #opens excel for us to edit
xl.Visible = True
edit_wb = xl.Workbooks.Open(excel_file)
edit_qc_sh = edit_wb.Worksheets("Quality Check")

#Running SQL query and putting result back into Excel
sql_df = quality_df.dropna() #removing any blank rows so we don't run blank sql queries lol
sql_queries = sql_df.loc[:,"SQL_Query"] 
#the first argument (:) signifies which rows we want to index (: indicates all columns), and the second argument lets us index the column we want
#print(sql_queries) #this returns the rows in our sql query column
rownumber = 1
for query in sql_queries:
    sql_cursor.execute(str(query))
    #print (sql_cursor.fetchall()[0])
    for row in sql_cursor:
        #print(row) #this is our dictionary created from the sql query result
        key = list(row.items())[0]
        rownumber += 1 #puts our sql responses in the right rows
        sql_result = key[1]
        quality_df.at[rownumber,"SQL_Result"] = sql_result #this will put our results into the proper cell in our dataframe
        edit_qc_sh.Range(f"D{rownumber}").Value = sql_result #this will put our results in the right excel cell
        #print(key[1]) #returns just our result values
        
edit_wb.Save() #saving our values back to excel

Okay, time to do the same thing with our DAX queries and wrap this project up. You may have noticed that the DAX queries in the sample data is not the exact same you would use in your measure. What we are doing is telling DAX how we want the data back, in this case we want it to return one row for us (hence, EVALUATE ROW at the beginning) based on measures I have in a table called “_Measures”. If you aren’t sure how to make the DAX query work for this API, feel free to put the measure you want to test in a card visual then use the performance analyzer to get the query. A good way to see if your DAX query will work is to test it out in DAX Studio.

Python time! We are going to tweak our code from earlier by renaming our variable “header” to “pbiheader” so we can reference it more clearly in our JSON request code. Again, feel free to skip to the end of this post for the final code if you have any questions (or need this to just work and fast). First, we will need all the pieces to execute our post script. Once we have the URL, the DAX query, and the json body, we can iterate through the list of queries from our Excel file.

#Running queries and putting results back into Excel
qdf = quality_df.dropna() #removing any blank rows so we don't run blank sql queries lol
sqlrownumber = 1
pbirownumber = -1

#Running SQL Queries
sql_queries = qdf.loc[:,"SQL_Query"] 
for query in sql_queries:
    sql_cursor.execute(str(query))
    #print (sql_cursor.fetchall()[0])
    for row in sql_cursor:
        #print(row) #this is our dictionary created from the sql query result
        key = list(row.items())[0] #gets the first item from the row dictionary
        sqlrownumber += 1 #puts our sql responses in the right rows
        sql_result = key[1] #grabs just the result of our query
        quality_df.at[sqlrownumber,"SQL_Result"] = sql_result #this will put our results into the proper cell in our dataframe
        edit_qc_sh.Range(f"D{sqlrownumber}").Value = sql_result #this will put our results in the right excel cell
        #print(key[1]) #returns just our result values
        
#Running PBI Queries
pbi_queries = quality_df.loc[:,"PBI_DAX_Query"]
execute_queries_url = f'https://api.powerbi.com/v1.0/myorg/datasets/{pbidatasetid}/executeQueries'
for items in pbi_queries:
    pbirownumber += 1
    pbiexcelrownumber = pbirownumber+2
    list_pbiquery = list(pbi_queries.items())[pbirownumber]
    #print(list_pbiquery)
    item_pbiquery = list_pbiquery[1]
    dax_query_json_body = {
                "queries":
                        [
                            {
                            "query": item_pbiquery
                            }
                        ]  
                }
    pbi_response = requests.post(url=execute_queries_url, headers=pbiheader,json=dax_query_json_body)
    query_json = pbi_response.text.encode().decode('utf-8-sig') #allows us to read the json response
    pbi_values = query_json.replace("}]}]}]}","") #drops the trailing encoding from json
    #print(query_json)
    pbi_result = pbi_values.split(":")[4] #grabs just the result of our query
    print(pbi_result) #grabs just the result of our query
    quality_df.at[pbiexcelrownumber,"PBI_Result"] = pbi_result #this will put our results into the proper cell in our dataframe
    edit_qc_sh.Range(f"C{pbiexcelrownumber}").Value = pbi_result #this will put our results in the right excel cell

edit_wb.Save() #saving our values back to excel

Pretty great! Now we have a functioning data quality testing tool where Excel holds all our inputs and outputs and all we need to do in Python is hit Run.

Final Query

'''---------------README------------------
Full instructions available: https://dataonwheels.wordpress.com/?s=Power+BI%3A+Data+Quality+Checks+Using+Python+%26+SQL
The goal of this script is to compare DAX measures to corresponding SQL queries. 
To accomplish this, it uses an excel file containing authentication variables and query values.
This idea came from seeing the Execute Queries REST API in Power BI and wanting to use python to perform our data quality checks efficiently.
To connect to a Power BI Data Model, we need to pass an authentication token generated through an Azure AD App. 
To run a DAX query and SQL query, we need to loop through our excel file then put the queries into the API and SQL calls respectively.
Finally, this script takes those query results and puts it back into the excel file.

I have left some print() lines commented out, feel free to uncomment them to troubleshoot or walk through the code step-by-step.

You will need to swap out the excel_file variable with your excel file path. Other than that, the rest of the variables are managed inside your excel file. 
Once you have your variables and queries in excel, hit run and you're good to go. 
'''

#---------------------------------------#
#       Import libraries needed  
#---------------------------------------#
import requests
import adal
import json
import pymssql 
import pandas
import openpyxl
from win32com.client import Dispatch; import os

#---------------------------------------#
#    Build out authentication steps  
#---------------------------------------#

#----- Authentication Variables from Excel -----#

#Grab authentication data from our excel file
from openpyxl import load_workbook
#we need the r at the beginning since "\" is a special character in python
excel_file = r"C:\Users\KristynaHughes\Downloads\PythonDataQualityChecker.xlsx"
wb = load_workbook(excel_file) 
sheets = wb.sheetnames
#print(sheets) #see the sheets in our workbook
cred_ws = wb['Credentials'] #access specific sheet called Credentials
cred_mapping = {} #this will help us build a dataframe from our table
for entry, data_boundary in cred_ws.tables.items():
    #parse the data within the ref boundary
    data = cred_ws[data_boundary]
    #extract the data 
    #the inner list comprehension gets the values for each cell in the table
    content = [[cell.value for cell in ent] #ent makes our code cleaner by handling nested data sources
               for ent in data
          ]
    header = content[0] #the contents excluding the header (aka column names)
    rest = content[1:] #create dataframe with the column names
    cred_df = pandas.DataFrame(rest, columns = header) #this is the dataframe we will use to get our creds
    cred_mapping[entry] = cred_df
#Use the dataframe to set up credential variables we can use later in the script
sqlserver = str(cred_df._get_value(0,"SQL_Server"))
sqldb = str(cred_df._get_value(0,"SQL_Database"))
sqluser = str(cred_df._get_value(0,"SQL_User"))
sqlpassword = str(cred_df._get_value(0,"SQL_Password"))
pbiclientid = str(cred_df._get_value(0,"PBI_ClientID"))
pbiusername = str(cred_df._get_value(0,"PBI_Username"))
pbipassword = str(cred_df._get_value(0,"PBI_Password"))
pbidatasetid = str(cred_df._get_value(0,"PBI_DatasetID"))
#check to make sure your variables are correct by uncommenting the next line
#print(sqlserver,sqldb,sqluser,sqlpassword,pbiclientid,pbiusername,pbipassword,pbidatsetid)


#----- Power BI REST API Authentication -----#

authority_url = 'https://login.windows.net/common'
resource_url = 'https://analysis.windows.net/powerbi/api'
context = adal.AuthenticationContext(authority=authority_url,
                                     validate_authority=True,
                                     api_version=None)
token = context.acquire_token_with_username_password(resource=resource_url,                                                     
                                                     username=pbiusername,
                                                     password=pbipassword,
                                                     client_id=pbiclientid)
access_token = token.get('accessToken')
pbiheader = {'Authorization': f'Bearer {access_token}'}
#try out the get workspaces REST API using our access token by uncommenting the next few lines
#get_workspaces_url = 'https://api.powerbi.com/v1.0/myorg/groups'
#r = requests.get(url=get_workspaces_url, headers=header)
#r.text will give us the response text for our get request, pretty neat!
#print(r.text)


#----- SQL Server Authentication -----#

try:
    sql_con = pymssql.connect(sqlserver,sqluser,sqlpassword,sqldb)
    sql_cursor = sql_con.cursor(as_dict=True)
except Exception as e:
    raise Exception(e)


#---------------------------------------#
#  Build out data quality check steps 
#---------------------------------------#

#----- Read excel to get quality check queries into a dataframe -----#

quality_ws = wb['Quality Check']
quality_mapping = {} #this will help us build a dataframe from our table
for entry, data_boundary in quality_ws.tables.items(): #grabs data dynamically from our table
    data = quality_ws[data_boundary] #parse the data within the ref boundary
    #the inner list comprehension gets the values for each cell in the table
    content = [[cell.value for cell in ent] #ent makes our code cleaner by handling nested data sources
               for ent in data
          ]
    header = content[0] #the contents excluding the header (aka column names)
    rest = content[1:] #create dataframe with the column names
    qualitydf = pandas.DataFrame(rest, columns = header) #this is the dataframe we will use to get our quality check queries
    quality_df = qualitydf.fillna(' ') #helps remove blank records from our excel file
    quality_mapping[entry] = quality_df
#print(quality_df)


#----- Open excel file in edit mode -----#

xl = Dispatch("Excel.Application") #opens excel for us to edit
xl.Visible = True
edit_wb = xl.Workbooks.Open(excel_file)
edit_qc_sh = edit_wb.Worksheets("Quality Check")

#----- Set variables to use in our iterators -----#

qdf = quality_df.dropna() #removing any blank rows so we don't run blank sql queries lol
sqlrownumber = 1
pbirownumber = -1

#----- Run SQL queries and put results back into excel -----#

sql_queries = qdf.loc[:,"SQL_Query"] 
for query in sql_queries:
    sql_cursor.execute(str(query))
    #print (sql_cursor.fetchall()[0])
    for row in sql_cursor:
        #print(row) #this is our dictionary created from the sql query result
        key = list(row.items())[0] #gets the first item from the row dictionary
        sqlrownumber += 1 #puts our sql responses in the right rows
        sql_result = key[1] #grabs just the result of our query
        quality_df.at[sqlrownumber,"SQL_Result"] = sql_result #this will put our results into the proper cell in our dataframe
        edit_qc_sh.Range(f"D{sqlrownumber}").Value = sql_result #this will put our results in the right excel cell
        #print(key[1]) #returns just our result values

#----- Run PBI DAX queries and put results back into excel -----#

pbi_queries = quality_df.loc[:,"PBI_DAX_Query"]
#print(pbi_queries)
execute_queries_url = f'https://api.powerbi.com/v1.0/myorg/datasets/{pbidatasetid}/executeQueries'
for items in pbi_queries:
    pbirownumber += 1
    pbiexcelrownumber = pbirownumber+2
    list_pbiquery = list(pbi_queries.items())[pbirownumber]
    #print(list_pbiquery)
    item_pbiquery = list_pbiquery[1]
    dax_query_json_body = {
                "queries":
                        [
                            {
                            "query": item_pbiquery
                            }
                        ]  
                }
    pbi_response = requests.post(url=execute_queries_url, headers=pbiheader,json=dax_query_json_body)
    query_json = pbi_response.text.encode().decode('utf-8-sig') #allows us to read the json response
    pbi_values = query_json.replace("}]}]}]}","") #drops the trailing encoding from json
    #print(query_json)
    pbi_result = pbi_values.split(":")[4] #grabs just the result of our query
    #print(pbi_result) #grabs just the result of our query
    quality_df.at[pbiexcelrownumber,"PBI_Result"] = pbi_result #this will put our results into the proper cell in our dataframe
    edit_qc_sh.Range(f"C{pbiexcelrownumber}").Value = pbi_result #this will put our results in the right excel cell

#----- Save our changes back to the excel file -----#

edit_wb.Save() #saving our values back to excel
print("All done")

Additional Resources:

This was my first experience with Python, and below are all the links that helped me build out this demo and learn more about how Python operates. Hope you find these as useful as I did!

Power BI: Dynamically Removing Errors From Columns in M

If you have excel data, or user-entered data, you have likely experienced frustration from repeatedly seeing the error message below upon hitting “Close & Apply” or refreshing your data model. It can be extremely frustrating to see failed refresh emails come in every time a user types in a text value in a number field or adds a formula to the report that results in #N/A.

End users that are not trained in data governance but are actively involved in maintaining a data set can easily make data entry mistakes or create inconsistent data types within columns. For example, you may have a column in the dataset called “Sales” and instead of 0, someone may type “None” or “NA”. When this gets loaded into Power BI, Power BI will not know how to convert the text value “None” to a number, and it will throw an error on the refresh of the report.

One way to mitigate the impact of user-entered data is to replace errors with null values. This is not ideal since it doesn’t fix the data entry issues, but it does enable reports to still be refreshed and used while the data issues are addressed. In Power Query, you can manually replace the errors with null by going to the “Transform” tab then selecting the drop down for “Replace Values” and choosing “Replace Errors”.

After selecting “Replace Errors”, type null (all lowercase) into the value field to replace your errors with a null value that will allow the report to refresh no matter the data type.

As you can imagine, it can get quite tedious to apply this step to every column in every query in your report. Good news! I have a query that you can add to the end of your applied steps that will make sure every single column in your query will replace errors with nulls. Let’s walk through it.

Final Query:

let
    Source = Excel.Workbook(File.Contents("C:\Users\KristynaHughes\OneDrive - DataOnWheels\GitHub\AdventureWorks Sales w Errors.xlsx"), null, true),
    Sales_data_Sheet = Source{[Item="Sales_data",Kind="Sheet"]}[Data],
    #"Promoted Headers" = Table.PromoteHeaders(Sales_data_Sheet, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"SalesOrderLineKey", Int64.Type}, {"ResellerKey", Int64.Type}, {"CustomerKey", Int64.Type}, {"ProductKey", Int64.Type}, {"OrderDateKey", Int64.Type}, {"DueDateKey", Int64.Type}, {"ShipDateKey", Int64.Type}, {"SalesTerritoryKey", Int64.Type}, {"Order Quantity", Int64.Type}, {"Unit Price", type number}, {"Extended Amount", type number}, {"Unit Price Discount Pct", Int64.Type}, {"Product Standard Cost", type number}, {"Total Product Cost", type number}, {"Sales Amount", type number}, {"Large Sales", type number}}),
    //ColumnsInTable grabs all the column names in your existing table dynamically
    ColumnsInTable = Table.ColumnNames(#"Changed Type"), 
    //Converting this list of columns to a table lets us add a column for what we will replace errors with
    #"Converted to Table" = Table.FromList(ColumnsInTable, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    //This adds a column to show what we will replace errors with
    #"Added ReplacementColumn" = Table.AddColumn(#"Converted to Table", "Replacement", each null),
    #"Transposed Table" = Table.Transpose(#"Added ReplacementColumn"),
    //Transforms the table back into a list so we can use it in our replace error step
    ColumnList = Table.ToColumns(#"Transposed Table"),
    //Takes our last step in our original table called Changed Type and replaces all errors in all our columns with null
    #"Replaced Errors" = Table.ReplaceErrorValues(#"Changed Type",ColumnList)
in
    #"Replaced Errors"

In the example above, #”Changed Type” is the last step in my original query but this resulted in some errors in my table (see screenshot below).

In order to apply a replaced error step to all the columns in the table, we need to understand a bit more about the Table.ReplaceErrorValues function. This function is comprised of two arguments: (table as table, errorReplacement as list). The table portion will be the name of our last step because that is the final table that we want to manipulate (#”Changed Type” in this example). The error replacement list will require two values – the column name and the value we want to replace errors with. To build the list, follow the steps below.

  1. Table.ColumnNames(#”Changed Type”) : this function will look at the last step in your query (#”Changed Type” in this example) and generate a list of the columns

2. Table.FromList(ColumnsInTable, Splitter.SplitByNothing(), null, null, ExtraValues.Error) : this function converts our list of columns into a table. The reason we are converting a list to a table then back to a list is so we can easily add in a column of nulls to our list.

3. Table.AddColumn(#”Converted to Table”, “Replacement”, each null) : this function adds a column called “Replacement” that contains the null values that we want to replace our errors with.

4. Table.Transpose(#”Added ReplacementColumn”) : transposing the table will make it easier to put our values into a list appropriately. We need the final result to be a comma separated record that has the column name then the replacement null value.

5. Table.ToColumns(#”Transposed Table”) : this function flips our transposed table back into a table full of lists that we can use in our final function. By clicking on the cell in our list of lists, you can see that the list is comprised of our column name and our replacement value. Pretty handy for our replaced error function.

6. Table.ReplaceErrorValues(#”Changed Type”,ColumnList) : time to replace our errors. Notice that we have #”Changed Type” as our first argument. This will force Power Query to grab our original last step and use that version of our table for the function. ColumnList is the name of the step that contained our list of lists with our column names and null values inside.

And there ya have it! To use the query above, you’ll need to swap #”Changed Type” with whatever the name is of your final step in your original query. Pretty handy when dealing with user-entered data, but keep in mind the ideal solution would be to put more data governance around the original data entry. For example, a simple application that forces people to enter the correct data type or else it throws an error or a drop down selection with valid options (this would be a great option for zip codes or states) would prevent data entry issues in the first place and raise the quality of your data.

Additional Resources