Dynamically Unpivot columns in SQL

Picture this, your data ingestion team has created a table that has the sales for each month year split into different columns. At first glance, you may think “what’s the big deal? Should be pretty easy, right? All I need to do is unpivot these columns in Power BI and I’m good to go.” So you go that route, and the report works for one month. Next month, you get an urgent email from your stakeholders saying they can’t see this month’s numbers. That’s when you realize that this table will grow with new columns every month. That means that any report you make needs a schema refresh every single month. Unfortunately, Power BI will not grab new columns from a table once it’s published into the online service. The only way for the Power Query to pivot the new columns is for you to open the report in your desktop, go to Power Query, and refresh the preview to get all the columns in that table.

If you’re like me, you may not have time (or the will) to manually download, fix, and reupload this report each month. But have no fear! SQL is here! The unpivot function in SQL can feel like a black box, designed for only black belts in SQL to use, but we will break it down together and create a dynamic unpivot script.

What does unpivot in SQL accomplish? Unpivot takes multiple rows and converts them to columns. The general syntax is as follows:

SELECT 
originalColumn1, originalColumnHeaders, originalValues
FROM 
(SELECT originalColumn1, originalColumn2, originialColumn3, originalColumn4 FROM dbo.Table) T 
UNPIVOT (originalColumnHeaders FOR originalValues in (originalColumn2, originialColumn3, originalColumn4)) as UP

Let’s break this down.

The first SELECT statement contains the column we don’t want to unpivot plus the two names of what we want our final columns to be. For example, using our table below, we want to keep the country column as is, but we want to unpivot those sales months and the sales values. So, we may name the originalColumnHeaders as SalesMonth and originalValues as Sales. You can use whatever you’d like, but it has to match what’s in the UNPIVOT statement.

The SELECT after FROM tells the query where to get data from. By default, SQL requires this to be hard-coded, but don’t worry, we have a way for that to be dynamic.

The UNPIVOT statement requires us to provide two new column names, the first will be what you want the column full of the old column headers to be called. In our case, this will be SalesMonth. The second will be the column full of the old values within those columns (aka Sales).

The hard-coded version of our example would look like this:

SELECT 
country, SalesMonth, Sales
FROM 
(SELECT country, sales_jan_2024, sales_feb_2024, sales_mar_2024, sales_apr_2024, sales_may_2024 FROM dbo.Table) T 
UNPIVOT (SalesMonth FOR Sales in (sales_jan_2024, sales_feb_2024, sales_mar_2024, sales_apr_2024, sales_may_2024)) as UP

The result of this code looks pretty good! Now we can easily pull the month and year out from SalesMonth and have some data we can trend. But we still have that pesky hard coded problem.

To get around the hard coding, we have to do some SQL jujitsu. We will use a couple of variables that can be dynamically populated then execute the SQL as a variable using sp_ExecuteQueries. This is a bit easier to explain with in-code notes, so feel free to look through this query and read the notes that go along with it for details on how it works.


--Use this script to create the table for the demo
/* --Comment this line out to run the create table code
DROP TABLE IF EXISTS dbo.PivotedSales
CREATE TABLE dbo.PivotedSales (
	country VARCHAR(250),
	sales_jan_2024 DECIMAL(17,3),
	sales_feb_2024 DECIMAL(17,3),
	sales_mar_2024 DECIMAL(17,3),
	sales_apr_2024 DECIMAL(17,3),
	sales_may_2024 DECIMAL(17,3)
);
INSERT INTO dbo.PivotedSales (country, sales_jan_2024, sales_feb_2024, sales_mar_2024, sales_apr_2024, sales_may_2024)
VALUES 
	('South Africa', 111.22, 222.33, 333.44, 444.55, 555.66),
	('Canada', 112.22, 227.33, 332.44, 400.55, 500.66),
	('United States', 113.22, 228.33, 330.44, 401.55, 501.66),
	('Mexico', 114.22, 229.33, 334.44, 404.55, 504.66),
	('Ireland', 115.22, 230.33, 335.44, 409.55, 509.66),
	('Germany', 116.22, 231.33, 336.44, 499.55, 599.66),
	('South Africa', 1011.22, 2022.33, 3303.44, 4044.55, 5505.66),
	('Canada', 1102.22, 2027.33, 3302.44, 4000.55, 5000.66),
	('United States', 1103.22, 2280.33, 3030.44, 4001.55, 5001.66),
	('Mexico', 1104.22, 2209.33, 3034.44, 4004.55, 5004.66),
	('Ireland', 1105.22, 2300.33, 3305.44, 4009.55, 5009.66),
	('Germany', 1106.22, 2310.33, 3036.44, 4909.55, 5909.66);

--*/

--Original table
SELECT * FROM PivotedSales


--Set up parameters
;DECLARE @UnpivotColumns NVARCHAR(MAX), @FilterPefix NVARCHAR(MAX) 
-- FilterPrefix is the prefix that all the columns we want to pivot have. This will allow us to dynamically grab any new columns as they get created
SET @FilterPefix = 'sales%' --Note, you can adjust this to be a suffix or a middle string if needed by moving the % (wildcard)


--This section sets our @Unpivot column variable to be a comma separated list of the columns in our table with the FilterPrefix
SELECT @UnpivotColumns = STRING_AGG(CONVERT(NVARCHAR(MAX), C.name),',')
FROM sys.columns C 
INNER JOIN sys.types T ON T.user_type_id = C.user_type_id
WHERE C.object_id = object_id('dbo.PivotedSales') --this ensures we only get columns from our table
	AND C.name LIKE ''+@FilterPefix+'' --this makes sure only columns with the filter prefix are returned
	AND (T.name = 'decimal' OR T.name LIKE '%int%') --this ensures we only grab columns with a decimal or int type. You can adjust this if needed

SELECT @UnpivotColumns AS 'Unpivot Columns'


--This section creates a dynamic SQL statement using the comma separated list we just generated
DECLARE @ExecuteSQL NVARCHAR(MAX)
SET @ExecuteSQL = '
SELECT Country, SalesMonth, Sales
FROM 
	(SELECT Country, ' + @UnpivotColumns + ' FROM dbo.PivotedSales) P
	UNPIVOT
	(Sales FOR SalesMonth IN (' + @UnpivotColumns + ')) as S'
SELECT @ExecuteSQL AS 'Dynamic SQL Script' --this will show you the SQL statement we've generated


--Finally, we will use the system stored proc sp_executesql to execute our dynamic sql script
EXECUTE sp_executesql @ExecuteSQL

Now that is a lot of code, so be sure to look at each chunk and make sure you adjust it for your use case. Happy coding folks!

Huge credit for this solution goes Colin Fitzgerald and Miranda Lochner! Thank you both for sharing this with me!

SQL Bits 2023 Recap

I had an amazing time adventuring to Wales with other members of the data community last week! It was my first international conference, and I had heard so many incredible things about it, but still it blew away all my expectations. I’m so honored to have met so many incredible people throughout the week! Thank you to new friends and old for making it a conference to remember.

Below are the links to GitHub for my session as well as some notes from the sessions I attended. Disclaimer, these notes are not even close to the real thing. Be sure to check out these sessions if you see them being held at your local user group or SQL Saturday in the future! I tend to take very bullet point oriented notes to best organize my thoughts, hopefully these spark your interest and guide you to professionals who know about specific technologies within the data realm.

Check out the agenda and feel free to reach out to folks with topics you’re interested in! They may have their slide deck easily accessible to folks who may not make it to the conference.

Power BI Meets Programmability with Yours Truly

Thank you so much to everyone who made it out to my session!

It was incredible to have 48 people in person and another 21 virtual! It was a humbling experience to see that room fill up and know they were all there to learn something new and exciting to help with their day-to-day work. Can’t wait to look through the feedback and learn where to tweak my presentation to make an even bigger impact in the future!

  • Github with slide deck, sample C# code, and sample PBIX file (with Excel file that powers it): Github Link
  • Category of blog posts related to TOM and C#: https://dataonwheels.wordpress.com/category/c-tom-and-power-bi/
  • Fun tip someone showed me after the session:
    If you use /* and */ to comment out chunks of code (like I do frequently during the session), you can use /* and –*/ so that you only need to comment out the /* to run that block. Pretty neat!

Supercharge Power BI with Azure Synapse Analytics with Mathias Halkjaer

Limitations of PBI

  • Source load minimization doesn’t exist in most cases (lots of queries against the same source)
  • Out of sight transformation layer
  • Not best tool for data warehouse tasks
  • No logs or output from quality checks
  • Doesn’t handle humongous data very well

Enchantments needed for PBI from Synapse

  • Time-travel (historical data & change tracking)
  • Enlarge (massive scale)
  • Counter spell (reverse ETL)
  • Wish (supports multiple programming languages)
  • Divination (AI)
  • Regenerate (CI/CD and source control)

Azure Synapse

  • A toolbox full of tools: serverless sql, data warehouse, spark engine, etc.

Data Lake

  • Landing zone for data
  • Cheap
  • Redundant
  • Quick and dirty analytics
  • Performance
  • Most efficient way to load data into data warehouse
  • Easily consumed

Report Design & Psychology (Quick Tips) with Jon Lunn

Dual process theory: intuitive vs attentive

You can mentally process up to 15 items intuitively. Past that it gets pushed to attentive.

Things can move from system 2 (attentive) to system 1 (intuitive) with repetition (aka driving or typing).

Layout – left to right, top to bottom. Move most important KPI to top left. Remove distractions (excess colors, 3d visuals, pie charts, etc). Titles in upper left make it easier to know what’s contained within the visual.

Tip – to visualize dots use dice patterns to move it to a system 1 process.

Ratios (how many, relative) vs percentages (how likely, chance risk probability). Both are good options to contextualize a number more. We usually understand ratios much better because how many is more intuitive than how likely.

Intuitive systems are more prone to errors or bias. If you want to hide something, go for percentage.

Use the power of 10’s (aka 8 out of 10 preferred it). Round and use a ratio to move it into intuitive process.

Power BI Datamarts What, How, Why with Marthe Moengen

You can manage RLS roles within datamart, join tables via GUI, and join with similar UI to PBI desktop modeling view. In preview only. You can use as a SQL endpoint to connect to your datamart.

Why? Access data through a SQL endpoint. Do simple aggregation using SQL script instead of M. There are potential challenges since this is in preview.

It keeps getting easier to move Oracle and Open Source workloads to Azure with David Levy

  • Cloud migration phases:
    • Discover
    • Assess
    • Migrate
    • Optimize
    • Secure & Manage
  • Azure Migrate: central hub of tools for datacenter migration
    • Primary tool for discover and assess
    • Azure Database Migration Service – Gen 2 has much more control for end users to say how many resources that item gets
  • Use Oracle Assessments in Azure Data Studio to get recommendations for sku size
  • Oracle databases in Oracle cloud can use OCI Interconnect to pull data back and forth as Azure IaaS
  • A lot of people move to PostgresSQL to pay less and get open source resources
  • Migration strategy
    • Rehost
    • Refactor
    • Rearchitect
    • Replace
  • No better way to run software than by SaaS. Why invest the time and resources when the vendor can maintain this for you and allow you to focus on your business?
  • The Oracle Assessments in Azure Data Studio can see all the database details, SKU recommendations, and Feature Compatibility. Each section has a migration effort attached, very nice
  • Azure is a great place for Postgres SQL:
    • High availability
    • Maintained
    • PostgresSQL migration extension can run assessment in ADS and understand the migration readiness
    • Integrated Assessment
    • Easy set up and config in ADS
  • To migrate postgres
    • Pg_dump scripts out the PostgresSQL schema
    • Pg-sql allows creating databases and tables
    • Use the PostgreSQL to Azure Database for PostgreSQL Online Migration Wizard
    • You can start cutover with pending changes
  • Azure Database for MySQL
    • Flexible server is now available, allows mission critical apps to use zone redundancy and fine-grain maintenance scheduling
    • Azure Database Migration Service (ADMS)
      • Migrates schema an data
      • Preview = replicate changes
      • You can do this as an online migration
  • Optimizing your environment
    • Costs during & after migration
      • During: Azure TCO Calculator, Azure Migrate, Azure Hybrid Benefit & Reserved Instances, and join Azure Migration & Modernization Program
        • Do reservations early, you’ll lose money otherwise
        • AMMP = Azure Migration & Modernization Program, provides coaching with best practice
    • Make sure you get good data for trouble times so your migrated instance is ready for the worst case scenario
    • Execute iteratively

5 Things You Can Do to build better looking reports – James McGillivray

  • Design is subjective
  • No clear end point
  • No one solution
  • Design is often seen as less important
  • Blog: jimbabwe.co.za
  • Tip Techniques
    • Grid Layout
      • Pro: left brain, less layout tweaking happens, looks professional
      • Con: can look boxy,
      • Gutters = white space between visuals
      • Margins = white space along edge of report
    • Maximize Real Estate
      • Pro: bigger visuals = more value, fewer visuals = less distractions
      • Con: often hard to convince stakeholders, all questions not answered by default
    • Beautiful Colors
      • Pro: use color to convey meaning, highlight data, show continuous data, qualitative sequential divergent
      • Con: many pitfalls, accessibility concerns, can be polarising
    • Consider the Audience
      • Pro: most important elements first, remove unneeded elements, hierarchy view (summary on top, grouped middle, detail on bottom)

Identifying and Preventing Unauthorized Power BI Gateways with Angela Henry

  • The problem: available to all, gateways are not only for PBI, results in chaos
  • How do we find gateways?
    • Manage connections & gateways. Let’s us identify gateways. Need to be Azure AD global, PBI service admin, or Gateway Admin
  • Reason to restrict: governance

Out of Your SSMS World, Jupyter Notebooks in Azure Data Studio – Louisville Data Technology Group, Feb 2023

It was a lot of fun to speak at the Louisville Data Technology Group in February. Sheila and I presented on Jupyter notebooks in Azure Data Studio. The session was very fun with a lot of interesting interaction from individuals who were looking at both developer and administration tooling within Azure Data Studio as well as understanding how to use Jupyter notebooks most for the first time.

Steve presenting in Louisville

The start of the session was a general introduction to Jupyter notebooks and Jupyter books. You can find the short slide deck here. I think the key thoughts from the introduction was the fact that Jupyter notebooks have been around for a long time and have often been used in data science as well as data engineering with Python. For example, my first exposure to notebooks was working with Databricks and more of an data engineering workload. One interesting note is that Jupyter books appear to be a nonstandard or as far as I can tell hard to understand component. Jupyter books are in fact a folder structure used to organize the contents including various markdown files, subfolders, and notebooks. Jupyter books allow you to store and organize your content and even share it in an organized way.

My first real exposure to Jupyter notebooks in a functional way was to create a platform on which my wife could help with presentations in a simpler manner than just using SQL Server Management Studio. As a result, I began to dig into how Jupyter notebooks could help us during presentations. We have since used Jupyter notebooks at two different SQL Saturdays and presented on how to use notebooks in this session at this user group. You can read about my first experience with notebooks in this post.

As part of our presentation at the Louisville Data Technology Group, my wife and I worked on a step by step walk through the demo. I’ve made some updates to the instructions to hopefully help any of you recreate the demo that we did during the presentation. You can find that step by step here. Besides the demo instructions, a sample of the completed sample notebook is also stored in that GitHub location.

Questions from the group

Can we mix Python and SQL in an SQL kernel notebook?

This is not possible at this time. Currently the notebook attaches to a single kernel and while there is an option to change what type of code is in the cell the only option available when you click in the SQL on the lower right corner is SQL.

When working with an SQL notebook does it create one or more sessions with each cell that is used?

We’re working with Azure Data Studio, each notebook or file will create a new session when connected. In our case each notebook will have its own session and the queries will run within that session for the single notebook. If you open separate notebooks, you will get separate sessions for each notebook to operate in.

In a SQL tab on Azure Data Studio, can you use the same charting functions with your result sets?

We were able to demonstrate this during the user group meeting. The charting and export functions that are available with the results in a notebook code cell execution are also available for results that’s from a traditional SQL execution. The image below shows where you can find be charting and export options from a result set in traditional SQL.

Insert image here

What is the best way to share notebooks with your team?

During our demo, we illustrated how to connect to remote Jupyter books. That however is a great approach for content you want to share with the general public. If you are working with a team and are managing a set of code in notebooks, the preferred approach would be to use GitHub. This would allow each of you to clone the repo and commit its changes back to the notebook and retrieve updates made by other team members.

Converting existing SQL files to notebooks

If you open a .sql file in the Azure Data Studio you have the option to convert the notebook to a SQL file. Typically, this will take comments and try to put them into text cells and separate your code the best it can into code cells that make sense. Be aware that it’s not always consistent and you will likely want to run through your notebook to verify that the result in the notebook is what you would desire. If you want to be proactive you can use markdown formatting in your comments that will then be converted to proper markdown when converted to a notebook.

/*
# This is an example of a header 
Here is an example of **bolded text**
*/

The code above would look like this when converted.

It is also possible to convert a notebook to SQL and it will create the reverse process with commented code and markdown tagging.

Power BI: Data Quality Checks Using Python & SQL

Picture this, you have a report in Power BI that someone passes off to you for data quality checks. There are a few ways to make sure your measures match what is in the source data system, but for this demo we are going to use python and excel to perform our data quality checks in one batch. In order to do that, we are going to build a python script that can run Power BI REST APIs, connect to a SQL Server, and connect to Excel to grab the formulas and to push back the quality check into Excel for final review. To find a sample Excel and the final python script, please refer to my GitHub.

To build this, you’ll need a few pieces:

  1. A way to run Power BI REST API’s
    • Requires an Azure AD App – if you can’t make one then you will need someone in your company to create one and give you the client ID for it and credentials to use it with
  2. Ability to run python scripts (Visual Studio Code will be used in this demo)
  3. Access to a Power BI dataset that is published to the Power BI service (Premium is not needed)
  4. Access to a SQL Server
  5. A list of DAX functions that you want to test in Excel
  6. A list of SQL queries that should have the same result as your DAX functions in Excel

Authentication

To start, you’ll need to register an application for Power BI within Azure. First, create & register your app in the Azure portal. This link will take you through the manual process step-by-step including how to add permissions so the application can access the PBI service. You’ll need to give your app the delegated permission of Dataset.ReadWrite.All or Dataset.Read.All as well as Tenant.Read.All or Tenant.ReadWrite.All which will require an admin to grant (check the required scope in API documentation). One important limitation for accessing the Azure AD app, the user that you use to access it must have no MFA on it. If that makes your IT security team squeemish, remind them that the app can be given read only access to only Power BI’s metadata. No proprietary data is at risk and write back is not necessary to accomplish our goals.

Let’s test out running the REST API using python scripts! To run this in python, we will need to generate an access token then call the REST API using the access token acquired earlier.

Adding Libraries in Python

If you have not used some of the libraries that will be imported for this demo, you will need to run “pip install [library name]” in your command prompt. If you don’t have pip, follow directions on this link and/or modify your Python application to include pip. Newer versions often have this enabled for ya by default, but triple check that you have added python to your environment variables or else you will get errors saying Python isn’t installed when you try to run stuff in the command prompt.

Getting the REST API Token

Alrighty, let’s go to Visual Studio Code and build out the authentication portion of our Python. We are going to use the adal library to make a REST API call that will get the token for us. Make sure to put single or double quotes around your Client ID, Username, and Password to format them as text. Keep in mind, our final code will get these values from the excel file instead of hard coding them into our code. That means you can also get these values from a separate Excel file if your IT team would prefer to keep these values out of the final code (which is highly recommended).

import adal
authority_url = 'https://login.windows.net/common'
resource_url = 'https://analysis.windows.net/powerbi/api'
client_id = <INSERT CLIENT ID>
username = <INSERT USERNAME>
password = <INSERT PASSWORD>
context = adal.AuthenticationContext(authority=authority_url,
                                     validate_authority=True,
                                     api_version=None)
token = context.acquire_token_with_username_password(resource=resource_url,
                                                     client_id=client_id,
                                                     username=username,
                                                     password=password)
access_token = token.get('accessToken')
print(access_token)

Awesome, now we have a way to generate an access token to use for our future API calls!

Alrighty, let’s try this out with a simple REST API call – getting a list of the workspaces (called groups in the API for some reason) in my tenant. I only have a couple workspaces, but this will let us test out passing our token to get back a response. Add the following lines to the end of the code above:

get_workspaces_url = 'https://api.powerbi.com/v1.0/myorg/groups'
header = {'Authorization': f'Bearer {access_token}'}
r = requests.get(url=get_workspaces_url, headers=header)
r.raise_for_status()
print(r.text)

You should see something similar to the screenshot below. You’ll notice there are a lot of elements in this response, but we can look at the name section and confirm that the code is working as expected. Exciting stuff!

Connecting to a SQL Server database

To connect to SQL, we are going to use the pymssql library. See below for the format we will need to connect to our SQL Server database and run a sample query.

#SQL Server Authentication
import pymssql 

sql_server = 'server.database.windows.net'
sql_user = 'Sample'
sql_password= 'Password'
sql_database = 'DatabaseName'

sql_con = pymssql.connect(sql_server,sql_user,sql_password,sql_database)
sql_cursor = sql_con.cursor(as_dict=True)

#execute SQL script
sql_cursor.execute('SELECT TOP 1 SalesKey FROM dbo.FactSales')
for row in sql_cursor: 
    print(row)

Your result will be a dictionary with the column name then the value. This will be important when we try to compare it to our Power BI DAX query result.

Comparing DAX Query with SQL Query

Alrighty now to the main event! Running a DAX query against our Power BI data model and comparing it to a SQL query to ensure our measure is working as expected. For this demo, I’ll keep it pretty simple. I have a measure in my dataset, Sales Amount = SUM(FactSales[SalesAmount]). This should equal the result from the SQL query SELECT SUM(SalesAmount) ‘Sales Amount’ FROM FactSales. If you have errors using the execute queries REST API, please review the limitations section of the Microsoft Execute Queries Documentation.

Python Code to Run SQL Script

#SQL Script Execution
sql_cursor.execute('SELECT SUM(SalesAmount) as SalesAmount FROM dbo.FactSales')
for row in sql_cursor: 
    print(row["SalesAmount"])

Python Code to Run PBI Execute Queries REST API

#PBI Execute Queries 
pbi_dataset_id = '10c3f357-9b2a-42f4-8438-748295b47b9b'
execute_queries_url = f'https://api.powerbi.com/v1.0/myorg/datasets/{pbi_dataset_id}/executeQueries'
dax_query_json_body = {
                "queries":
                        [
                            {
                            "query": 'EVALUATE ROW("Sales_Amount", \'_Measures\'[Sales Amount])'
                            }
                        ]
                        
                }
eq = requests.post(url=execute_queries_url, headers=header,json=dax_query_json_body)
eq.raise_for_status()
query_json = eq.text.encode().decode('utf-8-sig')
print("JSON Response ", json.loads(query_json))

Awesome! Looks like now I can visually confirm that my Power BI DAX measure returns the same result as my SQL query. But what if I want to do more than one at a time? In the next section, we will flip the script to use an Excel file as our input and our output. That way, all you would need to do is have two columns for input – one with the SQL query and one with the corresponding DAX formula. Then the output would be a third column that returns the difference between the two (ideally, 0).

Use Excel File as Input and Output

To start, we are going to build an Excel template to use with this script. For ease of use, we are going to use formatted tables so end users can append new measures as needed (a copy is available on our GitHub for download). We will make two sheets- one for credentials and one for the data quality checks. On the credentials sheet, make one table for the SQL credentials and for the PBI credentials. I’m going to name the table “Creds”.

On the Quality Check tab, we will make one more table that has a column for Power BI DAX formula, the SQL query, the result of the DAX formula, the result of the SQL query, and the difference between the two. Let’s avoid using spaces in our column and table names to keep our coding simple. We’ll call this table “Quality”.

Alrighty, now to pull these values into python! Let’s start with grabbing our creds and pulling them in. For this section, we need to import the openpyxl library and use it to pull in our tables. To start, we are going to grab our file then go to the worksheet named “Credentials”. From there, we will see what tables are in that worksheet.

#Grab data from our excel file
from openpyxl import load_workbook

#Read file
wb = load_workbook(r"C:\Users\KristynaHughes\Downloads\PythonDataQualityChecker.xlsx") 
#we need the r at the beginning since "\" is a special character in python
sheets = wb.sheetnames
print(sheets) #see the sheets in our workbook
#access specific sheet called Credentials
cred_ws = wb['Credentials']
print(cred_ws.title)
cred_tables = cred_ws.tables.values()
for table in cred_tables:
    print(table.displayName)

Next, we are going to use the mapping function to create an object that contains a dataframe for our excel file. If you’re not familiar with python, a dataframe is essentially Python’s version of a SQL view or Excel table that can store and manipulate tabular data. Once we get our two tables into dataframes, we can start to pull out values as inputs for our earlier equations.

Don’t worry, at the end of this blog is the entire final script.

cred_mapping = {} #this will help us build a dataframe from our table
for entry, data_boundary in cred_ws.tables.items():
    #parse the data within the ref boundary
    data = cred_ws[data_boundary]
    #extract the data 
    #the inner list comprehension gets the values for each cell in the table
    content = [[cell.value for cell in ent] #ent makes our code cleaner by handling nested data sources
               for ent in data
          ]
    header = content[0]
    #the contents excluding the header (aka column names)
    rest = content[1:]
    #create dataframe with the column names
    #and pair table name with dataframe
    cred_df = pandas.DataFrame(rest, columns = header)
    cred_mapping[entry] = cred_df
print(cred_df)

server = cred_df._get_value(0,"SQL_Server")
db = cred_df._get_value(0,"SQL_Database")
print(server)
print(db)

Sweet, now we can get our SQL_Server and SQL_Database values directly from the spreadsheet instead of hard coding it in Python! To code out the rest of the variables, simply copy then replace “server” and “SQL_Server” in the code above with the variable name and column name that you want to return.

Alrighty, now we need to pull in our data quality spreadsheet and set up some for loops that can help iterate through the list of Power BI and SQL queries and return some outputs to our Excel sheet.

Note: Your SQL query will need to have every column named or else you will get an error when you run the cursor over it. Also, for this section we will need to import a couple more libraries. If you don’t already have win32.com installed, you’ll need to run pip install pypiwin32 in your command prompt.

This script is pretty awesome to see work because we will finally get some values back into our Excel file! The first section will use the win32com library to essentially open excel in edit mode instead of read only like we have been doing. After we open the worksheet to edit it, we will grab our SQL queries then iterate through them to get the results from the query and put it back into Excel. If that doesn’t get the nerd in you excited, I’m not sure what will!

from win32com.client import Dispatch; import os
#Using the win32com library, we can open excel in edit mode
xl = Dispatch("Excel.Application") #opens excel for us to edit
xl.Visible = True
edit_wb = xl.Workbooks.Open(excel_file)
edit_qc_sh = edit_wb.Worksheets("Quality Check")

#Running SQL query and putting result back into Excel
sql_df = quality_df.dropna() #removing any blank rows so we don't run blank sql queries lol
sql_queries = sql_df.loc[:,"SQL_Query"] 
#the first argument (:) signifies which rows we want to index (: indicates all columns), and the second argument lets us index the column we want
#print(sql_queries) #this returns the rows in our sql query column
rownumber = 1
for query in sql_queries:
    sql_cursor.execute(str(query))
    #print (sql_cursor.fetchall()[0])
    for row in sql_cursor:
        #print(row) #this is our dictionary created from the sql query result
        key = list(row.items())[0]
        rownumber += 1 #puts our sql responses in the right rows
        sql_result = key[1]
        quality_df.at[rownumber,"SQL_Result"] = sql_result #this will put our results into the proper cell in our dataframe
        edit_qc_sh.Range(f"D{rownumber}").Value = sql_result #this will put our results in the right excel cell
        #print(key[1]) #returns just our result values
        
edit_wb.Save() #saving our values back to excel

Okay, time to do the same thing with our DAX queries and wrap this project up. You may have noticed that the DAX queries in the sample data is not the exact same you would use in your measure. What we are doing is telling DAX how we want the data back, in this case we want it to return one row for us (hence, EVALUATE ROW at the beginning) based on measures I have in a table called “_Measures”. If you aren’t sure how to make the DAX query work for this API, feel free to put the measure you want to test in a card visual then use the performance analyzer to get the query. A good way to see if your DAX query will work is to test it out in DAX Studio.

Python time! We are going to tweak our code from earlier by renaming our variable “header” to “pbiheader” so we can reference it more clearly in our JSON request code. Again, feel free to skip to the end of this post for the final code if you have any questions (or need this to just work and fast). First, we will need all the pieces to execute our post script. Once we have the URL, the DAX query, and the json body, we can iterate through the list of queries from our Excel file.

#Running queries and putting results back into Excel
qdf = quality_df.dropna() #removing any blank rows so we don't run blank sql queries lol
sqlrownumber = 1
pbirownumber = -1

#Running SQL Queries
sql_queries = qdf.loc[:,"SQL_Query"] 
for query in sql_queries:
    sql_cursor.execute(str(query))
    #print (sql_cursor.fetchall()[0])
    for row in sql_cursor:
        #print(row) #this is our dictionary created from the sql query result
        key = list(row.items())[0] #gets the first item from the row dictionary
        sqlrownumber += 1 #puts our sql responses in the right rows
        sql_result = key[1] #grabs just the result of our query
        quality_df.at[sqlrownumber,"SQL_Result"] = sql_result #this will put our results into the proper cell in our dataframe
        edit_qc_sh.Range(f"D{sqlrownumber}").Value = sql_result #this will put our results in the right excel cell
        #print(key[1]) #returns just our result values
        
#Running PBI Queries
pbi_queries = quality_df.loc[:,"PBI_DAX_Query"]
execute_queries_url = f'https://api.powerbi.com/v1.0/myorg/datasets/{pbidatasetid}/executeQueries'
for items in pbi_queries:
    pbirownumber += 1
    pbiexcelrownumber = pbirownumber+2
    list_pbiquery = list(pbi_queries.items())[pbirownumber]
    #print(list_pbiquery)
    item_pbiquery = list_pbiquery[1]
    dax_query_json_body = {
                "queries":
                        [
                            {
                            "query": item_pbiquery
                            }
                        ]  
                }
    pbi_response = requests.post(url=execute_queries_url, headers=pbiheader,json=dax_query_json_body)
    query_json = pbi_response.text.encode().decode('utf-8-sig') #allows us to read the json response
    pbi_values = query_json.replace("}]}]}]}","") #drops the trailing encoding from json
    #print(query_json)
    pbi_result = pbi_values.split(":")[4] #grabs just the result of our query
    print(pbi_result) #grabs just the result of our query
    quality_df.at[pbiexcelrownumber,"PBI_Result"] = pbi_result #this will put our results into the proper cell in our dataframe
    edit_qc_sh.Range(f"C{pbiexcelrownumber}").Value = pbi_result #this will put our results in the right excel cell

edit_wb.Save() #saving our values back to excel

Pretty great! Now we have a functioning data quality testing tool where Excel holds all our inputs and outputs and all we need to do in Python is hit Run.

Final Query

'''---------------README------------------
Full instructions available: https://dataonwheels.wordpress.com/?s=Power+BI%3A+Data+Quality+Checks+Using+Python+%26+SQL
The goal of this script is to compare DAX measures to corresponding SQL queries. 
To accomplish this, it uses an excel file containing authentication variables and query values.
This idea came from seeing the Execute Queries REST API in Power BI and wanting to use python to perform our data quality checks efficiently.
To connect to a Power BI Data Model, we need to pass an authentication token generated through an Azure AD App. 
To run a DAX query and SQL query, we need to loop through our excel file then put the queries into the API and SQL calls respectively.
Finally, this script takes those query results and puts it back into the excel file.

I have left some print() lines commented out, feel free to uncomment them to troubleshoot or walk through the code step-by-step.

You will need to swap out the excel_file variable with your excel file path. Other than that, the rest of the variables are managed inside your excel file. 
Once you have your variables and queries in excel, hit run and you're good to go. 
'''

#---------------------------------------#
#       Import libraries needed  
#---------------------------------------#
import requests
import adal
import json
import pymssql 
import pandas
import openpyxl
from win32com.client import Dispatch; import os

#---------------------------------------#
#    Build out authentication steps  
#---------------------------------------#

#----- Authentication Variables from Excel -----#

#Grab authentication data from our excel file
from openpyxl import load_workbook
#we need the r at the beginning since "\" is a special character in python
excel_file = r"C:\Users\KristynaHughes\Downloads\PythonDataQualityChecker.xlsx"
wb = load_workbook(excel_file) 
sheets = wb.sheetnames
#print(sheets) #see the sheets in our workbook
cred_ws = wb['Credentials'] #access specific sheet called Credentials
cred_mapping = {} #this will help us build a dataframe from our table
for entry, data_boundary in cred_ws.tables.items():
    #parse the data within the ref boundary
    data = cred_ws[data_boundary]
    #extract the data 
    #the inner list comprehension gets the values for each cell in the table
    content = [[cell.value for cell in ent] #ent makes our code cleaner by handling nested data sources
               for ent in data
          ]
    header = content[0] #the contents excluding the header (aka column names)
    rest = content[1:] #create dataframe with the column names
    cred_df = pandas.DataFrame(rest, columns = header) #this is the dataframe we will use to get our creds
    cred_mapping[entry] = cred_df
#Use the dataframe to set up credential variables we can use later in the script
sqlserver = str(cred_df._get_value(0,"SQL_Server"))
sqldb = str(cred_df._get_value(0,"SQL_Database"))
sqluser = str(cred_df._get_value(0,"SQL_User"))
sqlpassword = str(cred_df._get_value(0,"SQL_Password"))
pbiclientid = str(cred_df._get_value(0,"PBI_ClientID"))
pbiusername = str(cred_df._get_value(0,"PBI_Username"))
pbipassword = str(cred_df._get_value(0,"PBI_Password"))
pbidatasetid = str(cred_df._get_value(0,"PBI_DatasetID"))
#check to make sure your variables are correct by uncommenting the next line
#print(sqlserver,sqldb,sqluser,sqlpassword,pbiclientid,pbiusername,pbipassword,pbidatsetid)


#----- Power BI REST API Authentication -----#

authority_url = 'https://login.windows.net/common'
resource_url = 'https://analysis.windows.net/powerbi/api'
context = adal.AuthenticationContext(authority=authority_url,
                                     validate_authority=True,
                                     api_version=None)
token = context.acquire_token_with_username_password(resource=resource_url,                                                     
                                                     username=pbiusername,
                                                     password=pbipassword,
                                                     client_id=pbiclientid)
access_token = token.get('accessToken')
pbiheader = {'Authorization': f'Bearer {access_token}'}
#try out the get workspaces REST API using our access token by uncommenting the next few lines
#get_workspaces_url = 'https://api.powerbi.com/v1.0/myorg/groups'
#r = requests.get(url=get_workspaces_url, headers=header)
#r.text will give us the response text for our get request, pretty neat!
#print(r.text)


#----- SQL Server Authentication -----#

try:
    sql_con = pymssql.connect(sqlserver,sqluser,sqlpassword,sqldb)
    sql_cursor = sql_con.cursor(as_dict=True)
except Exception as e:
    raise Exception(e)


#---------------------------------------#
#  Build out data quality check steps 
#---------------------------------------#

#----- Read excel to get quality check queries into a dataframe -----#

quality_ws = wb['Quality Check']
quality_mapping = {} #this will help us build a dataframe from our table
for entry, data_boundary in quality_ws.tables.items(): #grabs data dynamically from our table
    data = quality_ws[data_boundary] #parse the data within the ref boundary
    #the inner list comprehension gets the values for each cell in the table
    content = [[cell.value for cell in ent] #ent makes our code cleaner by handling nested data sources
               for ent in data
          ]
    header = content[0] #the contents excluding the header (aka column names)
    rest = content[1:] #create dataframe with the column names
    qualitydf = pandas.DataFrame(rest, columns = header) #this is the dataframe we will use to get our quality check queries
    quality_df = qualitydf.fillna(' ') #helps remove blank records from our excel file
    quality_mapping[entry] = quality_df
#print(quality_df)


#----- Open excel file in edit mode -----#

xl = Dispatch("Excel.Application") #opens excel for us to edit
xl.Visible = True
edit_wb = xl.Workbooks.Open(excel_file)
edit_qc_sh = edit_wb.Worksheets("Quality Check")

#----- Set variables to use in our iterators -----#

qdf = quality_df.dropna() #removing any blank rows so we don't run blank sql queries lol
sqlrownumber = 1
pbirownumber = -1

#----- Run SQL queries and put results back into excel -----#

sql_queries = qdf.loc[:,"SQL_Query"] 
for query in sql_queries:
    sql_cursor.execute(str(query))
    #print (sql_cursor.fetchall()[0])
    for row in sql_cursor:
        #print(row) #this is our dictionary created from the sql query result
        key = list(row.items())[0] #gets the first item from the row dictionary
        sqlrownumber += 1 #puts our sql responses in the right rows
        sql_result = key[1] #grabs just the result of our query
        quality_df.at[sqlrownumber,"SQL_Result"] = sql_result #this will put our results into the proper cell in our dataframe
        edit_qc_sh.Range(f"D{sqlrownumber}").Value = sql_result #this will put our results in the right excel cell
        #print(key[1]) #returns just our result values

#----- Run PBI DAX queries and put results back into excel -----#

pbi_queries = quality_df.loc[:,"PBI_DAX_Query"]
#print(pbi_queries)
execute_queries_url = f'https://api.powerbi.com/v1.0/myorg/datasets/{pbidatasetid}/executeQueries'
for items in pbi_queries:
    pbirownumber += 1
    pbiexcelrownumber = pbirownumber+2
    list_pbiquery = list(pbi_queries.items())[pbirownumber]
    #print(list_pbiquery)
    item_pbiquery = list_pbiquery[1]
    dax_query_json_body = {
                "queries":
                        [
                            {
                            "query": item_pbiquery
                            }
                        ]  
                }
    pbi_response = requests.post(url=execute_queries_url, headers=pbiheader,json=dax_query_json_body)
    query_json = pbi_response.text.encode().decode('utf-8-sig') #allows us to read the json response
    pbi_values = query_json.replace("}]}]}]}","") #drops the trailing encoding from json
    #print(query_json)
    pbi_result = pbi_values.split(":")[4] #grabs just the result of our query
    #print(pbi_result) #grabs just the result of our query
    quality_df.at[pbiexcelrownumber,"PBI_Result"] = pbi_result #this will put our results into the proper cell in our dataframe
    edit_qc_sh.Range(f"C{pbiexcelrownumber}").Value = pbi_result #this will put our results in the right excel cell

#----- Save our changes back to the excel file -----#

edit_wb.Save() #saving our values back to excel
print("All done")

Additional Resources:

This was my first experience with Python, and below are all the links that helped me build out this demo and learn more about how Python operates. Hope you find these as useful as I did!

Last Non-NULL Date in SQL Server

The simplest of requests are often the most difficult to execute. For example, a finance team needs to know every time a customer did not invoice for 90 days in the past 2 years. The simplicity of the ask is deceiving. Tracking differences across multiple dimensions (customer and invoice date in this case) and accounting for NULL values in the changing dimension (aka when a customer did not invoice on a day) appears to be hopeless without the support of a CRM code change. But have no fear, complicated SQL is here!

Testing Scenario: the business would like you to create a customer attrition report. To do this, you need to find gaps in invoice dates per customer and determine when and how many customers go “inactive” and are “reactivated” in the past two years. A customer is deemed “inactive” whenever there are greater than 90 days since the last invoice. This can occur multiple times in one year, so a customer can be “reactivated” multiple times in one year.

Resources Needed:

  1. SQL Server Access to the needed data elements
    • In this scenario, this consists of invoice date by customer. You can swap this out for any other date range or any other unique ID.
  2. Business logic
    • In this scenario, activations in a year = anytime a customer has invoiced first the first time in a 90 day period. You can swap customer field for any dimension such as sales rep, carrier, business segment, etc. You can also swap out invoice date for any date field such as creation date, pickup date, paid date, delivery date, etc.
  3. Start and End dates
  4. Ability to use CTE’s/Temp Tables
    • This really comes into play if you are trying to create a Direct Query based report in Power BI or using any other reporting tools that do not allow calling Temp Tables. If you hit this limitation, then you will need to leverage a database/code solution instead of the method below.

Notes:

  • If your SQL server instance is after 2016, then you will not need to use the custom date temp table and can use IGNORE NULL within the MAX OVER statement (see alternative line in the final SQL code below).
  • The process below lays out each portion of the final query, but feel free to skip ahead to the end for the final sql statement if you don’t need each section explained.

Process:

  1. Set up parameters
    • DECLARE @StartDate DATE = '2019-01-01'
      DECLARE @EndDate DATE = GETDATE()
      DECLARE @ActivationRange INT = 90 --notates how many days can be between invoice dates before a customer is deemed "inactive".
  2. Create a date/calendar table. Check with your DBA’s first to make sure they haven’t already created something similar that you can use, all you need is a list of sequential calendar dates with no gaps.
    • ;WITH cte AS (
      SELECT @StartDate AS myDate
      UNION ALL|
      SELECT DATEADD(day,1,myDate) as myDate
      FROM cte
      WHERE DATEADD(day,1,myDate) <= @EndDate
      )
      SELECT myDate 'CalendarDate'
      INTO #Calendar
      FROM cte
      OPTION (MAXRECURSION 0) –this works around the usual 100 recursion row limit
  3. If you need to partition by a dimension other than date, such as customer in this scenario, you will need to create a table to grab that dimension’s values as well. After this, you’ll need to create a bridge table that will have a value for every date in your range and every customer (or other dimension) value as well.
    • –Customer Table
      SELECT DISTINCT
      DA.AccountsKey
      ,DA.CompanyID
      ,DA.CompanyName
      ,MIN(FSF.InvoiceDateKey) 'FirstInvoiceDate'
      INTO #Companies
      FROM DimAccount DA
      JOIN ShipmentFacts FSF ON FSF.AccountKey = DA.AccountsKey
      WHERE FSF.InvoiceDateKey IS NOT NULL
      GROUP BY
      DA.AccountsKey
      ,DA.CompanyID
      ,DA.CompanyName
    • –Bridge Table that combines both Customer and Date values
      SELECT DISTINCT
      C.CalendarDate
      ,Comp.CompanyID
      ,Comp.CompanyName
      ,MIN(Comp.FirstInvoiceDate) 'FirstInvoiceDate'
      ,CONCAT(C.CalendarDate,Comp.CompanyID) 'ID'
      INTO #Bridge
      FROM #Calendar C, #Companies Comp
      GROUP BY
      C.CalendarDate
      ,Comp.CompanyID
      ,Comp.CompanyName
      ,CONCAT(C.CalendarDate,Comp.CompanyID)
  4. Next, we need to create our unique ID’s that combine all the dimensions we are hoping to account for in our “IGNORE NULLS” scenario. In this test case, we need to create one ID that grabs the actual dates a customer invoiced on and another for all the dates in our range that a customer could have possibly invoiced on. Then, we join the two together to grab the last time a customer invoiced and get ignore those pesky NULL values. This is the section where having SQL Server 2016 and later will do you a lot of favors (see code below).
    • –Actual Invoiced Dates by Customer
      SELECT DISTINCT
      FSF.InvoiceDateKey
      ,DA.CompanyName
      ,DA.CompanyID
      ,CONCAT(FSF.InvoiceDateKey,DA.CompanyId) 'ID'
      INTO #ShipmentData
      FROM ShipmentFacts FSF
      JOIN #Companies DA ON DA.AccountsKey = FSF.AccountKey
      WHERE FSF.InvoiceDateKey BETWEEN @StartDate AND @EndDate
    • –Joining together and filling in the NULLS with the previous invoiced date by customer
      SELECT DISTINCT
      C.ID
      ,S.ID 'ShipData'
      ,CAST( SUBSTRING( MAX( CAST (C.ID AS BINARY(4)) + CAST(S.ID AS BINARY(20))) OVER (PARTITION BY C.CompanyID ORDER BY C.ID ROWS UNBOUNDED PRECEDING),5,20) AS varchar) 'PreviousInvoiceDateKey'
      --ALTERNATIVE FOR POST SQL Server 2012--
      --,CAST( SUBSTRING( MAX( CAST (C.ID AS BINARY(4)) + CAST(S.ID AS BINARY(20))) IGNORE NULLS OVER (PARTITION BY C.CompanyID ORDER BY C.ID ROWS UNBOUNDED PRECEDING),5,20) AS varchar) 'PreviousInvoiceDateKey'

      INTO #RunningDates
      FROM #Bridge C
      LEFT JOIN #ShipmentData S ON S.ID = C.ID
  5. The rest of the code is based on business logic, so please use at your discretion and edit for your own needs.

Full SQL Code:

DECLARE @StartDate DATE = '2019-01-01'
DECLARE @EndDate DATE = GETDATE()
DECLARE @ActivationRange INT = 90 --notates how many days can be between invoice dates before a customer is deemed "inactive"
;WITH cte AS (
SELECT @StartDate AS myDate
UNION ALL
SELECT DATEADD(day,1,myDate) as myDate
FROM cte
WHERE DATEADD(day,1,myDate) <= @EndDate
)
SELECT myDate 'CalendarDate'
INTO #Calendar
FROM cte
OPTION (MAXRECURSION 0)


SELECT DISTINCT
DA.AccountsKey
,DA.CompanyID
,DA.CompanyName
,MIN(FSF.InvoiceDateKey) 'FirstInvoiceDate'
INTO #Companies
FROM DimAccount DA
JOIN ShipmentFacts FSF ON FSF.AccountKey = DA.AccountsKey
WHERE FSF.InvoiceDateKey >= '2000-01-01'
GROUP BY
DA.AccountsKey
,DA.CompanyID
,DA.CompanyName


SELECT DISTINCT
C.CalendarDate
,Comp.CompanyID
,Comp.CompanyName
,MIN(Comp.FirstInvoiceDate) 'FirstInvoiceDate'
,CONCAT(C.CalendarDate,Comp.CompanyID) 'ID'
INTO #Bridge
FROM #Calendar C, #Companies Comp
GROUP BY
C.CalendarDate
,Comp.CompanyID
,Comp.CompanyName
,CONCAT(C.CalendarDate,Comp.CompanyID)

SELECT DISTINCT
FSF.InvoiceDateKey
,DA.CompanyName
,DA.CompanyID
,CONCAT(FSF.InvoiceDateKey,DA.CompanyId) 'ID'
INTO #ShipmentData
FROM ShipmentFacts FSF
JOIN #Companies DA ON DA.AccountsKey = FSF.AccountKey
WHERE FSF.InvoiceDateKey BETWEEN @StartDate AND @EndDate

SELECT DISTINCT
C.ID
,S.ID 'ShipData'
,CAST( SUBSTRING( MAX( CAST (C.ID AS BINARY(4)) + CAST(S.ID AS BINARY(20))) OVER (PARTITION BY C.CompanyID ORDER BY C.ID ROWS UNBOUNDED PRECEDING),5,20) AS varchar) 'PreviousInvoiceDateKey'
--ALTERNATIVE FOR POST SQL Server 2012--
--,CAST( SUBSTRING( MAX( CAST (C.ID AS BINARY(4)) + CAST(S.ID AS BINARY(20))) IGNORE NULLS OVER (PARTITION BY C.CompanyID ORDER BY C.ID ROWS UNBOUNDED PRECEDING),5,20) AS varchar) 'PreviousInvoiceDateKey'
INTO #RunningDates
FROM #Bridge C
LEFT JOIN #ShipmentData S ON S.ID = C.ID


SELECT DISTINCT
R.ID
,R.ShipData
,R.PreviousInvoiceDateKey
,LEFT(R.PreviousInvoiceDateKey,10) 'PreviousInvoiceDate'
,LEFT(R.ID,10) 'DateKey'
,RIGHT(R.ID,5) 'CompanyId'
,B.FirstInvoiceDate
INTO #ActivationData
FROM #RunningDates R
LEFT JOIN #Bridge B ON B.ID = R.ID

SELECT DISTINCT
A.ID
,A.DateKey
,A.CompanyId
,A.PreviousInvoiceDate
,YEAR(A.DateKey) 'Year'
,YEAR(A.FirstInvoiceDate) 'InitialActivationYear'
,CASE WHEN YEAR(A.DateKey) = YEAR(A.FirstInvoiceDate) THEN 1 ELSE 0 END 'IsActivationYear'
,DATEDIFF(Day,A.PreviousInvoiceDate,A.DateKey) 'DaysSinceInvoice'
,CASE WHEN DATEDIFF(Day,A.PreviousInvoiceDate,A.DateKey) = @ActivationRange THEN 1 ELSE 0 END 'IsInactive'
,CASE WHEN DATEDIFF(Day,A.PreviousInvoiceDate,A.DateKey) = @ActivationRange THEN A.DateKey ELSE NULL END 'InactiveDate'
INTO #ActivationDetails
FROM #ActivationData A

SELECT DISTINCT
D.Year
,D.CompanyId
,SUM(D.IsInactive) 'InactivatedPeriods'
,MAX(D.IsActivationYear) 'IsFirstActivationYear'
,MAX(D.DaysSinceInvoice) 'BiggestGapInInvoicing (Days)'
,MAX(D.InactiveDate) 'LastInactiveDate'
,MAX(D.PreviousInvoiceDate) 'LastInvoiceDate'
,CASE WHEN MAX(D.InactiveDate) > MAX(D.PreviousInvoiceDate) THEN -1 ELSE 0 END 'NotActiveAtEndOfYear'

--to grab the activations per customer per year follow equation below
-- Activations = [InactivatedPeriods] + [NotActiveAtEndOfYear] + [IsFirstActivationYear] --this part will be done in Power BI
FROM #ActivationDetails D
GROUP BY
D.Year
,D.CompanyId


DROP TABLE #Calendar
DROP TABLE #Companies
DROP TABLE #Bridge
DROP TABLE #ShipmentData
DROP TABLE #RunningDates
DROP TABLE #ActivationData
DROP TABLE #ActivationDetails

Additional Resource: