Last Non-NULL Date in SQL Server

The simplest of requests are often the most difficult to execute. For example, a finance team needs to know every time a customer did not invoice for 90 days in the past 2 years. The simplicity of the ask is deceiving. Tracking differences across multiple dimensions (customer and invoice date in this case) and accounting for NULL values in the changing dimension (aka when a customer did not invoice on a day) appears to be hopeless without the support of a CRM code change. But have no fear, complicated SQL is here!

Testing Scenario: the business would like you to create a customer attrition report. To do this, you need to find gaps in invoice dates per customer and determine when and how many customers go “inactive” and are “reactivated” in the past two years. A customer is deemed “inactive” whenever there are greater than 90 days since the last invoice. This can occur multiple times in one year, so a customer can be “reactivated” multiple times in one year.

Resources Needed:

  1. SQL Server Access to the needed data elements
    • In this scenario, this consists of invoice date by customer. You can swap this out for any other date range or any other unique ID.
  2. Business logic
    • In this scenario, activations in a year = anytime a customer has invoiced first the first time in a 90 day period. You can swap customer field for any dimension such as sales rep, carrier, business segment, etc. You can also swap out invoice date for any date field such as creation date, pickup date, paid date, delivery date, etc.
  3. Start and End dates
  4. Ability to use CTE’s/Temp Tables
    • This really comes into play if you are trying to create a Direct Query based report in Power BI or using any other reporting tools that do not allow calling Temp Tables. If you hit this limitation, then you will need to leverage a database/code solution instead of the method below.

Notes:

  • If your SQL server instance is after 2016, then you will not need to use the custom date temp table and can use IGNORE NULL within the MAX OVER statement (see alternative line in the final SQL code below).
  • The process below lays out each portion of the final query, but feel free to skip ahead to the end for the final sql statement if you don’t need each section explained.

Process:

  1. Set up parameters
    • DECLARE @StartDate DATE = '2019-01-01'
      DECLARE @EndDate DATE = GETDATE()
      DECLARE @ActivationRange INT = 90 --notates how many days can be between invoice dates before a customer is deemed "inactive".
  2. Create a date/calendar table. Check with your DBA’s first to make sure they haven’t already created something similar that you can use, all you need is a list of sequential calendar dates with no gaps.
    • ;WITH cte AS (
      SELECT @StartDate AS myDate
      UNION ALL|
      SELECT DATEADD(day,1,myDate) as myDate
      FROM cte
      WHERE DATEADD(day,1,myDate) <= @EndDate
      )
      SELECT myDate 'CalendarDate'
      INTO #Calendar
      FROM cte
      OPTION (MAXRECURSION 0) –this works around the usual 100 recursion row limit
  3. If you need to partition by a dimension other than date, such as customer in this scenario, you will need to create a table to grab that dimension’s values as well. After this, you’ll need to create a bridge table that will have a value for every date in your range and every customer (or other dimension) value as well.
    • –Customer Table
      SELECT DISTINCT
      DA.AccountsKey
      ,DA.CompanyID
      ,DA.CompanyName
      ,MIN(FSF.InvoiceDateKey) 'FirstInvoiceDate'
      INTO #Companies
      FROM DimAccount DA
      JOIN ShipmentFacts FSF ON FSF.AccountKey = DA.AccountsKey
      WHERE FSF.InvoiceDateKey IS NOT NULL
      GROUP BY
      DA.AccountsKey
      ,DA.CompanyID
      ,DA.CompanyName
    • –Bridge Table that combines both Customer and Date values
      SELECT DISTINCT
      C.CalendarDate
      ,Comp.CompanyID
      ,Comp.CompanyName
      ,MIN(Comp.FirstInvoiceDate) 'FirstInvoiceDate'
      ,CONCAT(C.CalendarDate,Comp.CompanyID) 'ID'
      INTO #Bridge
      FROM #Calendar C, #Companies Comp
      GROUP BY
      C.CalendarDate
      ,Comp.CompanyID
      ,Comp.CompanyName
      ,CONCAT(C.CalendarDate,Comp.CompanyID)
  4. Next, we need to create our unique ID’s that combine all the dimensions we are hoping to account for in our “IGNORE NULLS” scenario. In this test case, we need to create one ID that grabs the actual dates a customer invoiced on and another for all the dates in our range that a customer could have possibly invoiced on. Then, we join the two together to grab the last time a customer invoiced and get ignore those pesky NULL values. This is the section where having SQL Server 2016 and later will do you a lot of favors (see code below).
    • –Actual Invoiced Dates by Customer
      SELECT DISTINCT
      FSF.InvoiceDateKey
      ,DA.CompanyName
      ,DA.CompanyID
      ,CONCAT(FSF.InvoiceDateKey,DA.CompanyId) 'ID'
      INTO #ShipmentData
      FROM ShipmentFacts FSF
      JOIN #Companies DA ON DA.AccountsKey = FSF.AccountKey
      WHERE FSF.InvoiceDateKey BETWEEN @StartDate AND @EndDate
    • –Joining together and filling in the NULLS with the previous invoiced date by customer
      SELECT DISTINCT
      C.ID
      ,S.ID 'ShipData'
      ,CAST( SUBSTRING( MAX( CAST (C.ID AS BINARY(4)) + CAST(S.ID AS BINARY(20))) OVER (PARTITION BY C.CompanyID ORDER BY C.ID ROWS UNBOUNDED PRECEDING),5,20) AS varchar) 'PreviousInvoiceDateKey'
      --ALTERNATIVE FOR POST SQL Server 2012--
      --,CAST( SUBSTRING( MAX( CAST (C.ID AS BINARY(4)) + CAST(S.ID AS BINARY(20))) IGNORE NULLS OVER (PARTITION BY C.CompanyID ORDER BY C.ID ROWS UNBOUNDED PRECEDING),5,20) AS varchar) 'PreviousInvoiceDateKey'

      INTO #RunningDates
      FROM #Bridge C
      LEFT JOIN #ShipmentData S ON S.ID = C.ID
  5. The rest of the code is based on business logic, so please use at your discretion and edit for your own needs.

Full SQL Code:

DECLARE @StartDate DATE = '2019-01-01'
DECLARE @EndDate DATE = GETDATE()
DECLARE @ActivationRange INT = 90 --notates how many days can be between invoice dates before a customer is deemed "inactive"
;WITH cte AS (
SELECT @StartDate AS myDate
UNION ALL
SELECT DATEADD(day,1,myDate) as myDate
FROM cte
WHERE DATEADD(day,1,myDate) <= @EndDate
)
SELECT myDate 'CalendarDate'
INTO #Calendar
FROM cte
OPTION (MAXRECURSION 0)


SELECT DISTINCT
DA.AccountsKey
,DA.CompanyID
,DA.CompanyName
,MIN(FSF.InvoiceDateKey) 'FirstInvoiceDate'
INTO #Companies
FROM DimAccount DA
JOIN ShipmentFacts FSF ON FSF.AccountKey = DA.AccountsKey
WHERE FSF.InvoiceDateKey >= '2000-01-01'
GROUP BY
DA.AccountsKey
,DA.CompanyID
,DA.CompanyName


SELECT DISTINCT
C.CalendarDate
,Comp.CompanyID
,Comp.CompanyName
,MIN(Comp.FirstInvoiceDate) 'FirstInvoiceDate'
,CONCAT(C.CalendarDate,Comp.CompanyID) 'ID'
INTO #Bridge
FROM #Calendar C, #Companies Comp
GROUP BY
C.CalendarDate
,Comp.CompanyID
,Comp.CompanyName
,CONCAT(C.CalendarDate,Comp.CompanyID)

SELECT DISTINCT
FSF.InvoiceDateKey
,DA.CompanyName
,DA.CompanyID
,CONCAT(FSF.InvoiceDateKey,DA.CompanyId) 'ID'
INTO #ShipmentData
FROM ShipmentFacts FSF
JOIN #Companies DA ON DA.AccountsKey = FSF.AccountKey
WHERE FSF.InvoiceDateKey BETWEEN @StartDate AND @EndDate

SELECT DISTINCT
C.ID
,S.ID 'ShipData'
,CAST( SUBSTRING( MAX( CAST (C.ID AS BINARY(4)) + CAST(S.ID AS BINARY(20))) OVER (PARTITION BY C.CompanyID ORDER BY C.ID ROWS UNBOUNDED PRECEDING),5,20) AS varchar) 'PreviousInvoiceDateKey'
--ALTERNATIVE FOR POST SQL Server 2012--
--,CAST( SUBSTRING( MAX( CAST (C.ID AS BINARY(4)) + CAST(S.ID AS BINARY(20))) IGNORE NULLS OVER (PARTITION BY C.CompanyID ORDER BY C.ID ROWS UNBOUNDED PRECEDING),5,20) AS varchar) 'PreviousInvoiceDateKey'
INTO #RunningDates
FROM #Bridge C
LEFT JOIN #ShipmentData S ON S.ID = C.ID


SELECT DISTINCT
R.ID
,R.ShipData
,R.PreviousInvoiceDateKey
,LEFT(R.PreviousInvoiceDateKey,10) 'PreviousInvoiceDate'
,LEFT(R.ID,10) 'DateKey'
,RIGHT(R.ID,5) 'CompanyId'
,B.FirstInvoiceDate
INTO #ActivationData
FROM #RunningDates R
LEFT JOIN #Bridge B ON B.ID = R.ID

SELECT DISTINCT
A.ID
,A.DateKey
,A.CompanyId
,A.PreviousInvoiceDate
,YEAR(A.DateKey) 'Year'
,YEAR(A.FirstInvoiceDate) 'InitialActivationYear'
,CASE WHEN YEAR(A.DateKey) = YEAR(A.FirstInvoiceDate) THEN 1 ELSE 0 END 'IsActivationYear'
,DATEDIFF(Day,A.PreviousInvoiceDate,A.DateKey) 'DaysSinceInvoice'
,CASE WHEN DATEDIFF(Day,A.PreviousInvoiceDate,A.DateKey) = @ActivationRange THEN 1 ELSE 0 END 'IsInactive'
,CASE WHEN DATEDIFF(Day,A.PreviousInvoiceDate,A.DateKey) = @ActivationRange THEN A.DateKey ELSE NULL END 'InactiveDate'
INTO #ActivationDetails
FROM #ActivationData A

SELECT DISTINCT
D.Year
,D.CompanyId
,SUM(D.IsInactive) 'InactivatedPeriods'
,MAX(D.IsActivationYear) 'IsFirstActivationYear'
,MAX(D.DaysSinceInvoice) 'BiggestGapInInvoicing (Days)'
,MAX(D.InactiveDate) 'LastInactiveDate'
,MAX(D.PreviousInvoiceDate) 'LastInvoiceDate'
,CASE WHEN MAX(D.InactiveDate) > MAX(D.PreviousInvoiceDate) THEN -1 ELSE 0 END 'NotActiveAtEndOfYear'

--to grab the activations per customer per year follow equation below
-- Activations = [InactivatedPeriods] + [NotActiveAtEndOfYear] + [IsFirstActivationYear] --this part will be done in Power BI
FROM #ActivationDetails D
GROUP BY
D.Year
,D.CompanyId


DROP TABLE #Calendar
DROP TABLE #Companies
DROP TABLE #Bridge
DROP TABLE #ShipmentData
DROP TABLE #RunningDates
DROP TABLE #ActivationData
DROP TABLE #ActivationDetails

Additional Resource:

I Wrote a Book – Hands-On SQL Server 2019 Analysis Services

While not the first time I have authored, this is the first book that I wrote as the sole author. Analysis Services is the product I built my career in business intelligence on and was happy to take on the project when I was approached by Packt.

I think one of my favorite questions is about how much research time did I put in for this book. The right answer is almost 20 years. I started working with Analysis Services when it was called OLAP Services and that was a long time ago. Until Power Pivot for Excel and tabular model technology was added to the mix, I worked in the multidimensional model. I was one of the few, or so it seems, that enjoyed working in the multidimensional database world including working with MDX (multidimensional expressions). However, I was very aware that tabular models with the Vertipaq engine were the model of the future. Analysis Services has continued to be a significant part of the BI landscape and this book give you the opportunity to try it out for yourself.

This book is designed for those who are most recently involved in business intelligence work but have been working more in the self-service or end user tools. Now you are ready to take your model to the next level and that is where Analysis Services comes into play. As part of Packt’s Hands On series, I focused on getting going with Analysis Services from install to reporting. Microsoft has developer editions of the software which allow you to do a complete walk through of everything in the book in a step by step fashion. You will start the process by getting the tools installed, downloading sample data, and building out a multidimensional model. Once you have that model built out, then we do build a similar model using tabular model technology. We follow that up by building reports and visualizations in both Excel and Power BI. No journey is complete without working through security and administration basics. If you want learn by doing, this is the book for you.

If you are interested in getting the book, you can order it from Amazon or Packt. From November 20, 2020 through December 20, 2020, you can get a 25% discount using the this code – 25STEVEN or by using this link directly.

I want to thank the technical editors that worked with me to make sure the content and the steps worked as expected – Alan Faulkner, Dan English, and Manikandan Kurup. Their attention to detail raised the quality of the book significantly and was greatly appreciated.

I have to also thank Tazeen Shaikh who was a great content editor to work with. When she joined the project, my confidence in the quality of the final product increased as well. She helped me sort out some of the formatting nuances and coordinated the needed changes to the book. Her work on the book with me was greatly appreciated. Finally, many thanks to Kirti Pisat who kept me on track in spite of COVID impacts throughout the writing of the book this year.

I hope you enjoy the book!

PASSMN June 2020 – Data Classification with SQL Server and Azure

I presented at the virtual Minnesota SQL Server User Group meeting on June 16, 2020. The topic was data classification with SQL Server 2019 and Azure SQL Database.

Data Classification Basics

Data classification in both SQL Server and Azure allow you to discover and label data based on information type and sensitivity. Information type is a way to describe the content of the data at high level. This includes types such as Address, Name, Networking, and Credit Card. By tagging your columns with types you will be able to easily see the types of data stored in your tables. You can also label the sensitivity. This includes labels such as Confidential and Confidential-GPDR.

Using SQL Server 2019 and SSMS 18.4+

For on premises implementations, you can use SQL Server Management Studio. I would recommend that you use SSMS 18.4 or greater. This has the most capability. SQL Server 2019 includes the sys.sensitivity_classifications system catalog view so you can query to see what field have been labeled.

To get started, open up SSMS. Right click the database and choose Tasks > Data Discovery and Classification > Classify Data. This will allow you to

Finding the Data Discovery and Classification Options in SSMS

view the Data Classification window in SQL Server. You will get a list of recommendations and the ability to add custom classifications in your SQL Server database.

The Data Classification view in SSMS

Once you have classified some of your data, you are able to view a report that shows the coverage of the classification work you have done.

Data Classification Report in SSMS

Adding Data Classification in Azure SQL Database

Azure SQL Database supports similar functionality for discovering and classifying data. The primary differences are (1) it requires Advanced Data Security which costs $15/month per server and (2) audit logging support is built in.

You can find this in the Azure portal with your SQL Database.

Advanced Data Security in Azure SQL Database

As you can see above, you get a visual here initially. Click the Data Discovery & Classification panel to open a similar classification window that we see in SSMS. This will allow you to discover and classify your data.

The key difference is turning on auditing and logging information about people querying the classified data. In the Security section in your SQL Database view in the Azure portal, choose Auditing. You can now add auditing to your server or database. (Click here for information about setting up Auditing.) I chose to use Log Analytics which is in preview. Log Analytics has a dashboard which shows activity in your database with this data.

Log Analytics Dashboard which Shows Access to Sensitive Data

You can click into the dashboard to dig into details. You can also use the Log Analytics query features to build your own queries to further analyze the data. The details contain who accessed the information, their IP address, and what was accessed. You can build more reports from that information to support more sophisticated auditing.

Final Thoughts

I think that there is still work to be done on SQL Server to better support auditing. Azure is ahead of the game in this area. More importantly, Azure logging is a platform level solution. You should be able to integrate your logging from the applications to the database in Azure.

You do have the ability to update the policy in SQL Server with a JSON file. I recommend you export the file and modify it. In Azure, you can update the information policy in the Security Center. Updating this policy allows you to discover data or information that you want to classify based on rules you set up. This should be part of your data governance plan.

One other follow up from the meeting. The question was raised about Visual Studio support in database projects. The answer is “sort of”. First, you need to make sure your project is targeting SQL Server 2019 or Azure SQL Database. Once that is set, you can use the following code to add the classification manually or you can apply it to your database and do a scheme compare to bring it in.

ADD SENSITIVITY CLASSIFICATION TO
    [SalesLT].[Customer].[FirstName]
    WITH (LABEL = 'Confidential - GDPR', LABEL_ID = 'fe62dcde-72c0-475c-b1af-fb8de4c8fc7e', INFORMATION_TYPE = 'Name', INFORMATION_TYPE_ID = '57845286-7598-22f5-9659-15b24aeb125e', RANK = MEDIUM);

You will need to know the GUIDs for the labels and types in your solution to do this manually. However, once this is done, you can see the information in the Properties window for the field as well.

Data Classification Properties in Visual Studio

The key thing to be aware of is that the properties are read only. You have to use the code to change them or do the changes in the database and use Schema Compare to bring them in.

Thanks again to those of you who joined us at the meeting. Here is the slide deck from that meeting. I look forward to sharing more with all of you later.

Power BI Data Security – Power BI Report Server

Power BI Security LogoPower BI Report Server was released as a way to host reports on premises. It was one of the highest requested features for Power BI. Power BI Report Server offers a subset of Power BI Service capabilities and as such does not have the rich collaborative or security options as seen in the service (online). I have a number of other thoughts on this tool, but that is not the purpose of this post. I am often asked in sessions on Power BI Data Security about Power BI Report Server. The problem is that Power BI Report Server is not what Power BI was intended to be and as such security is very different.

Power BI Report Server Uses Files and Folder Security Model

Unlike Power BI Service which leverages the Office 365 security model with workspaces and apps, Power BI Report Server only supports deploying Power BI Desktop files as Power BI Reports in SQL Server Reporting Services. Each file is uploaded to the Report Server and it is viewed by opening the report.

Power BI Report Server - Portal

You have essentially three layers of access to the report file security in Power BI Report Server.

  1. The portal itself can be secured. You can and should limit access to the reports by only allowing specific users or groups access to the report portal.
  2. Folders can be used to provide more granular security over a group of assets in the report portal. In the image above, I created a folder called PBI Secure Reports. A specific AD group has access to this folder. If a user does not have permissions to the folder, the folder does not show up in the portal and they cannot access the folder or the assets, including Power BI reports, stored in this folder.
  3. Individual reports can be secured as well. I never recommend this option as it becomes administratively difficult to manage. However, the capability is there is a single asset needs to be secured in this fashion.

These options work for any asset stored in the Report Portal and are not limited to Power BI reports.

Power BI Report Server Report Nuances

If you have read many of my posts around Power BI Data Security, I have discussed gateways, workspaces, and even Office 365 groups. The following items are uniquely related to Power BI reports stored in the report server.

Data Sources

Because it is not possible to use a Reporting Services Shared Data Source with your Power BI reports at this time (not sure if this will change), the data source information will need to be managed by report. This is not dissimilar to the Power BI Service. However, for on premises data we use the gateway with the service. There is not a common or shared data access feature in Power BI Report Server. Data sources are included in the report and can be managed in the deployed report by clicking the ellipses on the icon and selecting manage.

Power BI Report Server - Data Source

Only the report creator, Content Managers, and Publishers have permissions to see and modify this information.

Protecting Data Sources

One concern raised is whether data sources with credentials are downloaded when the Power BI file is downloaded. First key idea is that only users with the Publisher and Content Manager roles can download the files

When a Power BI report file is downloaded, you are prompted for credentials when refreshing data. Passwords are encrypted so they must be reentered when the file is opened or data refreshed in the desktop.

Mapping SSRS Roles to Power BI Functions

The portal, folders, and files are secured using SSRS roles. Here is a high level summary of the roles and how it impacts Power BI Reports

  • Browser – This is similar to a read only function or if you would deliver the report as a Power BI App in the service. This should be used by anyone who needs access to the report, but does not create content.
  • Content Manager – This is for a content creator with admin privileges. This role can manage content and user access.
  • Publisher – This is for content creators who do not need to manage users. Most content creators fall into this category.
  • Report Builder – does not apply to Power BI Reports which use Power BI Desktop to develop reports.

Avoid Deploying to the Portal Home Page

I would recommend you not allow Power BI reports to be deployed to the primary portal, but create folders to manage the group of content creators and managers based on the department or group level needs.

Reports deployed to the home page of the portal should be managed as enterprise assets. Use the folders and related security groups to manage who can deploy at this  level.

Wrap Up

I would like to extend a special thanks to Kathy Vick, Principal Consultant at Pragmatic Works. She provided guidance on implementing Power BI Report Server. Thanks Kathy for the help on this topic. Check out Kathy’s Twitter (@MsKathyV)  and LinkedIn account to learn more.

Check out more about Power BI Data Security in my blog series.

Thoughts about the Microsoft Data Amp Announcements

Microsoft conducted a live event called Microsoft Data Amp to announce a number of key features and releases for SQL Server on premises and data platforms in Azure (such as Azure SQL DB and Azure Data Lake). Some of these include features that I have been waiting to see. Here are some of announcements that I am excited about.

Microsoft Data Platforms 
Intelligent – Trusted – Flexible
On-premises & Cloud

SQL Server 2017

Yes. Microsoft has officially announced that SQL Server vNext is SQL Server 2017. The marquee feature being released in SQL Server running on Linux. But this also shows Microsoft is increasing its innovation efforts with SQL Server with an even shorter time between releases.

CTP 2 of SQL Server 2017 has been released today and includes an number of analytics features such as support for graph processing and graph queries. It will be the first commercial database with built in support for AI and deep learning database applications using R and Python scripts. Check out all the database engine improvements.

Azure SQL Database

Microsoft is bringing even more symmetry between the on-premises product and the PaaS product. The goal is to support upgrades or migrations to Azure SQL DB with minimal effort and no changes. Here are some of the features that are coming to Azure SQL DB soon:

  • Support for SQL Agent
  • 3-part names
  • DBMail
  • CDC
  • Service Broker
  • Cross-Database and Cross-Instance querying
  • CLR & R Services
  • SQL Profiler
  • Native backup-restore
  • Log shipping
  • Transactional Replication

These features will definitely bring more parity to the platforms. A number of these features are key for some of my clients to move to Azure SQL DB.

Migration Project for Azure SQL DB

Whether you have SQL Server, Oracle, or MySQL, you should be able to migrate your database to Azure SQL DB in “five simple steps”. While a great tool, I am interested in exploring this more with Oracle in particular. You can create a project in Azure that let’s you choose the source database and platform and target a Azure SQL DB then move the schema and load the database. While I am skeptical on the full capability of this solution, I look forward to exploring it more.

Azure Analysis Services is GA

The last topic I am going to bring up is Azure Analysis Services. This service is now GA which brings a great service to the PaaS space in Azure. Check out the capabilities here.

Final Thoughts

Microsoft announced much more than I highlight here including tighter AI integration into the data engine, R Server 9.1, and planet scale Document DB. Check out the Microsoft Data Amp site for more videos on what’s coming to Microsoft’s data platforms.