Power Testing ETL with Power BI – Creating the Tests with Power Pivot

13 11 2014

PowerTool_1This is the second deep dive into Power Testing ETL with Power BI. At this point, we have created the source table which will be used in our testing. The next step is to bring in the destination table and create the tests that will be “run” against the data. In its simplest form the tests are created using logical conditions based on whether source data matches destination data and calculations applied to those data sets also match. When they don’t match, you have data load error which results in a failed test.

How to Calculate Success and Failure

The basics of the testing is turn the results into numbers and calculate if and how much we succeeded or failed. Typically, every test will result in a 1 or 0. Whether you assign 1 to success or failure is largely dependent on how you plan to display your results. If you plan to use KPIs built into the Power Pivot model, you will be comparing the number of successful tests against the number of rows expected to be imported. The primary reason for this is that you cannot target zero when using KPIs. In this scenario, successful tests result in 1 and are therefore easily compared to the number of expected rows which would be 100% successful if they matched.

The other scenario is to measure failures. In this case, we assign 1 to each failed test and count the number of failed tests. This can easily be handled in visualizations such as conditional formatting where 0 can be displayed as green and the number of failures change the state from from green to yellow then red. This helps identify the most commonly failed tests.

The method you choose is up to you and how you prefer to see the results. We will cover using both variations in visualizations, but for sake of brevity here, we will measure success against our row count. Success = 1; Failure = 0.

Creating the Power Pivot Tests

In order to create the tests, you need to open the Power Pivot window and add the destination table to the model. In our case we have created a table in the HughesMediaLibrary database called books that is our target. Here is the syntax for the target table.

CREATE TABLE dbo.Books(
BookID int IDENTITY(1,1) NOT NULL
CONSTRAINT pk_Books PRIMARY KEY CLUSTERED,
BookName varchar(100) NOT NULL,
Publisher varchar(100) NULL,
Genre varchar(50) NULL,
CopyrightYear smallint NULL,
AuthorFName1 varchar(100) NULL,
AuthorLName1 varchar(100) NULL,
AuthorFName2 varchar(100) NULL,
AuthorLName2 varchar(100) NULL,
AuthorFName3 varchar(100) NULL,
AuthorLName3 varchar(100) NULL,
AuthorFName4 varchar(100) NULL,
AuthorLName4 varchar(100) NULL,
AuthorFName5 varchar(100) NULL,
AuthorLName5 varchar(100) NULL,
PageCount int NULL
)

While I realize this is not a good normalized table, it serves our purposes well to build out the tests. This table needs to be added to the Power Pivot model before we can do the next steps.

Relating the Source and Destination

The next step is to relate the source and destination. In our case, the only data that will work is the book name. We will use the Source table as the primary table in this relationship. The idea is that all the data in the source table should exist in the target. As this is not always the case, the source is the “source of truth” for the testing scenario.

 

Building the Tests

The tests are comprised of calculated columns that handle data analysis and calculated measures which summarize results.

Validating Data Field by Field,  Row by Row

This is the primary reason that we worked with Power BI. One of the most common testing scenarios is whether the data came over correctly. In the previous post, we shaped the data with Power Query. Now we will compare it with the results from our ETL process in SSIS. We will use Book Name as the example. Every field you wish to test can follow this pattern. The test consists of a calculated column and a calculated measure.

We create a column in the destination table called Book Name Matches. (Remember we are tracking success not failures.) In each row of the data we need determine that the book name in the destination is the exact match for the book name in our source. We used the following DAX for that calculation:

=IF(RELATED(‘Booklist Source Fixes’[BookName])=’Media Library – Books’[BookName],1,0)

It looks at the related table to determine that the field names match. If they match, the test is assigned a 1 for that row. If they do not match, a 0 is assigned. (The table names are how I named the source and destination. They may not match your solution if you are following along.) Once we have the rows evaluated, we will sum the values with a Book Name Matches measure:

Book Name Matches (34):=SUM([Book Name Mismatch])

We will use the Book Name Matches (34) measure to compare with the book count. If they match, all tests passed. If they do not, then some or all rows have failed.

The number after the measure, 34, is the test key from TFS. I added this into the measure to make it easier to identify which test case is being evaluated with this measure. In some cases, you may have multiple measures that are required to complete a test. You can either evaluate them independently or create and additional measure that summarizes them for your use.

Other Validations or Tests

Some other basic validations can be created as well. A common one would the book count. In my scenario, I return the book count then evaluate it using a KPI. Another way to do this is to add another measure that checks for equality between the two book count measures in the source and destination. If they match, success. If not, failure.

You can also use measures to validate expected totals the same way we were working with counts. This is particularly helpful in financial data loads where you would want to verify a sum of balances to make sure the results match. The point is that you can add any other measures that you want to compare in order to meet the unique needs of your situation. It is also possible that you can compare to entered values. If you know that 100 widgets are to be imported, you can have the measure evaluate against 100 instead of  a measure in the source.

Recording the Results in TFS

In order to bring the process full circle, we enter test results into TFS or Visual Studio Online. This allows us the ability to track test results, bugs, and fixes in a development lifecycle tool. It is also the best way to track testing history. One caveat here is that the query results from TFS do not permit you to set test results in Excel. Ideally, we should be able to link in the tests with the results. We could then update the results in the query and push it back. This is NOT supported at the moment. As a result, you will need to open the tests in TFS to update your results. This is not a significant issue because you should also create bugs for failed tests. It’s primarily a nuisance.

An added side effect of using this method to test is that we are able to collaborate with developers to determine what the bug actually is. Because all the data is loaded into Excel reviewing results is fairly simple and may actually be easier than trying to look at the destination system.

Quick Look at SSIS

Up to this point, we have focused on how an non-developer can set up the source and destination and proceed to test. I wanted to call out the author name work done in Power Query to highlight why Power BI is a great choice. When splitting author names, the work was done using right-click operations. Here is an example of the expression code used to split out the second author name column:

(DT_STR,200,1252)TRIM((FINDSTRING(AuthorNames,”,”,1) == 0 ? NULL(DT_WSTR,200) : TRIM(SUBSTRING(AuthorNames,FINDSTRING(AuthorNames,”,”,1) + 1,FINDSTRING(AuthorNames,”,”,2) == 0 ? LEN(AuthorNames) : 1 + LEN(AuthorNames) – FINDSTRING(AuthorNames,”,”,2)))))

Compared to Power Query, this is complex and not intuitive. While Power Query is not intended for enterprise ETL use, it’s simplicity helps test complex scenarios such as our author name split without having to create and equally complex SQL statement or expression.

The next post will take a look at some of the visualization options for the test results.





2013 – A Year In Review

2 01 2014

It is in our nature as humans to look back in order to understand where we have been.

Warning – some of this blog contains stuff about my family… In case you only want the technical stuff.

Family Fun

This past year has been very interesting for me personally and professionally. In the past year, my youngest, Mikayla, has entered Junior High officially taking our family out of elementary schools. Mikalya joined me at the SQL Saturday event in Omaha. At the same time, my oldest, Kristyna, is now a senior at Burnsville Senior High School. Both of my boys, Alex a junior and Andrew a freshman, are both taller than me and staying active. Alex joined us at the Minnesota SQL Saturday and did a lot of volunteering. Andrew probably had the best event of all as he joined me at SQL Saturday in Fargo. There he got to see Bill Gates in person. I am proud of all of them, they are great kids. This was also the year I celebrated 20 years with the woman I love, Sheila. Without her support, I would not have been able to get this far in my career as well. Yep, it has been a busy year personally. Soon there will be lots of college, marriage, and maybe even grandkids. Wow, I must be getting old.

Magenic and the Server Development Practice

2013 is my first full year as a Practice Lead at Magenic. I started out as the Practice Lead for our Business Intelligence and Data Practice. In August, my role expanded to include SharePoint, Biztalk, and TFS. This allows us to focus server technologies at Magenic. Along  the way, I have had to learn a lot about VMs (still a work in progress). I really enjoy working with the pros across the company that we have. We some very talented BI, SharePoint and BizTalk consultants including a few virtual TSPs in SQL Server, Business Intelligence, and BizTalk.

During this past year, I have traveled around the country to consult, to speak, and to meet customers. I have had the privilege of speaking at multiple SQL Saturdays, Modern Apps Live, SQL Live, and Code Mastery events. It has been fun. I almost made it to all of our offices including the locations we opened this year. I made it to Minneapolis, Chicago, Atlanta, Charlotte, Boston, New York City, and San Francisco. Still need to get out to Los Angelos and Manila.

image

While it has been hard at times, the travel experience has been good overall. I try to keep my speaking engagements up to date, maybe I will see some of you next year.

This year I also authored outside of the blog. Chuck Whittemore (The Insight Analyst)The Changing World of Business Intelligence: Leading with Microsoft Excel - Custom Software Development White Paper and I coauthored a white paper on Leading with Excel: The Changing World of Business Intelligence. This was a fun project where we bring together Microsoft Excel and Microsoft BI in a real world way. We continue to successfully work this strategy with our customers and it was the impetus for my Excel BI Tips blog post series. I SQL Server Analysis Services 2012 Cube Development Cookbookalso had the privilege to coauthor a book that is just being released: SQL Server Analysis Services 2012 Cube Development Cookbook by Packt Publishing. This the third book I have worked on and it has been a while since was last published so this was a good experience for me. I still don’t know if I would take an entire project on, but maybe someday.

This year wraps up with me becoming a virtual TSP with Microsoft to further support their efforts with SQL Server and Business Intelligence in the marketplace.

One other thing that has been interesting for me is that with the release of Power Pivot and SQL Server Analysis Services Tabular Model, I am seeing a huge shift in how I work with and sell BI. I have always worked with cubes, but now I see the in-memory space as a more compelling and leading edge solution that will continue to change what my career will look like. While I had a lot of fun being a cube and MDX wizard, the ability to deliver results to business users in a timely fashion with great visualizations is actually more fun. The more things change …

Happy New Year!

I hope you and your family had much to look back and celebrate this year. I thank God for the blessings of a great company to work for and an awesome family to be with.





SQL Saturday#197–Omaha Recap

12 04 2013

96791a10-4559-4bac-bb98-c25ebc5e52c6

This was the second SQL Saturday hosted in Omaha.  I loved to see how the event grew from the first event until now.  John Morehouse ( T | B ) and team did another stellar job organizing this event. I know they packed the house

Having taken part in the first event, I found it spectacular that the speaker list was so diverse.  It is great to see so many SQL Server pros come out and speak at these events.

Part of the fun for me was bringing my 11-year old daughter along.  Many of you, speakers, attendees, and sponsors were kind to her and she had a good time, even though much of it was spent using my Surface to watch Netflix.  Smile  As a speaker, this was a way to spend some time on the road with her and to introduce one of my children to what I do when I travel to these events (one of my sons will be joining me in Fargo).

I do have to say that the food, both at the speaker’s dinner and for lunch were awesome!  If you are looking for an event that will feed you well, be sure to try this event next time around.

I was able to attend a few of the sessions, but I wanted to mention that the SQL Server vs Oracle: The Throwdown! was really good.  As a cross-over platform developer (check out my Oracle for MSBI Tips), it was great having a SQL Server Pro, David Klee (@kleegeek), and an Oracle Pro, Joe Grant (@dba_jedi), co-present.  Nice work guys!

Finally, I presented on Building BI Solutions with Excel 2013.  I have uploaded the slides to the event site.  Until next time.





T-SQL Window Functions on LessThanDot and at SQL Saturday 149

26 09 2012

LessThanDot Sit LogoI recently completed a series of blog posts on www.lessthandot.com on T-SQL Window functions.  The enhancements to SQL Server 2012 in this area are phenomenal.  They solve a myriad of issues including calculating running totals with SQL.  Check it out if you want to learn more and get some simple examples related to the functions and structure related to the window functions.  Here is the series outline and links to each section.

T-SQL Window Functions:

I do a presentation related to T-SQL functions for SQL Saturdays and am presenting it at the PASS Summit this year.  Maybe I will see you there.

I recently presented this at SQL Saturday #149 in Minnesota.  Here is the presentation and the demo code. Thanks for attending.

 

Finally, if you use Oracle, you will find this series helpful as well.  Most of the syntax is supported in Oracle as well.  Look for an Oracle tip with the Oracle samples for your use soon.





SQL Saturday #149 and CodeMastery–Minnesota Events

18 09 2012

sqlsat149_webWe are less than two weeks away from SQL Saturday #149 in Minneapolis on September 29, 2012 with two preconference sessions on September 28.  In case you haven’t heard, we are having the main event on a Saturday.  Yes, the precons are on Friday this year.  Check out the details here.  I am really excited about this event as we have a great group of local, regional, and national speakers at this event.  There are nine rooms being used for this event, so go out to the site and build your schedule.

cm-logoThe following Tuesday, Magenic is hosting CodeMastery with a BI track at the Microsoft Technology Center in Edina, MN.  This event includes a sessions on managing the BI stack in SharePoint and xVelocity.  The other track is Windows 8 development with sessions on WinRT and Game Development.

I’m a Speaker at Both Events

Besides plugging these two awesome events on their own, I am also a speaker for both events.  Here is what I will be speaking on at each event:

SQL Saturday #149: A Window into Your Data: Using SQL Window Functions

In this session, I will walk through the window functions enabled by the OVER clause in SQL Server.  Come join me as we celebrate the SQL Server 2012 release of analytic functions and expansion of aggregate functionality to support tasks such as running totals and previous row values.  Thankfully, this is a demo heavy session as it is one of the last sessions of the day.

CodeMastery: Data Mining with the Tools You Already Have

The next week, I will be presenting on data mining tools which Microsoft has made available to us in SSAS and Excel.  The goal of this session is to help developers understand how to implement data mining algorithms into their business intelligence solutions.

I look forward to seeing you at both events.  They are priced right, FREE!








Follow

Get every new post delivered to your Inbox.

Join 811 other followers

%d bloggers like this: