Category Archives: Office 2013

Excel Tip #24: Removing “Buttons” from Excel Charts

As I mentioned in my original post, Exploring Excel 2013 as Microsoft’s BI Client, I will be posting tips regularly about using Excel 2013 and later.  Much of the content will be a result of my daily interactions with business users and other BI devs.  In order to not forget what I learn or discover, I write it down … here.  I hope you too will discover something new you can use.  Enjoy!

Annoyed by the Buttons, Remove Them

This is one of my shortest tips, but many users who are new to Excel charts look to remove the field buttons on the charts. See the image below to see what we are talking about.

image

It is really simple to hide them or turn them off. You can manage the field buttons on the ANALYZE tab on the POWERCHART TOOLS. If you want to turn all the buttons off, you only need to click the Field Buttons button which will look “unselected” and all the buttons are gone. If you want to remove only some of the buttons, expand the selection on the button and select or deselect what you want to see. (Although I am not a fan of buttons in dashboards because they are ugly.)

image

Here is the same chart without buttons. Be sure when you are designing your dashboards, you turn the buttons off so you can see how the chart looks for deployment. Buttons take up a lot of real estate in the chart. When they are removed your chart could look very different.

image

Chart Filter Buttons

One use case that I want to highlight is using filters on chart. When you add a filter for your chart, you need to use a field button to interact with the filter. The nice part about hiding the buttons and deploying to Excel Services, you can apply an underlying, hidden filter. However, if you want to change the filtered option you need to unhide the field buttons. If you want users to apply filters, you should use slicers not the filter option. Slicers are a much better user experience.

Excel Tip #23: Adding a Trendline to Your Chart

As I mentioned in my original post, Exploring Excel 2013 as Microsoft’s BI Client, I will be posting tips regularly about using Excel 2013 and later.  Much of the content will be a result of my daily interactions with business users and other BI devs.  In order to not forget what I learn or discover, I write it down … here.  I hope you too will discover something new you can use.  Enjoy!

Value of a Trendline

Adding a trendline to a chart visually projects the current trend over time or other axis. Adding a trendline to a chart is very easy. I am going to look at two usages – one over time and one over age ranges. Excel supports a variety of trendlines to meet specific data stories. I typically use linear and logarithmic depending on which seems to show the trend the best. However, Excel also supports polynomial, power, exponential and moving average trends. If you want to know more about the formulas and the properties applied check out Microsoft’s support article on trendlines.

Adding a Trendline Over Dates

This is the most common use of a trendline. As you use columns or bars to represent data for each time period, you add a trendline to see how the data is changing. First, create a chart with data represented over time. In my example, I am looking at game usage on cell phones as a percent of polled users over time. Here is my chart without the trendline.

image

In order to add a trendline, click the cross image next to the chart. It will open up the Chart Elements menu. In this menu click the arrow by Trendline to see your options. If you just select Trendline, it will add a linear trendline by default.

image

Once you select trendline, you can hover over the various trendline types to see how they will be displayed on the chart. In order for this to work, you must have the Trendline option selected. Here is my chart with the linear trendline applied.

image

Adding a Trendline Over Age Ranges or Non-Date Data

The other option is to use a non-date based trendline. In order to make sense the axis should still be sequential in nature. In my example, I am going to swap the date out for the age range of the responses. I am also using the Exponential trendline in this.

image

As you can see here, we can see that younger users are more apt to play games. The trend works because the data being reviewed is sequential.

Multiple Trendlines

You can also have more than one trendline on a chart. There are two scenarios that I will show here. In the first scenario, you can add two types of trendlines to your chart. To add another trendline, you go to the cross and select the trendline type you want to add. If you get the wrong one created, unselect the trendline option and reapply. Here is my Game Usage by Age example with the linear trendline added.

image

You can also do this with multiple data points. I am going to add Internet Usage % to the chart and compare as well. When you select the Trendline option, it will open a new dialog box allowing you to select the metric you want to trend. So, in our use case we select Game Usage % for the first trendline. It will match the trendline color to the bar color. Now open the Chart Elements menu and select the arrow and choose linear again. This will let you choose the value to be trended. We then select the Internet Usage % and we get two linear trendlines on the chart.

image

image

Here you can see the year the trends “cross”. It allows for some other interesting visualizations you can add to your Excel dashboard.

Excel Tip #22: Combo Charts – Out of the Box Functionality

As I mentioned in my original post, Exploring Excel 2013 as Microsoft’s BI Client, I will be posting tips regularly about using Excel 2013 and later.  Much of the content will be a result of my daily interactions with business users and other BI devs.  In order to not forget what I learn or discover, I write it down … here.  I hope you too will discover something new you can use.  Enjoy!

Creating a Combo Chart

Excel supports a couple of options for overlaying lines on bars or columns. The most common method is to use the combo chart. In this option, you would put metrics into the chart and then select which is the bar and which is the line. This is particularly helpful when using different types of metrics such as counts and percentages. To set this up, you create the initial chart with the metrics you want and then change the type. In the example below I have text and email usage counts with internet usage as a percentage in a column chart.

image

Select the Change Chart Type option on the PIVOTCHART TOOLS menu.

image

In the dialog that is opened, choose Combo Chart at the bottom. Excel will separate the values into lines and columns. In my case, it picked correctly, but be sure to check as it does not do what you want all the time. Because I am using percentages and counts, I want to have a secondary axis. My result is below. We can now look at percentages and counts in the same chart.

image

This functionality can only be used with column charts and lines or areas. The value is being able to show two types of data on the same chart. Up next, we will look at overlaying charts in Excel which gives you more design options for your dashboard.

Excel BI Tip #20: Wingdings–an Excel Services Supported Indicator Alternative

As I mentioned in my original post, Exploring Excel 2013 as Microsoft’s BI Client, I will be posting tips regularly about using Excel 2013 and later.  Much of the content will be a result of my daily interactions with business users and other BI devs.  In order to not forget what I learn or discover, I write it down … here.  I hope you too will discover something new you can use.  Enjoy!

Wingdings? Really? How did we get here?

As you have seen from previous tips, I have been working with customers to build dashboards using Excel 2013 in SharePoint 2013. I am a big fan of conditional formatting. However, one of my customers wanted to use a specific design which used triangles as images on their dashboard to indicate whether the trend was improving or worsening. What you may not know is that drawing shapes and textboxes are among the objects not supported in Excel Services.

Here is what it looks like in Excel:

image

Here is what it looks like in Excel Services – note the warning bar at the top:

image

This led me to my first option – use indicators in conditional formatting, it has a similar image.

image

As you can see it starts out fairly small, not the large shape we want to display. So, we added the indicator into a merged cell and increased the font size. You can see the image is pixelated.

image

Making matters worse, when you upload it to Excel Services it will not honor the font size.

image

Winging it with Wingdings

For some reason, yet unknown, it occurred to me to use Wingdings. Wingdings are TrueType fonts which make them “scalable” because you can specify the font size. In this case we are looking for an upward facing triangle and down facing triangle. So the first thing we needed to do was try to find out if those symbols existed. Here is one of the clearest cheatsheets I found for Wingdings font set: http://speakingppt.com/2011/10/31/finally-a-printable-character-map-of-the-wingdings-fonts/. Bruce has created a PowerPoint slide which is easy to follow. Whether you use, his or look it up on your own, you will find that in Windings 3, the letters “p” and “q” are the directional triangles that we need (p and q). Now let’s build our visualization with Wingdings (and no, I can’t believe I said that as well).

The key to using wingdings is that you need place the font representing what you are looking for in the field as shown here. You can see that the value in the cell is “p” but the wingding font gives us the triangle.

image

Not only can you affect size, you can change the color.

image

But the end goal was to have this work in Excel Services. So I can save this sheet to my Office 365 SharePoint site. As you can see here, it works as desired.

image

Here is how you can put this to practical use. Let’s say we want to use a green smiley (Wingdings – J) and a red frown (Wingdings – L) based on our value. Greater than or equal to .5 or 50% will be smiling and less than .5 will be frowning. Assuming that the value we are evaluating is in K5, we would use the following formula to set the value:

=IF(K5>=0.5, “J”, “L”)

This sets the text value that we want to use. Because we are using a character value in the field, we can use conditional formatting to set the appropriate color by using the Highlight Cells Rules – Text That Contains… option. You will create two rules, one for J and one for L. You can use a default setting or create a custom format to change the color.

image image

By using Wingdings and Webdings, you will be able to further enhance your dashboards with a variety of symbols. I hope you have fun with your dashboards and get to tell your users or designers that, yes, you do use Wingdings! Enjoy winging it!

Note: The target environment needs to support the Wingding fonts. We have seen this not work when using iPads for instance. Be sure to consider and test your target environments for this solution.

Excel BI Tip #18: Using CUBE Functions to Break Out of Pivot Tables

As I mentioned in my original post, Exploring Excel 2013 as Microsoft’s BI Client, I will be posting tips regularly about using Excel 2013 and later.  Much of the content will be a result of my daily interactions with business users and other BI devs.  In order to not forget what I learn or discover, I write it down … here.  I hope you too will discover something new you can use.  Enjoy!

The Need to Break Out of Pivot Tables

Pivot tables are a great way for users to interact with data from an analytical source such as SSAS Multidimensional Models, SSAS Tabular Models, and Power Pivot Models. Users can connect to the data model and start working with the data. However, when building dashboards, we often need to display content in different ways – such as a header or clarifying value on a sheet. I have done this by creating a single cell pivot table but I do not think that is the best way to accomplish this. What I really wanted to do is put the value into a cell, but have it still honor slicers or filters. In this post, we will take a simple pivot table with a slicer and decompose it using CUBE functions within Excel and create dynamic cell values which respond to the slicer even after the pivot table has been decomposed.

A Look at Our Starting Point

We are going to start with a simple pivot table with a matching slicer. In the example, we have categories on columns and the gender of poll authors on rows. We are measuring the number of polls created. The slicer will show which polls have had a response [1] versus those with no responses yet [0].

image_thumb[1]

We are going to decompose this table and show how it allows you to move the cells around while retaining the connection to the underlying data.

Decomposing the Table

The first step is to select the table and open the analyze tab on the ribbon. (NOTE: I have reduced the size of my window to make the next steps clearer. The ribbon will adjust with the size of the window.)

image

On the Analyze ribbon select OLAP Tools and Convert to Formulas.

image

The result is that your pivot table will “disappear” and all the selected values will be created using formulas as shown here.

image

The next sections will look at how the table was converted and how we can use the results.

A Look at the Formulas

The pivot table is converted to formulas. You can see the formulas for any of the cells by clicking the cell with the formula bar visible. Let’s look at how each area was converted. We are going to start with the first cell on the upper left and work our way through the cells from left to right and then down the rows.

“Count of Poll ID”

This is the name of the measure we added to the pivot table. Here is the formula:

=CUBEMEMBER(“ThisWorkbookDataModel”,”[Measures].[Count of PollID]”)

The CUBEMEMBER function returns the name of the member, in this case, “Count of PollID”. The basic syntax includes the name of the connection as it is referred to in the data connections of the workbook – “ThisWorkbookDataModel”. In our example, the name is that of the Power Pivot model in the background. If you connect to a server based model, you would provide a name with the connection or use the default name. The next portion is the member.

One important note, if you are familiar with MDX and multidimensional models, you will understand the syntax quite quickly. If you have never worked with MDX, you may find the syntax complex or confusing. The Measures dimension referred to here, will apply to any values in the Values section of the Pivot Table fields window.

Finally, you can add a caption if you choose to the value. This will change what is shown in the field. For instance, if I add a caption of “Poll Counts”, it will show “Poll Counts” in the field. This on way to clean up a database name to look more user friendly.

“Column Labels” and “Row Labels”

These are converted to text values in the sheet and provide no value

“Entertainment”, “Fun”, and the Other Category Headers

These are created the same way as the Count of PollID was created. Here is the formula for “Entertainment”.

=CUBEMEMBER(“ThisWorkbookDataModel”,”[Category].[CategoryName].&[Entertainment]”)

As you can see the connection is the same, however the structure of the value is different. In this case, it is [Dimension].[Hierarchy].[Member]. The ampersand (&) signifies that the member is the key. Sometimes a number will show up here. You can see the structure that is used based on what is displayed in the Pivot Table Fields window as shown here.

image

 

“Grand Total” (Column and Row)

The Grand Total headers both use captions. The actual syntax uses the “ALL” member of the hierarchy being displayed. Here is the example from the column Grand Total header.

=CUBEMEMBER(“ThisWorkbookDataModel”,”[Category].[CategoryName].[All]”,”Grand Total”)

Those familiar with MDX will note that this is a standard way to roll up the data in a cube. Excel uses the same key word here to roll up all the values from the category to total.

“F” and “M”

The gender labels also use the CUBEMEMBER function as shown here.

=CUBEMEMBER(“ThisWorkbookDataModel”,”[Poll Owner].[Gender].&[F]”)

As you can see, it uses the Poll Owner as the dimension and Gender as the hierarchy.

The Values or Numbers in the Table

The values section is the last part of the conversion or decomposition to review. It is also the most interesting. Up to this point we have been using the CUBEMEMBER function. The values use the CUBEVALUE function. For those familiar with MDX, each value cell represents a tuple. A tuple is an address for a value in a cell of an analytic structure. When you click inside the cell, then inside the formula bar you can see how the value is created.

image

Here is the formula:

=CUBEVALUE(“ThisWorkbookDataModel”,$B$4,$B8,C$5,Slicer_Poll_Has_Submissions)

Let’s break this down now.

The first parameter is the connection name which is the same as we saw in the CUBEMEMBER function. Next we have three cell references. $B$4 refers to value or [Measures].[Count of PollID] which is the first cell we evaluated. The next reference, $B8, is to row header or [Poll Owner].[Gender].&[F]. The third refers to the current column header which is Entertainment or [Category].[CategoryName].&[Entertainment] which is in cell C$5. The final reference is to the slicer. We will discuss that more in detail in a moment.

The actual value being sent to the underlying data model is

=CUBEVALUE(“ThisWorkbookDataModel”,”[Measures].[Count of PollID]”,[Poll Owner].[Gender].&[F],[Category].[CategoryName].&[Entertainment],[Poll].[Poll Has Submissions].[ALL])

What is important about understanding this is that you can change everything around. You can move the headers, etc, and refer directly to the measure you want or you can move header and use the new cell reference. The cell does not even need to be on the same sheet which is the ultimate level of flexibility.

A quick note on the slicer. The is a filter object that has background reference to the data. The name used in the formula is “Slicer_Poll_Has_Submissions”. It is constructed from the name of the slicer as found in the Slicer Options dialog with a Slicer prefix and underscores to replace spaces. Our slicer is named Poll Has Submissions and was converted to Slicer_Poll_Has_Submissions in the formula.

image

Converting Filters

We used a slicer to provide built in filtering to our formulas. If you have a filter you will be provided with a choice. You can either leave the filter intact or convert the filter. Let’s look at both options.

image

Here is the pivot table we will be using for these examples:

image

The filter is for Poll Has Submissions. We are looking at completed polls for each category.

Leaving the Filter

image

As you can see the filter remained intact while the rest of the table was converted to formulas. The primary difference is that the filter reference in the formula for CUBEVALUE is the cell that the filter shows the value in.

=CUBEVALUE(“ThisWorkbookDataModel”,$C$13,$B18,C$15)

image

This is nice if you want to use the filter format and not a slicer to enable users to filter the value. You can also map that same cell to the other values we were looking at as it returns the member value just like the slicer. Because the filter still functions as a pivot table, when you select the filter you get the Pivot Table ribbon. From here you can move the filter to the position on the worksheet you desire.

Converting the Filter

When you choose to convert the filter, it converts the filter value to the currently selected value. In the case of this example we have All selected and the ALL member is selected.

=CUBEMEMBER(“ThisWorkbookDataModel”,”[Poll].[Poll Has Submissions].[All]”)

What Does It Mean to Me?

By pulling apart an existing table that has data you want to display, you are able to move the cells around and be more creative in your dashboard design. For example, we can highlight the number of polls related to sports created by men. Then we can create an entire dashboard with other details around this without dealing with a pivot table.

image

Have fun creating more creative dashboards with these functions.

Excel Apps – Not Quite Ready for Primetime

While this is not a regular Excel tip, but it is about Excel. In my Excel BI Tips series, I am always looking for ways to build Excel dashboards or do BI work with Excel that will help everyone. In this case, I am going to discuss a new feature in Excel 2013 and Office 2013 and some of the drawbacks we discovered while trying to bring dashboards into production on SharePoint.

Excel Apps, What Is That?

With the introduction of Office 2013 and SharePoint 2013, Microsoft added the capability to create apps that can be used in the various Office applications to provide enhanced capabilities. I was most interested in the ability to bring in new visualizations in Excel that could be used for creating dashboards on my projects. One of the key advantages of using them, was that they worked when deployed to Excel Services in SharePoint without installing anything on SharePoint.

Here some examples of visualizations I planned to use.

Gauges by DataVis Design

image

People Graph by Microsoft

image

Modern Chart by Microsoft

image

Bing Maps by Microsoft

image

There are a number of other visualization options that are free or for some charge as well as other functions. You can find more of them and more information about Office Apps here.

Initial User Experience Is Poor

After getting a couple of these visualizations in a dashboard over the period of a couple of weeks were were ready to deploy the dashboards for user acceptance. Each user who opens the dashboard will have to clear the following install message from each app when they load the dashboard.

image image

While not a “big” deal for savvy users, this is really an unacceptable user experience for less savvy or less patient users. Furthermore, this could easily turn into a support nightmare as each new user is likely to call or email support regardless of the amount of instructions provided. Given that some of our audience was likely going to be executives, we determined that this would not work for us and would actually reflect poorly on our project.

Ongoing User Experience Issue

So, if you decide to move forward with these apps, you dashboard can look pretty good. However, this brings up a more long term issue. Each of the visualizations created have one or more settings buttons that remain visible, even after deployment. For instance, the gauges have a “gear” and a “question mark.” One the first requests we got from power users reviewing the dashboard was to hide them. As far as I can tell this is not possible. Next, the question was “why doesn’t the question mark contain information about the metric being displayed?” Great question, but the question mark is there to provide information about the gauge not the content. Once again, users don’t need that information. These issues reinforced our decision to remove them from our executive level dashboards and not recommend their use in other dashboards.

image

image

Concluding Thoughts

I am not sure if the problem lies in the way the apps were created or with what Microsoft has enabled in the API designs. In the end, these visualizations need to have a “deployment view” or something similar that will hide all this as well as deploy cleanly for end users. These apps do provide some cool visualizations that are not readily available elsewhere, but they need to be cleaner or more elegant for use in general dashboards deployed in SharePoint. Understanding these nuances will hopefully help you make the better decisions about dashboard design in Excel with Office Apps.

Excel BI Tip #17: Using the Timeline Filter

 

As I mentioned in my original post, Exploring Excel 2013 as Microsoft’s BI Client, I will be posting tips regularly about using Excel 2013 and later.  Much of the content will be a result of my daily interactions with business users and other BI devs.  In order to not forget what I learn or discover, I write it down … here.  I hope you too will discover something new you can use.  Enjoy!

Before I go much further, I wanted to call out an update to this series. I am planning to continue to cover more about Excel and Power BI components in Excel through these tips as we move forward. The focus will continue to be on Excel 2013 and beyond. So look for “sub series” around Power BI tools such as Power Pivot, Power Query, and Power Map as they are now integral parts of the Excel BI story.

Introducing the Timeline Filter

The Timeline Filter is a great visual filtering tool that can be used with pivot tables, pivot charts and data from Analysis Services and Power Pivot. It is the best way to allow users to have “range” query capabilities in Excel. It breaks down a date into Years, Quarters, Months and Days. The Timeline was first released with Excel 2013.

image

imageYou can find the Timeline filter in the same area on the ribbons as the Slicer filter or by right-clicking the fields area used with pivot tables and charts. If you have a valid date in your model, the option will be available in this area. However, if you have no valid dates, it will not be available. This is by far the most frustrating part of working with this filter. I will go through what I have discovered about getting a date that can be used with this filter in the next section.

When you can use it, Timeline filters greatly enhance the look and feel and the ease of use for Excel dashboards and analytics.

Getting a “Date” Value the Timeline Can Use

First of all, this will be the most frustrating part of working with this filter. The Timeline requires a date field, not a date dimension. This means that traditional cubes will have the least success working with this feature unless the cube design is modified. Typically, we create date dimensions that use a surrogate key that is a integer data type. Even when we choose to make this a “smart” key (e.g. 20120131 = YYYYMMDD), the value we place in the related fact tables is an integer. However, we often include an actual date as an attribute so there can be workarounds. If you are working with a cube design that has no dates typed as dates, it is likely you will be unable to use this filter.

The key point is that the Timeline must have a date value in order to be implemented. The Timeline will work with any field that is a date.

Adding a Timeline to a Pivot Table

In my example, I will be using a Power Pivot model in Excel. This is not built on a star schema, but the model has a date table that will be used in one of the demos as well. My starting pivot table will be from one table that summarizes polls by state (I am using the MyVote sample data from Modern Apps Live). As you can see in the screenshot below, it has states and the sum of submissions.

image

The next step is to add a Timeline filter. If we right-click the PollSubmissionDate field, we will see the option for adding it as a Timeline. image

image

Voila! We have a Timeline that works with our pivot table. In the next section, we will break down the parts of the filter and its options.

Timeline Parts and Options

image

The Timeline Caption and Header properties affect the same section. In our current Timeline, we have a caption of “PollSubmissionDate” which is the name of the field. This is the default when creating the Timeline. You can change the header by changing the caption. If you do not what to show the header, unselect the Header box in the options.

The other three options also allow you to hide or show features in the Timeline. By default, all of the features are showing.

The Scrollbar is located at the bottom of the Timeline. It allows users to scroll through the available dates in the filter. This is helpful when dealing with a underlying large date range.

The Selection Label is the portion that shows what has been selected in text form. In the example above, you can see that Jan-Feb 2014 has been selected and that is what is shown in the label. I find that this reinforces to the user what they have selected. If the label is not visible, then the bar under the dates is the only way to see what has been selected and that is not always clear to users.

The last option that can be turned on or off is the Time Level. This is the drop down list that shows Years, Quarters, Months, and Days. This can be used to change the granularity of the selection bar. Depending on the implementation, you may want to limit the Timeline to a particular view. However, if you are using this dynamically and the data exists to support all of those levels, then you are best served giving users the option to select the granularity of the selection bar.

The truly “cool” feature is the selection bar. Users can “grab” the edges to expand or contract the range of dates they wish to see. As they change the granularity with the time level, they are able to select days, months, quarters, or years. This truly allows for dynamic range filtering which has typically been very difficult to implement in a simple fashion in Excel.

Connecting the Filter to a Second Pivot Table

As with slicers, we can have the Timeline filter apply to multiple objects in the workbook through Report Connections. Let’s add another pivot table and try to apply the date and you will see the issue. In this example, we are adding the PollResponseCount from the PollResponse table with the ResponseDate.

image

By right-clicking the ResponseDate, I am able to confirm that it is a candidate for a Timeline filter. Let’s see if we can create a connection with our Timeline filter.

image

As you can see in our example, by adding PivotTable2 from Sheet1 we have filtered the data in the second pivot table. The data is now limited to the January and February of 2014. Of course, we should change our caption now as the filter will apply to multiple data sets and different date fields.

Post Publication Update from Chris Webb (@technitrain). There are additional limitations when using the Timeline filter with SSAS multidimensional. See Teo Lachev’s blog post on the topic.

Power Testing ETL with Power BI – Creating the Tests with Power Pivot

PowerTool_1This is the second deep dive into Power Testing ETL with Power BI. At this point, we have created the source table which will be used in our testing. The next step is to bring in the destination table and create the tests that will be “run” against the data. In its simplest form the tests are created using logical conditions based on whether source data matches destination data and calculations applied to those data sets also match. When they don’t match, you have data load error which results in a failed test.

How to Calculate Success and Failure

The basics of the testing is turn the results into numbers and calculate if and how much we succeeded or failed. Typically, every test will result in a 1 or 0. Whether you assign 1 to success or failure is largely dependent on how you plan to display your results. If you plan to use KPIs built into the Power Pivot model, you will be comparing the number of successful tests against the number of rows expected to be imported. The primary reason for this is that you cannot target zero when using KPIs. In this scenario, successful tests result in 1 and are therefore easily compared to the number of expected rows which would be 100% successful if they matched.

The other scenario is to measure failures. In this case, we assign 1 to each failed test and count the number of failed tests. This can easily be handled in visualizations such as conditional formatting where 0 can be displayed as green and the number of failures change the state from from green to yellow then red. This helps identify the most commonly failed tests.

The method you choose is up to you and how you prefer to see the results. We will cover using both variations in visualizations, but for sake of brevity here, we will measure success against our row count. Success = 1; Failure = 0.

Creating the Power Pivot Tests

In order to create the tests, you need to open the Power Pivot window and add the destination table to the model. In our case we have created a table in the HughesMediaLibrary database called books that is our target. Here is the syntax for the target table.

CREATE TABLE dbo.Books(
BookID int IDENTITY(1,1) NOT NULL
CONSTRAINT pk_Books PRIMARY KEY CLUSTERED,
BookName varchar(100) NOT NULL,
Publisher varchar(100) NULL,
Genre varchar(50) NULL,
CopyrightYear smallint NULL,
AuthorFName1 varchar(100) NULL,
AuthorLName1 varchar(100) NULL,
AuthorFName2 varchar(100) NULL,
AuthorLName2 varchar(100) NULL,
AuthorFName3 varchar(100) NULL,
AuthorLName3 varchar(100) NULL,
AuthorFName4 varchar(100) NULL,
AuthorLName4 varchar(100) NULL,
AuthorFName5 varchar(100) NULL,
AuthorLName5 varchar(100) NULL,
PageCount int NULL
)

While I realize this is not a good normalized table, it serves our purposes well to build out the tests. This table needs to be added to the Power Pivot model before we can do the next steps.

Relating the Source and Destination

The next step is to relate the source and destination. In our case, the only data that will work is the book name. We will use the Source table as the primary table in this relationship. The idea is that all the data in the source table should exist in the target. As this is not always the case, the source is the “source of truth” for the testing scenario.

 

Building the Tests

The tests are comprised of calculated columns that handle data analysis and calculated measures which summarize results.

Validating Data Field by Field,  Row by Row

This is the primary reason that we worked with Power BI. One of the most common testing scenarios is whether the data came over correctly. In the previous post, we shaped the data with Power Query. Now we will compare it with the results from our ETL process in SSIS. We will use Book Name as the example. Every field you wish to test can follow this pattern. The test consists of a calculated column and a calculated measure.

We create a column in the destination table called Book Name Matches. (Remember we are tracking success not failures.) In each row of the data we need determine that the book name in the destination is the exact match for the book name in our source. We used the following DAX for that calculation:

=IF(RELATED(‘Booklist Source Fixes'[BookName])=’Media Library – Books'[BookName],1,0)

It looks at the related table to determine that the field names match. If they match, the test is assigned a 1 for that row. If they do not match, a 0 is assigned. (The table names are how I named the source and destination. They may not match your solution if you are following along.) Once we have the rows evaluated, we will sum the values with a Book Name Matches measure:

Book Name Matches (34):=SUM([Book Name Mismatch])

We will use the Book Name Matches (34) measure to compare with the book count. If they match, all tests passed. If they do not, then some or all rows have failed.

The number after the measure, 34, is the test key from TFS. I added this into the measure to make it easier to identify which test case is being evaluated with this measure. In some cases, you may have multiple measures that are required to complete a test. You can either evaluate them independently or create and additional measure that summarizes them for your use.

Other Validations or Tests

Some other basic validations can be created as well. A common one would the book count. In my scenario, I return the book count then evaluate it using a KPI. Another way to do this is to add another measure that checks for equality between the two book count measures in the source and destination. If they match, success. If not, failure.

You can also use measures to validate expected totals the same way we were working with counts. This is particularly helpful in financial data loads where you would want to verify a sum of balances to make sure the results match. The point is that you can add any other measures that you want to compare in order to meet the unique needs of your situation. It is also possible that you can compare to entered values. If you know that 100 widgets are to be imported, you can have the measure evaluate against 100 instead of  a measure in the source.

Recording the Results in TFS

In order to bring the process full circle, we enter test results into TFS or Visual Studio Online. This allows us the ability to track test results, bugs, and fixes in a development lifecycle tool. It is also the best way to track testing history. One caveat here is that the query results from TFS do not permit you to set test results in Excel. Ideally, we should be able to link in the tests with the results. We could then update the results in the query and push it back. This is NOT supported at the moment. As a result, you will need to open the tests in TFS to update your results. This is not a significant issue because you should also create bugs for failed tests. It’s primarily a nuisance.

An added side effect of using this method to test is that we are able to collaborate with developers to determine what the bug actually is. Because all the data is loaded into Excel reviewing results is fairly simple and may actually be easier than trying to look at the destination system.

Quick Look at SSIS

Up to this point, we have focused on how an non-developer can set up the source and destination and proceed to test. I wanted to call out the author name work done in Power Query to highlight why Power BI is a great choice. When splitting author names, the work was done using right-click operations. Here is an example of the expression code used to split out the second author name column:

(DT_STR,200,1252)TRIM((FINDSTRING(AuthorNames,”,”,1) == 0 ? NULL(DT_WSTR,200) : TRIM(SUBSTRING(AuthorNames,FINDSTRING(AuthorNames,”,”,1) + 1,FINDSTRING(AuthorNames,”,”,2) == 0 ? LEN(AuthorNames) : 1 + LEN(AuthorNames) – FINDSTRING(AuthorNames,”,”,2)))))

Compared to Power Query, this is complex and not intuitive. While Power Query is not intended for enterprise ETL use, it’s simplicity helps test complex scenarios such as our author name split without having to create and equally complex SQL statement or expression.

The next post will take a look at some of the visualization options for the test results.

Power Testing ETL with Power BI – Shaping The Data with Power Query and Power Pivot

PowerTool_1This blog post digs into the details of shaping the data with Power Query and Power Pivot in order to build out the test cases. In the previous post, you were able to get a sense of the bigger picture and how the pieces work together. This post will focus entirely on creating the source table that will be used.

One of the most difficult parts of testing the data in an ETL process is that the data needs to be transformed to match the results of the ETL process. Typically this is done using a combination of tools including SQL, Excel, and even Access. The solution I am proposing will use Power Query to do the initial massaging of the data and Power Pivot to put any finishing touches in place.

Understanding the Requirements

The first thing that has to be understood are the requirements. Those requirements are driven from the business rules and the Source to Target Map. Because we are focusing on a non-developer to deliver this work, we need to move away from developer centric tools and into the world of Excel and Power BI.

Building Out the Power Query Query

Power Query is an excellent choice for this work. It allows us to transform or shape the data through a series of steps. What really makes this compelling is that Power Query is a “no code” solution. Once the tester or analyst is familiar with the tool, they understand that most operations can be accomplished using short cut or right-click menus during the design process. Here is the indepth look at what it will take to take the multiple authors in the source and separate them into multiple columns using Power Query.

Step 1 – Find the data source

In our case the data source is a CSV file. You can download that file here. This link will opens an Excel file with the pipe-delimited values that will be used as the source. I would recommend saving it as a .csv file as it is easier to work with in Power Query.

Here is the data raw so you can see what we will be working with:

BookName|Publisher|Genre|Copyright|PageCount|AuthorNames
Wish List|John Locke Books|Thriller|0011|215|John Locke
Riders of the Pale Horse|Bethany House Publishers|Christian Fiction|1994|348|T. Davis Bunn
HTML Pocket Reference|O’Reilly|Technology|2000|92000|Jennifer Niederst
Renegade|Thomas Nelson|Juvenile Fiction|2007|245|Ted DekKer
Gutenberg to Google|Billion Soul Publishing|Missions|2009|272|James Davis
Sinner|Thomas Nelson|Suspense|2008|386|Ted DekKer
SQL Server Analysis Services 2012 Cube Development Cookbook|Packt Publishing|Technology|2013|324|Baya Dewald, Steve Hughes, Paul Turley
Troubleshooting SQL Server – A Guide for the Accidental DBA|Redgate Books|Technology|2011|358|Jonathan Kehaysias, Ted Krueger

Step 2 – Open Power Query in Excel and Connect to the CSV File

image

Select the Power Query tab and select the From File option on the ribbon. Pick the From CSV option. Select the booklist.csv file and click OK. The result will be a preview of the data, which in our case is all the data. You can see it has created the Source, First Row as Header and Changed Type steps. If it did not do this for you automatically, you may need to set the delimiter and specify that the header is the first row.

image

Step 3 – Shape the Data in Power Query to Match Our ETL Process

In Power Query, we are going to split the author list and the author names. We also will apply some trimming to the data. In all we will apply ten (10) steps to query. Power Query works like an ETL tool as it shapes or transforms the data a step at a time.

Splitting the AuthorNames column

In this step, we will create a column for each author name. Our destination supports up to five authors. Our source has up to three. Right click on the AuthorNames column, select Split Column, then By Delimiter.

image

You can leave the defaults in the dialog and click OK.

image

This will result in three columns being created as AuthorNames.1, AuthorNames.2 and AuthorNames.3. Power Query does the next step which changes the data type to match what it sees in the resulting data.

Splitting the Author’s Names into First and Last Name Column

You will need to repeat this three times, once for each AuthorNames column. What is different is that we need to match a couple of business rules:

1. Author names will be stored as AuthorFName and AuthorLName for up to 5 authors (e.g. AuthorFName1).

2. Authors with middle initials or middle names or variations thereof should store these values with the first name. For example, J.R.R. Tolkien would store “J.R.R.” in the AuthorFName column and his last name, “Tolkien”, will be stored in the AuthorLName column.

Understanding these rules clarify how we should split these columns. Like before we will select to split the AuthorNames.1 column. However, in the delimiter dialog we will use a space as a delimiter and we will also choose the right most delimiter. This will pick the first space from the right, essentially the last name and everything else will be separated.

image

We will repeat the process for each column. The last step for this process is to rename columns to something meaningful for us to reference later such as the target field names like AuthorFName1. This will make the steps later simpler to follow.

Trim Author First Names for Authors after First Author

The final step we need to do is to apply a trim to the AuthorFName2 and AuthorFName3 columns. When the data is split, leading spaces were retained. In my demos, this is “discovered” as a mismatch in the test scenario. This would be an example of an easy miss for someone not used to some of the nuances of ETL. Keep in mind that we will test the tests as well throughout this process. This is a simple fix in Power Query – Right Click the affected columns and select Transform then Trim. Problem solved.

At this point, we have completed our work in Power Query. Up to this point, you may have seen the results of your query in an Excel spreadsheet. You actually want to load the data to a Power Pivot model. If you right-click on the query in the Workbook Queries panel, you can change the Load To target.

image

Select Load to Data Model and then we will finish the source data using Power Pivot.

Step 4 – Fix Additional Issues Using Calculated Columns in Power Pivot

Open the Power Pivot model in Excel. You should see data from your Power Query query as one tab of the data. While we have massaged some of the data there are still a few data issues that need to be resolved to match business rules.

3 – Copyright years must be stored as 4 digit values.

4 – Page counts should not exceed 1000.

If you look at the source data you will notice that one of the books has a two digit year for the Copyright. This should not be imported as it does not meet the rule. In our case, we will set the value to NULL in the ETL process. The same is true for one of the book page counts, it is 92,000 which greatly exceeds the maximum page count allowed by the business rule. It too will be set to NULL. The idea here is that row value checks are easily handled in Power Pivot with DAX and calculated columns.

To resolve the copyright year issue we are using the following DAX to create a new column called “Copyright Year”:

=IF([Copyright] < 1900, BLANK(), [Copyright])

To resolve the page count issue, we use the following DAX and create a “Pages” column:

=IF([PageCount]>1000, BLANK(),[PageCount])

Now we have fixed the remaining issues that violate business rules in the Power Pivot model.

Step 5 – Add Some Calculated Measures and Columns that Can Be Used for Data Validation

The final step is to add some calculations that will help us do some basic load testing. The first is just the row count. In this case, I created two measures: Source Book Count and Source Distinct Book Count (This handles a business rule that says a title can only be imported once). We can use these measures to verify that the expected data made it from source to destination. Both of these measures were created in the calculation area in Power Pivot using the Autosum functions from the ribbon. The resulting DAX is noted below.

Source Book Count:

Source Book Count:=COUNTA([BookName])

Source Distinct Book Count:

Source Distinct Book Count:=DISTINCTCOUNT([BookName])

The last calculation we need to create is the Author Count calculated column. This needs to be a column as each row could have a different number of authors. Based on what we know with the data, we will count instances of AuthorLName columns that are not NULL to determine the number of authors.

=IF(ISBLANK([AuthorLName1]),0,1)+ IF(ISBLANK([AuthorLName2]),0,1)+ IF(ISBLANK([AuthorLName3]),0,1)

This calculation would need to be modified if the source had an row with more than three columns.

Shaping Is Complete

The source transformation is now complete in the test scenario. A key point is that no code per se was written. While some DAX was required, it was fairly straightforward and likely the most complicated part of setting up the source table for testing.

Next up, creating the tests with Power Pivot and DAX.

Power Testing ETL with Power BI – The Process

PowerTool_1This is a short blog series on using Power BI tools to support testing ETL processes. I have presented on this subject at few SQL Saturdays over the past few years and am finally succumbing to multiple request to turn it into a blog post. Realizing the amount of content is more than I typically would put into a single post, I will be putting together this short series to cover the material. The first post is this one. It will walk through the entire process at a high level. I will follow this post with a deeper look at Power Query’s role in the process. The third post will cover Power Pivot and building out test cases. Finally, we will wrap the series up with some visualization ideas for Excel and Power View. You can find all the posts as they come online here. Let’s get started.

The Problem Area

Why use Power BI to test ETL? While working as the architect on an ETL project for moving data from third party web service to an on-premise financial solution, we needed to put together a testing strategy that could be implemented by non-developers on the project. Our situation was that our project was “too small” to engage our QA team but the requirement for reusable testing needed to be fulfilled. Our project team consisted of a BI architect (that would be me), an ETL developer, and a business analyst (Chuck Whittemore).

NOTE: We are testing the data transformations and data load. This is not intended for auditing or performance. There are other tools for reviewing those including the built in reporting in SSIS and Pragmatic Works’ BI xPress tool. If you are tracking whether a package fails or succeeds, you should use either of these options not this process.

The Big Idea

The BA and I were discussing options for testing and we theorized that we could use a new add-in for Excel (Power Query, still in preview at imagethe time) with Power Pivot to build out tests. The key to success on this project is that we needed to be able to test with non-developer tools, no SQL Server Management Studio or SSIS could be involved in the testing. The primary reason for this is that he would be doing the testing. We also did not want to recreate every step in the ETL process the same way. So, time to put theory into practice. We determined that we would create test cases in Visual Studio then build out tests to match those cases in Excel using the Power BI add-ins. He would do the work in Excel and we, the developer and I, would provide technical support as needed.

The Recommended Tools

Before we dig into the process, I want to lay out the tools used for development and for testing. While this solution can use other tools, it is important to know what we used in practice to create our solution.

ETL Development Tools

imageThe ETL development was done using SQL Server Integration Services (SSIS). At the time, we needed to use Script tasks to consume the web service content. The financial system used a custom load process that we dumped formatted data into a file for the system to pick up and load.

In the examples, I use in the presentations and will lay out here, I will be using a text file to SQL Server implementation. While complex ETL problems are common and hard to test, this simplified version is easier to follow in examples. You should be able to apply the principles used here to test any solution.

Testing Tools

imageThe testing development for the referenced project consisted of Excel with Power Query and Power Pivot. Power Query was in preview at the time, so we had some of the performance issues and early bugs to work through. None of these issues, prevented us from completing the project.

The presentation solution relies on the latest version of Power Query (which changes every month) and Power Pivot in Excel 2013. Most of the examples are easy to follow, but you should be able to solve most transformation tests with the combination of Power Query and Power Pivot. Definitely do not discount the capabilities of Power Query and the fact that new functionality is being added each month.

Team Foundation Server/Visual Studio Online

imageBoth projects use the online version of TFS. If you are currently not using a source control and work tracking solution, I highly recommend you look at the online version of TFS. It will allow you up to 5 users free and give you ability to use source control, create test plans, create test cases, log bugs and track changes. These are key features necessary to complete a good solution that can be managed and tracked.

The Process

image

I am going to walk through my demo to build out the process steps. This will allow you to see examples. I will call out any thing of relevance related to the project here as well.

1. Business Rules

The first part of any project, especially in ETL, is to understand the business rules. If you are working with a data warehouse project, this may be fairly well documented in a dimensional model. In both of our cases here, we are moving data from one system to another. The transformations and business rules are primarily driven by the target system. Here are some examples of business rules in the media library sample project.

  • Author names are stored in separate columns – FirstName and LastName
  • If an author’s name include a middle name or initial or some variation, this combination should be stored in the first name column. For example, J.R.R. Tolkien would be stored as follows:
    — FirstName: J.R.R.
    — LastName: Tolkien
  • Copyright year should be stored as a 4 digit value
  • Page numbers should not exceed 1000

Every project has some type of business rules. It is hard to build out transformations and create test cases without these rules.

2. Source to Target Map

This is the single most important document for the tester. It tells the tester how the developer is getting from source to destination and what type of data massaging needs to be handled. Typically, people use some variation of the example created by the Kimball Group over the years.

3. Developing SSIS

The developer begins the process of creating the SSIS package. He will be using the Source to Target Map as his guide and will update that document to handle special cases in the data as needed. Ideally he is working in a development environment that will allow for test build outs as well.

4. Creating Test Plans and Test Cases

The tester creates test plans and test cases in TFS. These tests are based on business rules and the source to target map. Depending on both the complexity of the solution and the time to develop, some test cases could be did the table move the correct data field for field and row count. This method can be particularly useful when working with large tables or simple data flows. However, you should have a test case for every transformation that massages the data. This will insure that the data is being transformed as expected.

image

Keep in mind, this solution will support test cases for each field in a data load if required. The tester and architect should evaluate what is the appropriate amount of coverage to guarantee the highest level of quality in the data transform. As always, there is a diminishing rate of return if you “test everything” at the lowest level. It will be expensive in terms of cost of development when the chance for error is minimal. It will also take substantially longer to test everything. You need to understand and be able to articulate how the testing was accomplished and your level of confidence in the results.

5. Building the Tests

This is the most extensive part of the process besides the SSIS development. I will not go into all the details here, but will walk through the overall process and principles. I will provide detailed examples in the follow up posts as noted above.

Let’s start with the end result. Chuck and I were able to determine that we could use DAX to create comparative formulas on data that could be brought into Power Pivot from both the source and the destination. Essentially, we wanted to use math to determine the results of the tests. So in our example, we use a formula like “if Source.CopyrightYear = Destination.CopyrightYear, then it passes, else it fails.” Depending on how you want to measure, pass could be 0 or 1. Then we add the values up to determine if data passed or failed the test. We can even tell you failure rates.

In order to get the data in a comparative state, we needed each table in the destination with a table that matched from the source. However, it is very common that sources and destinations are not one-for-one table matches. This is where Power Query comes in. Using Power Query in our example set we bring in the text file and massage or shape the data to look like the destination. Most importantly, we need to apply all business rules and transformations to the source. Once this is done, we do no massaging on the destination data. This allows us to compare what the ETL process did with what our tests say it should have done.

A key part of being able to compare is the ability to relate the two tables in Power Pivot. You need to be able to match natural keys or derived keys between the two sources. The relationship should be from the destination table to the source table. Without this relationship, you will not be able to build the calculations for the tests. Keep in mind the goal is to get our source to look like expected results. Any data in the destination should match the source in our scenario.

image

Once both tables are created and loaded into Power Pivot, we can complete the tests using DAX. In some cases, we create calculations on both tables to be compared. A classic example is row count. We count the number of rows in the source table and the destination table. Then we create a calculation on the destination to compare values. This meets the requirements of a row count test case (e.g. all data was successfully imported).

Another example of a test is to compare the content in a field from source to destination. This is where we use a lot of conditional logic to verify the contents of a field in a row is the same in both tables. Calculated columns (not measures) are used to create the comparison results. The conditional statement should result in a number. This is important in order to create a measure that sums up the results to determine if errors exist or not. If you choose success to be 1, then you will check your results against the row count to determine if there are errors. If you choose failure to be 1, then a nonzero count means you have errors. There is no right or wrong way to handle this, you would choose based on visualization techniques. Most of the time, using 1 for failures is fine. However, if you want to create KPIs, you will likely need success to be one so you have a good target to work with.

6. Testing the Initial Load

Once you have created the tests, you are ready to test the initial load. You will connect to both sources. Ideally, your source will not change so you can redo the test multiple times, but this will work regardless. Refresh the data which may require rerunning the Power Query query. Once you have refreshed the data you should be able to check the calculations in a simple pivot table to determine what tests have succeeded or failed. This is the beauty of this solution. Each subsequent execution of SSIS, you will be able to refresh your data and review your results to determine how successful the ETL is.

image      image     image

A side effect of this work is that the developer can review the test results in Excel and Power Pivot with you to more easily find the discrepancy in the data transform. In some cases, the tests are in error as well. It is important that the developer and tester work together to determine cause as well. A good team will be able to work through issues rather quickly.

7. Recording Bugs and Issues

You will need to go back to Visual Studio to change the pass/fail for each test. If a test fails you can log a bug for the developer and you that information to determine if it was fixed prior to a subsequent run. It is likely that multiple sprints will be required to complete the work so you can work with your team to determine the best ways to communicate what is ready. If you track the work in TFS, you will queries are available to help you see what work has been completed.

You can determine if the fix worked and then set the test results accordingly. This will help show progress on the project as well.

8. Visualizing the Results

You can visualize your results using KPIs, conditional formatting and even Power View. If you have a project that needs to be easily evaluated you can publish your results to SharePoint and use charts and graphs to show how accurate the process is so far.

image  image

We will dig into visualization options more in a following blog post.

Tracking Test History

No solution is perfect and that is true here as well. One of the most common questions is how do we see the historical results? This solution does not easily provide for that. I am looking at options, but for the moment the idea is that the history will be tracked through TFS. However, you could save the workbook after each iteration. This will give you some history, but you would want to make sure that you don’t refresh data on a historical workbook or the results would be overwritten.

Some final thoughts.

Power Query is not an ETL tool. It’s target destination is always the same – Power Pivot. While it’s ease of use makes it appear to be a tool to be used for ETL, it is not there yet. However, it is in its ease of use that we have a place to work with it here.

My plan is to have some deeper technical dives into parts of the solution in the future.