Power Testing ETL with Power BI – Creating the Tests with Power Pivot

13 11 2014

PowerTool_1This is the second deep dive into Power Testing ETL with Power BI. At this point, we have created the source table which will be used in our testing. The next step is to bring in the destination table and create the tests that will be “run” against the data. In its simplest form the tests are created using logical conditions based on whether source data matches destination data and calculations applied to those data sets also match. When they don’t match, you have data load error which results in a failed test.

How to Calculate Success and Failure

The basics of the testing is turn the results into numbers and calculate if and how much we succeeded or failed. Typically, every test will result in a 1 or 0. Whether you assign 1 to success or failure is largely dependent on how you plan to display your results. If you plan to use KPIs built into the Power Pivot model, you will be comparing the number of successful tests against the number of rows expected to be imported. The primary reason for this is that you cannot target zero when using KPIs. In this scenario, successful tests result in 1 and are therefore easily compared to the number of expected rows which would be 100% successful if they matched.

The other scenario is to measure failures. In this case, we assign 1 to each failed test and count the number of failed tests. This can easily be handled in visualizations such as conditional formatting where 0 can be displayed as green and the number of failures change the state from from green to yellow then red. This helps identify the most commonly failed tests.

The method you choose is up to you and how you prefer to see the results. We will cover using both variations in visualizations, but for sake of brevity here, we will measure success against our row count. Success = 1; Failure = 0.

Creating the Power Pivot Tests

In order to create the tests, you need to open the Power Pivot window and add the destination table to the model. In our case we have created a table in the HughesMediaLibrary database called books that is our target. Here is the syntax for the target table.

CREATE TABLE dbo.Books(
BookID int IDENTITY(1,1) NOT NULL
CONSTRAINT pk_Books PRIMARY KEY CLUSTERED,
BookName varchar(100) NOT NULL,
Publisher varchar(100) NULL,
Genre varchar(50) NULL,
CopyrightYear smallint NULL,
AuthorFName1 varchar(100) NULL,
AuthorLName1 varchar(100) NULL,
AuthorFName2 varchar(100) NULL,
AuthorLName2 varchar(100) NULL,
AuthorFName3 varchar(100) NULL,
AuthorLName3 varchar(100) NULL,
AuthorFName4 varchar(100) NULL,
AuthorLName4 varchar(100) NULL,
AuthorFName5 varchar(100) NULL,
AuthorLName5 varchar(100) NULL,
PageCount int NULL
)

While I realize this is not a good normalized table, it serves our purposes well to build out the tests. This table needs to be added to the Power Pivot model before we can do the next steps.

Relating the Source and Destination

The next step is to relate the source and destination. In our case, the only data that will work is the book name. We will use the Source table as the primary table in this relationship. The idea is that all the data in the source table should exist in the target. As this is not always the case, the source is the “source of truth” for the testing scenario.

 

Building the Tests

The tests are comprised of calculated columns that handle data analysis and calculated measures which summarize results.

Validating Data Field by Field,  Row by Row

This is the primary reason that we worked with Power BI. One of the most common testing scenarios is whether the data came over correctly. In the previous post, we shaped the data with Power Query. Now we will compare it with the results from our ETL process in SSIS. We will use Book Name as the example. Every field you wish to test can follow this pattern. The test consists of a calculated column and a calculated measure.

We create a column in the destination table called Book Name Matches. (Remember we are tracking success not failures.) In each row of the data we need determine that the book name in the destination is the exact match for the book name in our source. We used the following DAX for that calculation:

=IF(RELATED(‘Booklist Source Fixes'[BookName])=’Media Library – Books'[BookName],1,0)

It looks at the related table to determine that the field names match. If they match, the test is assigned a 1 for that row. If they do not match, a 0 is assigned. (The table names are how I named the source and destination. They may not match your solution if you are following along.) Once we have the rows evaluated, we will sum the values with a Book Name Matches measure:

Book Name Matches (34):=SUM([Book Name Mismatch])

We will use the Book Name Matches (34) measure to compare with the book count. If they match, all tests passed. If they do not, then some or all rows have failed.

The number after the measure, 34, is the test key from TFS. I added this into the measure to make it easier to identify which test case is being evaluated with this measure. In some cases, you may have multiple measures that are required to complete a test. You can either evaluate them independently or create and additional measure that summarizes them for your use.

Other Validations or Tests

Some other basic validations can be created as well. A common one would the book count. In my scenario, I return the book count then evaluate it using a KPI. Another way to do this is to add another measure that checks for equality between the two book count measures in the source and destination. If they match, success. If not, failure.

You can also use measures to validate expected totals the same way we were working with counts. This is particularly helpful in financial data loads where you would want to verify a sum of balances to make sure the results match. The point is that you can add any other measures that you want to compare in order to meet the unique needs of your situation. It is also possible that you can compare to entered values. If you know that 100 widgets are to be imported, you can have the measure evaluate against 100 instead of  a measure in the source.

Recording the Results in TFS

In order to bring the process full circle, we enter test results into TFS or Visual Studio Online. This allows us the ability to track test results, bugs, and fixes in a development lifecycle tool. It is also the best way to track testing history. One caveat here is that the query results from TFS do not permit you to set test results in Excel. Ideally, we should be able to link in the tests with the results. We could then update the results in the query and push it back. This is NOT supported at the moment. As a result, you will need to open the tests in TFS to update your results. This is not a significant issue because you should also create bugs for failed tests. It’s primarily a nuisance.

An added side effect of using this method to test is that we are able to collaborate with developers to determine what the bug actually is. Because all the data is loaded into Excel reviewing results is fairly simple and may actually be easier than trying to look at the destination system.

Quick Look at SSIS

Up to this point, we have focused on how an non-developer can set up the source and destination and proceed to test. I wanted to call out the author name work done in Power Query to highlight why Power BI is a great choice. When splitting author names, the work was done using right-click operations. Here is an example of the expression code used to split out the second author name column:

(DT_STR,200,1252)TRIM((FINDSTRING(AuthorNames,”,”,1) == 0 ? NULL(DT_WSTR,200) : TRIM(SUBSTRING(AuthorNames,FINDSTRING(AuthorNames,”,”,1) + 1,FINDSTRING(AuthorNames,”,”,2) == 0 ? LEN(AuthorNames) : 1 + LEN(AuthorNames) – FINDSTRING(AuthorNames,”,”,2)))))

Compared to Power Query, this is complex and not intuitive. While Power Query is not intended for enterprise ETL use, it’s simplicity helps test complex scenarios such as our author name split without having to create and equally complex SQL statement or expression.

The next post will take a look at some of the visualization options for the test results.





Power Testing ETL with Power BI – Shaping The Data with Power Query and Power Pivot

12 11 2014

PowerTool_1This blog post digs into the details of shaping the data with Power Query and Power Pivot in order to build out the test cases. In the previous post, you were able to get a sense of the bigger picture and how the pieces work together. This post will focus entirely on creating the source table that will be used.

One of the most difficult parts of testing the data in an ETL process is that the data needs to be transformed to match the results of the ETL process. Typically this is done using a combination of tools including SQL, Excel, and even Access. The solution I am proposing will use Power Query to do the initial massaging of the data and Power Pivot to put any finishing touches in place.

Understanding the Requirements

The first thing that has to be understood are the requirements. Those requirements are driven from the business rules and the Source to Target Map. Because we are focusing on a non-developer to deliver this work, we need to move away from developer centric tools and into the world of Excel and Power BI.

Building Out the Power Query Query

Power Query is an excellent choice for this work. It allows us to transform or shape the data through a series of steps. What really makes this compelling is that Power Query is a “no code” solution. Once the tester or analyst is familiar with the tool, they understand that most operations can be accomplished using short cut or right-click menus during the design process. Here is the indepth look at what it will take to take the multiple authors in the source and separate them into multiple columns using Power Query.

Step 1 – Find the data source

In our case the data source is a CSV file. You can download that file here. This link will opens an Excel file with the pipe-delimited values that will be used as the source. I would recommend saving it as a .csv file as it is easier to work with in Power Query.

Here is the data raw so you can see what we will be working with:

BookName|Publisher|Genre|Copyright|PageCount|AuthorNames
Wish List|John Locke Books|Thriller|0011|215|John Locke
Riders of the Pale Horse|Bethany House Publishers|Christian Fiction|1994|348|T. Davis Bunn
HTML Pocket Reference|O’Reilly|Technology|2000|92000|Jennifer Niederst
Renegade|Thomas Nelson|Juvenile Fiction|2007|245|Ted DekKer
Gutenberg to Google|Billion Soul Publishing|Missions|2009|272|James Davis
Sinner|Thomas Nelson|Suspense|2008|386|Ted DekKer
SQL Server Analysis Services 2012 Cube Development Cookbook|Packt Publishing|Technology|2013|324|Baya Dewald, Steve Hughes, Paul Turley
Troubleshooting SQL Server – A Guide for the Accidental DBA|Redgate Books|Technology|2011|358|Jonathan Kehaysias, Ted Krueger

Step 2 – Open Power Query in Excel and Connect to the CSV File

image

Select the Power Query tab and select the From File option on the ribbon. Pick the From CSV option. Select the booklist.csv file and click OK. The result will be a preview of the data, which in our case is all the data. You can see it has created the Source, First Row as Header and Changed Type steps. If it did not do this for you automatically, you may need to set the delimiter and specify that the header is the first row.

image

Step 3 – Shape the Data in Power Query to Match Our ETL Process

In Power Query, we are going to split the author list and the author names. We also will apply some trimming to the data. In all we will apply ten (10) steps to query. Power Query works like an ETL tool as it shapes or transforms the data a step at a time.

Splitting the AuthorNames column

In this step, we will create a column for each author name. Our destination supports up to five authors. Our source has up to three. Right click on the AuthorNames column, select Split Column, then By Delimiter.

image

You can leave the defaults in the dialog and click OK.

image

This will result in three columns being created as AuthorNames.1, AuthorNames.2 and AuthorNames.3. Power Query does the next step which changes the data type to match what it sees in the resulting data.

Splitting the Author’s Names into First and Last Name Column

You will need to repeat this three times, once for each AuthorNames column. What is different is that we need to match a couple of business rules:

1. Author names will be stored as AuthorFName and AuthorLName for up to 5 authors (e.g. AuthorFName1).

2. Authors with middle initials or middle names or variations thereof should store these values with the first name. For example, J.R.R. Tolkien would store “J.R.R.” in the AuthorFName column and his last name, “Tolkien”, will be stored in the AuthorLName column.

Understanding these rules clarify how we should split these columns. Like before we will select to split the AuthorNames.1 column. However, in the delimiter dialog we will use a space as a delimiter and we will also choose the right most delimiter. This will pick the first space from the right, essentially the last name and everything else will be separated.

image

We will repeat the process for each column. The last step for this process is to rename columns to something meaningful for us to reference later such as the target field names like AuthorFName1. This will make the steps later simpler to follow.

Trim Author First Names for Authors after First Author

The final step we need to do is to apply a trim to the AuthorFName2 and AuthorFName3 columns. When the data is split, leading spaces were retained. In my demos, this is “discovered” as a mismatch in the test scenario. This would be an example of an easy miss for someone not used to some of the nuances of ETL. Keep in mind that we will test the tests as well throughout this process. This is a simple fix in Power Query – Right Click the affected columns and select Transform then Trim. Problem solved.

At this point, we have completed our work in Power Query. Up to this point, you may have seen the results of your query in an Excel spreadsheet. You actually want to load the data to a Power Pivot model. If you right-click on the query in the Workbook Queries panel, you can change the Load To target.

image

Select Load to Data Model and then we will finish the source data using Power Pivot.

Step 4 – Fix Additional Issues Using Calculated Columns in Power Pivot

Open the Power Pivot model in Excel. You should see data from your Power Query query as one tab of the data. While we have massaged some of the data there are still a few data issues that need to be resolved to match business rules.

3 – Copyright years must be stored as 4 digit values.

4 – Page counts should not exceed 1000.

If you look at the source data you will notice that one of the books has a two digit year for the Copyright. This should not be imported as it does not meet the rule. In our case, we will set the value to NULL in the ETL process. The same is true for one of the book page counts, it is 92,000 which greatly exceeds the maximum page count allowed by the business rule. It too will be set to NULL. The idea here is that row value checks are easily handled in Power Pivot with DAX and calculated columns.

To resolve the copyright year issue we are using the following DAX to create a new column called “Copyright Year”:

=IF([Copyright] < 1900, BLANK(), [Copyright])

To resolve the page count issue, we use the following DAX and create a “Pages” column:

=IF([PageCount]>1000, BLANK(),[PageCount])

Now we have fixed the remaining issues that violate business rules in the Power Pivot model.

Step 5 – Add Some Calculated Measures and Columns that Can Be Used for Data Validation

The final step is to add some calculations that will help us do some basic load testing. The first is just the row count. In this case, I created two measures: Source Book Count and Source Distinct Book Count (This handles a business rule that says a title can only be imported once). We can use these measures to verify that the expected data made it from source to destination. Both of these measures were created in the calculation area in Power Pivot using the Autosum functions from the ribbon. The resulting DAX is noted below.

Source Book Count:

Source Book Count:=COUNTA([BookName])

Source Distinct Book Count:

Source Distinct Book Count:=DISTINCTCOUNT([BookName])

The last calculation we need to create is the Author Count calculated column. This needs to be a column as each row could have a different number of authors. Based on what we know with the data, we will count instances of AuthorLName columns that are not NULL to determine the number of authors.

=IF(ISBLANK([AuthorLName1]),0,1)+ IF(ISBLANK([AuthorLName2]),0,1)+ IF(ISBLANK([AuthorLName3]),0,1)

This calculation would need to be modified if the source had an row with more than three columns.

Shaping Is Complete

The source transformation is now complete in the test scenario. A key point is that no code per se was written. While some DAX was required, it was fairly straightforward and likely the most complicated part of setting up the source table for testing.

Next up, creating the tests with Power Pivot and DAX.





Power Testing ETL with Power BI – The Process

11 11 2014

PowerTool_1This is a short blog series on using Power BI tools to support testing ETL processes. I have presented on this subject at few SQL Saturdays over the past few years and am finally succumbing to multiple request to turn it into a blog post. Realizing the amount of content is more than I typically would put into a single post, I will be putting together this short series to cover the material. The first post is this one. It will walk through the entire process at a high level. I will follow this post with a deeper look at Power Query’s role in the process. The third post will cover Power Pivot and building out test cases. Finally, we will wrap the series up with some visualization ideas for Excel and Power View. You can find all the posts as they come online here. Let’s get started.

The Problem Area

Why use Power BI to test ETL? While working as the architect on an ETL project for moving data from third party web service to an on-premise financial solution, we needed to put together a testing strategy that could be implemented by non-developers on the project. Our situation was that our project was “too small” to engage our QA team but the requirement for reusable testing needed to be fulfilled. Our project team consisted of a BI architect (that would be me), an ETL developer, and a business analyst (Chuck Whittemore).

NOTE: We are testing the data transformations and data load. This is not intended for auditing or performance. There are other tools for reviewing those including the built in reporting in SSIS and Pragmatic Works’ BI xPress tool. If you are tracking whether a package fails or succeeds, you should use either of these options not this process.

The Big Idea

The BA and I were discussing options for testing and we theorized that we could use a new add-in for Excel (Power Query, still in preview at imagethe time) with Power Pivot to build out tests. The key to success on this project is that we needed to be able to test with non-developer tools, no SQL Server Management Studio or SSIS could be involved in the testing. The primary reason for this is that he would be doing the testing. We also did not want to recreate every step in the ETL process the same way. So, time to put theory into practice. We determined that we would create test cases in Visual Studio then build out tests to match those cases in Excel using the Power BI add-ins. He would do the work in Excel and we, the developer and I, would provide technical support as needed.

The Recommended Tools

Before we dig into the process, I want to lay out the tools used for development and for testing. While this solution can use other tools, it is important to know what we used in practice to create our solution.

ETL Development Tools

imageThe ETL development was done using SQL Server Integration Services (SSIS). At the time, we needed to use Script tasks to consume the web service content. The financial system used a custom load process that we dumped formatted data into a file for the system to pick up and load.

In the examples, I use in the presentations and will lay out here, I will be using a text file to SQL Server implementation. While complex ETL problems are common and hard to test, this simplified version is easier to follow in examples. You should be able to apply the principles used here to test any solution.

Testing Tools

imageThe testing development for the referenced project consisted of Excel with Power Query and Power Pivot. Power Query was in preview at the time, so we had some of the performance issues and early bugs to work through. None of these issues, prevented us from completing the project.

The presentation solution relies on the latest version of Power Query (which changes every month) and Power Pivot in Excel 2013. Most of the examples are easy to follow, but you should be able to solve most transformation tests with the combination of Power Query and Power Pivot. Definitely do not discount the capabilities of Power Query and the fact that new functionality is being added each month.

Team Foundation Server/Visual Studio Online

imageBoth projects use the online version of TFS. If you are currently not using a source control and work tracking solution, I highly recommend you look at the online version of TFS. It will allow you up to 5 users free and give you ability to use source control, create test plans, create test cases, log bugs and track changes. These are key features necessary to complete a good solution that can be managed and tracked.

The Process

image

I am going to walk through my demo to build out the process steps. This will allow you to see examples. I will call out any thing of relevance related to the project here as well.

1. Business Rules

The first part of any project, especially in ETL, is to understand the business rules. If you are working with a data warehouse project, this may be fairly well documented in a dimensional model. In both of our cases here, we are moving data from one system to another. The transformations and business rules are primarily driven by the target system. Here are some examples of business rules in the media library sample project.

  • Author names are stored in separate columns – FirstName and LastName
  • If an author’s name include a middle name or initial or some variation, this combination should be stored in the first name column. For example, J.R.R. Tolkien would be stored as follows:
    — FirstName: J.R.R.
    — LastName: Tolkien
  • Copyright year should be stored as a 4 digit value
  • Page numbers should not exceed 1000

Every project has some type of business rules. It is hard to build out transformations and create test cases without these rules.

2. Source to Target Map

This is the single most important document for the tester. It tells the tester how the developer is getting from source to destination and what type of data massaging needs to be handled. Typically, people use some variation of the example created by the Kimball Group over the years.

3. Developing SSIS

The developer begins the process of creating the SSIS package. He will be using the Source to Target Map as his guide and will update that document to handle special cases in the data as needed. Ideally he is working in a development environment that will allow for test build outs as well.

4. Creating Test Plans and Test Cases

The tester creates test plans and test cases in TFS. These tests are based on business rules and the source to target map. Depending on both the complexity of the solution and the time to develop, some test cases could be did the table move the correct data field for field and row count. This method can be particularly useful when working with large tables or simple data flows. However, you should have a test case for every transformation that massages the data. This will insure that the data is being transformed as expected.

image

Keep in mind, this solution will support test cases for each field in a data load if required. The tester and architect should evaluate what is the appropriate amount of coverage to guarantee the highest level of quality in the data transform. As always, there is a diminishing rate of return if you “test everything” at the lowest level. It will be expensive in terms of cost of development when the chance for error is minimal. It will also take substantially longer to test everything. You need to understand and be able to articulate how the testing was accomplished and your level of confidence in the results.

5. Building the Tests

This is the most extensive part of the process besides the SSIS development. I will not go into all the details here, but will walk through the overall process and principles. I will provide detailed examples in the follow up posts as noted above.

Let’s start with the end result. Chuck and I were able to determine that we could use DAX to create comparative formulas on data that could be brought into Power Pivot from both the source and the destination. Essentially, we wanted to use math to determine the results of the tests. So in our example, we use a formula like “if Source.CopyrightYear = Destination.CopyrightYear, then it passes, else it fails.” Depending on how you want to measure, pass could be 0 or 1. Then we add the values up to determine if data passed or failed the test. We can even tell you failure rates.

In order to get the data in a comparative state, we needed each table in the destination with a table that matched from the source. However, it is very common that sources and destinations are not one-for-one table matches. This is where Power Query comes in. Using Power Query in our example set we bring in the text file and massage or shape the data to look like the destination. Most importantly, we need to apply all business rules and transformations to the source. Once this is done, we do no massaging on the destination data. This allows us to compare what the ETL process did with what our tests say it should have done.

A key part of being able to compare is the ability to relate the two tables in Power Pivot. You need to be able to match natural keys or derived keys between the two sources. The relationship should be from the destination table to the source table. Without this relationship, you will not be able to build the calculations for the tests. Keep in mind the goal is to get our source to look like expected results. Any data in the destination should match the source in our scenario.

image

Once both tables are created and loaded into Power Pivot, we can complete the tests using DAX. In some cases, we create calculations on both tables to be compared. A classic example is row count. We count the number of rows in the source table and the destination table. Then we create a calculation on the destination to compare values. This meets the requirements of a row count test case (e.g. all data was successfully imported).

Another example of a test is to compare the content in a field from source to destination. This is where we use a lot of conditional logic to verify the contents of a field in a row is the same in both tables. Calculated columns (not measures) are used to create the comparison results. The conditional statement should result in a number. This is important in order to create a measure that sums up the results to determine if errors exist or not. If you choose success to be 1, then you will check your results against the row count to determine if there are errors. If you choose failure to be 1, then a nonzero count means you have errors. There is no right or wrong way to handle this, you would choose based on visualization techniques. Most of the time, using 1 for failures is fine. However, if you want to create KPIs, you will likely need success to be one so you have a good target to work with.

6. Testing the Initial Load

Once you have created the tests, you are ready to test the initial load. You will connect to both sources. Ideally, your source will not change so you can redo the test multiple times, but this will work regardless. Refresh the data which may require rerunning the Power Query query. Once you have refreshed the data you should be able to check the calculations in a simple pivot table to determine what tests have succeeded or failed. This is the beauty of this solution. Each subsequent execution of SSIS, you will be able to refresh your data and review your results to determine how successful the ETL is.

image      image     image

A side effect of this work is that the developer can review the test results in Excel and Power Pivot with you to more easily find the discrepancy in the data transform. In some cases, the tests are in error as well. It is important that the developer and tester work together to determine cause as well. A good team will be able to work through issues rather quickly.

7. Recording Bugs and Issues

You will need to go back to Visual Studio to change the pass/fail for each test. If a test fails you can log a bug for the developer and you that information to determine if it was fixed prior to a subsequent run. It is likely that multiple sprints will be required to complete the work so you can work with your team to determine the best ways to communicate what is ready. If you track the work in TFS, you will queries are available to help you see what work has been completed.

You can determine if the fix worked and then set the test results accordingly. This will help show progress on the project as well.

8. Visualizing the Results

You can visualize your results using KPIs, conditional formatting and even Power View. If you have a project that needs to be easily evaluated you can publish your results to SharePoint and use charts and graphs to show how accurate the process is so far.

image  image

We will dig into visualization options more in a following blog post.

Tracking Test History

No solution is perfect and that is true here as well. One of the most common questions is how do we see the historical results? This solution does not easily provide for that. I am looking at options, but for the moment the idea is that the history will be tracked through TFS. However, you could save the workbook after each iteration. This will give you some history, but you would want to make sure that you don’t refresh data on a historical workbook or the results would be overwritten.

Some final thoughts.

Power Query is not an ETL tool. It’s target destination is always the same – Power Pivot. While it’s ease of use makes it appear to be a tool to be used for ETL, it is not there yet. However, it is in its ease of use that we have a place to work with it here.

My plan is to have some deeper technical dives into parts of the solution in the future.





Exploring Excel 2013 for BI Tip #16: Exposing “Values” from a Tabular Model

19 06 2014

As I mentioned in my original post, Exploring Excel 2013 as Microsoft’s BI Client, I will be posting tips regularly about using Excel 2013.  Much of the content will be a result of my daily interactions with business users and other BI devs.  In order to not forget what I learn or discover, I write it down … here.  I hope you too will discover something new you can use.  Enjoy!

From Power Pivot to SSAS Tabular

As companies move through the cycle of building Excel based solutions for business intelligence and analytics, they eventually end up with a SQL Server Analysis Services Tabular Model. The tabular model comes into play when you need more data in your model or want to support more granular security.

Up to this point, users have been happily using Power Pivot models in Excel to build their analysis solutions. However, once the model is deployed to tabular some functionality or interaction with the model changes in significant ways.

To summarize this point, power users or data modelers will create Power Pivot models in Excel. These models may or may not be deployed SharePoint, but they need to take them to the next level. You can migrate a Power Pivot model to tabular with ease by using the import option in SQL Server Data Tools.

image

Interacting with Power Pivot

I started by creating a simple Power Pivot model using Adventure Works DW data based on the Internet Sales fact table. I am using seven tables in my model as shown here.

image

I am not going to add any calculated measures to the model because Power Pivot allows me to use the data as it sets. Next we create a pivot table based on this model. I dropped the Fiscal Year onto rows and added OrderQuantity and ExtendedAmount to the values region. When OrderQuantity and ExtendedAmount are added to the pivot table, Excel defaults to a sum calculation when working with the data. Basically Excel creates the calculation for you based on what it knows about the data.

The point here is that I have data that can be used as values without doing any additional work with the model. I saved the workbook, closed Excel and moved on to the next step.

Interacting with Tabular

First we need to convert the Power Pivot model to a tabular model. Which is done by importing the model we just saved in SQL Server Data Tools. Once we have the project open, we need to deploy the model to a SSAS tabular instance so we can connect to it with Excel.

image

Now that it has been deployed to SSAS we can reopen our workbook and add a connection to the tabular model. In the field list we notice three differences now that the model is tabular.

1. The SUM symbol (sigma) is used to highlight values or measures that can be calculated.

2. The values we created in the Power Pivot model show up here.

3. In the Values section, “_No measures defined” is shown.

image

When working with multidimensional models, the Values section are represented the same. That makes sense as the connection that Excel is using is based on MDX not DAX. This significantly changes the user experience.

Let’s add a new measure to our Power Pivot model and try to do the same in the tabular model. We can still drop the DiscountAmount into the values section in our pivot table based on Power Pivot. However, when we try to do the same on tabular we get an error saying that we cannot add it to that area of the report.

image

In order for us to use DiscountAmount as a measure we will need to create an OLAP measure (See Excel Tip #8 for details) to use it in this Excel workbook or we will need to add it as a calculated measure in tabular and redeploy for it to be available.

What’s Happening

Because Excel treats a tabular model the same as a multidimensional model in SSAS you will need to add calculated measures for all measures you want to use as values in pivot tables in Excel. Multidimensional models are highly structured using the dimension and measure group techniques. While tabular “feels” like Power Pivot, to be used by Excel it needs to appear structured like multidimensional cubes.

Making this more interesting is that Excel uses MDX to communicate with tabular models, not DAX. As a result, we are able to use the OLAP tools in the PivotTable Tools ribbon.

image

This option is not available when working with Power Pivot models in Excel.

Impact to Users

Overall the impact to users, in particular power users and report builders, is that they have less “freedom” to design when using a tabular model. If they want to add more calculations, they need to be familiar with MDX. Furthermore, if they want the calculations to be generally available they need to work with IT to deploy updated models.

Hopefully we will see DAX supported interaction with SSAS in the future, but for the moment you need to understand how tabular and Power Pivot differ when using pivot tables in Excel.





Exploring Excel 2013 for BI Tip #15: Locking Slicer Position

18 06 2014

As I mentioned in my original post, Exploring Excel 2013 as Microsoft’s BI Client, I will be posting tips regularly about using Excel 2013.  Much of the content will be a result of my daily interactions with business users and other BI devs.  In order to not forget what I learn or discover, I write it down … here.  I hope you too will discover something new you can use.  Enjoy!

The Issue

A picture, or in this case two pictures, are worth a thousand words. I created a pivot table from Power Pivot and then added two slicers above the pivot table. The pivot table contains a date hierarchy which can be expanded and collapsed. During this process the slicer moves around which is not optimal when you are creating a visualization in Excel such as a dashboard. Here are the screenshots which highlight the issue.

How I set it up:

image

What happens when the date gets expanded:

image

How to Fix the Slicer Position

Right click the slicer you want to keep from moving, in my case that is the second one. I first looked in the settings, but saw nothing. I stumbled onto the Size and Properties option which opened the Format Slicer slide out menu. If you expand the Properties section select “Don’t move or size with cells” option, the slicer will no longer move.

image

This is just one more way to use slicers to improve the user experience in your Excel dashboards and reports.





Setting Up Tabular Models on Windows Azure

12 03 2014

In my last post, I discussed how to set up Oracle in Windows Azure. During a customer call, there were questions about using SQL Server Analysis Services (SSAS) tabular models with Azure. This post will walk through setting up an Azure VM and deploy a tabular model to that VM.

If you do not have an Windows Azure account you can use a trial account with your Microsoft or Live account. Check out http://www.windowsazure.com for details on how to “try it free.”

Setting Up the VM in Azure

From the Management Portal on your Azure account, select Virtual Machines from the left then Add at the bottom. On the next screen, choose to create your VM from the gallery. You should see the Choose an Image option as seen below. As you can see, I have the SQL Server 2012 SP1 Enterprise image highlighted. You will need to use the Enterprise license as Tabular does not run on Standard. In this case, the Data Warehousing image is NOT the Business Intelligence Edition of SQL Server.

image

You can also choose to create a “blank” VM and load up SQL Server on your own. I chose to use the image to speed up the process – which it did substantially.

After selecting the image, the next few steps guide you through setting up the VM. For the most part, the defaults will work fine. Be aware that once this is turned on, you will be charged for it running. It is up to you to make sure you understand the costs, even if you are using the free trial.

During the setup steps, you will create the VM and its related cloud service. Remember that the account is your admin account for the VM and you will need those credentials to Remote Desktop into the VM. On the last, setup page is the Endpoints. Leave the defaults, we will add an endpoint for our tabular model later.

At this point, it will take a few minutes to set up your new VM. Once it is setup, open a Remote Desktop session into it. If you look at services or at the SQL Configuration console you will notice that everything except a tabular instance have been set up for you. As a result, I would not recommend using this gallery image for a production deployment. You should look at creating your own template if you want a more locked down and refined setup.

Setting Up the Tabular Instance in Azure

As noted before, the tabular instance is not set up. The installation media is on the server, so you can run that to install your tabular instance. I won’t walk through the install process, but this was great to find because that meant I did not have to copy media to my VM.

Making the Tabular Instance Available

This section covers the tedious tasks required to make your tabular instance available for querying outside of the VM. There are three basic steps to getting your instance “online”: setting the port number in SSAS, updating the firewall, and adding endpoints. I will walk you through the steps I used to get this done followed by some references that helped me get here.

Setting the Port Number in SSAS

By default, SSAS, both multidimensional and tabular instances, use dynamic ports. In order, to allow connections through the firewall and endpoints, the port number needs to be fixed. I used guidance from TechNet and did the following steps to set the port.

    1. Opened the Task Manager to get the PID for MSOLAP$<<instance name>>.
    2. Ran netstat –ao –p TCP to get a list of ports used by current processes. Once I had identified my port number, I also noted the server IP address which is required in the next step.
    3. I chose to confirm that I had the correct port by connecting to the instance using the IP address and port number.
    4. Next, we have to go old school and modify the msmdsrv.ini file. The typical install path for this file is C:\Program Files\Microsoft SQL Server\<<instance name>>\OLAP\Config\msmdsrv.ini.
    5. Open the file in notepad and find the <Port>0</Port> tag.
    6. Change the port number to the port number that was identified above. (Technically we could have used any open port number. I chose to do this because I was sure the port number was available.)
    7. Save the changes and restart the service.
    8. Once again confirm you can connect to the server with SSMS using the IP address and port number.

Now you have set up the SSAS instance to use a fixed port number.

Updating the Firewall

Now that we have a port number, we can create a firewall rule. We access the firewall rules from the Server Manger. In the Windows Firewall console, we will be creating a new Inbound Rule..

image

  1. The rule type is Port
  2. We will apply the rule to TCP and specify the port we defined above.
  3. On the action tab, we selected Allow the Connection. (If you are planning to use this in a production environment, you will need to verify your connection requirements.)
  4. Next, we clear any connection we don’t want to apply.
  5. Finally, I named the rule with a descriptive name.

Now we have the firewall rule in place to allow external connections for the tabular instance.

Adding Endpoints

The final step to making the instance available is to add the endpoints in Azure. In the WIndows Azure portal, we need to go the VMs page again, select our VM, and open the ENDPOINTS tab. From here we create a new endpoint.

  1. We are creating a standalone endpoint.
  2. Next, we select the TCP protocol and add the port number to both the private and public port textboxes.
  3. Finally, we apply the changes.

We are now ready to test connectivity.

References

Setting up VM Endpoints

Configuring Windows Firewall

Configuring Windows Firewall with SSAS

Connecting to the Tabular Instance in Azure

So to verify this works, I wanted to connect to the model with SSMS on my desktop. However, it uses Windows authentication and I am not on the same domain. (My IT staff is likely still laughing about me joining my VM to our domain.)

Thankfully, Greg Galloway (blog) reminded me of how to set up runas to use different credentials to connect to SSAS. Here is the syntax I used to connect to the tabular instance on Azure using a command window:

runas /netonly /user:<<VM name>>\<<username>> “C:\Program Files (x86)\Microsoft SQL Server\110\Tools\Binn\ManagementStudio\Ssms.exe”

This allowed me to open SSMS using those credentials. Next, I used the VM + port to connect. You will see that the Windows credentials in the dialog are not what you entered on the command line. This is expected and the credentials you entered in the command line will be passed through to the VM.

Deploying a Tabular Model to Azure

I attempted three different ways to deploy my model to the Azure VM. Two worked, one did not.

Deploying from Visual Studio on My Desktop

My first attempt was to deploy from Visual Studio on my desktop. I set the deployment properties to point to the Azure instance using the same credentials I had used with SSMS. I also set up a command line execution to use runas like with SSMS.

It appeared I could connect, but I continually got a permissions error which is shown below. After much frustration, I gave up on it and moved to the next options.

image

Using XMLA to Deploy

This is the most straightforward way to deploy an SSAS DB on a different server. I used SSMS to generate the Create Database XMLA statement. Because I had not deployed it locally, I needed to modify the XMLA statement to remove the user name and guid from the database name and database ID. (AdvWorksTab1_SteveH_<<Some GUID>>)

In a bit of irony, I can use the SSMS connection from my desktop using the runas to deploy the database to the VM.

The reality is that this is easy and acceptable way to deploy SSAS databases to production environments, in Azure or in your datacenter.

Deploying from Visual Studio on the VM

The final method I used was deploying Visual Studio onto the VM and deploying from there. I used VisualStudio.com (TFS online) to transfer the source code from my desktop to the VM. I had to install the TFS client on the VM, but SSDT with BI tools is already there.

  1. Installed the VS 2010 TFS Client: http://www.microsoft.com/en-us/download/details.aspx?id=329
  2. Then installed Visual Studio SP1  http://social.msdn.microsoft.com/Forums/vstudio/en-US/4e4851dc-eb29-4081-9484-d38a6efa07ee/unable-to-connect-to-tfs-online?forum=TFService
  3. Finally installed VS2010 Team Foundation Server Compatibility GDR (KB2662296) http://www.microsoft.com/en-us/download/details.aspx?id=29082

Now it will connect to TFS Online. I got the latest from my project and pointed the project to my tabular instance.

Be sure to check the impersonation settings.

Next, I deployed the project to the local tabular instance on the VM and it worked great. This might make sense for development, but I would not use this method in a production environment.

Some Closing Thoughts

I was amazed at how simple it was to create the VM and set up tabular in Azure. Knowing what I know now, I would be able to set up a usable instance fairly quickly and deploy a database using XMLA without much effort. That was very nice.

Doesn’t work with Office 365

I started this project to determine the connectivity capability with Office 365. Well, that does not work in my current configuration. I was able to create a workbook on my desktop using my Azure tabular model and Excel. It works just as you would expect. However, when I deployed the workbook to Office 365, data refresh always failed.

image

My next thought was to use a data gateway, but those only work with SQL Server Data Engine and Oracle, not SSAS. This is still a significant issue with making Power BI able to fully take advantage of existing BI assets at a company.

Using Azure Active Directory

My next step would be to use Azure Active Directory services to try to get Windows Authentication to work throughout. But that is for a later project and post.





Exploring Excel 2013 for BI Tip #14: Sparklines and Pivot Tables

7 01 2014

As I mentioned in my original post, Exploring Excel 2013 as Microsoft’s BI Client, I will be posting tips regularly about using Excel 2013.  Much of the content will be a result of my daily interactions with business users and other BI devs.  In order to not forget what I learn or discover, I write it down … here.  I hope you too will discover something new you can use.  Enjoy!

Sparklines and Dashboards

There are a lot of visualization possibilities with Excel. When creating dashboards, sparklines are a good visualization of what happened over a data series. My goal was to add sparklines to a pivot table so it could be added to a dashboard. After many failed attempts, I was able to get the following to work.

On the INSERT tab, you will find the Sparklines options. In my pivot table I am going to add Line and Column Sparkline visualizations using the MyVote submission counts.

image

Here are the steps that I used to add this visualization to my pivot table.

First, I created a pivot table with Submission Count as the measure, the rows were the Poll Categories, and the columns are the quarters of the year. Here is what the original data looks like.

image

In this case, I kept the Grand Totals for both columns and rows turned on. I am going to use these areas as the targets for the sparklines. I am going to use lines for trends over time on the Grand Total column. Then I am going to use the column visualization to show the category distribution on Grand Total row.

Adding the Line Sparkline

To add the line sparkline, select all of the data cells (no grand totals). Next, select the Line Sparkline option. This will open the Create Sparklines dialog. In the dialog, you can see the Data Range is already populated with the highlighted cells. The Location Range is empty as shown below.

image

Next, you select the columns in the Grand Total column, and that cell range will be added to the Location Range field. This will put the sparklines in those columns and they will match the data trend. For clarity, the final step would be to change the column name to “Trend” and change the font color to white so the text is not seen. Here is the result.

image

Adding the Column Sparkline

Next up, we will add the Column Sparkline. Highlight the same cells as before. Once the cells have been highlighted, select the Column Sparkline option. Select the Grand Total row for the location. This will show the distribution within the quarter for the categories. Changing the font to white does not hide the value in this case. I actually reduced the font size to 1 to make it nearly invisible. (There is no transparent font available.) Here is the result.

image

I also added lower right corner by selecting the Grand Total column cells as the data and that cell as the location to get a consistent look at distribution. One other note, the Grand Total row is called “Trend” as well because they have to have the same name. But, overall, this was the look I was working toward.

Limitations and Nuances with Sparklines

Now for the stuff that doesn’t work as you would like. Sparklines are technically not part of the pivot table. As a result, the table needs to be static in shape. This means rows and columns need to stay the same in count and position.

I am going to add a category slicer to my example. When I select the Entertainment category, all of the sparklines are “stranded” in space. Quarter 2 disappears because it has no data and as a result the trendlines are no longer in the table. This is also true for the columns as four categories are eliminated by the filter. Worse yet, if you look at the filter, you will notice we have no poll submissions in the News category. When that is added the sparklines will end up in the last data row as opposed to the Grand Total sections.

image

Sparklines are a nice tool to have, but you need to understand what is the best way to use them in the context of what you are doing.

Reference and Credit

I ran across this during my search for how sparklines work in pivot tables: http://answers.microsoft.com/en-us/office/forum/office_2010-excel/how-do-you-insert-a-sparkline-into-a-pivot-table/e072570d-b367-41f1-b2d6-2dbe939db311.  As I note with the limitations to my solution, the forum post above calls out some alternatives which allow for more dynamic approaches, but they also involve coding. Furthermore, the comment from Andrew Lavinsky (MVP) confirmed that this was possible and that it is supported in SharePoint Excel Services.








Follow

Get every new post delivered to your Inbox.

Join 812 other followers

%d bloggers like this: