Tag Archives: Excel

Powering Up HDInsight with Power BI

On Tuesday, September 15, I presented on this topic for Pragmatic Works. You can find that session here. This session is based on five blog posts that I created in August 2015.

HDInsight Series Featured Pic

Setting Up and HDInsight Cluster (No Scripts Required)

Exploring the Microsoft Azure HDInsight Query Console (No Scripting Required)

Uploading Files to an HDInsight Cluster (No Scripting Required)

Using Power BI with HDInsight Part 1: Power Query and Files

Using Power BI with HDInsight Part 2: Power BI Desktop and Hive

My goals for this series

1. Document using Power BI with HDInsight

2. Prove that you can set up a HDInsight Cluster with no scripts

Other References from the Session

Azure: http://azure.microsoft.com/en-us/

Cloud Berry: http://www.cloudberrylab.com/free-microsoft-azure-explorer.aspx

Wrap Up from the Session

A few questions were asked during the session and I wanted to handle some of them here.

Why did you not use Azure Resource Manager to deploy storage?

I did this as simple as possible and did not need to use the Resource Manager for my demos. However, if you need to rebuild the cluster quickly, the Azure Resource Manager would be a good option. Find out more here: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-provision-clusters/. This site will also walk through scripts and other options for setting up HDInsight clusters.

Why didn’t the table structure show up in the Power Query demo?

The Power Query demo worked with the data from a file approach. This approach is more “raw”. The files did not have column headers, so no headers were created in the table. However, when working with the Power BI Desktop demo, I used Hive. The table was defined in Hive and were easily seen. This is another case for using Hive or something similar to define the schema for ease of use.

What are the differences between Hadoop, Hortonworks, and HDInsight?

Starting from the top, Hadoop is the Adobe open source specification. All of the products listed above are based on Hadoop. Hadoop

Hortonworks and Cloudera are examples of Hadoop distributions. These companies have worked with the various versions of open source technologies around Hadoop and created a supported distribution as a result.


Finally, HDInsight is Microsoft’s cloud-based Hadoop implementation. They continue to add functionality including Spark, R, Giraph, and Solr. You can expect Microsoft to continue to grow the capabilities of HDInsight as part of their cloud-based analytics solutions.

Thanks for attending my session.

Using Power BI with HDInsight Part 1: Power Query and Files

With the rise of HDInsight and other Hadoop based tools, it is valuable to understand how Power BI can help you take advantage of those big data investments. If you need to set up a cluster to work with, check out my previous posts on Setting Up an HDInsight Cluster and Loading Data Into Your New HDInsight Cluster. These posts show how to do this with no scripting required. If you prefer to script, there are a number of resources with sample scripts on doing the same work.

In this article, I will focus on using Power Query to get data from the Hadoop file structure in HDInsight. I will be using Excel 2013 with the Power Query Add-In. I will also be using the restaurant data I loaded as noted in the three previous posts. If you need to create a cluster and load data I encourage you to check the following blog posts:

These posts walk through the process of creating a cluster and loading up data.

Connecting to HDInsight

First, open a new Excel workbook and click the Power Query tab. Once there, you can find the Azure HDInsight source in the From Other Sources dropdown. Select that option to open the following dialog:


You will need your storage account in order to continue. Then you will need the storage account key. Once you have added the key you will see that the Navigator opened in Excel on the right.


It should show the name of your cluster and the default container name. Double click the container name and it will open the Power Query window. It will show all the files available in the container. Even though we have it organized in folders, the view shows all the files. If you have a large amount of files and you don’t want to scroll to find them, you can click the down arrow on the Folder Path column and use the text filter to find the folder you are looking for.


Now I have the files I want to use in Power Query. If you click the binary link it will open a copy of the file. However, this is not how we want to work with the data as we have multiple files. (If you did this, remove steps up to the Filtered Rows step in the Applied Steps section.) I now have the files I uploaded showing.


In order to work with them all together we need to Combine Binaries.


This now merges all the files into a single dataset. You can now do any data shaping you would like using standard Power Query methods such as updating the column names and the data types. You can even add columns such as a Total Amount column. Here is what my final query looks like including the steps I did. Be sure to give your query a meaningful name. (Note that Power Query realized my Transaction Date column was a date dataype and changed it for me.)


Now click Close & Load and your data will be loaded into Excel. You can change the query to load a model in Power Pivot as well to do additional work with the data.

The next post will walk through using Power BI Desktop to load data using a Hive query.

Excel BI Tip #27: CUBESET and CUBESETCOUNT Functions

As I mentioned in my original post, Exploring Excel 2013 as Microsoft’s BI Client, I will be posting tips regularly about using Excel 2013 and later.  Much of the content will be a result of my daily interactions with business users and other BI devs.  In order to not forget what I learn or discover, I write it down … here.  I hope you too will discover something new you can use.  Enjoy!

CUBE Function Overview

In Tip #18, I talked about the CUBEMEMBER and CUBEVALUE functions which can be used to refer to cube data directly. In this tip, we will look at CUBESET and CUBESETCOUNT. These functions return sets that can be applied to the CUBEVALUE function like the CUBEMEMBER function. They allow you to create custom sets which can be used for calculations such as trailing 12 months, top 10 performers, or the number of customers with sales in 2009. Keep in mind that the advantage of using the CUBE functions with SSAS data is that it allows you to go beyond the limiting capabilities of pivot tables using that data and MDX.

CUBESET Function

The CUBESET function is designed to return a valid MDX Set from the cube or tabular model. The construction of the formula is CUBESET(“connection”,”set expression”,”caption”,”sort order”, “sort by”). Only the connection and set expression are required. However, I highly recommend using the caption. The caption is what will show in the cell with the formula. If you don’t include the caption, the cell will appear empty so you will need mark it in some way. Before I started using the caption, I color coded the cell so I would make sure not to overwrite that cell. In a nutshell, use a caption to make your solution more usable.

Simple Set Example

In this example, we are going to pick a list of college level Education attribute members from the Adventure Works sample cube. Education is an attribute hierarchy that is part of the Customer dimension. Here is the MDX for that set:

{[Customer].[Education].&[Bachelors], [Customer].[Education].&[Graduate Degree]}

Here is how I used that in Excel with the CUBESET function (AdvWorks is my connection name).

=CUBESET(“AdvWorks”,”{[Customer].[Education].&[Bachelors], [Customer].[Education].&[Graduate Degree]}”,”Completed College”)

While this is a simple example, any MDX set will work in the expression. If you need to create a set dynamically such as a Trailing 12 Months or Top 10 set it will work just as well. Work out the set in SQL Server Management Studio and then put it into a cell and refer to that cell when creating your cube value.


The CUBESETCOUNT function can be used to count the members in a set returned by a a CUBESET function. In MDX we use .count or COUNT() to determine the number of members in a set. This does not work with the CUBESET function and thus we have CUBESETCOUNT. This is a fairly simple function to use, but it is very powerful if you are trying to do dynamic set counts based on reference data on your spreadsheet.

Counting a Set of Customers with Sales for the Selected Period

In this example, we have created a set which returns customers with orders in 2011. We will count this set using the CUBESETCOUNT function to determine the count. In Excel, we could use the NOW() function to get the current year or some other dynamic value to build the set. However, Adventure Works is a static set so I am using a single year in the example.

nonempty( {([Customer].[Customer].[Customer],[Measures].[Internet Order Quantity])},[Date].[Calendar].[Calendar Year].&[2010])

This can be put into a CUBESET function as follows:

=CUBESET(“AdvWorks”,”nonempty( {([Customer].[Customer].[Customer],[Measures].[Internet Order Quantity])},[Date].[Calendar].[Calendar Year].&[2010])”)

Next, we put the CUBESET function into the CUBESETCOUNT function.

=CUBESETCOUNT( CUBESET(“AdvWorks”,”nonempty( {([Customer].[Customer].[Customer],[Measures].[Internet Order Quantity])},[Date].[Calendar].[Calendar Year].&[2010])”))

This can now be used as a value for other calculations. For example, you could show the average sale amount for customers who purchased something this year or even the average purchase amount this year for those who ordered last year to see if your repeat customers are buying more.

Wrap Up

Using these functions with other CUBE functions increases the flexibility of your dashboard solution and allows you to make many more creative visualizations. Remember to use captions with CUBESET functions you plan to use in your formulas and you need a CUBESET result to count with CUBESETCOUNT.

Excel BI Tip #26: Using a Data Spreadsheet or Tab

As I mentioned in my original post, Exploring Excel 2013 as Microsoft’s BI Client, I will be posting tips regularly about using Excel 2013 and later.  Much of the content will be a result of my daily interactions with business users and other BI devs.  In order to not forget what I learn or discover, I write it down … here.  I hope you too will discover something new you can use.  Enjoy!

Data Sheet or Tab in Excel

With a lot of the dashboard designs in Excel I work on, we often use CUBE formulas and other calculations and functions to get the data ready for the presentation area. One of the key things we do is create a sheet in the workbook, or tab, that will allow you to hold this data. This allows us to refer to cells on the data tab in our visualizations without trying to support visualization techniques along with calculations.

The most common scenario is when I want to present numbers in the visualization that are not in a pivot chart or pivot table. By keeping this in the data tab I have maximum flexibility in the visualization.

Let’s look at the following example using Adventure Works data (from http://msftdbprodsamples.codeplex.com/). We will create the following “data box” visualization using a data tab.


First, get the data into data sheet using a pivot table. Once we have the data we want to present there, we flatten the pivot table (see Excel BI Tip #18 for details). Now we can refer to the fields we need using the data tab. In the following images you can see the data box referring to data on the data tab which uses the CUBE functions to get the data.

image  image

As you can see, this allows us to contain a lot of data that is used for processing without cluttering up the visualization.

Hiding the Data Sheet from Users

Using a data sheet also means we need to hide this sheet from our users. You can hide the sheet in Excel directly. This is most useful when the workbook will be shared as a workbook. However, if you deploy the workbook to SharePoint or Office 365, you can use the Internet Settings to only make ranges or sheets visible depending on your implementation. I prefer this process as it allows dashboard designers to easily access the data without needing to be concerned with hiding the data sheet once they are done. (Refer to Excel BI Tip #21 for more about using ranges.)

When used in SharePoint or Office 365, their is no impact to the visualizations which use the data sheet. While not visible or available to the user, the data sheet stills supports the visualization as expected. In scenarios I have delivered, this technique has allowed for extensive data manipulation and formatting to present data in meaningful ways.

Excel Tip #25: Overlaying Charts

As I mentioned in my original post, Exploring Excel 2013 as Microsoft’s BI Client, I will be posting tips regularly about using Excel 2013 and later.  Much of the content will be a result of my daily interactions with business users and other BI devs.  In order to not forget what I learn or discover, I write it down … here.  I hope you too will discover something new you can use.  Enjoy!

Making a cooler Excel chart by overlaying charts

This is one of the coolest techniques when creating compelling chart visualizations in Excel. The concept is to overlay one chart on another such as line chart over a bar chart which is our example. This is really helpful when showing data that has very different granularity or scale. In the example we will do below, we are going to show the number of tickets each hour of a day with the % spending on alcohol for that time period. Let’s get started.

Step 1 – Create the bar chart

We are creating a bar chart that shows the number of tickets created during each hour of the day at my fictional restaurant – Sensational Servings. In my bar chart, you can see that I added a title, removed the legend, and hid the field buttons. Here is the resulting chart.


Step 2 – Create the line chart

Next, we create a line chart that shows what % of the total ticket amount for that hour of the day was on alcoholic beverages. I will not add a title to this one and I will hide the buttons and the legend as well. I will also convert this to a line chart so we can see the trend through the day as a line.


Step 3 – Set up the line chart to be overlaid on the bar chart

In this step, we will walk through the settings that make line chart ready to overlay. In my scenario, you may have already noticed that the list of hours are different on the Y-axis. In order to overlay the charts, they must represent space that can be overlaid. In this example, we need to make sure that the Y-axis on both charts match. We expect that the X-axis will be different as we have a count and a percentage we want to compare. In most cases you need to have one of the axis represent the same values so you can compare the other. However, you may have a situation where this does not matter. As we move through the rest of the example, you can adjust to your data visualization needs accordingly.

In order to set the hours the same, we will show empty values for the Hour of the Day. (NOTE: you may need to do this with both charts to prevent unexpected results.) You can make this change in the PivotChart Options dialog on the Display tab.


Here is the result:


Here are the other changes we will make:

  • Make the background transparent
  • Remove the Y-axis label
  • Move the X-axis label to the right side
  • Remove the lines in the chart

Let’s do the labels first. To remove and move the respective labels, you can use the cross that is displayed on the upper left corner of the chart. image  The first change is unselecting the Primary Horizontal axis from the Axes menu. From here you click the Axis option and then go to More Options. That will open a side panel with more options from which you can make the necessary changes. On this panel, you need to open the Format Axis page. You will see an option for Labels near the bottom. To move the X-axis to the right side you need to select Low or High. This will depend on the data in your axis. The change is applied immediately so you should see the change. In the example we need to change it to High.


While leaving the format panel open, select the one of the gridlines. This will open the Format Major Gridlines menu. Select the No Line option and these will be removed. Once you have completed that, select the entire chart and set the fill to No Fill in the Format Chart Area and select No Line for the border in the same section.

As with many Microsoft products there are other ways such as ribbon commands to change fill and line options. You should explore other options to see which method is the most intuitive for you. However, the format panel has the most complete set of options which are grouped for the section of the chart you are modifying which is why I used that for most of the changes here.

Once all these formatting changes have been made you should see the Excel gridlines behind your line chart and it should look as follows:


Now the line chart is ready to be overlaid on our bar chart.

Step 4 – Overlaying the charts and finishing touches

The final step is both the simplest and most annoying. You will need to drag the line chart on to the bar chart. Making no changes here is what my initial attempt looks like.


You will notice that the axis labels from the line graph are outside the bar chart border, the line is not sized to be meaningful and the points on the line chart are not matching the Y-axis labels on the underlying chart.

Here are the steps I took to get this chart the way I wanted.

  1. Made the bar chart wider. I did this to get the space I wanted to include the labels in the chart.
  2. Made the plot area on the bar chart smaller. This opened up white space on the right side for the line chart labels
  3. Next, I made the line chart shorter. Keep in mind that without a chart title the chart expands to fill. By reducing the height of the line chart it will fit in with the title and the labels will fit into the plot height of the bar chart.
  4. This step is the most painful. I lined up the points with the bars and made sure that 0% was lined up with the bottom of the bar chart. This takes patience to get it looking right.
  5. Finally, I changed the color of the line to read so the contrast can be easily seen.

Here is the result.


Now we can see alcoholic beverages account for much more of the overall ticket amount as the day goes one. By adding slicers for thinks like day of the week we do further analysis such as seeing if Fridays are the biggest percentage and so on.

Wrap Up

This illustration shows one of the most common scenarios to overlay charts. Using a percentage to see a trend related to the underlying values shown in the bar chart helps understand correlation. If you use slicers on your dashboard, make sure that you connect them to both charts (if you want them both filtered by the slicer of course).