Tag Archives: MSBI

Consumption Based Architecture for Modern Data Analytics

Throughout many years of working with BI solutions and data warehouse solutions, we have strived to put all the data in one location so it could be easily consumed by reporting and analysis tools from SQL Server Reporting Services to Microsoft Excel. We have followed the dimensional modeling processes promoted by Ralph Kimball and others. These techniques were developed to turn relational data platforms into viable and well-performing reporting platforms. They worked.

Throughout the years, I have built many star schemas, enterprise data warehouses, and reporting databases based on these techniques. However, they were not without their flaws. In the early days, it would take years to create the enterprise data warehouse. When it was done, the business had already moved on. So, we started creating data marts which were departmentally focused. This allowed us to shorten the development lifecycle to more quickly meet the needs of the business. But the speed of business continued to outpace IT’s ability to deliver effective BI solutions when they were needed.

During this same time, the worlds most ubiquitous BI tool, Microsoft Excel, ran more and more businesses, both large and small. Why was Excel so popular? It allowed the user, notExcel-2013-Icon_thumb.png IT, to do the analysis and produce results when the business needed them. Self-service BI is not new, we just refuse to accept Excel as a “real” BI tool in the industry. Inevitably, no matter how good your BI or reporting tool is, users want to know, “How do I export this to Excel?” I found it humorous that Microsoft suffered the same problem with Power BI. The ability to extract the data is hugely important to users. Why? So they could do this on their own. They feel empowered with Excel.

In today’s world, we are also witnessing a shift to a more mobile, tech savvy group of users. As my teenage and young adult children begin to enter the workforce, I still see a disconnect with enterprise BI solutions in most cases. They want the data at their fingertips and easily consumable to solve the question they have now, not in three weeks. That brings us to the architecture I have been promoting for a few years. It’s not new, but I needed a way to talk about modern data and BI solutions that focused on one the most significant needs in the business – consumable data.

The Consumption Based Architecture is based on the following key concepts:

  1. Keep the data close to the source
  2. Data interfaces should be easy to use
  3. Modern, in-memory tools make this possible

Keep the data close to the source

In the normal enterprise data warehouse solution, we process data to clean it up, reshape it, and generally make it “better”. However, the reality is that the users don’t see it as sourceoftruthbetter if it does not match the source. The only “source of truth” that matters in reality is the one that is closes to the data entry as possible. If that data is wrong, it needs to be corrected there. Why? Because users will always verify that the data warehouse or any BI solution is correct by checking the system of record.

Besides the source of truth issue, any time data is transformed or shaped differently than the source, documentation and maintenance are required. In most cases, documentation is lacking in data warehouse solutions. It’s hard and changes often. Furthermore, the amount of maintenance required to support a traditional data warehouse solution is a problem. When the business needs to change a field or the source changes, how do you measure how long it will take to get that in the data warehouse? Days? Weeks? Months? Years? By the time we have the change in place, tested and ready for use, the business has moved on.

In Consumption Based Architecture, the goal is to move the data as needed and transform it as little as possible. Operational Data Stores which are replicas or copies of the data from the source systems are the best mechanisms to move the data to a reporting area. Only transform if absolutely necessary. In most cases, our transforms will involve data type cleanup or other in cases where fields have changed use over time maybe adding a column to clarify the data. Each time you move data or transform data, it should be to make the data more consumable.

The primary exception to this rule is dimensional data. Dimensional data has a special place in the architecture. While it is not necessary to transform this data, the overall solution is improved by combining similar dimensions and using natural keys for relationships. For example, if you have a customer record in CRM and shipping data, you can create a “golden record” or conformed dimension which contains both natural keys. This will allow you to cross reference data easily in both solutions. Master data solutions help with this but are not required to be successful in this architecture.

Data Interfaces Should Be Easy

Having done a lot of work in the application development field where interfaces are used to simplify programming, I think we should have a similar concept in working with data. We have dabbled in the area for some time. I know that I have used views with schemas (or users in Oracle) that were specifically designed to support SQL Server Analysis Services. We used views to support a consistent data set to the cubes for processing and allow changes in the back end data as needed. The views operated as an interface between the data warehouse and the cube.

In a similar fashion, these interfaces need to be identified and used throughout the Consumption Based Architecture. If you are referencing a relational data structure, views continue to make sense. They allow consumers to interact with the data in a known fashion. They are also logical constructs which can be deprecated with a timeline for users to move off of them to the newer versions.

Analytic models such as those found in Power BI, Qlik, and SQL Server Analysis Services create a similar tool for consumers. Usually those models can be used in the tool itself or even in other tools to produce reports and dashboards. Depending on how the model is viewsmodvirtcreated, they are often a table based view of data. For instance, Power BI can turn a folder of files into a table structure for easy consumption in Power BI reports. Power Pivot models created in Excel can be shared in SharePoint and in Power BI. These are just a few examples of using analytic models as interfaces.

The third option is data virtualization. This tends to be fairly expensive. The two that I am aware of are Cisco Data Virtualization and RedHat OpenShift virtualization. The concept of data virtualization is perfect for Consumption Based Architecture; however, I have not seen this used much due to cost implications. Microsoft may be changing the landscape of data visualization. During PASS Summit 2016, they announced expanding the use of Polybase to reference other data sources such as Oracle and Teradata. Currently, Polybase supports Hadoop data, but this change could allow it to become a virtualization tool that is cost effective (included with SQL Server) and simple to use.  I will be reviewing these tools in a later post as I get more information on them.

Modern, In-Memory Tools

The most significant technological improvement in the past few years to make this architecture really possible is in-memory data tools. While this revolution has occurred in more technologies than Microsoft, Microsoft tools are what I am most familiar with. In the Microsoft arena, this started with Power Pivot in Excel. The ability to mashup various data sources using in-memory models is awesome.

When Power Pivot came on the scene, I was doing a lot of work with SQL Server Analysis Services cubes. While these cubes provided a great analytic layer, they were very fragile in my experience. However, Power Pivot allowed us create better performing models more quickly. At that point, I knew a shift was coming. Two of the biggest pain points with cube design were tackled – speed of development and ease of use. We could solve problems quicker and easier with Power Pivot.

speed-and-ease

Since the release of Power Pivot, Microsoft has also improved their overall in-memory solution set by adding SQL Server Analysis Services Tabular Models and OLTP in-memory and columnstore functionality in SQL Server. These improvements continue to make it easier than ever to build out consumable models in memory. You can now use columnstore in SQL Server without an analytics model, simplifying your architecture without performance penalties. With the release of SQL Server 2016, Microsoft has created a data engine capable of in-memory OLTP to improve transactional loads while also supporting columnstore indexes for reporting loads in the same database. While your mileage may vary on implementation, it is easy to see that Consumption Based Architectures are best able to take advantage of these advances and making data more easily accessible and consumable to our business users.

What’s Next

Over the next few weeks, I will be digging in on some of the topics that support Consumption Based Architecture. The goal is to help you begin to take advantage of this architecture in your business and build out a flexible, easily consumed data and analytics platform.

cba-featured-pic

Upcoming Topics:

  • Interface Layers
  • Data Dictionaries
  • Dimensional or Master Data
  • Moving and Transforming Data
  • Modeling in MSBI
  • Reporting with Consumption Based Architecture

Boston BI User Group Meeting–Dashboard Design with Microsoft: Power BI vs Datazen (10/13/15)

Boston BI User Group

Thanks for joining Anthony Martin (@SQLMartini) and I at the Boston BI User Group Meeting in October. During the session, we demo’d and built dashboards in Power BI Desktop and Datazen Publisher.

Power BI

image

www.powerbi.com

Couple of thoughts from our demo:

  • Power BI is a way to get data, model data, and visualize it
  • Power BI Desktop allows you to work with data on your PC
  • Power BI is experiencing a lot of change – for example over 40 changes were applied in September 2015
  • Power BI has an open API that allows you to customize the experience

Datazen

image

www.datazen.com

Couple of thoughts from our demo:

  • Design first scenario – make it look good, then shape data to match
  • Datazen publisher allows us to create dashboard for many different profiles
  • Datazen handles custom shapes

Additional Training from Pragmatic Works

Questions from the Session

Can you use links in Datazen to support drillthrough?

Yes. You can find more information here: Drill-throughs to Other Dashboards or Custom URLs.

Power BI API Development

You have the ability customize Power BI. Check out the contest winners to get some ideas of what is possible.

image

Details about the solutions can be found on the Power BI blog: http://community.powerbi.com/t5/Best-Visual-Contest/con-p/best_visual_contest/tab/entries.

You can find more about custom visuals here: https://powerbi.microsoft.com/en-us/custom-visuals.

Thanks again for joining us.

Excel Tip #29: Forcing Slicers to Filter Each Other when Using CUBE Functions

As I mentioned in my original post, Exploring Excel 2013 as Microsoft’s BI Client, I will be posting tips regularly about using Excel 2013 and later.  Much of the content will be a result of my daily interactions with business users and other BI devs.  In order to not forget what I learn or discover, I write it down … here.  I hope you too will discover something new you can use.  Enjoy!

Scenario

You have went to all the trouble to build out a good set of slicers which allow you to “drill” down to details based on selections. In my example, I have created a revenue distribution table using cube formulas such as:

=CUBEVALUE(“ThisWorkbookDataModel”,$B6, Slicer_Date, Slicer_RestaurantName, Slicer_Seat_Number, Slicer_TableNumber)

image

Each cell with data references all the slicers. When working with pivot tables or pivot charts, the slicers will hide values that have no matching reference. However, since we are using cube formulas the slicers have no ability to cross reference. For example, when I select a date and a table, I expect to see my seat list reduce in size, but it does not. All of my slicers are set up to hide options when data is available. There are two examples below. In the first, you can see that the seats are not filtered. However, this may be expected. In the second example, we filter a seat which should cause the tables to hide values and it does not work as expected either.

image

image

As you can see in the second example, we are able to select a seat that is either not related to the selected table or has no data on that date. Neither of these scenarios is user friendly and does not direct our users to see where the data matches.

Solving the Problem with a “Hidden” Pivot Table

To solve this issue, we are going to use a hidden pivot table. In most cases we would add this to a separate worksheet and then hide the sheet from the users. For sake of our example, I am going to put the pivot table in plain sight for the examples.

Step 1: Add a Pivot Table with the Same Connection as the Slicers

In order for this to work, you need to add a pivot table using the same connection you used with the slicers. The value you use in the pivot table, should only be “empty” or have no matches when that is the expected result. You want to make sure that you do not unintentionally filter out slicers when data exists. In my example, I will use the Total Ticket Amount as the value. That will cover my scenario. In most cases, I recommend looking for a count type value that will always have data if there is a potential match of any kind.

image

Step 2: Connect the Slicers to the Pivot Table

Using the Apply Filters button on the Pivot Table ribbon, you need to select all the slicers you want to interact with each other.

image

Once these changes are applied, you will see how my data changed.

image

Now, let’s test this for real. We will keep the date and table, but now we will see that the other slicers are now filtered to match the data that is available.

image

As you can see, the solution is fairly simple, but not intuitive. You will be able to create more creative dashboards with this technique. Keep in mind this issue is primarily a problem when using cube formulas in your Excel dashboard.

Until next time…

Excel Tip #25: Overlaying Charts

As I mentioned in my original post, Exploring Excel 2013 as Microsoft’s BI Client, I will be posting tips regularly about using Excel 2013 and later.  Much of the content will be a result of my daily interactions with business users and other BI devs.  In order to not forget what I learn or discover, I write it down … here.  I hope you too will discover something new you can use.  Enjoy!

Making a cooler Excel chart by overlaying charts

This is one of the coolest techniques when creating compelling chart visualizations in Excel. The concept is to overlay one chart on another such as line chart over a bar chart which is our example. This is really helpful when showing data that has very different granularity or scale. In the example we will do below, we are going to show the number of tickets each hour of a day with the % spending on alcohol for that time period. Let’s get started.

Step 1 – Create the bar chart

We are creating a bar chart that shows the number of tickets created during each hour of the day at my fictional restaurant – Sensational Servings. In my bar chart, you can see that I added a title, removed the legend, and hid the field buttons. Here is the resulting chart.

image

Step 2 – Create the line chart

Next, we create a line chart that shows what % of the total ticket amount for that hour of the day was on alcoholic beverages. I will not add a title to this one and I will hide the buttons and the legend as well. I will also convert this to a line chart so we can see the trend through the day as a line.

image

Step 3 – Set up the line chart to be overlaid on the bar chart

In this step, we will walk through the settings that make line chart ready to overlay. In my scenario, you may have already noticed that the list of hours are different on the Y-axis. In order to overlay the charts, they must represent space that can be overlaid. In this example, we need to make sure that the Y-axis on both charts match. We expect that the X-axis will be different as we have a count and a percentage we want to compare. In most cases you need to have one of the axis represent the same values so you can compare the other. However, you may have a situation where this does not matter. As we move through the rest of the example, you can adjust to your data visualization needs accordingly.

In order to set the hours the same, we will show empty values for the Hour of the Day. (NOTE: you may need to do this with both charts to prevent unexpected results.) You can make this change in the PivotChart Options dialog on the Display tab.

image

Here is the result:

image

Here are the other changes we will make:

  • Make the background transparent
  • Remove the Y-axis label
  • Move the X-axis label to the right side
  • Remove the lines in the chart

Let’s do the labels first. To remove and move the respective labels, you can use the cross that is displayed on the upper left corner of the chart. image  The first change is unselecting the Primary Horizontal axis from the Axes menu. From here you click the Axis option and then go to More Options. That will open a side panel with more options from which you can make the necessary changes. On this panel, you need to open the Format Axis page. You will see an option for Labels near the bottom. To move the X-axis to the right side you need to select Low or High. This will depend on the data in your axis. The change is applied immediately so you should see the change. In the example we need to change it to High.

image

While leaving the format panel open, select the one of the gridlines. This will open the Format Major Gridlines menu. Select the No Line option and these will be removed. Once you have completed that, select the entire chart and set the fill to No Fill in the Format Chart Area and select No Line for the border in the same section.

As with many Microsoft products there are other ways such as ribbon commands to change fill and line options. You should explore other options to see which method is the most intuitive for you. However, the format panel has the most complete set of options which are grouped for the section of the chart you are modifying which is why I used that for most of the changes here.

Once all these formatting changes have been made you should see the Excel gridlines behind your line chart and it should look as follows:

image

Now the line chart is ready to be overlaid on our bar chart.

Step 4 – Overlaying the charts and finishing touches

The final step is both the simplest and most annoying. You will need to drag the line chart on to the bar chart. Making no changes here is what my initial attempt looks like.

image

You will notice that the axis labels from the line graph are outside the bar chart border, the line is not sized to be meaningful and the points on the line chart are not matching the Y-axis labels on the underlying chart.

Here are the steps I took to get this chart the way I wanted.

  1. Made the bar chart wider. I did this to get the space I wanted to include the labels in the chart.
  2. Made the plot area on the bar chart smaller. This opened up white space on the right side for the line chart labels
  3. Next, I made the line chart shorter. Keep in mind that without a chart title the chart expands to fill. By reducing the height of the line chart it will fit in with the title and the labels will fit into the plot height of the bar chart.
  4. This step is the most painful. I lined up the points with the bars and made sure that 0% was lined up with the bottom of the bar chart. This takes patience to get it looking right.
  5. Finally, I changed the color of the line to read so the contrast can be easily seen.

Here is the result.

image

Now we can see alcoholic beverages account for much more of the overall ticket amount as the day goes one. By adding slicers for thinks like day of the week we do further analysis such as seeing if Fridays are the biggest percentage and so on.

Wrap Up

This illustration shows one of the most common scenarios to overlay charts. Using a percentage to see a trend related to the underlying values shown in the bar chart helps understand correlation. If you use slicers on your dashboard, make sure that you connect them to both charts (if you want them both filtered by the slicer of course).