Building My HDInsight Server Cluster

After all the hype about Big Data, Hadoop, and now HDInsight, I decided to build out my own big data cluster on HDInsight. My overall goal is to have a cluster I can use with Excel and Data Explorer.  After all, I needed more data in my mashups. I am not going to get into the details or definitions of Big Data, there are entire books on the subject.  I will discuss any issues or tidbits during the process while I am here.

Setting Up the Environment

I am actually doing this on a VM on my Windows 8 laptop.  I created a Windows 2012 VM with 1 GB of RAM and 50 GB of storage.  (Need some help creating a VM in Windows 8, check out my post on the subject.

Installing the HDInsight Server

First, this product is still in Preview at the time of this writing, so mileage will vary and likely change over the next few months.  You will find the installer at  This uses the Microsoft Web Platform Installer.  When prompted I just ran the installer.  This took about one hour to complete on my VM setup. Once it completed, it opened up the dashboard view in IE.


At this point we have installed a cluster called “local (hdfs)”.

Exploring My Local Cluster

Well, things did not go well at first.  Whenever I clicked the big gray box to view my dashboard, I received the following error: “Your cluster ‘local (hdfs)’ is not responding.  Please click here to navigate to cluster.”  I clicked “here” and ended up on a IIS start page.  Not really effective.  Let the troubleshooting begin.

Based on this forum issue response, I opened the services window to find that none of my Apache Hadoop services were running after a restart AND they were set to manual.  To resolve this I took two steps.  First, I changed all of my services to run automatically.  This makes sense for my situation because the VM would be running when I wanted to use HDInsight.  Second, I used the command line option to restart all of the services as also noted in the forum post above.

From a command prompt execute the following code to restart all Hadoop services:


And, VOILA!, my cluster is now running.


Maybe we can get a better error message next time.

At this point I walked through the Getting Started option on the home screen and proceeded to do “Hello World”.  I used these samples as intended to get data in my cluster and start working with the various tools.  Stay tuned for more posts in the future on my Big Data adventures.

Why Not HDInsight Service on Azure?

The primary reason I did not use the HDInsight Service on Azure was that I did not want to risk the related charges.  Once I have a good understanding of how HDInsight Server works, I will be more comfortable working with HDInsight Service.

Other Resources

Here are some of the resources I used throughout the build.

HDInsight Service Quick Start and Tutorials

Getting Started With Microsoft HDInsight


SQL Saturday#197–Omaha Recap


This was the second SQL Saturday hosted in Omaha.  I loved to see how the event grew from the first event until now.  John Morehouse ( T | B ) and team did another stellar job organizing this event. I know they packed the house

Having taken part in the first event, I found it spectacular that the speaker list was so diverse.  It is great to see so many SQL Server pros come out and speak at these events.

Part of the fun for me was bringing my 11-year old daughter along.  Many of you, speakers, attendees, and sponsors were kind to her and she had a good time, even though much of it was spent using my Surface to watch Netflix.  Smile  As a speaker, this was a way to spend some time on the road with her and to introduce one of my children to what I do when I travel to these events (one of my sons will be joining me in Fargo).

I do have to say that the food, both at the speaker’s dinner and for lunch were awesome!  If you are looking for an event that will feed you well, be sure to try this event next time around.

I was able to attend a few of the sessions, but I wanted to mention that the SQL Server vs Oracle: The Throwdown! was really good.  As a cross-over platform developer (check out my Oracle for MSBI Tips), it was great having a SQL Server Pro, David Klee (@kleegeek), and an Oracle Pro, Joe Grant (@dba_jedi), co-present.  Nice work guys!

Finally, I presented on Building BI Solutions with Excel 2013.  I have uploaded the slides to the event site.  Until next time.

Exploring Excel 2013 for BI Tip #6: GeoFlow–The Latest Excel Visualization

As I mentioned in my original post, Exploring Excel 2013 as Microsoft’s BI Client, I will be posting tips regularly about using Excel 2013.  Much of the content will be a result of my daily interactions with business users and other BI devs.  In order to not forget what I learn or discover, I write it down … here.  I hope you too will discover something new you can use.  Enjoy!

Introducing GeoFlow

Today, Microsoft announced a new add-in for Excel 2013 called GeoFlow.  While it is still a preview, I have to admit this is very cool.  More than that it is super simple to use.  At this point, you are familiar with my sample data set that was created for Modern Apps Live Las Vegas.  The app is called MyVote and it is analysis of polls and related submissions.

Getting GeoFlow

You can download GeoFlow and view the requirements at Microsoft’s download center:

Building Your First Tour and Scene

On the insert tab, click the Map button and it will open a blank Tour canvas like the following:


Once you have created a tour it looks like the following:


In this view, you need to use the New Tour button.  Existing Tours are shown into the dialog box.

In my case, the data comes from an underlying PowerPivot data source.  This makes the next design options very easy.  Before you leave this page, you should play with the map.  It is easy to zoom and move around.

About Tours and Scenes

The Tour is a container for Scenes.  This is similar in construction to PowerPoint.  As you add scenes, you can play them with transitions.  Keep in mind that GeoFlow visualizations are meant to be “in motion”.  Scenes focus on data for a period of time.  Tours put these scenes together to tell a story.  Time to build a story about MyVote.

Your First Layerimage

The first step is to pick the geography.  I had zip codes stored in a Postal Code column.  When I selected it, I was given the option to choose the type of geography data. There are a number of geography data types supported.  After selecting this, the data starts to show up on the map where ever data is present.

Once you have your Geography selected you pick your Chart Type, Height, Category, and Time.   As you add each element look at the map to see how your data is becoming visualized.  Even during the design you can see the 3D visualization work.

The following screens illustrate what happens in each of these phases:

1. Pick CategoryName for Category.

2. Pick PollSubmissionID as Height.  Set the aggregation as count.

3. Pick PollSubmissionDate as Time.




Once you have done this play the visualization to see the data move over time.

You have created your first scene.

Creating a Tourimage

You already have a tour, but it only has one scene and does not look like a tour.  Any good tour has more than one stop.  Let’s add a scene by clicking the Add Scene button.

Clicking this creates a copy of your selected scene, which is the one we created.  This matters as you add more scenes and may want to use specific existing scenes as a starting point.

In order to see something change, we are going to change the category from category to age range.  This will show a different view of the data that we can see.

imageQuick note on the task panel. After creating your new scene you are on the Settings tab.  There are three tabs on the task panel.  The first is the layers view, the next is layer details, and the third one is the settings tab.

Now you can play the tour to see how it looks by clicking Play Tour in the ribbon.  Click the “back” arrow on the lower left of the screen to return to the design surface.

Some Scene Modifications I Did

Here are some ideas to add more “flare” to your visualizations.

  • Try different chart types.  Heat maps and bubbles are supported as well.
  • Layer visualizations.  You can add a layer into your scene which will allow you to have different chart types with different geography to enhance your visualization.
  • Change scene timing.  By clicking the gear on the view, you can change the pace of the scene.
  • Zoom in to sections of the map.  You can focus on a specific area and add a new scene to your tour to focus on a different area.  Then you can see more details for the area in question and move to the next one as a part of the tour.
  • Change your theme.  Try out the different themes.  This will affect map types and chart coloring.
  • Label your layers.  When you label the layer it changes the legend header.
  • Resize the legend.  You can resize your legend to better fit the data.
  • Change the effect.  On the settings tab for the scene you can change the Effect.  I liked flyovers, but many options are available.  You can also change the speed and duration here.  These changes affect the tour play.

Here is my Tour:


More Information

Here are some links for more information.  Enjoy this tool. I know I did.

GeoFlow Download Details Page: Requirements, Web Download

GeoFlow Facebook Page:

GeoFlow Community Forum: Post Questions to be answered by team and community

Announcement at the PASS Business Analytics Conference