TechFuse Minnesota Follow Up

 TechFuse Fall Edition

My Session: Using Azure SQL Database for Enterprise Needs

On 10/6/2015, I presented on Azure SQL Database at TechFuse Minnesota. Some of the highlights from the session included a comparison of the various SQL Server offerings and how Azure SQL Database fits into the overall picture. During the session we also discussed the importance of V12 and elastic databases (still in preview) for the enterprise environment. I was also raised and discussed the cloudy concerns including data safety, data compliance, data privacy and business continuity. With the ongoing changes Microsoft makes in this space, expect more changes to come. However, even today you can be successful using Azure SQL Database for your enterprise needs.

The presentation can be found in PDF format here.

Microsoft has a wealth of documentation which can be found here:

Questions from the Session

How are upgrades handled in Azure SQL Database?

Most upgrades to Azure SQL Database are seamless to us. However, they have made changes over the years which have required some intervention. One such change was the move from Web and Business Service Tiers to Basic, Standard, and Premium. In this case Microsoft provided a lot of guidance around the process and it was very easy to do in most cases. The most recent instance was the move to V12. Once again Microsoft provided the upgrade path in documentation. Both upgrades could be handled in the portal. However, V12 was a significant change so groups needed to do more testing prior to an upgrade.

Microsoft’s Azure upgrades for Azure SQL Database over the past couple of years:

Can we use Point in Time Restore to recover a table or to recover to a new database?

Point in Time Restore allows us to recover databases to specific points in time. Each service tier has different amounts they retain: Basic-7 days, Standard-14 days; Premium-35 days. Azure SQL Database always recovers the database to a new database. This means you can either “swap” the database once it is recovered or move the items you need to recover from the new database. One word of caution, you need to have the DTU capability to run both databases on the server to support the restore capability. Be sure to plan the recovery process and clean up when you have completed your recovery.

Thanks again to everyone who could attend this year.

Until the next session …

Using Power BI with HDInsight Part 2: Power BI Desktop and Hive

With the rise of HDInsight and other Hadoop based tools, it is valuable to understand how Power BI can help you take advantage of those big data investments. If you need to set up a cluster to work with, check out my previous posts on Setting Up an HDInsight Cluster and Loading Data Into Your New HDInsight Cluster. These posts show how to do this with no scripting required. If you prefer to script, there are a number of resources with sample scripts on doing the same work.

In this article, I will focus on using Power BI Desktop to get data from the Hadoop file structure in HDInsight using a Hive query. I will also be using the restaurant data I loaded as noted in the previous posts. If you need to create a cluster and load data I encourage you to check the following blog posts:

These posts walk through the process of creating a cluster and loading up data.

Connecting to HDInsight Using the Hive ODBC Driver

Before you can connect using a Hive query you need to download the Hive ODBC Hive from Microsoft.  You can find the driver here: Once you have the driver installed, the connection can be created.

Open Power BI Desktop and click Get Data on the splash screen. This will open the Get Data dialog. Scroll down until you see the ODBC option. (Do not select a Hadoop or HDInsight option. See my previous post on connecting using HDInsight.)


Click Connect to start the process.


Here is where the “fun” begins. You get no help creating a Hive connection string. It took some searching and trial and error to figure out what was needed to make this happen. Here are the properties you need:

  • Driver:  Driver={Microsoft Hive ODBC Driver}
  • Host: (Your HDInsight server name)
  • Port: Port=443
  • Schema: Schema=default (default Hive database schema)
  • RowsFetchedPerBlock: RowsFetchedPerBlock=10000 (This is the default)
  • HiveServerType: HiveServerType=2 (This is the default)
  • AuthMech: AuthMech=6
    • This is the Authentication Mechanism which is Windows Azure HDInsight Service.
  • DefaultStringColumnLength: DefaultStringColumnLength=200 (Default is 32767, this should always be set lower)

Each property is separated by a semicolon. My completed connection string looked like this (Note: I added spaces to fit better in the post.):

Driver={Microsoft Hive ODBC Driver};; Port=443;Schema=default; RowsFetchedPerBlock=10000; HiveServerType=2; AuthMech=6; DefaultStringColumnLength=200;

Enter the connection string into the dialog and then you will be prompted for credentials. Use the Database option and set the Username and Password. Then click Connect. In my case, I see three tables in the resultset including the sample table. We have connected to our HDInsight cluster using Hive.


Retrieving Data from HDInsight Using HiveQL

So, getting a list of tables is not really helpful. As you can see, this is the Power Query portion of the Power BI Desktop. Let’s add a HiveQL statement to return only our sales data.

In Applied Steps, click the gear next to Source. This will reopen the From ODBC dialog. Expand the SQL Statement portion and add a SELECT * FROM yourtable  to get our desired result set. Click Ok and check the results again. You should see the tablename.fieldname format for column headers. At this point, you can proceed with more data shaping and prep data for other analytics. Click Close and Load when you are done and it will load the data into the Power Pivot designer in Power BI desktop.

I hope you enjoyed this series through HDInsight and Power BI. It was a great learning experience for me.

Using Power BI with HDInsight Part 1: Power Query and Files

With the rise of HDInsight and other Hadoop based tools, it is valuable to understand how Power BI can help you take advantage of those big data investments. If you need to set up a cluster to work with, check out my previous posts on Setting Up an HDInsight Cluster and Loading Data Into Your New HDInsight Cluster. These posts show how to do this with no scripting required. If you prefer to script, there are a number of resources with sample scripts on doing the same work.

In this article, I will focus on using Power Query to get data from the Hadoop file structure in HDInsight. I will be using Excel 2013 with the Power Query Add-In. I will also be using the restaurant data I loaded as noted in the three previous posts. If you need to create a cluster and load data I encourage you to check the following blog posts:

These posts walk through the process of creating a cluster and loading up data.

Connecting to HDInsight

First, open a new Excel workbook and click the Power Query tab. Once there, you can find the Azure HDInsight source in the From Other Sources dropdown. Select that option to open the following dialog:


You will need your storage account in order to continue. Then you will need the storage account key. Once you have added the key you will see that the Navigator opened in Excel on the right.


It should show the name of your cluster and the default container name. Double click the container name and it will open the Power Query window. It will show all the files available in the container. Even though we have it organized in folders, the view shows all the files. If you have a large amount of files and you don’t want to scroll to find them, you can click the down arrow on the Folder Path column and use the text filter to find the folder you are looking for.


Now I have the files I want to use in Power Query. If you click the binary link it will open a copy of the file. However, this is not how we want to work with the data as we have multiple files. (If you did this, remove steps up to the Filtered Rows step in the Applied Steps section.) I now have the files I uploaded showing.


In order to work with them all together we need to Combine Binaries.


This now merges all the files into a single dataset. You can now do any data shaping you would like using standard Power Query methods such as updating the column names and the data types. You can even add columns such as a Total Amount column. Here is what my final query looks like including the steps I did. Be sure to give your query a meaningful name. (Note that Power Query realized my Transaction Date column was a date dataype and changed it for me.)


Now click Close & Load and your data will be loaded into Excel. You can change the query to load a model in Power Pivot as well to do additional work with the data.

The next post will walk through using Power BI Desktop to load data using a Hive query.

Uploading Files to an HDInsight Cluster (No Scripting Required)

As I noted in my first post, I am not a fan of scripting. In that post we set up a cluster without using scripts to do so. Now we are going to look at how to upload files without scripts. While this will work for our demo and learning purposes, I would encourage you to use scripting to handle production level loads or even if you want to upload a lot of files. While I am not a fan, it does not mean the scripting may not be a better overall tool. However, when I am trying to learn the functionality or work with system using other tools (in this case Power BI), I find that methods such as these help me be more productive sooner.

Prepping to Load Data Into Your New HDInsight Cluster

A key difference between standard Hadoop and HDInsight is file management. With HDInsight, you can load files into Azure Storage and they can be consumed by the HDInsight cluster. Keeping with the No Scripting Required mantra, we will be using a graphical interface to load files into Azure storage. There are a number of options out there, you need one of them installed. For our example, we will be using the freeware version of CloudBerry Explorer for Azure Blob Storage. Once you have your tool of choice installed you are ready to get some files.

At this point, you need some files to load. I am using some data I created for another demo. My data is in 7 files of daily receipts for my restaurant for a week in March. Once you have the data, we can load that into the cluster.

Loading Data Into Your New HDInsight Cluster

A noted above, the next steps for use will use CloudBerry Explorer to load our data. In this case, I just copied the folder with my files over to the Azure Storage once I connected the tool to Azure.


Once that is done, we will look at working with the data in Hadoop and with Hive.

Creating an External Hive Table and Querying It

You can create two types of tables using Hive – internal and external. An internal table loads the data into a Hive database. An external table applies a schema to the data without moving it. I will be creating an external table. I like this concept because it applies schema to the files that have been uploaded and allows other tools to interact with that data using HiveQL. When you drop an external table, the data remains because the table represents structure only.

In order to help everyone through this (in particular me), the next sections walk through the steps I took to create my table and select data from it. (This is not a detailed look at Hive, but rather a focus on the process of making HDInsight data available using HiveQL.)

Understanding the Files

The first step was to document the structure of the data in the files. Here is the data that I had in each of the files in column order:

  • Ticket Number – int
  • Ticket Date – date
  • Hour of the Day – int
  • Seat Number – int
  • App Amount – int
  • Entrée Amount – int
  • Non Alcoholic Amount – int
  • Alcoholic Amount – int

My structure was fairly simplistic. Each file represented a day.

Creating the Table

Now that I had the structure, I needed to work out the table DDL. (Reference: Here is the syntax that I came up with.

DROP TABLE RestaurantSales;
CREATE EXTERNAL TABLE RestaurantSales (ticketnumber int, ticketdate string, hourofday int, seat int, appamount int, entreeamount int, nonalcoholamount int, alcoholamount int)
STORED AS TEXTFILE LOCATION ‘wasb:///restaurant-data-files’;

The first statement drops the existing table if it exists. Unlike SQL Server, no error is thrown if the table does not already exist, so there is no need to check for existence.

The second statement creates the table. One quick note on the data types. In my case, my date was not recognized as a date in my first attempt. I changed it to string. When I ran my query it returned NULLs in that column. As noted earlier, this is an external table which means it is applying schema. The fields are terminated by a comma (‘,’). The next part is significant because it is part of HDInsight syntax. The location is prefixed by wasb which tells HDInsight we are using Azure Blob Storage for the files. The three forward slashes mean we are using the default container for files. When I uploaded the data, I added a folder called restaurant-data-files which holds the files for HDInsight. It will now apply the schema to all the files in that folder. If new files get uploaded, they would be part of the query as well.

You can run this in the Hive Editor in the HDInsight Console discussed in the previous blog post. You can track your jobs there and see when it completes.

Querying the Table

The final step is checking to see if it worked. I opened a new Hive Editor window and executed the following statement:

select * from RestaurantSales;

Voila! Once the job completed, I was able to click the session link and review the results.

Hopefully you were equally successful creating your first Hive table and query with your data. The next two posts will talk about using Power BI to interact with HDInsight data.

Exploring the Microsoft Azure HDInsight Query Console (No Scripting Required)

As I noted in my previous post, I am not a fan of scripting. In that post we set up a cluster without using scripts to do so. Before we get to the steps to load our own data, I wanted to take a moment and explore the HDInsight Query Console or Dashboard (according to the link on the Azure Portal). This console is part of a HDInsight Cluster, if you need to create a cluster, check out my previous post. As part of my desire to not script, this console provides some insights into my cluster as well of some tools to explore the contents the cluster.

I am going to highlight areas of interest to me on this, so this is by no means exhaustive. I encourage you to explore more comment on what you find compelling that I may have missed. Also, with the rate of change in Azure, it is likely some of this will change sooner rather than later. Hopefully it will only get better.

Getting Started Gallery

I would guess that this is the section most likely to change since they have “Coming Soon” buttons. Each of these scenarios use a  step-by-step approach to helping you set up a job common to HDInsight or Hadoop. When you click a scenario button, you will get a list of steps to walk through to complete the solution. Here is the start page from Azure Website Log Analysis.


In this scenario, you load log files and eventually use Hive to organize and use the data in Excel. This is a great way to learn about using HDInsight to solve some specific problems. Currently the scenarios are grouped into solutions with Azure data and solutions using sample data. When this article was created, the following scenarios were available:

  • Microsoft Azure Website Log Analysis Solution (uses Azure Data)
  • Microsoft Azure Storage Analytics Solution (uses Azure Data)
  • Sensor Data Analysis (uses sample data)
  • Twitter Trend Analysis (uses sample data)
  • Website Log Analysis (uses sample data)
  • Mahout Movie Recommendation (uses sample data)

Hive Editor

The next section is an online Hive editor which lets you create Hive queries. It comes with a quick query for the hivesampletable which is included in your cluster when it is created. Go ahead and submit this query.  You will see the editor create the jobs and track the status.


Click View Details to see the Job information and the results of the query. You can download the results and logs from the details page. You can create additional queries. For instance if you want to see the structure of the table you can use the following Hive query:

SHOW CREATE TABLE hivesampletable;

You can find more info on the Hive Query Language at

Job History

The Job History page shows all of the jobs you have run. You can also get to the Job Details from previously run jobs here like you did from the Hive Editor window.

File Browser

The File Browser page lets you explore the files and structure in your HDInsight cluster. You click the name to drill down into the contents of your cluster. When you first open the page, your cluster name is shown in the window. Here is the basic structure from a new cluster:

  • Cluster
    • Containers (e.g. $logs)
      • Directories (e.g. example, app-logs)
        • Files (these may be multiple directories deep)

Once you get to the file location, you can download the file.

Hadoop UI and Yarn UI Pages

The Hadoop UI page shows information about the Hadoop cluster you have created. It includes an overview and some specifics on datanodes and snapshots.

The Yarn UI gives some more information about jobs and cluster metrics related to your MapReduce jobs. If you are interested in learning more about Yarn check out this site:

This console is a great way to become more familiar with Hadoop, HDInsight, and Hive. While it does not give you all the capabilities it is a good no scripting starting point. The next post will discuss how to load a simple set of data into Hadoop for analysis beyond the samples.

Thoughts on data, business analytics, and the SQL Server community


Get every new post delivered to your Inbox.

Join 999 other followers

%d bloggers like this: