On Tuesday, September 15, I presented on this topic for Pragmatic Works. You can find that session here. This session is based on five blog posts that I created in August 2015.
My goals for this series
1. Document using Power BI with HDInsight
2. Prove that you can set up a HDInsight Cluster with no scripts
Other References from the Session
Wrap Up from the Session
A few questions were asked during the session and I wanted to handle some of them here.
Why did you not use Azure Resource Manager to deploy storage?
I did this as simple as possible and did not need to use the Resource Manager for my demos. However, if you need to rebuild the cluster quickly, the Azure Resource Manager would be a good option. Find out more here: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-provision-clusters/. This site will also walk through scripts and other options for setting up HDInsight clusters.
Why didn’t the table structure show up in the Power Query demo?
The Power Query demo worked with the data from a file approach. This approach is more “raw”. The files did not have column headers, so no headers were created in the table. However, when working with the Power BI Desktop demo, I used Hive. The table was defined in Hive and were easily seen. This is another case for using Hive or something similar to define the schema for ease of use.
What are the differences between Hadoop, Hortonworks, and HDInsight?
Hortonworks and Cloudera are examples of Hadoop distributions. These companies have worked with the various versions of open source technologies around Hadoop and created a supported distribution as a result.
Finally, HDInsight is Microsoft’s cloud-based Hadoop implementation. They continue to add functionality including Spark, R, Giraph, and Solr. You can expect Microsoft to continue to grow the capabilities of HDInsight as part of their cloud-based analytics solutions.
Thanks for attending my session.