Do You Need a Relational Data Warehouse?

On September 11, 2018September 3, 2018 By SteveIn Azure, Azure Data Warehouse, Consumption Based Archtitecture, Hadoop, HDInsight, Hive, SQL Saturday

Are you looking to do a major update to your data warehouse or looking to modernize? Many technologies have come about that are changing the landscape of what data warehouses are made of. In this Azure Every Day session, I’d like to talk about 3 new technologies in Azure and HDInsight that break the rules.

1. HDInsight (with Spark in particular)

You don’t have to use SSIS to get ETL into your big data storage. HDInsight with Spark can give us ETL, as well as bring to light a lot of machine learning and other technologies, so we no longer have a dependency on SSIS. Earlier this year, Microsoft released Azure Databricks which improves of what is in HDInsight with expanded Spark capabilities.

2. Azure Data Lake

This is a great place to store your data, and no, it won’t cost you a lot of money and it’s not hard to work with. Azure Data Lake gives you the ability to store all your data, regardless of where it comes from or how it looks, in that space – including real time data.

Have you ever considered putting real time data into your data warehouse? Streaming data into your data warehouse would break everything we consider conventional. So, don’t put it in a data warehouse, move it into a Hadoop structure in Azure Data Lake, and avoid the issue in a structured data warehouse of: I got it in there, how do I get it out?

Check out Azure Data Week coming in October 2018

3. Interactive Hive and Spark SQL (also part of HDInsight)

The everyday report writers and users are not going to learn how to do map reduce or all those other technologies that make us the cool big data/data science people. These users just need to build a report.

These technologies bring us to a place where we can write SQL against those data structures and not care where the data came from, how it got there, or the type of data format it’s in. We apply the schema after the fact and it means the same data that we put in our data lake, can be used in multiple scenarios.

How Azure Storage Kick Starts Your Big Data Projects

On August 29, 2018August 24, 2018 By SteveIn Azure, Azure Every Day, Hadoop, HDInsight, Power BI

So, your boss says, ‘Let’s do big data!’ And you think: ‘I don’t even know what that means or what I have to do. Do I need big data? Do I need a bunch of servers?’ These are the questions we hear all the time.

A simple intro into the big data world is to take advantage of Azure blob storage. This is a great starting point since when you put data in Azure blob storage, it’s formatted very similar to how you put it into any other Hadoop storage scenario.

Once you get the data there and it’s in your file-based storage, here comes the big question: What am I going to do with this data? As you’re in the introduction phase, start simple. Power BI will connect to your blob storage, and it will connect the same as you would connect to an HDInsight or Hortonworks cluster.

Using Power BI, with limited learning curve and expense, you’ll be able to take advantage of the data you stored there in your beginning big data scenario. It also gives you the chance to start adding on to that, such as looking at an HDInsight or Hortonworks cluster to use and reference the storage, without moving your data around.

Check out Azure Data Week coming October 2018 – www.AzureDataWeek.com.

If your business wants to do big data, this is a great start on the path. If you’re doing other data warehouse work in Azure, you can use Azure blob storage as your staging area. It’s a simple way to begin without worrying about what you need from a server standpoint and infrastructure is eliminated from the equation.

VS Live 2016 – Las Vegas Follow Up

On April 11, 2016May 25, 2016 By SteveIn Hadoop, HDInsight, Hive, JSON, Microsoft Azure, Power BI, Visual Studio Live

LVSPK18

I spoke at Visual Studio Live in Vegas on two topics. While the presentations have been uploaded to the site and were available for attendees, the code was not distributed yet as an oversight on my part. In this post, I will do a quick summary of the sessions and make sample code available. I will be writing more on these topics throughout the year and will tag VS Live in the notices.

JSON - VSLive

JSON & SQL Server Finally Together

JSON is now part of SQL Server 2016. SQL Server now includes functions to generate and shred JSON. Here are the basics:

OPENJSON: Used to convert JSON data into a tabular format
FOR JSON: Used to create JSON from tabular data
ISJSON: Determines if the data in question is JSON
JSON_VALUE: Returns scalar values from JSON data
JSON_QUERY: Returns JSON formatted arrays or objects
JSON_MODIFY: Used to modify JSON data and properties

With all of this support, JSON is not a native data type in SQL Server like XML.

You can download supporting files and code here.

Hive - VS Live

Using Hive and Hive ODBC with HDInsight and Power BI

During this session I went through the process of setting up HDInsight and loading data into the cluster. Once created, Hive tables were created and queries created that were used with Power BI to analyze the results.

You can find the details here.

Minnesota BI User Group – Powering Up HDInsight with Power BI (December 2015)

On February 4, 2016February 4, 2016 By SteveIn Excel, Hadoop, HDInsight, Hive, Power BI, Power Query, Speaking, Syndicated, Windows Azure

On Wednesday, December 16, I presented on this topic at the Minnesota BI User Group. This session is based on five blog posts that I created in August 2015.

You can find the presentation here: Powering Up HDInsight with Power BI (pdf).

The details can be found in the blog posts noted below:

Setting Up and HDInsight Cluster (No Scripts Required)

Exploring the Microsoft Azure HDInsight Query Console (No Scripting Required)

Uploading Files to an HDInsight Cluster (No Scripting Required)

Using Power BI with HDInsight Part 1: Power Query and Files

Using Power BI with HDInsight Part 2: Power BI Desktop and Hive

My goals for this series

1. Document using Power BI with HDInsight

2. Prove that you can set up a HDInsight Cluster with no scripts

Other References from the Session

Azure: http://azure.microsoft.com/en-us /

Cloud Berry: http://www.cloudberrylab.com/free-microsoft-azure-explorer.aspx

Thanks for attending my session.

Powering Up HDInsight with Power BI–

On September 15, 2015July 15, 2019 By SteveIn Excel, Hadoop, HDInsight, Hive, Power BI, Power Query, Speaking, SQL Azure, Syndicated, Windows Azure

On Tuesday, September 15, I presented on this topic for Pragmatic Works. You can find that session here. This session is based on five blog posts that I created in August 2015.

Powering Up HDInsight with Power BI (pdf)https://dataonwheels.wordpress.com/wp-content/uploads/2016/02/powering-up-hdinsight-with-power-bi.pdffoundin the

Setting Up and HDInsight Cluster (No Scripts Required)

Exploring the Microsoft Azure HDInsight Query Console (No Scripting Required)

Uploading Files to an HDInsight Cluster (No Scripting Required)

Using Power BI with HDInsight Part 1: Power Query and Files

Using Power BI with HDInsight Part 2: Power BI Desktop and Hive

My goals for this series

1. Document using Power BI with HDInsight

2. Prove that you can set up a HDInsight Cluster with no scripts

Other References from the Session

Azure: http://azure.microsoft.com/en-us /

Cloud Berry: http://www.cloudberrylab.com/free-microsoft-azure-explorer.aspx

Wrap Up from the Session

A few questions were asked during the session and I wanted to handle some of them here.

Why did you not use Azure Resource Manager to deploy storage?

I did this as simple as possible and did not need to use the Resource Manager for my demos. However, if you need to rebuild the cluster quickly, the Azure Resource Manager would be a good option. Find out more here: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-provision-clusters/. This site will also walk through scripts and other options for setting up HDInsight clusters.

Why didn’t the table structure show up in the Power Query demo?

The Power Query demo worked with the data from a file approach. This approach is more “raw”. The files did not have column headers, so no headers were created in the table. However, when working with the Power BI Desktop demo, I used Hive. The table was defined in Hive and were easily seen. This is another case for using Hive or something similar to define the schema for ease of use.

What are the differences between Hadoop, Hortonworks, and HDInsight?

Starting from the top, Hadoop is the Adobe open source specification. All of the products listed above are based on Hadoop.

Hortonworks and Cloudera are examples of Hadoop distributions. These companies have worked with the various versions of open source technologies around Hadoop and created a supported distribution as a result.

Finally, HDInsight is Microsoft’s cloud-based Hadoop implementation. They continue to add functionality including Spark, R, Giraph, and Solr. You can expect Microsoft to continue to grow the capabilities of HDInsight as part of their cloud-based analytics solutions.

Thanks for attending my session.

Data on Wheels – Kristyna Ferris & Steve Hughes

Category: HDInsight

Do You Need a Relational Data Warehouse?

1. HDInsight (with Spark in particular)

2. Azure Data Lake

3. Interactive Hive and Spark SQL (also part of HDInsight)

How Azure Storage Kick Starts Your Big Data Projects

VS Live 2016 – Las Vegas Follow Up

JSON & SQL Server Finally Together

Using Hive and Hive ODBC with HDInsight and Power BI

Minnesota BI User Group – Powering Up HDInsight with Power BI (December 2015)

My goals for this series

Other References from the Session

Powering Up HDInsight with Power BI–

My goals for this series

Other References from the Session

Wrap Up from the Session

Why did you not use Azure Resource Manager to deploy storage?

Why didn’t the table structure show up in the Power Query demo?

What are the differences between Hadoop, Hortonworks, and HDInsight?

1. HDInsight (with Spark in particular)

2. Azure Data Lake

3. Interactive Hive and Spark SQL (also part of HDInsight)

Share this:

Share this:

JSON & SQL Server Finally Together

Using Hive and Hive ODBC with HDInsight and Power BI

Share this:

My goals for this series

Other References from the Session

Share this:

My goals for this series

Other References from the Session

Wrap Up from the Session

Why did you not use Azure Resource Manager to deploy storage?

Why didn’t the table structure show up in the Power Query demo?

What are the differences between Hadoop, Hortonworks, and HDInsight?

Share this: