SQL Saturday #796 – Minnesota, 2018


First, many thanks to the SQL Saturday and MNPass team for putting on another great event and letting me participate.

I co-presented with Joshuha Owen (@JoshSQL) in a precon on Modern Enterprise Data Warehousing on Azure. Thanks to those who attended and participated in the conversations around changing the way we implement data warehouse capabilities in Azure. Josh and I will be talking more about this in the future.

Now, those of you who attended my Saturday presentation on Consumption Based Architecture, I wanted to get you the slide deck and reference materials here. Thanks again for attending.

The slide deck from the session is here.


I also have blog post around this topic: Consumption Based Architecture for Modern Data Analytics. Feel free to join the conversation there around this.

ERPs and the Consumption Based Architecture Conversation

One of the key topics that came up during the session was related to handling ERPs with minimal change. The key issue surrounding ERP solutions is with the data structure in those systems. Whether you work with SAP, JackHenry, or Dynamics, you have a situation where the data model is very complex and definitely not user friendly. In Consumption Based Architecture, we try to minimize data transformation and reshaping, but ERP solutions are by nature cryptic and complex. By definition, they are not consumable. So in the consumable space, we typically recommend using the vendor supplied solutions such as JHKnow, SAP BW and so on. These solutions provide a vendor managed interpretation of the data in the ERP for reporting and other solutions.

Security in this Architecture

The question was raised during the session around how to secure this. This does not have a simple answer. Each solution may have implemented security differently. For instance, an Oracle database may use user names and not have AD integration. This means that you need to determine how to secure your consumable space. For instance, if you pick Azure Active Directory, you would move data to AAD compliant structures in Azure such as Azure SQL Database, Azure SQL Datawarehouse, and Azure Databricks. This means you might need to use a tool like Goldengate with CDC to update a SQL DB which you can apply security to. This will allow you to centralize security for your consumable data. You will need to plan for security in whatever you do and create what you need to support it.

Thanks again everyone for joining us at SQL Saturday.



Do You Need a Relational Data Warehouse?

Are you looking to do a major update to your data warehouse or looking to modernize? Many technologies have come about that are changing the landscape of what data warehouses are made of. In this Azure Every Day session, I’d like to talk about 3 new technologies in Azure and HDInsight that break the rules.

1. HDInsight (with Spark in particular)

You don’t have to use SSIS to get ETL into your big data storage. HDInsight with Spark can give us ETL, as well as bring to light a lot of machine learning and other technologies, so we no longer have a dependency on SSIS. Earlier this year, Microsoft released Azure Databricks which improves of what is in HDInsight with expanded Spark capabilities.

2. Azure Data Lake

This is a great place to store your data, and no, it won’t cost you a lot of money and it’s not hard to work with. Azure Data Lake gives you the ability to store all your data, regardless of where it comes from or how it looks, in that space – including real time data.

Have you ever considered putting real time data into your data warehouse? Streaming data into your data warehouse would break everything we consider conventional. So, don’t put it in a data warehouse, move it into a Hadoop structure in Azure Data Lake, and avoid the issue in a structured data warehouse of: I got it in there, how do I get it out?

Check out Azure Data Week coming in October 2018

3. Interactive Hive and Spark SQL (also part of HDInsight)

The everyday report writers and users are not going to learn how to do map reduce or all those other technologies that make us the cool big data/data science people. These users just need to build a report.

These technologies bring us to a place where we can write SQL against those data structures and not care where the data came from, how it got there, or the type of data format it’s in. We apply the schema after the fact and it means the same data that we put in our data lake, can be used in multiple scenarios.