Tag Archives: Consumption Based Architecture

SQL Saturday #796 – Minnesota, 2018

sqlsat796_header

First, many thanks to the SQL Saturday and MNPass team for putting on another great event and letting me participate.

I co-presented with Joshuha Owen (@JoshSQL) in a precon on Modern Enterprise Data Warehousing on Azure. Thanks to those who attended and participated in the conversations around changing the way we implement data warehouse capabilities in Azure. Josh and I will be talking more about this in the future.

Now, those of you who attended my Saturday presentation on Consumption Based Architecture, I wanted to get you the slide deck and reference materials here. Thanks again for attending.

The slide deck from the session is here.

cba-featured-pic

I also have blog post around this topic: Consumption Based Architecture for Modern Data Analytics. Feel free to join the conversation there around this.

ERPs and the Consumption Based Architecture Conversation

One of the key topics that came up during the session was related to handling ERPs with minimal change. The key issue surrounding ERP solutions is with the data structure in those systems. Whether you work with SAP, JackHenry, or Dynamics, you have a situation where the data model is very complex and definitely not user friendly. In Consumption Based Architecture, we try to minimize data transformation and reshaping, but ERP solutions are by nature cryptic and complex. By definition, they are not consumable. So in the consumable space, we typically recommend using the vendor supplied solutions such as JHKnow, SAP BW and so on. These solutions provide a vendor managed interpretation of the data in the ERP for reporting and other solutions.

Security in this Architecture

The question was raised during the session around how to secure this. This does not have a simple answer. Each solution may have implemented security differently. For instance, an Oracle database may use user names and not have AD integration. This means that you need to determine how to secure your consumable space. For instance, if you pick Azure Active Directory, you would move data to AAD compliant structures in Azure such as Azure SQL Database, Azure SQL Datawarehouse, and Azure Databricks. This means you might need to use a tool like Goldengate with CDC to update a SQL DB which you can apply security to. This will allow you to centralize security for your consumable data. You will need to plan for security in whatever you do and create what you need to support it.

Thanks again everyone for joining us at SQL Saturday.

 

 

Advertisements

Consumption Based Architecture for Modern Data Analytics

Throughout many years of working with BI solutions and data warehouse solutions, we have strived to put all the data in one location so it could be easily consumed by reporting and analysis tools from SQL Server Reporting Services to Microsoft Excel. We have followed the dimensional modeling processes promoted by Ralph Kimball and others. These techniques were developed to turn relational data platforms into viable and well-performing reporting platforms. They worked.

Throughout the years, I have built many star schemas, enterprise data warehouses, and reporting databases based on these techniques. However, they were not without their flaws. In the early days, it would take years to create the enterprise data warehouse. When it was done, the business had already moved on. So, we started creating data marts which were departmentally focused. This allowed us to shorten the development lifecycle to more quickly meet the needs of the business. But the speed of business continued to outpace IT’s ability to deliver effective BI solutions when they were needed.

During this same time, the worlds most ubiquitous BI tool, Microsoft Excel, ran more and more businesses, both large and small. Why was Excel so popular? It allowed the user, notExcel-2013-Icon_thumb.png IT, to do the analysis and produce results when the business needed them. Self-service BI is not new, we just refuse to accept Excel as a “real” BI tool in the industry. Inevitably, no matter how good your BI or reporting tool is, users want to know, “How do I export this to Excel?” I found it humorous that Microsoft suffered the same problem with Power BI. The ability to extract the data is hugely important to users. Why? So they could do this on their own. They feel empowered with Excel.

In today’s world, we are also witnessing a shift to a more mobile, tech savvy group of users. As my teenage and young adult children begin to enter the workforce, I still see a disconnect with enterprise BI solutions in most cases. They want the data at their fingertips and easily consumable to solve the question they have now, not in three weeks. That brings us to the architecture I have been promoting for a few years. It’s not new, but I needed a way to talk about modern data and BI solutions that focused on one the most significant needs in the business – consumable data.

The Consumption Based Architecture is based on the following key concepts:

  1. Keep the data close to the source
  2. Data interfaces should be easy to use
  3. Modern, in-memory tools make this possible

Keep the data close to the source

In the normal enterprise data warehouse solution, we process data to clean it up, reshape it, and generally make it “better”. However, the reality is that the users don’t see it as sourceoftruthbetter if it does not match the source. The only “source of truth” that matters in reality is the one that is closes to the data entry as possible. If that data is wrong, it needs to be corrected there. Why? Because users will always verify that the data warehouse or any BI solution is correct by checking the system of record.

Besides the source of truth issue, any time data is transformed or shaped differently than the source, documentation and maintenance are required. In most cases, documentation is lacking in data warehouse solutions. It’s hard and changes often. Furthermore, the amount of maintenance required to support a traditional data warehouse solution is a problem. When the business needs to change a field or the source changes, how do you measure how long it will take to get that in the data warehouse? Days? Weeks? Months? Years? By the time we have the change in place, tested and ready for use, the business has moved on.

In Consumption Based Architecture, the goal is to move the data as needed and transform it as little as possible. Operational Data Stores which are replicas or copies of the data from the source systems are the best mechanisms to move the data to a reporting area. Only transform if absolutely necessary. In most cases, our transforms will involve data type cleanup or other in cases where fields have changed use over time maybe adding a column to clarify the data. Each time you move data or transform data, it should be to make the data more consumable.

The primary exception to this rule is dimensional data. Dimensional data has a special place in the architecture. While it is not necessary to transform this data, the overall solution is improved by combining similar dimensions and using natural keys for relationships. For example, if you have a customer record in CRM and shipping data, you can create a “golden record” or conformed dimension which contains both natural keys. This will allow you to cross reference data easily in both solutions. Master data solutions help with this but are not required to be successful in this architecture.

Data Interfaces Should Be Easy

Having done a lot of work in the application development field where interfaces are used to simplify programming, I think we should have a similar concept in working with data. We have dabbled in the area for some time. I know that I have used views with schemas (or users in Oracle) that were specifically designed to support SQL Server Analysis Services. We used views to support a consistent data set to the cubes for processing and allow changes in the back end data as needed. The views operated as an interface between the data warehouse and the cube.

In a similar fashion, these interfaces need to be identified and used throughout the Consumption Based Architecture. If you are referencing a relational data structure, views continue to make sense. They allow consumers to interact with the data in a known fashion. They are also logical constructs which can be deprecated with a timeline for users to move off of them to the newer versions.

Analytic models such as those found in Power BI, Qlik, and SQL Server Analysis Services create a similar tool for consumers. Usually those models can be used in the tool itself or even in other tools to produce reports and dashboards. Depending on how the model is viewsmodvirtcreated, they are often a table based view of data. For instance, Power BI can turn a folder of files into a table structure for easy consumption in Power BI reports. Power Pivot models created in Excel can be shared in SharePoint and in Power BI. These are just a few examples of using analytic models as interfaces.

The third option is data virtualization. This tends to be fairly expensive. The two that I am aware of are Cisco Data Virtualization and RedHat OpenShift virtualization. The concept of data virtualization is perfect for Consumption Based Architecture; however, I have not seen this used much due to cost implications. Microsoft may be changing the landscape of data visualization. During PASS Summit 2016, they announced expanding the use of Polybase to reference other data sources such as Oracle and Teradata. Currently, Polybase supports Hadoop data, but this change could allow it to become a virtualization tool that is cost effective (included with SQL Server) and simple to use.  I will be reviewing these tools in a later post as I get more information on them.

Modern, In-Memory Tools

The most significant technological improvement in the past few years to make this architecture really possible is in-memory data tools. While this revolution has occurred in more technologies than Microsoft, Microsoft tools are what I am most familiar with. In the Microsoft arena, this started with Power Pivot in Excel. The ability to mashup various data sources using in-memory models is awesome.

When Power Pivot came on the scene, I was doing a lot of work with SQL Server Analysis Services cubes. While these cubes provided a great analytic layer, they were very fragile in my experience. However, Power Pivot allowed us create better performing models more quickly. At that point, I knew a shift was coming. Two of the biggest pain points with cube design were tackled – speed of development and ease of use. We could solve problems quicker and easier with Power Pivot.

speed-and-ease

Since the release of Power Pivot, Microsoft has also improved their overall in-memory solution set by adding SQL Server Analysis Services Tabular Models and OLTP in-memory and columnstore functionality in SQL Server. These improvements continue to make it easier than ever to build out consumable models in memory. You can now use columnstore in SQL Server without an analytics model, simplifying your architecture without performance penalties. With the release of SQL Server 2016, Microsoft has created a data engine capable of in-memory OLTP to improve transactional loads while also supporting columnstore indexes for reporting loads in the same database. While your mileage may vary on implementation, it is easy to see that Consumption Based Architectures are best able to take advantage of these advances and making data more easily accessible and consumable to our business users.

What’s Next

Over the next few weeks, I will be digging in on some of the topics that support Consumption Based Architecture. The goal is to help you begin to take advantage of this architecture in your business and build out a flexible, easily consumed data and analytics platform.

cba-featured-pic

Upcoming Topics:

  • Interface Layers
  • Data Dictionaries
  • Dimensional or Master Data
  • Moving and Transforming Data
  • Modeling in MSBI
  • Reporting with Consumption Based Architecture