Here is a summary of the final day of sessions I attended at PASS.
Vertipaq vs OLAP – Change Your Data Modeling Approach by Marco Russo
This session discussed the role of each data model type in your Business Intelligence platform.
The Many-to-Many Revolution 2.0 is being released which will cover BISM multidimensional and BISM Tabular. It is available now from http://www.SQLBI.com.
The relational model for the multidimensional cube is a star schema. It provides the best performance and functionality.
|OLAP (Multidimensional)||Vertipaq (Tabular)|
|Dimensional modeling||Relational modeling|
|Complex Relationships||Basic Relationships (1:N)|
OLAP and MDX is much more complex than Vertipaq and DAX. That makes the barrier to entry or learning curve with DAX and Vertipaq much lower.
The dimensional model is designed to be faster and to store aggregations in the relational world. For solutions with Vertipaq, is the datamart still required? Can Vertipaq replace the dimensionally modeled datamart?
Marco went through a number of scenarios which show the difference in the performance between DAX, MDX, and SQL for aggregating and analyzing data. At this point, it is too early to say that OLAP is old and will die. We have two tools with different purposes and areas of strengths. For instance, Vertipaq can handle distinct count operation much easier and faster than OLAP. Marco keeps noting that Vertipaq and DAX are “fast enough” which seems to mean that expensive operations are expensive in both models Vertipaq and OLAP.
Key take away: Adapt the technology to the data model, not the data model to the technology.
PDW Architecture and Internals – Brian Walker
Only place you can put your software on PDW is on the Landing Zone.
Flow of data in PDW:
Source – Landing Zone Files – Data Loader – Compute Nodes
You can get up to 654 TB of backup space on the Backup Node. Yes, that is 654 Terabytes!
The bread and butter of PDW is the Compute Nodes and Storage Nodes. Each compute node is running a SQL Server 2008 EE instance. SQL is the primary interface to the compute nodes. TempDB lives on the Compute Node as well because it serves instance specific functions.
In the storage node, SQL Server file groups are created by the appliance to support a compute node. This includes filegroups for distributed data, replicated data, and logs.
The Admin console is available through the web browser with tabs for Alerts, Sessions, Queries, Loads, Backup/Restores, Appliance State, Locks and Performance Monitor.
There are now add ins for System Center management with PDW.
PDW supports a hub and spoke architecture. You can connect SQL Server and Analysis Services to PDW so you can offload data to different consumers. You can treat the data warehouse on PDW as the single source of truth that can support analytics and data marts and related functionality in a spoke.
Integration with PDW:
- SQL Server – Remote table copy from MPP to SMP
- PDW ADO.NET drivers for SSAS, SSIS, .NET, SSRS
Data layout approaches that support Shared Nothing querying which is the ability for the appliance to request discreet query subsets from an originating query.
- Replicated – full copy on each discrete PDW node (good for dimensions)
- Distributed – uniformly distributed across all nodes (good for fact tables). Uses hash keys to distribute the data across the storage nodes.
Data Warehousing: From SMP to MPP using SQL Server Parallel Data Warehouse – Dandy Weyn (Sr. Tech Project Mgr) and Brian Mitchell (Sr Premier Field Engineer)
Simplified Data Warehouse Offerings:
- Enterprise Edition (software only)
- Fast Track Data Warehouse (reference architecture) – includes software, configuration guidelines, and hardware;
- BDW Appliance (integrated architecture)
- Parallel Data Warehouse (DW appliance)
SQL Server Appliances:
- HP Enterprise Data Warehouse Appliance (PDW)
- Dell Parallel Data Warehouse Appliance (PDW)
- HP Business Data Warehouse Appliance
- HP Business Business Application Appliance (inc SharePoint)
DWLoader or SQL Server Integration Services can be used to load data into the PDW. DWLoader will usually be the faster method because it has been optimized for PDW.
This is a Shared Nothing Architecture unlike Oracle RAC which is a shared everything architecture.
You need to choose your distribution key very well. It will make sure that the table is properly distributed across nodes.
Largest SQL Azure Projects in the World & How Do They Scale – Kevin Cox, Michael Thomassy (SQLCAT)
- Largest Sharded Database is 20 TB
- Largest number of databases for 1 app is 500
- Most users on 1 app is 3 million
- 2k block Read, 8 node end-point – 34,000 ops/sec
SQL Server is the core DB Engine for SQL Azure.
Flavorous – Ticket Sales Vendor
This is a truly elastic solution. They only keep databases up for the duration of the ticket sale. (NOTE: Checkout Chaos Monkey). All of the applications needed to be built with SQL Azure capabilities in mind. They were able to handle 3 million tickets in 6 minutes including 150,000 tickets in 10 seconds.
Bing Game tracking and scoring system. 398 SQL Azure databases supporting 30 million unique users per month, 200,000 concurrent users with 99.975% uptime.
Why SQL Azure?
- Faster than Azure Table Storage – for entities with large number of attributes
- Very low learning curve – prototyped and written in less than 4 days with 1 developer
- Testability – mostly compatible with SQL Server and easy to prepopulate with millions of records.
Sepia Labs – Glassboard.com
They are now in the process of closing their on-premise data center and they are moving everything to the cloud. This is a private social network for mobile phones. (Side note from them: “Windows 7 is by far the easiest mobile platform to build on.”) They picked Azure table storage and not SQL Azure except for some analytical data.
Why Table Storage?
- Less expensive than SQL Azure
- Speed approximately the same for small entities (very few columns; i.e. < 10)
- High size limit – 100 TB total between Table, Blob store, Queue per storage account.
MYOB – Account Software ISV
They service small businesses with accounting solutions. They chose the cloud because of the types of offerings they have. This allowed them to provide massive scalability and high availability. They scale by a sharding technique built by the developers. They are considering Federations in SQL Azure for the future.
Federations in SQL Azure will allow you to do less development to support sharding within SQL Azure. It is designed to help customers take care of the following:
- Unlimited Scalability – distribute workload equally to all nodes, scale out which is not scale up. Scale applications horizontally to many nodes and allow applications to dynamically deliver data.
- Rich Elasticity – expand and contract without downtime.
- Simplified Tenancy Model – whether you have multiple tenants per database, one tenant per database, or multiple databases per tenant.
Thus concludes my week at PASS Summit 2011.