As some of you know, I am really excited about the data appliances Microsoft and HP have released this year. I really believe that they make it even easier to get MSBI projects up and running while minimizing the complexity of building out servers. Read more about my thoughts on this in an article I wrote for Magenic: Microsoft Data Appliances Lower the Entry Bar for MSBI Adoption.
Here is the summary of the session I attended while at my 4th day at the Summit. (Technically, this is the second day of the Summit.) I went to four sessions all of which were in the BI tracts.
Top 10 Things You Didn’t Know about SSIS by Matt Masson, Microsoft (@mattmasson)
Be sure to check out the Tech Ed Video with CTP 3 enhancements.
Two New Concepts in SQL Server 2012 Integration Services: Projects and Parameters. SSIS projects now compile down to a single ispac file (“ice packs”). This will allow you to deploy packages in a project. Parameters have been added to better support configurations and they are exposed by the package execution UIs.
The SSIS Catalog. A SQL Server User database called “SSISDB” is where the catalog is stored. This provides a central place for configuration including new parameters and connection managers. Security is also much improved as encryption is now managed at the server level. Logs and reporting will also be recorded in the catalog.
Top 10 List:
1. Change Data Capture and 2. ODBC Support through Attunity partnership.
CDC for Oracle databases will result in better integration similar to SQL Server CDC. CDC components in SSIS will work with both SQL Server and Oracle. Three new tasks have been added to SSIS – CDC Control Task , CDC Source (reads changes) , CDC Splitter (handles insert/update/delete).
ODBC components have also been added to support many of the non-Microsoft data sources including bulk load capabilities with the ODBC destination object.
3. Connection Manager
We now have connection managers which are shared at the project level. Connection managers get marked as offline if the data source is not available or you want to bring it offline. The capability to share in-memory cache across packages at the project level has been added.
4. Flat File Source Improvements
Varying number of columns and embedded qualifiers are now supported.
5. Package Format Changes
Updated the XML format using attributes. Layout information can now be ignored. Better support for merging. Lineage IDs have been replaced with refid to support better merge. Still not recommeded, but better supported.
6. Visual Studio Designer Changes
Parameters are in a Visual Studio design tab which allow you to set up defaults for various environments.
7. Script Components
Uses compiled IL in .NET for better performance. Script component debugging is in 2012 with the ability debug script components and see the values of the data in the pipe. All script components support debugging.
8. Troubleshooting and Logging
Because all packages use the SSIS catalog, all error and performance logging including task execution timing and row counts are recorded in the SSISDB. Much of this information will be available in built in reports.
9. Data Taps
Allows data in a pipeline to be dumped to a flat file. This allows for troubleshooting packages with errors in production. You can create data taps in SSMS using T-SQL.
10. Server Management using PowerShell
The whole SSIS catalog can be managed or accessed through PowerShell or TSQL. This will allow you to automate or script out deployment and frequently used tasks.
You can do a lot of these tasks right in SSMS as well using menu and shortcut functionality. You can query the SSISDB for performance counters while the package is executing. You can query SSISDB for package execution progress at the task and component levels.
There is built in reporting. You start from the Integration Services Dashboard with a bunch of stats on the front page. It includes a number of built in reports with performance data and other relevant topics. Jamie Thompson has a CodePlex report pack (http://sqlblog.com/blogs/jamie_thomson/archive/2011/09/04/ssis-reporting-pack-v0-2-now-available.aspx). Check out post from Jamie on the SSIS blog about tables as well(http://sqlblog.com/blogs/jamie_thomson/archive/2011/07/16/ssis-logging-in-denali.aspx).
They bubble up errors related to connections. There are drill down reports available to the error messages in the tool. Logs have a configurable retention value. Versions of projects are also maintained and the number of versions is configurable.
11. Project Barcelona Metadata Discovery
This is a project that is coming soon. Demos available online. This is not a metadata repository. This is zero touch. Targets get crawled and added to an Index Server. Check out the project blog at http://blogs.msdn.com/b/project_barcelona_team_blog/.
Tier-1 BI in the World of Big Data – Denny Lee and Thomas Keiser, SQLCAT
Two models to work with Dimensional (UDM) and Tabular (formerly BISIM). UDM is the current large scale engine. Tabular puts all of the data in the memory, so not very suitable large cubes.
New themes for workload scale:
- Cheap storage at scale
- Massive query scale (both size and concurrent)
- Scaling ETL another order of magnitude
- Scaled and integrated reporting/BI
NoSQL ecosystem (e.g. Facebook, Yahoo, Twitter, Google, Amazon data offerings) are being used frequently. Not only are people using these systems, they are building solutions on top of them. This is why Microsoft has announced the HADOOP integration.
MapReduce systems (e.g. HADOOOP) and RDBMS systems are actually complimentary, not competing in most cases. HADOOP is about dynamic schemas. Getting the data somewhere is the important piece. Hadoop / NoSQL: Move compute to the Data. BASE basically available, Soft State Eventually Consistent. Each row may have differnent schema.
Tradintional RDBMS move data to compute. They answer the question quickly and are ACID compliant.
The balance between scale and performance is getting more important. What do you want? Guaranteed Response, but get it slow.
Hive Connector is the first Step in integraion with the BI Platform. Hive is a data warehouse framework for Hadoop. This is the starting point, not the end game for adding structure to Big Data for consumption by other tools such as SSAS. HiveODBC Driver will allow the cube to be loaded directly from Hadoop.
Yahoo – “Fast” Oracle Load – 24 TB cube. This is the largest known cube in the world. Yahoo moves data from Hadoop into Oracle and processes the cube from there.
Check out the new SSAS 2008 R2 Performance Guide from SQLCAT at http://sqlcat.com/sqlcat/b/whitepapers/archive/2011/10/10/analysis-services-2008-r2-performance-guide.aspx.
What’s New in StreamInsight? Isaac Kunen, Microsoft
Stream Data Processing – unbounded streams of data with queries lasting forever. This engine handles the data coming through. Temporal Semantics can be added to the StreamInsight data to make it more understandable.
Event processing varies greatly based on business function – latency versus aggregated data rate. StreamInsight is designed to handle high volumes of low latency data. Common use cases – Fraud Detection, Shipping, Sales, Clickstream Analysis, Security, Building Management. Common scenario is pattern analysis.
- StreamInsight 1.2 is available today on premise.
- StreamInsight on Azure currently private CTP, should open up later this year.
- 3 event types: intervals (begin end), points (“one tick”), edge (only the beginning with an end to be defined later) events.
- CTI = Current Time Increment. Up to this point in time, all events are processed. From one CTI to the next CTI is the period reported. They can have to increment into the future, but do not need to be equal time periods. After the CTI is issued, the data is gone from memory.
- You write declarative queries against StreamInsight in LINQ. It is the actual query language for SI, not just a wrapper.
- Select and Project Operators return events from a stream.
- Join Operator returns events that overlap streams.
- Snapshot Window is any time the event stream changes, start a new window. This window is completely data driven.
Resiliency through checkpointing has been added to StreamInsight. This allows you to restart from specific points. Performance Counters and Admin Logs have also been added to StreamInsight 1.2. Check out his blog for more information: http://blogs.msdn.com/b/isaac/archive/2011/07/25/introducing-checkpointing-in-streaminsight-1-2.aspx.
Building the Perfect BI Semantic Model for Power View – Carl Rabeler (SQLCAT), Kasper de Jonge – Microsoft SSAS
You can import a PowerPivot workbook into SQL Server Data Tools as a Tabular SSAS project. This will create a Tabular Model in SSAS.
Changes in SQL Server Data Tools with Tabular models change the deployed Tabular SSAS model. You should use a small set of data when developing the model because the data needs to be loaded in memory. You can add data mining models to the tabular model by creating a connection to a SSAS Muultidimensional data source. When you add a data source to the tabular model in design, it will add the data to the tabular model (the workspace database). Columns can now be added into the model in any position.
There is an “effective user name” property in the user string to handle the doublehop issues without Kerberos. Set execution context to this account sets this property in the Reporting Services Data Servivces (RSDS) connection in SharePoint. This functionality is currently available in SQL Server 2008 R2. In SQL Server 2012, this can be seen in the profiler trace in the discoverer events.
To add images to the Tabular model, you can use an embedded image, stored image (e.g. in the database), or link. If you use a link, be sure to set the ImageURL property to “true” for this column so Power View knows that it is an image.
The tabular model has a wealth of metadata options that will be carried forward to Power View. For instance, if you set the Default Label and Default Image in the tabular model table, Power View will display these in many visualizations including Card View.
Be aware that in Power View, the refresh on the tool refreshes data wheras an IE refresh will reload the metadata.
Time intelligence functions in DAX will allow you to create functions as Year To Date. However, if your tabular model uses surrogate keys to map to the Date table you need to specify the table as a Date table and specify the date column to use. This is needed to make sure the Time Intelligence functions work correctly.
The RANKX(Table, Expression, [Value], [Order], [Time]) function was illustrated to show how to rank items within a table. Ranking:=RANKX(all(Products), Sum(Margin)). Watch out for situations where you are using a complex key.
DirectQuery functionality will allow you to directly query SQL Server relational data. There is a list of limitations to using DirectQuery. Model with it carefully.
While working with Microsoft BI projects, I am often required to deliver the visualizations through SharePoint. On my current project, I have a SharePoint 2010 development environment on Windows 7 that has SharePoint and all the SQL Server services running. Last week I was preparing to demo our progress when the site became unresponsive. The demo site had the wonderfully descriptive message: “An unexpected error has occurred.” So, I attempted to open the management console but that page could not be found (404 error). I turned on the expanded error messaging by modifying the web.config file and checked the log files. The errors pointed to the inability to connect to a database. Upon further investigation I discovered that three of our databases went suspect including the admin and content databases.
At this point, I proceeded to try to restore my full SharePoint backup. Which works in all situations except for recovering the admin database. Based on the online resources, this is to be expected. After conferring with the coworkers, I was able to determine that the admin database was unrecoverable and the best bet was a reinstall of SharePoint. After working through some options I was able to get the site back up and running without recreating the entire solution. The rest of the blog walks through the recovery scenario and calls out the nuances of having BI components as part of the content.
First, the SharePoint Farm must be Farm backed up. I did this through the Central Administration console.
The next step involves uninstalling SharePoint. I did this by running Setup and choosing the Remove Option. (The Repair option did not work.) Once I had SharePoint uninstalled, I proceeded to reinstall it. After SharePoint was installed, I ran the Configuration wizard. This step recreated the Admin database. Once this was complete, I verified that the starter site and the Central Administration site were operational.
Recovering the Farm
You can recover the farm from either the Central Administration site or by using PowerShell. In this case, I used Central Administration as it is easier to select and understand the options.
I first browsed to the Backup and Restore menu page. From this page, I selected the “Restore from a backup” option under the Farm Backup and Restore header.
NOTE: In order to run the backup or restore farm functions, the Administration service must be running. If it is not running, start it in the Windows Services console.
There are three pages or steps to recovering the backup.
Step 1. Select the backup to restore.
Step 2. Select the components to restore.
For my purposes, I selected the Farm option which restores all of the options to the farm from the backup.
Step 3. Select Restore Options.
The first two selections are the most important. I selected the “Restore content and configuration” and “Same configuration”. This uses the same application and site names for the full restore of your site. The “New configuration” option would have required me to rename all of the applications and services I needed to restore. There were a lot of applications and services to rename which I chose not to do.
When I restored at both the client and on my personal development environments I needed to remove the following services before running the restore:
- Application Registry Service
- Web Analytics Service Application
- User Profile Service Application
- Search Service Application.
When I did not remove these, my restore failed due to duplicate names. You may run into a similar issue, so removing them before the restore is easiest. These applications and services should be removed from the Central Administration site. Once these services are removed, run the restore.
Bringing BI Back Online
At this point, my site would have been operational except I had a Reporting Services webpart on the homepage of my site. Because of that I saw an error below.
As the error message notes, the issue is with the Reporting Services webpart. The issue is that the full restore did not restore the Reporting Services integration. The integration needed to be redone manually.
Once Reporting Services was reintegrated I was able to open the site. However, neither the Reporting Services or PerformancePoint webparts were working.
These issues are primarily security related. To fix the Reporting Services issue, I openend the data source and reset the credentials. Once reestablished, the report webpart came back online.
To resolve the PerformancePoint issue, I first went to Central Administration to refresh the Secure Store Service key. I was able to use my passphrase refresh the key. (NOTE: if necessary you can create a new key.) Next, I went in to the PerformancePoint Service Application settings and reset the unattended service account.
Once these steps are complete, the site was fully restored. While the full backup is extremely useful for site failures it will not recover all of the BI components and as such feels very incomplete. I hope this saves someone some time later.