T-SQL Window Functions on LessThanDot and at SQL Saturday 149

26 09 2012

LessThanDot Sit LogoI recently completed a series of blog posts on www.lessthandot.com on T-SQL Window functions.  The enhancements to SQL Server 2012 in this area are phenomenal.  They solve a myriad of issues including calculating running totals with SQL.  Check it out if you want to learn more and get some simple examples related to the functions and structure related to the window functions.  Here is the series outline and links to each section.

T-SQL Window Functions:

I do a presentation related to T-SQL functions for SQL Saturdays and am presenting it at the PASS Summit this year.  Maybe I will see you there.

I recently presented this at SQL Saturday #149 in Minnesota.  Here is the presentation and the demo code. Thanks for attending.

 

Finally, if you use Oracle, you will find this series helpful as well.  Most of the syntax is supported in Oracle as well.  Look for an Oracle tip with the Oracle samples for your use soon.





Oracle Tips for MSBI Devs #4: Executing an Oracle Stored Proc with IN Parameters in SSIS Execute SQL Task

1 05 2012

The first tip I published discussed how to execute an Oracle procedure with no parameters.  In this tip, I will discuss a technique that works with IN parameters in an Oracle stored procedure.

After many unsuccessful attempts at executing a stored procedure with parameters, the following pattern was developed by a one of my peers working in a blended environment, Brian Hanley (T | B).  With his permission, I have documented the solution here for your use.

The Solution:

The solution involves using variables and creating the SQL script to be executed in an Script task.

image

Here is the syntax for the procedure used in the examples:

CREATE OR REPLACE PROCEDURE
SCOTT.spDelete1 (DEPTNUMBER int) IS
BEGIN
DELETE FROM DEPT WHERE DEPTNO=DEPTNUMBER;
END spDelete1;

Create variables

Create variables to hold the name of the procedure, any parameters, and the finished script.  In my example, I only have one parameter in the procedure, so I only use three variables.  If it fits your needs, you can also separate the user/schema name into a separate variable.

image

The variable used for the statement has been set up to use string formatting with C#.

Prepare the Statement Variable

Use the Script task to build the statement variable (SPStatement).  Start by adding the variables to the script task.  Be sure to add the statement variable to the ReadWriteVariables collection.

image

The following image contains the script syntax to use to set up the variable.  As noted above, the C# String.Format function is used to update the statement variable.

public void Main()
{
    // TODO: Add your code here
    Dts.Variables["SPStatement"].Value = String.Format(Dts.Variables["SPStatement"].Value.ToString()
        ,Dts.Variables["StoredProcName"].Value.ToString()
        ,Dts.Variables["SPVar1"].Value.ToString()                
        );

    String msg = "SPStatement: " + Dts.Variables["SPStatement"].Value.ToString();
    Boolean refire = true;

    Dts.Events.FireInformation(0, "SPStatement", msg, String.Empty, 0, ref refire);

    Dts.TaskResult = (int)ScriptResults.Success;
}

Setting up the Execute SQL Task

In the Execute SQL task, you will set the SQLSourceType property to Variable and set the SourceVariable to the name of this statement variable.  In the case of our example, this is the SPStatement variable.

image

Versions:

This tip has been confirmed to work in both SQL Server 2008 R2 Integration Services and SQL Server 2012 Integration Services.  The Oracle version tested with is Oracle 11g.





Oracle Tips for MSBI Devs #3: Choosing Drivers

24 04 2012

When working with Oracle, drivers are truly a pain to get working correctly.  I will discuss my preferred choice and why for the following tools – SSIS, SSAS, and SSRS.

SSIS Drivers

Without much question, you should use the Attunity tools for working with Oracle data in the Data Flow task.  In SSIS 2008, the SSIS Connector is free and can be found here:  http://www.microsoft.com/download/en/search.aspx?q=oracle%20connector.  It includes the connection manager, source component and destination component.  Without a doubt this is the only way to work with Oracle data components in the Data Flow task.  (NOTE: I cannot find the SSIS 2012 equivalent at the moment.  However, Matt Massan’s blog post after PASS Summit 2011 notes more work is being done with Attunity.)  UPDATE: I wrote this prior to a blog post from Matt Massan on support for SSIS 2012 with v2.0 of the Microsoft Connector.  Check out Matt’s update on this: http://blogs.msdn.com/b/mattm/archive/2012/04/04/microsoft-connectors-v2-0-for-oracle-and-teradata-now-available.aspx.

However, this connector does you no good when working with the Execute SQL task.  In SSIS 2008, I use the OLE DB provider from Oracle to create the connection used with Execute SQL Task.  In my work with procedures in my first tip, I used the OLE DB provider with 2012 as well and it worked fine.

SSAS Drivers

When working with SSAS 2008 and, until I know differently, SSAS 2012, I would recommend using the Oracle OLE DB driver from Oracle.  This driver is not the fastest I have seen (third party drivers are marginally faster and the .NET driver is faster as well), but it has provided consistent results for the right price.  Third party drivers will improve the throughput, but not substantially.  The Oracle provided .NET driver is faster as well, but has an unchangeable active query timeout of one hour.  If you have any processing times that exceed this, it will unceremoniously drop the connection.  For these reasons, I have stuck with the OLE DB provider from Oracle which is not necessarily the fastest, but it has been the least painful to work with.

SSRS Drivers

It is with SSRS I have seen mixed results.  Primarily because of the better performance in the Oracle .NET driver.  If you can guarantee that your reports will return their data in under an hour, this seems to be the best option.  However, if you want to manage to a single driver set across all tools, you may find that the management of the OLE DB driver as the only driver makes sense in your organization.

Test, Test, Test

I have given you my experience using the drivers above.  However, you may find value in purchasing a third party driver or you may find a different experience when you implement in your environment.  Be sure to test and understand the implications in maintenance and system cost when choosing different drivers across your solutions.





Great User Group Meeting Tuesday, Now on to Madison and SQL Saturday #118

19 04 2012

PASSMNLogoMinnesota SQL Server User Group Meeting Review (4/17/12)

First of all, I have to thank the awesome speakers who presented at our user group meeting on Tuesday night, April 17.  Mark Souza ( T | B ) for a great question and answer session.  He was open about SQL Server and where it is heading.  Some items that interested me is that SQL Azure was running the SQL Server 2012 platform in production by December 2011.  Microsoft is committed to releasing more changed through the SQL Azure platform at a faster clip.  This will allow them to continually improve the quality of releases for the boxed version as well.  (Speaking of which, the boxed version is not going away any time soon.)

Mark also spoke about leaving data at its source and using tools to bring the data together for reporting and analysis.  He touted the Hadoop integration efforts as key part of this strategy.  If you have followed my blog for a bit, you know that I do a lot of work with Oracle and MSBI.  As a result, I had to ask when a good version of Oracle drivers would be available.  While he did not have a good answer on this at the time, he did mention that OData is a data access solution for me to look at. Look for a future blog post as I explore this protocol to ease some of my Oracle.

After Mark finished answering the variety of questions asked by the crowd we handed the microphone to Itzik Ben-Gan ( T | B ).  As usual, he took an entire hour to help us understand that we did not know all of the possible ways to use the APPLY operator.  However, I also learned about the DBCC OPTIMIZER_WHATIF function which allows you to change the system settings the Optimizer uses to create a plan.  For instance, DBCC OPTIMIZER_WHATIF(1,16) will tell the optimizer to use 16 processors in its plan.

SQL Saturday #118 – Madison, WI (4/21/12)sqlsat118_web

Finally, I will be out at SQL Saturday #118 in Madison this weekend.  I have two presentations there – A Window Into Your Data: Using SQL Window Functions and Performance Monitoring and Tuning in SSAS.  Maybe I will see you there.





Oracle Tips for MSBI Devs #2: Keyword compare between Oracle and SQL Server

17 04 2012

While attending a recent Windows Azure Quick Start, the presenter, Mike Benkovich (@mbenko) happened to show a table on his site with this keyword comparison between SQL Server and Oracle so I wanted to pass it a long.

The Solution:

A list of comparable functions for SQL Server or Oracle developers.  Check out Mike’s site:  http://www.benkotips.com/pages/TSQLvsOracleKeywords.aspx for this helpful list.  While not exhaustive, it a one-stop shop for many of the most commonly used functions in SQL Server and their equivalent in Oracle.

The Situation:

You are an experienced SQL Server developer who now needs to pull data from Oracle or work with Oracle SQL to load Analysis Services or in Integration Services.  Some things are the same, but you want to find the equivalent of PATINDEX in Oracle.





Oracle Tips for MSBI Devs #1: Executing an Oracle Stored Proc with No Parameters in SSIS Execute SQL Task

4 03 2012

Originally posted: 3-4-2012

Welcome to a new blog series of tips related to Oracle with a focus on helping those of us who are delivering Microsoft Business Intelligence solutions using Oracle in some way.

The Solution:

In an Execute SQL Task the proper syntax for executing an Oracle stored procedure with no parameters is:

{CALL <schema name>.<stored procedure name>}

image

This works with the OLEDB driver from Oracle. (Do not use the Microsoft Oracle OLEDB driver as it is being deprecated.)

image

The Situation:

The following list highlights the most common errors when trying to get an Oracle procedure to execute.

  • EXEC or EXECUTE – as SQL Server pros we often use EXEC to make the procedure call. When working with Oracle, you need to use CALL
  • Forget the schema name – when using Oracle, you should specify the schema name when calling the procedure. In some cases, you may pick up the schema from the user, but for a consistent pattern and result, you should explicitly specify the schema name.
  • No curly braces – this is the one I did not know and a coworker tracked down some examples of this usage. Enclosing the call in curly braces appears to let the driver know that it is a procedure call and handle it appropriately.
  • Adding a semi-colon – most work with Oracle statements results in a semi-colon to conclude the statement. In this syntax, you will get an error when you conclude your statement with a semi-colon.

Versions:

This tip has been confirmed to work in both SQL Server 2008 R2 Integration Services and SQL Server 2012 Integration Services. The Oracle version tested with is Oracle 11g.

Note on Parameters:

This syntax has not worked consistently with parameters. I will post a followup tip for handling parameters at a later time.





SQL PASS Summit–Day 4 (BI Sessions)

16 10 2011

Here is the summary of the session I attended while at my 4th day at the Summit. (Technically, this is the second day of the Summit.)  I went to four sessions all of which were in the BI tracts.

Top 10 Things You Didn’t Know about SSIS by Matt Masson, Microsoft (@mattmasson)

Be sure to check out the Tech Ed Video with CTP 3 enhancements.

Two New Concepts in SQL Server 2012 Integration Services:  Projects and Parameters.  SSIS projects now compile down to a single ispac file (“ice packs”).  This will allow you to deploy packages in a project.  Parameters have been added to better support configurations and they are exposed by the package execution UIs.

The SSIS Catalog.  A SQL Server User database called “SSISDB” is where the catalog is stored.  This provides a central place for configuration including new parameters and connection managers.  Security is also much improved as encryption is now managed at the server level.  Logs and reporting will also be recorded in the catalog.

Top 10 List:

1. Change Data Capture and 2. ODBC Support through Attunity partnership.

CDC for Oracle databases will result in better integration similar to SQL Server CDC.  CDC components in SSIS will work with  both SQL Server and Oracle.  Three new tasks have been added to SSIS – CDC Control Task , CDC Source (reads changes) , CDC Splitter (handles insert/update/delete).

ODBC components have also been added to support many of the non-Microsoft data sources including bulk load capabilities with the ODBC destination object.

3. Connection Manager

We now have connection managers which are shared at the project level.  Connection managers get marked as offline if the data source is not available or you want to bring it offline.  The capability to share in-memory cache across packages at the project level has been added.

4. Flat File Source Improvements

Varying number of columns and embedded qualifiers are now supported.

5. Package Format Changes

Updated the XML format using attributes.  Layout information can now be ignored.  Better support for merging.  Lineage IDs have been replaced with refid to support better merge. Still not recommeded, but better supported.

6. Visual Studio Designer Changes

Parameters are in a Visual Studio design tab which allow you to set up defaults for various environments.

7. Script Components

Uses compiled IL in .NET for better performance.  Script component debugging is in 2012 with the ability debug script components and see the values of the data in the pipe.  All script components support debugging.

8. Troubleshooting and Logging

Because all packages use the SSIS catalog, all error and performance logging including task execution timing and row counts are recorded in the SSISDB.  Much of this information will be available in  built in reports.

9. Data Taps

Allows data in a pipeline to be dumped to a flat file.  This allows for troubleshooting packages with errors in production.  You can create data taps in SSMS using T-SQL.

10.  Server Management using PowerShell

The whole SSIS catalog can be managed or accessed through PowerShell or TSQL.  This will allow you to automate or script out deployment and frequently used tasks.

You can do a lot of these tasks right in SSMS as well using menu and shortcut functionality.  You can query the SSISDB for performance counters while the package is executing.  You can query SSISDB for package execution progress at the task and component levels.

There is built in reporting.  You start from the Integration Services Dashboard with a bunch of stats on the front page.  It includes a number of built in reports with performance data and other relevant topics.  Jamie Thompson has a CodePlex report pack (http://sqlblog.com/blogs/jamie_thomson/archive/2011/09/04/ssis-reporting-pack-v0-2-now-available.aspx).  Check out post from Jamie on the SSIS blog about tables as well(http://sqlblog.com/blogs/jamie_thomson/archive/2011/07/16/ssis-logging-in-denali.aspx).

They bubble up errors related to connections.  There are drill down reports available to the error messages in the tool. Logs have a configurable retention value. Versions of projects are also maintained and the number of versions is configurable.

11.  Project Barcelona Metadata Discovery

This is a project that is coming soon.  Demos available online.  This is not a metadata repository.  This is zero touch.  Targets get crawled and added to an Index Server.  Check out the project blog at http://blogs.msdn.com/b/project_barcelona_team_blog/.

Tier-1 BI in the World of Big Data – Denny Lee and Thomas Keiser, SQLCAT

Two models to work with Dimensional (UDM) and Tabular (formerly  BISIM).  UDM is the current large scale engine. Tabular puts all of the data in the memory, so not very suitable large cubes.

New themes for workload scale:

  • Cheap storage at scale
  • Massive query scale (both size and concurrent)
  • Scaling ETL another order of magnitude
  • Scaled and integrated reporting/BI

NoSQL ecosystem (e.g. Facebook, Yahoo, Twitter, Google, Amazon data offerings) are being used frequently.  Not only are people using these systems, they are building solutions on top of them.  This is why Microsoft has announced the HADOOP integration.

MapReduce systems (e.g. HADOOOP)  and RDBMS systems are actually complimentary, not competing in most cases.  HADOOP is about dynamic schemas.  Getting the data somewhere is the important piece.  Hadoop / NoSQL: Move compute to the Data. BASE basically available, Soft State Eventually Consistent. Each row may have differnent schema.

Tradintional RDBMS move data to compute.  They answer the question quickly and are ACID compliant.

The balance between scale and performance is getting more important.  What do you want? Guaranteed Response, but get it slow.

Hive Connector is the first Step in integraion with the BI Platform.  Hive is a data warehouse framework for Hadoop.  This is the starting point, not the end game for adding structure to Big Data for consumption by other tools such as SSAS.  HiveODBC Driver will allow the cube to be loaded directly from Hadoop.

Yahoo – “Fast” Oracle Load – 24 TB cube. This is the largest known cube in the world.  Yahoo moves data from Hadoop into Oracle and processes the cube from there.

Check out the new SSAS 2008 R2 Performance Guide from SQLCAT at http://sqlcat.com/sqlcat/b/whitepapers/archive/2011/10/10/analysis-services-2008-r2-performance-guide.aspx.

What’s New in StreamInsight?  Isaac Kunen, Microsoft

Stream Data Processing – unbounded streams of data with queries lasting forever.  This engine handles the data coming through.  Temporal Semantics can be added to the StreamInsight data to make it more understandable.

Event processing varies greatly based on business function – latency versus aggregated data rate.  StreamInsight is designed to handle high volumes of low latency data. Common use cases – Fraud Detection, Shipping, Sales, Clickstream Analysis, Security, Building Management.  Common scenario is pattern analysis.

StreamInsight Releases:

  • StreamInsight 1.2 is available today on premise.
  • StreamInsight on Azure currently private CTP, should open up later this year.

Temporal Semantics

  • 3 event types: intervals (begin end), points (“one tick”), edge (only the beginning with an end to be defined later) events.
  • CTI = Current Time Increment.  Up to this point in time, all events are processed.  From one CTI to the next CTI is the period reported.  They can have to increment into the future, but do not need to be equal time periods.  After the CTI is issued, the data is gone from memory.
  • You write declarative queries against StreamInsight in LINQ.  It is the actual query language for SI, not just a wrapper.
  • Select and Project Operators return events from a stream.
  • Join Operator returns events that overlap streams.
  • Snapshot Window is any time the event stream changes, start a new window.  This window is completely data driven.

Resiliency through checkpointing has been added to StreamInsight.  This allows you to restart from specific points.  Performance Counters and Admin Logs have also been added to StreamInsight 1.2.  Check out his blog for more information: http://blogs.msdn.com/b/isaac/archive/2011/07/25/introducing-checkpointing-in-streaminsight-1-2.aspx.

Building the Perfect BI Semantic Model for Power View – Carl Rabeler (SQLCAT), Kasper de Jonge – Microsoft SSAS

You can import a PowerPivot workbook into SQL Server Data Tools as a Tabular SSAS project.  This will create a Tabular Model in SSAS.

Changes in SQL Server Data Tools with Tabular models change the deployed Tabular SSAS model.  You should use a small set of data when developing the model because the data needs to be loaded in memory. You can add data mining models to the tabular model by creating a connection to a SSAS Muultidimensional data source.  When you add a data source to the tabular model in design, it will add the data to the tabular model (the workspace database).  Columns can now be added into the model in any position.

There is an “effective user name” property in the user string to handle the doublehop issues without Kerberos.  Set execution context to this account sets this property in the Reporting Services Data Servivces (RSDS) connection in SharePoint.  This functionality is currently available in SQL Server 2008 R2. In SQL Server 2012, this can be seen in the profiler trace in the discoverer events.

To add images to the Tabular model, you can use an embedded image, stored image (e.g. in the database), or link.  If you use a link, be sure to set the ImageURL property to “true” for this column so Power View knows that it is an image.

The tabular model has a wealth of metadata options that will be carried forward to Power View.  For instance, if you set the Default Label and Default Image in the tabular model table, Power View will display these in many visualizations including Card View.

Be aware that in Power View, the refresh on the tool refreshes data wheras an IE refresh will reload the metadata.

Time intelligence functions in DAX will allow you to create functions as Year To Date.  However, if your tabular model uses surrogate keys to map to the Date table you need to specify the table as a Date table and specify the date column to use.  This is needed to make sure the Time Intelligence functions work correctly.

The RANKX(Table, Expression, [Value], [Order], [Time]) function was illustrated to show how to rank items within a table. Ranking:=RANKX(all(Products), Sum(Margin)).  Watch out for situations where you are using a complex key.

DirectQuery functionality will allow you to directly query SQL Server relational data.  There is a list of limitations to using DirectQuery.  Model with it carefully.








Follow

Get every new post delivered to your Inbox.

Join 538 other followers

%d bloggers like this: