T-SQL Window Functions – Part 4: Analytic Functions

This is a reprint with some revisions of a series I originally published on LessThanDot. You can find the links to the original blogs on my Series page.

TSQL-WIndow-Functions_thumb1_thumb_tIn the final installment of my series on SQL window functions, we will explore using analytic functions. Analytic functions were introduced in SQL Server 2012 with the expansion of the OVER clause capabilities. Analytic functions fall in to two primary categories: values at a position and percentiles. Four of the functions, LAG, LEAD, FIRST_VALUE and LAST_VALUE find a row in the partition and returns the desired value from that row. CUME_DIST and PERCENT_RANK break the partition into percentiles and return a rank value for each row. PERCENTILE_CONT and PERCENTILE_DISC a value at the requested percentile in the function for each row. All of the functions and examples in this blog will only work with SQL Server 2012.
Once again, the following CTE will be used as the query in all examples throughout the post:

with CTEOrders as
(select cast(1 as int) as OrderID, cast(‘3/1/2012’ as date) as OrderDate, cast(10.00 as money) as OrderAmt, ‘Joe’ as CustomerName
union select 2, ‘3/1/2012’, 11.00, ‘Sam’
union select 3, ‘3/2/2012’, 10.00, ‘Beth’
union select 4, ‘3/2/2012’, 15.00, ‘Joe’
union select 5, ‘3/2/2012’, 17.00, ‘Sam’
union select 6, ‘3/3/2012’, 12.00, ‘Joe’
union select 7, ‘3/4/2012’, 10.00, ‘Beth’
union select 8, ‘3/4/2012’, 18.00, ‘Sam’
union select 9, ‘3/4/2012’, 12.00, ‘Joe’
union select 10, ‘3/4/2012’, 11.00, ‘Beth’
union select 11, ‘3/5/2012’, 14.00, ‘Sam’
union select 12, ‘3/6/2012’, 17.00, ‘Beth’
union select 13, ‘3/6/2012’, 19.00, ‘Joe’
union select 14, ‘3/7/2012’, 13.00, ‘Beth’
union select 15, ‘3/7/2012’, 16.00, ‘Sam’
)
select OrderID
,OrderDate
,OrderAmt
,CustomerName
from CTEOrders;

Position Value Functions: LAG, LEAD, FIRST_VALUE, LAST_VALUE

Who has not needed to use values from other rows in the current row for a report or other query? A prime example is needing to know what the last order value was to calculate growth or just show the difference in the results. This has never been easy in SQL Server until now. All of these functions require the use of the OVER clause and the ORDER BY clause. They all use the current row within the partition to find the row at the desired position.

The LAG and LEAD functions allow you to specify the offset or how many rows to look forward or backward and they support a default value in cases where the value returned would be null. These functions do not support the use of ROWS or RANGE in the OVER clause. The FIRST_VALUE and LAST_VALUE allow you to further define the partition using ROWS or RANGE if desired.

The following example illustrates all of the functions with various variations on the parameters and settings.

select OrderID
,OrderDate
,OrderAmt
,CustomerName
,LAG(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderID) as PrevOrdAmt
,LEAD(OrderAmt, 2) OVER (PARTITION BY CustomerName ORDER BY OrderID) as NextTwoOrdAmt
,LEAD(OrderDate, 2, ‘9999-12-31’) OVER (PARTITION BY CustomerName ORDER BY OrderID) as NextTwoOrdDtNoNull
,FIRST_VALUE(OrderDate) OVER (ORDER BY OrderID) as FirstOrdDt
,LAST_VALUE(CustomerName) OVER (PARTITION BY OrderDate ORDER BY OrderID) as LastCustToOrdByDay

from CTEOrders

Results (with shortened column names):
ID OrderDate Amt Cust PrevOrdAmt NextTwoAmt NextTwoDt FirstOrd LastCust
1 3/1/2012 10 Joe NULL 12 3/3/2012 3/1/2012 Joe
2 3/1/2012 11 Sam NULL 18 3/4/2012 3/1/2012 Sam
3 3/2/2012 10 Beth NULL 11 3/4/2012 3/1/2012 Beth
4 3/2/2012 15 Joe 10 12 3/4/2012 3/1/2012 Joe
5 3/2/2012 17 Sam 11 14 3/5/2012 3/1/2012 Sam
6 3/3/2012 12 Joe 15 19 3/6/2012 3/1/2012 Joe
7 3/4/2012 10 Beth 10 17 3/6/2012 3/1/2012 Beth
8 3/4/2012 18 Sam 17 16 3/7/2012 3/1/2012 Sam
9 3/4/2012 12 Joe 12 NULL 12/31/9999 3/1/2012 Joe
10 3/4/2012 11 Beth 10 13 3/7/2012 3/1/2012 Beth
11 3/5/2012 14 Sam 18 NULL 12/31/9999 3/1/2012 Sam
12 3/6/2012 17 Beth 11 NULL 12/31/9999 3/1/2012 Beth
13 3/6/2012 19 Joe 12 NULL 12/31/9999 3/1/2012 Joe
14 3/7/2012 13 Beth 17 NULL 12/31/9999 3/1/2012 Beth
15 3/7/2012 16 Sam 14 NULL 12/31/9999 3/1/2012 Sam

If you really like subselects, you can also mix in some subselects and have a very creative SQL statement. The following statement uses LAG and a subselect to find the first value in a partition. I am showing this to illustrate some more of the capabilities of the function in case you have a solution that requires this level of complexity.

select OrderID
,OrderDate
,OrderAmt
,CustomerName
,LAG(OrderAmt, (
select count(*)-1
from CTEOrders c
where c.CustomerName = CTEOrders.CustomerName
and c.OrderID <= CTEOrders.OrderID), 0)
OVER (PARTITION BY CustomerName ORDER BY OrderDate, OrderID) as FirstOrderAmt
FROM CTEOrders

Percentile Functions: CUME_DIST, PERCENT_RANK, PERCENTILE_CONT, PERCENTILE_DISC

As I wrap up my discussion on window functions, the percentile based functions were the functions I knew the least about. While I have already used the position value functions above, I have not yet needed to use the percentiles. So, that meant I had to work with them for a while so I could share their usage and have some samples for you to use.

The key differences in the four function have to do with ranks and values. CUME_DIST and PERCENT_RANK return a ranking value while PERCENTILE_CONT and PERCENTILE_DISC return data values.

CUME_DIST returns a value that is greater than zero and lest than or equal to one (>0 and <=1) and represents the percentage group that the value falls into based on the order specified. PERCENT_RANK returns a value between zero and one as well (>= 0 and <=1). However, in PERCENT_RANK the first group is always represented as 0 whereas in CUME_DIST it represents the percentage of the group. Both functions return the last percent group as 1. In both cases, as the ranking percentages move from lowest to highest, each group’s percent value includes all of the earlier values in the calculation as well.

The following statement shows both of the functions using the default partition to determine the rankings of order amounts within our dataset.

select OrderID
,OrderDate
,OrderAmt
,CustomerName
,CUME_DIST() OVER (ORDER BY OrderAmt) CumDist
,PERCENT_RANK() OVER (ORDER BY OrderAmt) PctRank
FROM CTEOrders

Results:
OrderID OrderDate OrderAmt CustomerName CumDist PctRank
1 3/1/2012 10 Joe 0.2 0
3 3/2/2012 10 Beth 0.2 0
7 3/4/2012 10 Beth 0.2 0
2 3/1/2012 11 Sam 0.33333333 0.214285714
10 3/4/2012 11 Beth 0.33333333 0.214285714
6 3/3/2012 12 Joe 0.46666667 0.357142857
9 3/4/2012 12 Joe 0.46666667 0.357142857
14 3/7/2012 13 Beth 0.53333333 0.5
11 3/5/2012 14 Sam 0.6 0.571428571
4 3/2/2012 15 Joe 0.66666667 0.642857143
15 3/7/2012 16 Sam 0.73333333 0.714285714
5 3/2/2012 17 Sam 0.86666667 0.785714286
12 3/6/2012 17 Beth 0.86666667 0.785714286
8 3/4/2012 18 Sam 0.93333333 0.928571429
13 3/6/2012 19 Joe 1 1

The last two functions, PERCENTILE_CONT and PERCENTILE_DISC, return the value at the percentile requested. PERCENTILE_CONT will return the true percentile value whether it exists in the data or not. For instance, if the percentile group has the values 10 and 20, it will return 15. If PERCENTILE_DISC, is applied to the same group it will return 10. It will return the smallest value in the percentile group, which in this case is 10. Both functions ignore NULL values and do not use the ORDER BY, ROWS, or RANGE clauses with the PARTITION BY clause. Instead, WITHIN GROUP is introduced which must contain a numeric data type and ORDER BY clause. Only one column can be specified here. Both functions need a percentile value which can be between 0.0 and 1.0.

The following script illustrates a couple of variations. The first two functions return the median of the default partition. Then next two return the median value for each day. Finally, the last two functions return the low and high values within the partition. The values segmented by the date partition highlight the key difference between the two functions.

select OrderID as ID
,OrderDate as ODt
,OrderAmt as OAmt
,CustomerName as CName
,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY OrderAmt) OVER() PerCont05
,PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY OrderAmt) OVER() PerDisc05
,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY OrderAmt) OVER(PARTITION BY OrderDate) PerContDt
,PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY OrderAmt) OVER(PARTITION BY OrderDate) PerDiscDt
,PERCENTILE_CONT(0) WITHIN GROUP (ORDER BY OrderAmt) OVER() PerCont0
FROM CTEOrders

Results
ID ODt OAmt CName PerCont05 PerDisc05 PerContDt PerDiscDt PerCont0
1 3/1/2012 10 Joe 13 13.00 10.5 10.00 10
2 3/1/2012 11 Sam 13 13.00 10.5 10.00 10
3 3/2/2012 10 Beth 13 13.00 15.0 15.00 10
4 3/2/2012 15 Joe 13 13.00 15.0 15.00 10
5 3/2/2012 17 Sam 13 13.00 15.0 15.00 10
6 3/3/2012 12 Joe 13 13.00 12.0 12.00 10
7 3/4/2012 10 Beth 13 13.00 11.5 11.00 10
10 3/4/2012 11 Beth 13 13.00 11.5 11.00 10
9 3/4/2012 12 Joe 13 13.00 11.5 11.00 10
8 3/4/2012 18 Sam 13 13.00 11.5 11.00 10
11 3/5/2012 14 Sam 13 13.00 14.0 14.00 10
12 3/6/2012 17 Beth 13 13.00 18.0 17.00 10
13 3/6/2012 19 Joe 13 13.00 18.0 17.00 10
14 3/7/2012 13 Beth 13 13.00 14.5 13.00 10
15 3/7/2012 16 Sam 13 13.00 14.5 13.00 10

As I wrap up this post, I have to give a shout out to my daughter, Kristy, who is an honors math student. She helped me get my head around this last group of functions. Her honors math work and some statistical work she had done in science helped provide additional insight into the math behind the functions. (Kristy – you rock!)

Series Wrap Up

I hope this series helps everyone understand the power and flexibility in the window functions made available in SQL Server 2012. If you happen to use Oracle, I know that many of these functions or there equivalent are also available in 11g and they also appear to be in 10g. I have to admit my first real production usage was with Oracle 11g but has since used them with SQL Server 2012. The expanded functionality in SQL Server 2012 is just one more reason to upgrade to the latest version.

Oracle Tips for MSBI Devs #6: Supporting SSAS Tabular Development

As SQL Server Analysis Services Tabular Models become more popular, models will use Oracle databases as sources. One of the key issues whenever you work with Oracle is understanding how to properly configure the necessary components to enable development.

Getting Started

If you have worked with Oracle before, you are very aware of a few things you need to be successful. First, you need to install the Oracle client. Here is where the details get messy. When you are working with MSBI tools, you will be using SQL Server Data Tools in Visual Studio which is still only 32 bit. Of the BI tools in SSDT, only SSIS has run modes to support 32 bit and 64 bit configurations. As a result, you need to install the 32 bit Oracle client in order to develop your tabular model.

Once that has been installed you will need to update the TNSNAMES.ORA file with the servers you will be targeting during development. Ideally, your Oracle DBAs have a file for you to use so you don’t need to create one. One nice thing is that the Oracle 12c client updates the PATH environment variable with the location of the bin folder. (Yes, Oracle still uses environment variables.) I would also recommend adding or using the TNS_ADMIN variable to specify the location of the TNSNAMES.ORA file. (See http://www.orafaq.com/wiki/TNS_ADMIN for details.)

NOTE: It took me many hours to work through a variety of configuration issues related to working with the Oracle client install. A couple of reinstalls, reboots, TNSNames.ORA tweaks, and lots of fruitless searching were all required to get this working. Be warned, working with Oracle clients are neither fun nor simple.

The Issue

Now that you have the 32 bit client installed you can connect to the Oracle database through the tabular model designer. As shown below, you can connect to Oracle through the Table Import Wizard.

image

You will be able to successfully test the connection as noted here.

image

And you will be able to execute a query and get results. You can also use the option to select tables and views.

image

However, once you decide to import the data you will encounter the following error:

image

The issue is that while you can do most of your work within Visual Studio using the 32 bit client, the import process targets the SQL Server tabular instance you specified when you created the project. While the 32 bit version of SQL Server is still available, most of us would not install that, even in our development environments. If you do not encounter this error, you are either using the 32 bit client of SQL Server or you have the 64 bit Oracle client installed (more on that next). As long as Visual Studio is only 32 bit compliant and you choose to use the 64 version of SQL Server you will see this issue.

The Resolution

The resolution is fairly simple. You need to download and install the 64 bit Oracle client. I would recommend that you get it installed, then reboot your development PC. While this may not be required, it seems to have helped me with a number of connectivity issues. You will need to be prepared for some “interesting” issues as you will have more than one Oracle home installed and you have the potential of many types of ORA-XXXXX errors. Once you are up and running you should be able to develop tabular models built on Oracle databases.

Some Parting Thoughts

First, I want to be clear that I think that Oracle is a solid database platform. However, I have never been at a client site or on a project where the connectivity or client installs were totally correct or functional without some work between the Oracle team and the BI development team. I think that the .NET driver is supposed to better and I may try that out for a later post (when I have the hours to spare).

I did the testing for this completely on Azure (and my Surface). I set up an Oracle VM and a SQL Server VM on Azure. The Microsoft team put together a great reference on setting up your Oracle VM. Check it out. I also did a previous post on setting up Oracle in an Azure VM. Both VM types can be pricey, but in a testing environment all was not too bad. I encourage you to use Azure to for these types of scenarios. But be sure to turn it off when you are done.

Oracle Tips for MSBI Devs #5: Working with Oracle on Windows Azure

As you have likely noticed in my series, Oracle Tips for MSBI Devs, I have done a lot of work with Oracle through the years while delivering BI solutions. One of the pain points of working with Oracle in development is setting up an Oracle development server. Even though I have installed Oracle a number of times, it is never seems to be an easy process.

So, I decided to try out the Oracle Virtual Machine template in Windows Azure. I will walk through the setup process here. I need to use Oracle as a data source for some SSIS development.

Setting Up the VM

From the Windows Azure portal, select the Virtual Machines tab then “Create a Virtual Machine”. This will open up the variety of options available to create the VM. Select the FROM GALLERY option which will open another dialog.

image

On the next screen, you pick the edition Oracle you want to use for the VM. (NOTE: at the moment, Oracle images are in preview. Microsoft recently that Oracle VMs will be be available on March 12. You can find more information here.)

image

I will be using the Oracle 11g R2 Standard Edition on Windows Server 2008 R2. The next step is to name and pick the size of the VM. The minimum size for this is Small and is what I used. I then completed the setup including setting up the endpoints and creating a new user.

I had originally tried to use Oracle 12c, but significant changes have been made to support multitenancy which make set up considerably more tedious with very few good examples available on the web. Most of the advice given by Oracle pros was to “Read the _____ Manual!” While “sensible”, I just needed a simple dev environment. This is one of the significant advantages of working with SQL Server, community help is abundant and usually pleasant. For instance, Microsoft recently published a document for setting up the Oracle 12c VM. I used it to work through some of the setup instructions below.

Once the initialization was complete I used the connect image button to open an RDP connection to the VM from the Azure dashboard. One thing to keep in mind, be sure to keep track of the user name and password you created. This your admin account and you will need it to log in to the VM. Now you have a running VM. At this point, I went and found the Oracle tools that I typically use and pinned them to the task bar.

Creating and Loading an Oracle Schema

Because I always for get some of these steps, and I really don’t want to read the manual, I listed the steps I used to create a simple set of data for use. This is not good enough for production, but it is a starting point.

Step 1: Create a Listener. This is required before you can create a database. To do this open the Oracle Net Configuration Assistant. From here you can create your first listener. I left the default settings for the setup.

Step 2: Create the database. (This is the equivalent of an instance in SQL Server.)  I used the Database Configuration Assistant for Oracle Database to create my first database on the server. This can be found in the Oracle home directory on the start menu.I chose the General Purpose template for my database. Most of the steps make some sense. I did choose to add the sample schemas as this is the easiest way to verify I can connect and work with the data. After all of this, the database will be created based on your choices.

Step 3: Using SQL*Plus, I connected to the SYSTEM schema. The user-name in this case is “SYSTEM”. Now we

Step 4: Create a new user and schema. (This is similar to the SQL Server database, not a SQL Server Schema.) This will give a location to create tables for loading data in the next steps. In accordance with typical Oracle support you can read about how to do this here: http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_8003.htm#i2065278. Or I can give you a good starting script and save you time.

CREATE USER someusername
IDENTIFIED BY <<add password here>>
DEFAULT TABLESPACE example
QUOTA 10M ON example
TEMPORARY TABLESPACE temp
QUOTA 5M ON system;

Your welcome! This will create a database using existing tablespaces. This is NOT a production level script and it is barely good enough for development. But in my case, I am using Oracle as a source and don’t plan to do much development or work on it so it meets my needs. If you need more insight, I guess you will need to read the documentation.

Step 5: Create a table and add rows. I continued to use the SYSTEM login and created a couple of simple tables within my new schema. I then used simple INSERT INTO statements to add data.

Now you have some basic data to work with to test connectivity with SSIS or SSAS.

Making the Oracle Database Accessible

In order to access your Oracle database on the VM you need to enable the port. You do this by going to the Azure portal and selecting the VMs tab. Once there, go to the Endpoints tab. You may recall that when you created the VM, you were asked about the Remote Desktop and PowerShell ports. Here are the steps to create the Endpoint to support Oracle.

  1. Click Add to open the Add Endpoint dialog.
  2. On the first page, leave the default which will add a stand-alone endpoint.
  3. On the second page you need to add a name (I used “Oracle”), select the TCP protocol, and put port 1521 in both the private and public port textboxes.

Once completed you should see the new endpoint in the list of available endpoints as shown below.

image

Connecting SSIS to that Oracle Database

Now that we have data in the db, it is time to create the connection to SSIS and load data and run queries.

The first thing I needed to do was load the client tools. I used the newer Oracle Data Access Components (ODTwithODAC12012). Once that was loaded, I added the following entry to my TNSNames.ora file (look for that in a directly like the following: c:\app\<username>\product\12.1.0\client_1\Network\Admin):

ORACLEDW =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = <servername>.cloudapp.net) (PORT  1521)
(CONNECT_DATA =
(SERVER=DEDICATED)
(SERVICE_NAME = ORACLEDW)
)
)

The key parts to get right are the HOST, PORT, and SERVICE_NAME as highlighted above.

Once TNS was in place, I was able to create an ODP.NET connection to the database and proceed to load the data.

I know that some of this has been simplistic but it is great that I don’t have to install Oracle myself. This functionality makes Azure even more appealing as a hosting solution.

SQL Saturday#197–Omaha Recap

96791a10-4559-4bac-bb98-c25ebc5e52c6

This was the second SQL Saturday hosted in Omaha.  I loved to see how the event grew from the first event until now.  John Morehouse ( T | B ) and team did another stellar job organizing this event. I know they packed the house

Having taken part in the first event, I found it spectacular that the speaker list was so diverse.  It is great to see so many SQL Server pros come out and speak at these events.

Part of the fun for me was bringing my 11-year old daughter along.  Many of you, speakers, attendees, and sponsors were kind to her and she had a good time, even though much of it was spent using my Surface to watch Netflix.  Smile  As a speaker, this was a way to spend some time on the road with her and to introduce one of my children to what I do when I travel to these events (one of my sons will be joining me in Fargo).

I do have to say that the food, both at the speaker’s dinner and for lunch were awesome!  If you are looking for an event that will feed you well, be sure to try this event next time around.

I was able to attend a few of the sessions, but I wanted to mention that the SQL Server vs Oracle: The Throwdown! was really good.  As a cross-over platform developer (check out my Oracle for MSBI Tips), it was great having a SQL Server Pro, David Klee (@kleegeek), and an Oracle Pro, Joe Grant (@dba_jedi), co-present.  Nice work guys!

Finally, I presented on Building BI Solutions with Excel 2013.  I have uploaded the slides to the event site.  Until next time.

T-SQL Window Functions on LessThanDot and at SQL Saturday 149

LessThanDot Sit LogoI recently completed a series of blog posts on www.lessthandot.com on T-SQL Window functions.  The enhancements to SQL Server 2012 in this area are phenomenal.  They solve a myriad of issues including calculating running totals with SQL.  Check it out if you want to learn more and get some simple examples related to the functions and structure related to the window functions.  Here is the series outline and links to each section.

T-SQL Window Functions:

I do a presentation related to T-SQL functions for SQL Saturdays and am presenting it at the PASS Summit this year.  Maybe I will see you there.

I recently presented this at SQL Saturday #149 in Minnesota.  Here is the presentation and the demo code. Thanks for attending.

 

Finally, if you use Oracle, you will find this series helpful as well.  Most of the syntax is supported in Oracle as well.  Look for an Oracle tip with the Oracle samples for your use soon.