T-SQL Window Functions – Part 3: Aggregate Functions

7 07 2014

This is a reprint with some revisions of a series I originally published on LessThanDot. You can find the links to the original blogs on my Series page.

TSQL WIndow Functions_thumb[1]_thumbThis is part 3 in my series on SQL window functions. In this post, we will explore using aggregation functions with T-SQL windows. SQL Server supports most of the aggregation functions such as SUM and AVG in this context with the exceptions of GROUPING and GROUPING_ID. However, prior to SQL Server 2012 only the PARTITION BY clause was supported which greatly limited the usability of aggregate window functions. When support for the ORDER BY clause was introduced in SQL Server 2012, more complex business problems such as running totals could be solved without the extensive use of cursors or nested select statement. In my experience, I used to try various ways to get around this limitation including pushing the data to .NET as it could solve this problem more efficiently. However, this was not always possible when working with reporting. Now that we are able to use SQL to solve the problem, more complex and low-performing solutions can be replaced with these window functions.

Once again, the following CTE will be used as the query in all examples throughout the post:

with CTEOrders as
(select cast(1 as int) as OrderID, cast(‘3/1/2012′ as date) as OrderDate, cast(10.00 as money) as OrderAmt, ‘Joe’ as CustomerName
union select 2, ‘3/1/2012′, 11.00, ‘Sam’
union select 3, ‘3/2/2012′, 10.00, ‘Beth’
union select 4, ‘3/2/2012′, 15.00, ‘Joe’
union select 5, ‘3/2/2012′, 17.00, ‘Sam’
union select 6, ‘3/3/2012′, 12.00, ‘Joe’
union select 7, ‘3/4/2012′, 10.00, ‘Beth’
union select 8, ‘3/4/2012′, 18.00, ‘Sam’
union select 9, ‘3/4/2012′, 12.00, ‘Joe’
union select 10, ‘3/4/2012′, 11.00, ‘Beth’
union select 11, ‘3/5/2012′, 14.00, ‘Sam’
union select 12, ‘3/6/2012′, 17.00, ‘Beth’
union select 13, ‘3/6/2012′, 19.00, ‘Joe’
union select 14, ‘3/7/2012′, 13.00, ‘Beth’
union select 15, ‘3/7/2012′, 16.00, ‘Sam’
)
select OrderID
,OrderDate
,OrderAmt
,CustomerName
from CTEOrders;

Using PARTITION BY with Aggregate Functions

SQL Server 2005 and the newer versions supports the usage of the PARTITION BY clause by itself. This allowed for some simple aggregate windows. The following example shows SUM and AVG for different partitions of data. The third function actually creates and average using a SUM and COUNT function.

select CustomerName
,OrderDate
,OrderAmt
,SUM(OrderAmt) OVER (PARTITION BY CustomerName) CustomerTotal
,AVG(OrderAmt) OVER (PARTITION BY OrderDate) AvgDailyAmt
,CAST(COUNT(OrderID) OVER (PARTITION BY OrderDate) as decimal(8,3)) / CAST(COUNT(OrderID) OVER() as decimal(8,3)) PctOfTotalPerDay
from CTEOrders
order by OrderDate;

NOTE: The COUNT aggregate returns an integer value. In order to return the decimal, the values need to be explicitly converted to decimal types. Otherwise, the result was rounding to zero for all results in this sample.

Results

CustomerName OrderDate OrderAmt CustomerTotal AvgDailyAmt PctOfTotalPerDay
Joe 3/1/2012 10 68 10.5 0.133333333
Sam 3/1/2012 11 76 10.5 0.133333333
Sam 3/2/2012 17 76 14 0.2
Joe 3/2/2012 15 68 14 0.2
Beth 3/2/2012 10 61 14 0.2
Joe 3/3/2012 12 68 12 0.066666667
Joe 3/4/2012 12 68 12.75 0.266666667
Beth 3/4/2012 10 61 12.75 0.266666667
Beth 3/4/2012 11 61 12.75 0.266666667
Sam 3/4/2012 18 76 12.75 0.266666667
Sam 3/5/2012 14 76 14 0.066666667
Beth 3/6/2012 17 61 18 0.133333333
Joe 3/6/2012 19 68 18 0.133333333
Beth 3/7/2012 13 61 14.5 0.133333333
Sam 3/7/2012 16 76 14.5 0.133333333

Using Subselects

Subselect statements in SQL Server are supported, but harder to optimize in SQL Server versus Oracle. Until window functions were introduced all of the situations above could be solved by subselects, but performance would degrade as the results needed to work with larger sets of data. With the improved functionality in SQL Server 2012, you should not need to use subselects to return row-based aggregations. Besides the performance implications, maintenance will also be much simpler as the SQL becomes more transparent. For reference, here is the subselect syntax to return the same results as above:

select cte.CustomerName
, cte.OrderDate
, cte.OrderAmt
, (select SUM(OrderAmt) from CTEOrders where CustomerName = cte.CustomerName) CustomerTotal
, (select cast(COUNT(OrderID) as decimal(8,3)) from CTEOrders where OrderDate = cte.OrderDate) / (select cast(COUNT(OrderID) as decimal(8,3)) from CETOrders) AvgDailyAmt
from CETOrders cte
order by cte.OrderDate;

While it is possible to solve the same function using the subselects, the code is already getting messier and with data sets larger than what we have here, you would definitely see performance degradation.

Some Thoughts on GROUP BY

While I am digressing, I wanted to also highlight some details concerning GROUP BY. The one the biggest difficulties working with the GROUP BY clause and aggregates, every column must either be a part of the GROUP BY or have an aggregation associated with it. The window functions help solve this problem as well.
In the following examples, the first query returns the sum of the amount by day. This is pretty standard logic when working with aggregated queries in SQL.

select OrderDate
,sum(OrderAmt) as DailyOrderAmt
from CTEOrders
group by OrderDate;

However, if you wanted to see more details, but not include them in the aggregation, the following will not work.

select OrderDate
,OrderID
,OrderAmt
,sum(OrderAmt) as DailyOrderAmt
from CTEOrders
group by OrderDate
,OrderID
,OrderAmt;

This SQL statement will return each row individually with the sum at the detail level. You can solve this using the subselect above which is not recommended or you can use a window function.

select OrderDate
,OrderID
,OrderAmt
,sum(OrderAmt) OVER (PARTITION BY OrderDate) as DailyOrderAmt
from CTEOrders

As you can see here and in previous examples the OVER clause allows you to manage the grouping based on the context specified in relationship to the current row.

One other twist on the GROUP BY clause. First, I need to give credit to Itzik Ben-Gan for calling this to my attention at one of our Minnesota SQL Server User Group meetings. In his usual fashion he was showing some T-SQL coolness and he showed an interesting error when using the OVER clause with the GROUP BY clause.

The following will return an error because the first expression is an aggregate, but the second expression which is using the OVER clause is not. Also note that in this example the OVER clause is being evaluated for the entire set of data.

select sum(OrderAmt)
, sum(OrderAmt) over() as TotalOrderAmt
from CTEOrders
group by CustomerName

The expression above returns the following error:
Column ‘CTEOrders.OrderAmt’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause

The goal of the statement above was to show the customer’s total order amount with the overall order amount. The following statement resolves this issue because it is aggregating the aggregates. The window is now summing the aggregated amount which are grouped on the customer name.

select sum(OrderAmt)
, sum(sum(OrderAmt)) over() as TotalOrderAmt
from CTEOrders
group by CustomerName

Thanks again to Itzik for bringing this problem and resolution to my attention.

Aggregates with ORDER BY

With the expansion of the OVER clause to include ORDER BY support with aggregates, window functions increased their value substantially. One of the key business problems this allowed us to solve was a running aggregate.

The first example shows how to get a running total by CustomreName based on OrderDate and OrderID.

select OrderID
,OrderDate
,OrderAmt
,CustomerName
,SUM(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderDate, OrderID) as RunningByCustomer
from CTEOrders
ORDER BY CustomerName, OrderDate;

Results

OrderID OrderDate OrderAmt CustomerName RunningByCustomer
3 3/2/2012 10 Beth 10
7 3/4/2012 10 Beth 20
10 3/4/2012 11 Beth 31
12 3/6/2012 17 Beth 48
14 3/7/2012 13 Beth 61
1 3/1/2012 10 Joe 10
4 3/2/2012 15 Joe 25
6 3/3/2012 12 Joe 37
9 3/4/2012 12 Joe 49
13 3/6/2012 19 Joe 68
2 3/1/2012 11 Sam 11
5 3/2/2012 17 Sam 28
8 3/4/2012 18 Sam 46
11 3/5/2012 14 Sam 60
15 3/7/2012 16 Sam 76

This next example is more creative. It begins to show how powerful the window functions are. In this statement, we are going to return the annual running total aggregated by day. The differentiator here is that we use a DATEPART function in the OVER clause to achieve the desired results.

select OrderID
,OrderDate
,OrderAmt
,CustomerName
,SUM(OrderAmt) OVER (PARTITION BY datepart(yyyy, OrderDate) ORDER BY OrderDate) as AnnualRunning
from CTEOrders
ORDER BY OrderDate;

Results

OrderID OrderDate OrderAmt CustomerName AnnualRunning
1 3/1/2012 10 Joe 21
2 3/1/2012 11 Sam 21
3 3/2/2012 10 Beth 63
4 3/2/2012 15 Joe 63
5 3/2/2012 17 Sam 63
6 3/3/2012 12 Joe 75
7 3/4/2012 10 Beth 126
8 3/4/2012 18 Sam 126
9 3/4/2012 12 Joe 126
10 3/4/2012 11 Beth 126
11 3/5/2012 14 Sam 140
12 3/6/2012 17 Beth 176
13 3/6/2012 19 Joe 176
14 3/7/2012 13 Beth 205
15 3/7/2012 16 Sam 205

The ORDER BY clause creates an expanding group within the partition. In the examples above, the partition was the customer. Within each partition, ordered groups based on OrderDate and OrderID are “created”. At each row, the OrderDate and OrderID groups are aggregated up to the current row’s group thus producing the running total. If more than one row has the same order grouping, all of the rows in the group are aggregated into the total as shown in the second example above with the days and years.

Aggregates with ROWS

The ROWS clause is used to further define the partition by specifying which physical rows to include based on their proximity to the current row. As noted in the first post in the series, ROWS requires the ORDER BY clause as this determines the orientation of the partition.

The following example uses the FOLLOWING keywords to find the next two purchases that the customer made.

select OrderID
,OrderDate
,OrderAmt
,CustomerName
,SUM(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderDate, OrderID ROWS BETWEEN 1 FOLLOWING and 2 FOLLOWING) as NextTwoAmts
from CTEOrders
order by CustomerName, OrderDate, OrderID;

Results

OrderID OrderDate OrderAmt CustomerName NextTwoAmts
3 3/2/2012 10 Beth 21
7 3/4/2012 10 Beth 28
10 3/4/2012 11 Beth 30
12 3/6/2012 17 Beth 13
14 3/7/2012 13 Beth NULL
1 3/1/2012 10 Joe 27
4 3/2/2012 15 Joe 24
6 3/3/2012 12 Joe 31
9 3/4/2012 12 Joe 19
13 3/6/2012 19 Joe NULL
2 3/1/2012 11 Sam 35
5 3/2/2012 17 Sam 32
8 3/4/2012 18 Sam 30
11 3/5/2012 14 Sam 16
15 3/7/2012 16 Sam NULL

As we noted in the first blog, the last two rows in the partition only contain partial values. For example, order 12 contains the sum of only one order, 14, and order 14 has now rows following it in the partition and returns NULL as a result. When working with the ROWS clause this must be taken into account.

Aggregates with RANGE

Lastly, adding the RANGE to the OVER clause allows you to create aggregates which go to the beginning or end of the partition. RANGE is commonly used with UNBOUNDED FOLLOWING which goes to the end of the partition and UNBOUNDED PRECEDING which goes to the beginning of the partition. One of the most common use would be to specify the rows from the beginning of the partition to the current row which allows for aggregations such as year to date.

In the example below, we are calculating the average order size over time to the current row. This could be a very effective in a trending report.

select OrderID
,OrderDate
,OrderAmt
,CustomerName
,AVG(OrderAmt) OVER (ORDER BY OrderID RANGE BETWEEN UNBOUNDED PRECEDING and CURRENT ROW) as AvgOrderAmt
from CTEOrders
order by OrderDate;

Results

OrderID OrderDate OrderAmt CustomerName AvgOrderAmt
1 3/1/2012 10 Joe 10
2 3/1/2012 11 Sam 10.5
3 3/2/2012 10 Beth 10.333333
4 3/2/2012 15 Joe 11.5
5 3/2/2012 17 Sam 12.6
6 3/3/2012 12 Joe 12.5
7 3/4/2012 10 Beth 12.142857
8 3/4/2012 18 Sam 12.875
9 3/4/2012 12 Joe 12.777777
10 3/4/2012 11 Beth 12.6
11 3/5/2012 14 Sam 12.727272
12 3/6/2012 17 Beth 13.083333
13 3/6/2012 19 Joe 13.538461
14 3/7/2012 13 Beth 13.5
15 3/7/2012 16 Sam 13.666666

As you can see, the latest versions of OVER clause supports powerful yet simple aggregations which can help in a multitude of reporting and business solutions. Up next, the last blog in the series – Analytic Functions which are all new in SQL Server 2012.





Happy Independence Day America and 10 Years Packt Publishing!

4 07 2014

capitol-fireworks01

 

 

If you are in the USA, I hope you take some time today to enjoy your family and friends and see some fireworks.

 

Packt Publishing Celebrates 10 Years – $10 eBooks and Videos

This month marks 10 years since Packt Publishing embarked on its mission to deliver effective learning and information services to IT professionals. In that time it’s published over 2000 titles and helped projects become household names, awarding over $400,000 through its Open Source Project Royalty Scheme.

To celebrate this huge milestone, from June 26th Packt is offering all of its eBooks and Videos at just $10 each for 10 days – this promotion covers every title and customers can stock up on as many copies as they like until July 5th.

10 days 10 years - Home Banner

Why not grab the book I coauthored?

9809EN%20Cookbook





T-SQL Window Functions – Part 2: Ranking Functions

26 06 2014

This is a reprint with some revisions of a series I originally published on LessThanDot. You can find the links to the original blogs on my Series page.

TSQL WIndow Functions_thumb[1]This is part 2 in my series on SQL window functions. In this post, we will explore using ranking functions. SQL Server support four different ranking functions which are supported in SQL Server versions 2005 and forward. All of these functions require the use of the OVER clause. The following functions are classified as ranking functions: ROW_NUMBER, RANK, DENSE_RANK, NTILE.

Once again, the following CTE will be used as the query in all examples throughout the post:

with CTEOrders as
(select cast(1 as int) as OrderID, cast(‘3/1/2012′ as date) as OrderDate, cast(10.00 as money) as OrderAmt, ‘Joe’ as CustomerName
union select 2, ‘3/1/2012′, 11.00, ‘Sam’
union select 3, ‘3/2/2012′, 10.00, ‘Beth’
union select 4, ‘3/2/2012′, 15.00, ‘Joe’
union select 5, ‘3/2/2012′, 17.00, ‘Sam’
union select 6, ‘3/3/2012′, 12.00, ‘Joe’
union select 7, ‘3/4/2012′, 10.00, ‘Beth’
union select 8, ‘3/4/2012′, 18.00, ‘Sam’
union select 9, ‘3/4/2012′, 12.00, ‘Joe’
union select 10, ‘3/4/2012′, 11.00, ‘Beth’
union select 11, ‘3/5/2012′, 14.00, ‘Sam’
union select 12, ‘3/6/2012′, 17.00, ‘Beth’
union select 13, ‘3/6/2012′, 19.00, ‘Joe’
union select 14, ‘3/7/2012′, 13.00, ‘Beth’
union select 15, ‘3/7/2012′, 16.00, ‘Sam’
)
select OrderID
,OrderDate
,OrderAmt
,CustomerName
from CTEOrders;

ROW_NUMBER

The ROW_NUMBER function will return a row number for each row within the partition based on the partition and order. This function requires the use of the ORDER BY clause. However, it is often used without the PARTITION BY clause as it will number the entire result set. If PARTITION BY is used, then the row numbering starts over within the partition. The following code shows how both of these work.

select CustomerName
, OrderDate
, OrderAmt
,ROW_NUMBER() OVER(ORDER BY CustomerName) RowNumByCust
,ROW_NUMBER() OVER(PARTITION BY OrderDate ORDER BY CustomerName) RowNumPart
from CTEOrders
order by CustomerName;

RANK and DENSE_RANK

While these are different functions with even different rules, it is easier to understand the difference when put side by side. RANK and DENSE_RANK will order the rows based on the specified partition and apply a rank or number to them. Both RANK and DENSE_RANK will assign the same rank to “ties”. For example if rows 3 and 4 have the same value in the partition, they will have the same rank. The difference is how it handles the next rank number in the series. RANK does a “true” ordering and will apply the ranking based on the number of rows and skip numbers that are ties. So, if you have a tie between the third and fourth row and the first two rows and the final row are unique the ranking is as follows: 1, 2, 3, 3, 5. As you can see, 4 is missing. DENSE_RANK keeps the tie as well, however it does not skip any numbers in the sequence. Here is the same example set based on using DENSE_RANK: 1, 2, 3, 3, 4. As with the ROW_NUMBER function, the ORDER BY is required for these functions.

select CustomerName
, OrderDate
, OrderAmt
,RANK() OVER (ORDER BY CustomerName) RankByCust
,DENSE_RANK() OVER (ORDER BY CustomerName) DenseByCust
,RANK() OVER (PARTITION BY CustomerName ORDER BY OrderDate) RankByCustDt
,DENSE_RANK() OVER (PARTITION BY CustomerName ORDER BY OrderDate) DenseByCustDt

from CTEOrders
order by CustomerName;

NTILE

The last of the ranking functions is NTILE. NTILE groups the data into ordered and ranked groups based on the ORDER BY clause. The number of groups used in the ranking are specified in the function itself. So if you specify four groups to produce a quartile ranking, four ranked values, one through four, will be assigned to each group based on the order. In many cases, the total numbers of rows is not divisible by the number of groups chosen. For instance, if the number of groups is 4 but the total number of rows is 39, then the first three groups will return 10 rows and the final group will only return 9 rows. The function will always frontload the results so the earliest groups will have the “extra” rows. If a PARTITION BY clause is used, the NTILE ranking will applied within each partition. As with the other ranking functions, the ORDER BY clause is required to use this function.

select CustomerName
,OrderDate
,OrderAmt
,NTILE(4) OVER (ORDER BY OrderDate) NtileByDt
,NTILE(2) OVER (PARTITION BY OrderDate ORDER BY  OrderAMt) NtileByDtAmt
from CTEOrders
order by CustomerName;

Up next, using aggregate functions with window functions.





T-SQL Window Functions – Part 1: The OVER() Clause

26 06 2014

This is a reprint with some revisions of a series I originally published on LessThanDot. You can find the links to the original blogs on my Series page.

TSQL WIndow FunctionsSQL Window functions were introduced in SQL Server 2005. At the time, only a small set of functionality was available. Window functions fill a need in the aggregation story for SQL Server. Window functions allow the developer to use row level aggregations without the penalty of using cursors to accomplish a similar task. Window functions allow you to segment a set of rows and then apply a function to that set of rows. In many cases, you may choose an aggregation function. However, other functions are also available including ranking and analytic functions. In this four-part series, I will start by breaking apart the OVER clause which is the key to understanding window functions in SQL Server. The following posts will expand on each group of window functions which use the OVER clause – ranking, aggregate, and analytic.

Up until 2008 R2, SQL Server only supported a subset of window functions which focused on ranking and some aggregation functions. With SQL Server 2012, Microsoft greatly expanded its support for window functions in T-SQL thus making fairly complex operations such as running totals much simpler to accomplish.

The following CTE will be used as the query in all examples throughout the post:

with CTEOrders as
(select cast(1 as int) as OrderID, cast(‘3/1/2012′ as date) as OrderDate, cast(10.00 as money) as OrderAmt, ‘Joe’ as CustomerName
union select 2, ‘3/1/2012′, 11.00, ‘Sam’
union select 3, ‘3/2/2012′, 10.00, ‘Beth’
union select 4, ‘3/2/2012′, 15.00, ‘Joe’
union select 5, ‘3/2/2012′, 17.00, ‘Sam’
union select 6, ‘3/3/2012′, 12.00, ‘Joe’
union select 7, ‘3/4/2012′, 10.00, ‘Beth’
union select 8, ‘3/4/2012′, 18.00, ‘Sam’
union select 9, ‘3/4/2012′, 12.00, ‘Joe’
union select 10, ‘3/4/2012′, 11.00, ‘Beth’
union select 11, ‘3/5/2012′, 14.00, ‘Sam’
union select 12, ‘3/6/2012′, 17.00, ‘Beth’
union select 13, ‘3/6/2012′, 19.00, ‘Joe’
union select 14, ‘3/7/2012′, 13.00, ‘Beth’
union select 15, ‘3/7/2012′, 16.00, ‘Sam’
)
select OrderID
,OrderDate
,OrderAmt
,CustomerName
from CTEOrders;

Defining the Window with the OVER() Clause

The purpose of the OVER clause is to define the window over which the function will be applied. The default functionality is to define the window for the entire table. As shown in the example below, the result is the same as if you had used an aggregation function without a GROUP BY clause. Some of what makes OVER unique includes the fact different windows can be specified in a single SELECT statement as will be shown in later sections.


select CustomerName
, SUM(OrderAmt) Over () as NoParm
from CTEOrders;

image

The OVER clause takes three additional arguments which change the scope of the clause — PARTITION BY, ORDER BY, and ROWS or RANGE.

PARTITION BY

The PARTITION BY clause is used to reduce the scope of the window to which the aggregation applies. This clause partitions the default or entire result set into partitions based on the criteria specified.  The function will be applied to each partition independently. At this point, we begin to see the real power of window functions.

While similar to the GROUP BY clause, the PARTITION BY clause allows you to specify different partitions within the same select statement. The following example builds on the illustration above with no partition specified and a partition on the CustomerName field.

select CustomerName
, SUM(OrderAmt) OVER() as NoParm
, SUM(OrderAmt) OVER (PARTITION BY CustomerName) as PartByName
from CTEOrders;

image

ORDER BY

The ORDER BY clause is used to order how the partitions apply the window function. For instance, when an ORDER BY is present in our current illustration, it will sum all of Beth’s order amounts, but will then add Joe’s to the total as he is the next in order. In the following illustration, no partition is specified so the partition is still the whole table. You can now see how partitions and ordering affect the window function.

select CustomerName
,SUM(OrderAmt) OVER () as NoParm
,SUM(OrderAmt) OVER (PARTITION BY CustomerName) as PartByName
,SUM(OrderAmt) OVER (ORDER BY CustomerName) as OrderByName
from CTEOrders;

image

As you can see, if no partition is specified the order will apply to the entire result set or the default partition. This is particularly important to understand when using ranking functions with the OVER clause.

Before SQL Server 2012, the ORDER BY clause could only be used with ranking functions. With the release of SQL Server 2012, the ORDER BY clause can now be used with ranking, aggregate, and analytic functions.

The following image visually illustrates the how the PARTITION BY and ORDER BY clauses aggregate and rank data.

image

ROWS or RANGE

The final clauses that can be used to define a partition are the ROWS and RANGE functions. This functionality was introduced in SQL Server 2012. These functions will specify a set of rows based on the position of the current row and as a result both functions require the use of the ORDER BY clause. Because each partition is anchored to the current row, the rows or range specifications are based on their proximity to the current row. A number of key words can be used with this clause to define the window frame used by the window functions.

• CURRENT ROW: That’s what it means. It identifies the current row as part of the ROWS or RANGE. This key word is supported by ROWS and RANGE.
• UNBOUNDED PRECEDING: Go to the beginning of the partition. This is supported by both ROWS and RANGE.
• UNBOUNDED FOLLOWING: Go to the end of the partition. This is supported by both ROWS and RANGE.
n PRECEDING: This is used to specify the number of rows before the current row. This is only supported by the ROWS clause.
n FOLLOWING: This is used to specify the number of rows after the current row. This is only supported by the ROWS clause.

select CustomerName
,SUM(OrderAmt) OVER () as NoParm
,SUM(OrderAmt) OVER (PARTITION BY CustomerName) as PartByName
,SUM(OrderAmt) OVER (ORDER BY CustomerName) as OrderByName
,SUM(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderDate, OrderID RANGE BETWEEN UNBOUNDED PRECEDING and CURRENT ROW) as RangeByDate
,SUM(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderDate, OrderID ROWS BETWEEN 1 FOLLOWING and 2 FOLLOWING) as NextTwoAmts
from CTEOrders;

image

A couple of other notes regarding ROWS and RANGE. In cases where the partition returns no rows, the value returned is NULL by default as seen in rows 5 and 10 above. Also, if the partition only returns fewer rows than specified it will only apply to rows within the range. In the example above, row 4 NextTwoAmts column will only return the sum of row 5 as it is the only row included in the partition.

Finally, ROWS and RANGE react differently when only the CURRENT ROW is specified. When CURRENT ROW is used by itself with the RANGE clause it will apply the function based on the partitioning and ordering. However, when used with the ROWS clause, it will only return the current row regardless of the partition specification.

select CustomerName
,OrderDate
,OrderAmt
,SUM(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderDate RANGE CURRENT ROW) as RangeRow
,SUM(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderDate ROWS CURRENT ROW) as RowRow
from CTEOrders;

image

The next three related posts I will discuss ranking functions, aggregate functions and analytic functions.





Exploring Excel 2013 for BI Tip #16: Exposing “Values” from a Tabular Model

19 06 2014

As I mentioned in my original post, Exploring Excel 2013 as Microsoft’s BI Client, I will be posting tips regularly about using Excel 2013.  Much of the content will be a result of my daily interactions with business users and other BI devs.  In order to not forget what I learn or discover, I write it down … here.  I hope you too will discover something new you can use.  Enjoy!

From Power Pivot to SSAS Tabular

As companies move through the cycle of building Excel based solutions for business intelligence and analytics, they eventually end up with a SQL Server Analysis Services Tabular Model. The tabular model comes into play when you need more data in your model or want to support more granular security.

Up to this point, users have been happily using Power Pivot models in Excel to build their analysis solutions. However, once the model is deployed to tabular some functionality or interaction with the model changes in significant ways.

To summarize this point, power users or data modelers will create Power Pivot models in Excel. These models may or may not be deployed SharePoint, but they need to take them to the next level. You can migrate a Power Pivot model to tabular with ease by using the import option in SQL Server Data Tools.

image

Interacting with Power Pivot

I started by creating a simple Power Pivot model using Adventure Works DW data based on the Internet Sales fact table. I am using seven tables in my model as shown here.

image

I am not going to add any calculated measures to the model because Power Pivot allows me to use the data as it sets. Next we create a pivot table based on this model. I dropped the Fiscal Year onto rows and added OrderQuantity and ExtendedAmount to the values region. When OrderQuantity and ExtendedAmount are added to the pivot table, Excel defaults to a sum calculation when working with the data. Basically Excel creates the calculation for you based on what it knows about the data.

The point here is that I have data that can be used as values without doing any additional work with the model. I saved the workbook, closed Excel and moved on to the next step.

Interacting with Tabular

First we need to convert the Power Pivot model to a tabular model. Which is done by importing the model we just saved in SQL Server Data Tools. Once we have the project open, we need to deploy the model to a SSAS tabular instance so we can connect to it with Excel.

image

Now that it has been deployed to SSAS we can reopen our workbook and add a connection to the tabular model. In the field list we notice three differences now that the model is tabular.

1. The SUM symbol (sigma) is used to highlight values or measures that can be calculated.

2. The values we created in the Power Pivot model show up here.

3. In the Values section, “_No measures defined” is shown.

image

When working with multidimensional models, the Values section are represented the same. That makes sense as the connection that Excel is using is based on MDX not DAX. This significantly changes the user experience.

Let’s add a new measure to our Power Pivot model and try to do the same in the tabular model. We can still drop the DiscountAmount into the values section in our pivot table based on Power Pivot. However, when we try to do the same on tabular we get an error saying that we cannot add it to that area of the report.

image

In order for us to use DiscountAmount as a measure we will need to create an OLAP measure (See Excel Tip #8 for details) to use it in this Excel workbook or we will need to add it as a calculated measure in tabular and redeploy for it to be available.

What’s Happening

Because Excel treats a tabular model the same as a multidimensional model in SSAS you will need to add calculated measures for all measures you want to use as values in pivot tables in Excel. Multidimensional models are highly structured using the dimension and measure group techniques. While tabular “feels” like Power Pivot, to be used by Excel it needs to appear structured like multidimensional cubes.

Making this more interesting is that Excel uses MDX to communicate with tabular models, not DAX. As a result, we are able to use the OLAP tools in the PivotTable Tools ribbon.

image

This option is not available when working with Power Pivot models in Excel.

Impact to Users

Overall the impact to users, in particular power users and report builders, is that they have less “freedom” to design when using a tabular model. If they want to add more calculations, they need to be familiar with MDX. Furthermore, if they want the calculations to be generally available they need to work with IT to deploy updated models.

Hopefully we will see DAX supported interaction with SSAS in the future, but for the moment you need to understand how tabular and Power Pivot differ when using pivot tables in Excel.





Exploring Excel 2013 for BI Tip #15: Locking Slicer Position

18 06 2014

As I mentioned in my original post, Exploring Excel 2013 as Microsoft’s BI Client, I will be posting tips regularly about using Excel 2013.  Much of the content will be a result of my daily interactions with business users and other BI devs.  In order to not forget what I learn or discover, I write it down … here.  I hope you too will discover something new you can use.  Enjoy!

The Issue

A picture, or in this case two pictures, are worth a thousand words. I created a pivot table from Power Pivot and then added two slicers above the pivot table. The pivot table contains a date hierarchy which can be expanded and collapsed. During this process the slicer moves around which is not optimal when you are creating a visualization in Excel such as a dashboard. Here are the screenshots which highlight the issue.

How I set it up:

image

What happens when the date gets expanded:

image

How to Fix the Slicer Position

Right click the slicer you want to keep from moving, in my case that is the second one. I first looked in the settings, but saw nothing. I stumbled onto the Size and Properties option which opened the Format Slicer slide out menu. If you expand the Properties section select “Don’t move or size with cells” option, the slicer will no longer move.

image

This is just one more way to use slicers to improve the user experience in your Excel dashboards and reports.





A Month, Almost Two Months Offline–Was the Silence Deafening?

17 06 2014

The past couple of months has been very busy at home and at work so I have fallen behind a bit on my blogging. The big event was the graduation of my oldest child, Kristyna, from Burnsville High School. It was our first grad party and a lot had to happen to make that go.

Before that party, we had our annual camping trip with friends. During that same period I spoke at SQL Saturday in Orange County where my wife joined me. She helped out at SQL Saturday and we were able to visit my daughter’s college choice – Biola University. I followed up that by organizing an internal technical conference at Magenic, Tech Summit. I was only able to attend a part of that weekend’s events as I wrapped up the weekend at Boy Scout camp.

All in all, my family took precedence followed by increased work responsibilities. Up next is my Sabbatical which will be six weeks off at the end of the summer. I will try to get a few of the updates in my queue loaded up here then more silence.

My advice to everyone, enjoy your family. They will matter more than work and contributions here.

Enjoy some pics from my months off…

Huntington Beach with my wife, Sheila (celebrated 21 years on June 5):

Huntington Beach

Our deck, ready for rain and for the grad party (yes, we are Scouts):

Deck for Grad Party

Here is my daughter Kristy with the family at her graduation:

Kristy and Family at Graduation

Andrew’s Dutch Oven cake at our Memorial Day Campout:

Drew and Cake

Finally, check out my Fun Info page to see my youngest daughter’s new video contribution.

Thanks for indulging me celebrating some extra family time here and at home. Next up, another Excel BI Tip.








Follow

Get every new post delivered to your Inbox.

Join 812 other followers

%d bloggers like this: