Creating a SharePoint Server Farm on Azure from the Gallery

16 10 2014

As many of you know creating a SharePoint farm for testing can be a daunting task. I volunteered to help troubleshoot an issue that was working with SharePoint Excel Services and it couldn’t be done in Office365. So, my first attempt was to grab the SharePoint Server 2013 Trial from Azure’s VM Gallery.

image

However, once I created the VM, it turns out that SharePoint is not installed, which is what I really wanted. To complicate matters further, the download stopped because IE was blocking file downloads. You can change that setting in Internet Explorer options on the Security tab. Select the Internet Zone and click on the Custom Level button. Scroll down to the Downloads section and enable File download. Restart IE and you can get the file downloaded. Of course, we have to ask, why isn’t it already enabled on the VM since that would be the obvious goal.

As I was troubleshooting that issue, I happened to check out the Azure gallery on the Azure site and found a SharePoint Server Farm gallery image that I could use.

image

image

I clicked on the Farm icon to see what it was. It does the multi-server farm install in Azure.

You start the process by clicking the green Create Virtual Machine button in the middle of the screen. And then you are off to the configuration parts. The next few screen shots will show you the basic configuration points used during the install. Click the button… and your journey will begin.

image

This will open up the preview portal from Azure with a blade for configuring your farm.

image

Add a group name and work your way through the configuration steps on the blade. It will create 3 VMs by default unless you select the Enable high availability checkbox under the password textboxes.

image

Each configuration step will open another blade in the portal allowing you to configure the various servers to be added to the farm.

image

Once you have configured the settings you are ready to create your farm. Click the Create button and the “magic” starts to happen.

image

You will see the following tile added to your Startboard.

image

It took a little more than an hour to set up the three servers required – domain controller, SQL Server and SharePoint server.

image

If you click on the new tile, you will get an overview of what was created including resources and estimated spend. The next step is to log into the instance and check out what is set up. If you click the Deployment history button and then the Microsoft.SharePoint.Farm tile, you can see the SharePoint Central Admin URL and the SharePoint Site URL. Each of these blades provide additional information about your environment.

image

Log in to Central Admin or the SharePoint site. And you now have a functioning SharePoint Farm in Azure. If you are using this as a testing platform be sure to manage your VMs (e.g. shut them down) to reduce costs.





Lync 2013 Video Issues on Windows 8

15 10 2014

Part of the reason I have a blog is to document issues and resolutions I do not want to forget. Yesterday morning I was on two calls using Lync and the video was blank or white. I had great audio and messaging worked fine. So the only part that was not functioning was the video. I have been using Lync for years usually the problem was related to connectivity.

I started by leaving and rejoining both calls multiple times. That was primarily just annoying with no change in the video issue. Time to search. So, giving credit to whom credit is due, I found the following blog post by Shay Atik – Lync 2013 Desktop Sharing Shows White Screen. Turns out you need to remove a registry entry related to IE and ActiveX. I am copying the steps that Shay gives here and letting you know it works.

From Shay’s blog:

1. Open the Registry Editor (Start + R -> regedit -> OK).

2. Backup the registry (just in case): File -> Export.

3. Go to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Internet Explorer\ActiveX Compatibility and delete the {00000000-0000-0000-0000-000000000000} expandable folder.

4. Mission completed. Run Lync desktop sharing, and you’re good to go.

Hopefully this helps someone else, and thanks Shay for posting this.





The Only Constant Is Change

5 10 2014

In March of this year, I celebrated 10 years at Magenic. I blogged about it and called out values and reasons for staying (Ten Years and Counting …). Now, I am writing to talk about my departure from Magenic. I can start by saying the decision to leave was neither quick nor easy. As I noted in my previous post, I really liked working at Magenic and still have nothing bad to say about it.

So, let’s get on with it. As of Friday, October 3, 2014, I left Magenic and am starting a new opportunity at Pragmatic Works. It is only fair that I talk about my decision and some of the reasons behind it. In my March blog post, I focused on three areas that kept me at Magenic. For me to make the move I did, I had to see that these three areas must bePragmaticHeaderLogo covered by Pragmatic Works as well. So, being a research type of person, I asked friends who were currently employed at Pragmatic Works and, in particular, one consultant who had also worked at Magenic. Would Pragmatic Works measure up in Family, Opportunity, and Appreciation? Based on what I learned about it, yes. I think both companies provide much of what I look for in these three key areas. So, why the change? After taking some time off, I realized that I wanted to pursue more of Microsoft’s cutting edge BI, data, and cloud technologies. Magenic has always been a cutting edge company, but Microsoft was moving in directions that did not particularly align with what Magenic does in these areas. (For the record, Magenic’s pursuit of Microsoft’s cutting edge application development technologies is excellent.) This is where Pragmatic Works comes in. They do a lot of work with the latest advancements in Azure, Power BI, and SQL Server and which gives me more opportunities to work on those tools to deliver great customer solutions.

The interesting part for me about the entire process is that I truly think I would be happy at either company. With over 10 years of service and over 15 years of history with Magenic, I am sad to go. I have made many friends over the years and I truly enjoyed the opportunities I had to influence people and careers as a Practice Lead. I wish Magenic and the team that I left, only the best. Without Magenic, I would not be where I open-dooram today.

That being said, I joined Pragmatic Works because I believe that I will be able to say the same thing about them in 10-15 years.

Thanks to everyone at Magenic for all you have done and the friendships that have been made.

Pragmatic Works team, let’s get started. I am ready to open the next door of my career.





T-SQL Window Functions – Part 4: Analytic Functions

16 07 2014

This is a reprint with some revisions of a series I originally published on LessThanDot. You can find the links to the original blogs on my Series page.

TSQL-WIndow-Functions_thumb1_thumb_tIn the final installment of my series on SQL window functions, we will explore using analytic functions. Analytic functions were introduced in SQL Server 2012 with the expansion of the OVER clause capabilities. Analytic functions fall in to two primary categories: values at a position and percentiles. Four of the functions, LAG, LEAD, FIRST_VALUE and LAST_VALUE find a row in the partition and returns the desired value from that row. CUME_DIST and PERCENT_RANK break the partition into percentiles and return a rank value for each row. PERCENTILE_CONT and PERCENTILE_DISC a value at the requested percentile in the function for each row. All of the functions and examples in this blog will only work with SQL Server 2012.
Once again, the following CTE will be used as the query in all examples throughout the post:

with CTEOrders as
(select cast(1 as int) as OrderID, cast(‘3/1/2012′ as date) as OrderDate, cast(10.00 as money) as OrderAmt, ‘Joe’ as CustomerName
union select 2, ‘3/1/2012′, 11.00, ‘Sam’
union select 3, ‘3/2/2012′, 10.00, ‘Beth’
union select 4, ‘3/2/2012′, 15.00, ‘Joe’
union select 5, ‘3/2/2012′, 17.00, ‘Sam’
union select 6, ‘3/3/2012′, 12.00, ‘Joe’
union select 7, ‘3/4/2012′, 10.00, ‘Beth’
union select 8, ‘3/4/2012′, 18.00, ‘Sam’
union select 9, ‘3/4/2012′, 12.00, ‘Joe’
union select 10, ‘3/4/2012′, 11.00, ‘Beth’
union select 11, ‘3/5/2012′, 14.00, ‘Sam’
union select 12, ‘3/6/2012′, 17.00, ‘Beth’
union select 13, ‘3/6/2012′, 19.00, ‘Joe’
union select 14, ‘3/7/2012′, 13.00, ‘Beth’
union select 15, ‘3/7/2012′, 16.00, ‘Sam’
)
select OrderID
,OrderDate
,OrderAmt
,CustomerName
from CTEOrders;

Position Value Functions: LAG, LEAD, FIRST_VALUE, LAST_VALUE

Who has not needed to use values from other rows in the current row for a report or other query? A prime example is needing to know what the last order value was to calculate growth or just show the difference in the results. This has never been easy in SQL Server until now. All of these functions require the use of the OVER clause and the ORDER BY clause. They all use the current row within the partition to find the row at the desired position.

The LAG and LEAD functions allow you to specify the offset or how many rows to look forward or backward and they support a default value in cases where the value returned would be null. These functions do not support the use of ROWS or RANGE in the OVER clause. The FIRST_VALUE and LAST_VALUE allow you to further define the partition using ROWS or RANGE if desired.

The following example illustrates all of the functions with various variations on the parameters and settings.

select OrderID
,OrderDate
,OrderAmt
,CustomerName
,LAG(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderID) as PrevOrdAmt
,LEAD(OrderAmt, 2) OVER (PARTITION BY CustomerName ORDER BY OrderID) as NextTwoOrdAmt
,LEAD(OrderDate, 2, ‘9999-12-31’) OVER (PARTITION BY CustomerName ORDER BY OrderID) as NextTwoOrdDtNoNull
,FIRST_VALUE(OrderDate) OVER (ORDER BY OrderID) as FirstOrdDt
,LAST_VALUE(CustomerName) OVER (PARTITION BY OrderDate ORDER BY OrderID) as LastCustToOrdByDay

from CTEOrders

Results (with shortened column names):
ID OrderDate Amt Cust PrevOrdAmt NextTwoAmt NextTwoDt FirstOrd LastCust
1 3/1/2012 10 Joe NULL 12 3/3/2012 3/1/2012 Joe
2 3/1/2012 11 Sam NULL 18 3/4/2012 3/1/2012 Sam
3 3/2/2012 10 Beth NULL 11 3/4/2012 3/1/2012 Beth
4 3/2/2012 15 Joe 10 12 3/4/2012 3/1/2012 Joe
5 3/2/2012 17 Sam 11 14 3/5/2012 3/1/2012 Sam
6 3/3/2012 12 Joe 15 19 3/6/2012 3/1/2012 Joe
7 3/4/2012 10 Beth 10 17 3/6/2012 3/1/2012 Beth
8 3/4/2012 18 Sam 17 16 3/7/2012 3/1/2012 Sam
9 3/4/2012 12 Joe 12 NULL 12/31/9999 3/1/2012 Joe
10 3/4/2012 11 Beth 10 13 3/7/2012 3/1/2012 Beth
11 3/5/2012 14 Sam 18 NULL 12/31/9999 3/1/2012 Sam
12 3/6/2012 17 Beth 11 NULL 12/31/9999 3/1/2012 Beth
13 3/6/2012 19 Joe 12 NULL 12/31/9999 3/1/2012 Joe
14 3/7/2012 13 Beth 17 NULL 12/31/9999 3/1/2012 Beth
15 3/7/2012 16 Sam 14 NULL 12/31/9999 3/1/2012 Sam

If you really like subselects, you can also mix in some subselects and have a very creative SQL statement. The following statement uses LAG and a subselect to find the first value in a partition. I am showing this to illustrate some more of the capabilities of the function in case you have a solution that requires this level of complexity.

select OrderID
,OrderDate
,OrderAmt
,CustomerName
,LAG(OrderAmt, (
select count(*)-1
from CTEOrders c
where c.CustomerName = CTEOrders.CustomerName
and c.OrderID <= CTEOrders.OrderID), 0)
OVER (PARTITION BY CustomerName ORDER BY OrderDate, OrderID) as FirstOrderAmt
FROM CTEOrders

Percentile Functions: CUME_DIST, PERCENT_RANK, PERCENTILE_CONT, PERCENTILE_DISC

As I wrap up my discussion on window functions, the percentile based functions were the functions I knew the least about. While I have already used the position value functions above, I have not yet needed to use the percentiles. So, that meant I had to work with them for a while so I could share their usage and have some samples for you to use.

The key differences in the four function have to do with ranks and values. CUME_DIST and PERCENT_RANK return a ranking value while PERCENTILE_CONT and PERCENTILE_DISC return data values.

CUME_DIST returns a value that is greater than zero and lest than or equal to one (>0 and <=1) and represents the percentage group that the value falls into based on the order specified. PERCENT_RANK returns a value between zero and one as well (>= 0 and <=1). However, in PERCENT_RANK the first group is always represented as 0 whereas in CUME_DIST it represents the percentage of the group. Both functions return the last percent group as 1. In both cases, as the ranking percentages move from lowest to highest, each group’s percent value includes all of the earlier values in the calculation as well.

The following statement shows both of the functions using the default partition to determine the rankings of order amounts within our dataset.

select OrderID
,OrderDate
,OrderAmt
,CustomerName
,CUME_DIST() OVER (ORDER BY OrderAmt) CumDist
,PERCENT_RANK() OVER (ORDER BY OrderAmt) PctRank
FROM CTEOrders

Results:
OrderID OrderDate OrderAmt CustomerName CumDist PctRank
1 3/1/2012 10 Joe 0.2 0
3 3/2/2012 10 Beth 0.2 0
7 3/4/2012 10 Beth 0.2 0
2 3/1/2012 11 Sam 0.33333333 0.214285714
10 3/4/2012 11 Beth 0.33333333 0.214285714
6 3/3/2012 12 Joe 0.46666667 0.357142857
9 3/4/2012 12 Joe 0.46666667 0.357142857
14 3/7/2012 13 Beth 0.53333333 0.5
11 3/5/2012 14 Sam 0.6 0.571428571
4 3/2/2012 15 Joe 0.66666667 0.642857143
15 3/7/2012 16 Sam 0.73333333 0.714285714
5 3/2/2012 17 Sam 0.86666667 0.785714286
12 3/6/2012 17 Beth 0.86666667 0.785714286
8 3/4/2012 18 Sam 0.93333333 0.928571429
13 3/6/2012 19 Joe 1 1

The last two functions, PERCENTILE_CONT and PERCENTILE_DISC, return the value at the percentile requested. PERCENTILE_CONT will return the true percentile value whether it exists in the data or not. For instance, if the percentile group has the values 10 and 20, it will return 15. If PERCENTILE_DISC, is applied to the same group it will return 10. It will return the smallest value in the percentile group, which in this case is 10. Both functions ignore NULL values and do not use the ORDER BY, ROWS, or RANGE clauses with the PARTITION BY clause. Instead, WITHIN GROUP is introduced which must contain a numeric data type and ORDER BY clause. Only one column can be specified here. Both functions need a percentile value which can be between 0.0 and 1.0.

The following script illustrates a couple of variations. The first two functions return the median of the default partition. Then next two return the median value for each day. Finally, the last two functions return the low and high values within the partition. The values segmented by the date partition highlight the key difference between the two functions.

select OrderID as ID
,OrderDate as ODt
,OrderAmt as OAmt
,CustomerName as CName
,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY OrderAmt) OVER() PerCont05
,PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY OrderAmt) OVER() PerDisc05
,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY OrderAmt) OVER(PARTITION BY OrderDate) PerContDt
,PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY OrderAmt) OVER(PARTITION BY OrderDate) PerDiscDt
,PERCENTILE_CONT(0) WITHIN GROUP (ORDER BY OrderAmt) OVER() PerCont0
FROM CTEOrders

Results
ID ODt OAmt CName PerCont05 PerDisc05 PerContDt PerDiscDt PerCont0
1 3/1/2012 10 Joe 13 13.00 10.5 10.00 10
2 3/1/2012 11 Sam 13 13.00 10.5 10.00 10
3 3/2/2012 10 Beth 13 13.00 15.0 15.00 10
4 3/2/2012 15 Joe 13 13.00 15.0 15.00 10
5 3/2/2012 17 Sam 13 13.00 15.0 15.00 10
6 3/3/2012 12 Joe 13 13.00 12.0 12.00 10
7 3/4/2012 10 Beth 13 13.00 11.5 11.00 10
10 3/4/2012 11 Beth 13 13.00 11.5 11.00 10
9 3/4/2012 12 Joe 13 13.00 11.5 11.00 10
8 3/4/2012 18 Sam 13 13.00 11.5 11.00 10
11 3/5/2012 14 Sam 13 13.00 14.0 14.00 10
12 3/6/2012 17 Beth 13 13.00 18.0 17.00 10
13 3/6/2012 19 Joe 13 13.00 18.0 17.00 10
14 3/7/2012 13 Beth 13 13.00 14.5 13.00 10
15 3/7/2012 16 Sam 13 13.00 14.5 13.00 10

As I wrap up this post, I have to give a shout out to my daughter, Kristy, who is an honors math student. She helped me get my head around this last group of functions. Her honors math work and some statistical work she had done in science helped provide additional insight into the math behind the functions. (Kristy – you rock!)

Series Wrap Up

I hope this series helps everyone understand the power and flexibility in the window functions made available in SQL Server 2012. If you happen to use Oracle, I know that many of these functions or there equivalent are also available in 11g and they also appear to be in 10g. I have to admit my first real production usage was with Oracle 11g but has since used them with SQL Server 2012. The expanded functionality in SQL Server 2012 is just one more reason to upgrade to the latest version.





T-SQL Window Functions – Part 3: Aggregate Functions

7 07 2014

This is a reprint with some revisions of a series I originally published on LessThanDot. You can find the links to the original blogs on my Series page.

TSQL WIndow Functions_thumb[1]_thumbThis is part 3 in my series on SQL window functions. In this post, we will explore using aggregation functions with T-SQL windows. SQL Server supports most of the aggregation functions such as SUM and AVG in this context with the exceptions of GROUPING and GROUPING_ID. However, prior to SQL Server 2012 only the PARTITION BY clause was supported which greatly limited the usability of aggregate window functions. When support for the ORDER BY clause was introduced in SQL Server 2012, more complex business problems such as running totals could be solved without the extensive use of cursors or nested select statement. In my experience, I used to try various ways to get around this limitation including pushing the data to .NET as it could solve this problem more efficiently. However, this was not always possible when working with reporting. Now that we are able to use SQL to solve the problem, more complex and low-performing solutions can be replaced with these window functions.

Once again, the following CTE will be used as the query in all examples throughout the post:

with CTEOrders as
(select cast(1 as int) as OrderID, cast(‘3/1/2012′ as date) as OrderDate, cast(10.00 as money) as OrderAmt, ‘Joe’ as CustomerName
union select 2, ‘3/1/2012′, 11.00, ‘Sam’
union select 3, ‘3/2/2012′, 10.00, ‘Beth’
union select 4, ‘3/2/2012′, 15.00, ‘Joe’
union select 5, ‘3/2/2012′, 17.00, ‘Sam’
union select 6, ‘3/3/2012′, 12.00, ‘Joe’
union select 7, ‘3/4/2012′, 10.00, ‘Beth’
union select 8, ‘3/4/2012′, 18.00, ‘Sam’
union select 9, ‘3/4/2012′, 12.00, ‘Joe’
union select 10, ‘3/4/2012′, 11.00, ‘Beth’
union select 11, ‘3/5/2012′, 14.00, ‘Sam’
union select 12, ‘3/6/2012′, 17.00, ‘Beth’
union select 13, ‘3/6/2012′, 19.00, ‘Joe’
union select 14, ‘3/7/2012′, 13.00, ‘Beth’
union select 15, ‘3/7/2012′, 16.00, ‘Sam’
)
select OrderID
,OrderDate
,OrderAmt
,CustomerName
from CTEOrders;

Using PARTITION BY with Aggregate Functions

SQL Server 2005 and the newer versions supports the usage of the PARTITION BY clause by itself. This allowed for some simple aggregate windows. The following example shows SUM and AVG for different partitions of data. The third function actually creates and average using a SUM and COUNT function.

select CustomerName
,OrderDate
,OrderAmt
,SUM(OrderAmt) OVER (PARTITION BY CustomerName) CustomerTotal
,AVG(OrderAmt) OVER (PARTITION BY OrderDate) AvgDailyAmt
,CAST(COUNT(OrderID) OVER (PARTITION BY OrderDate) as decimal(8,3)) / CAST(COUNT(OrderID) OVER() as decimal(8,3)) PctOfTotalPerDay
from CTEOrders
order by OrderDate;

NOTE: The COUNT aggregate returns an integer value. In order to return the decimal, the values need to be explicitly converted to decimal types. Otherwise, the result was rounding to zero for all results in this sample.

Results

CustomerName OrderDate OrderAmt CustomerTotal AvgDailyAmt PctOfTotalPerDay
Joe 3/1/2012 10 68 10.5 0.133333333
Sam 3/1/2012 11 76 10.5 0.133333333
Sam 3/2/2012 17 76 14 0.2
Joe 3/2/2012 15 68 14 0.2
Beth 3/2/2012 10 61 14 0.2
Joe 3/3/2012 12 68 12 0.066666667
Joe 3/4/2012 12 68 12.75 0.266666667
Beth 3/4/2012 10 61 12.75 0.266666667
Beth 3/4/2012 11 61 12.75 0.266666667
Sam 3/4/2012 18 76 12.75 0.266666667
Sam 3/5/2012 14 76 14 0.066666667
Beth 3/6/2012 17 61 18 0.133333333
Joe 3/6/2012 19 68 18 0.133333333
Beth 3/7/2012 13 61 14.5 0.133333333
Sam 3/7/2012 16 76 14.5 0.133333333

Using Subselects

Subselect statements in SQL Server are supported, but harder to optimize in SQL Server versus Oracle. Until window functions were introduced all of the situations above could be solved by subselects, but performance would degrade as the results needed to work with larger sets of data. With the improved functionality in SQL Server 2012, you should not need to use subselects to return row-based aggregations. Besides the performance implications, maintenance will also be much simpler as the SQL becomes more transparent. For reference, here is the subselect syntax to return the same results as above:

select cte.CustomerName
, cte.OrderDate
, cte.OrderAmt
, (select SUM(OrderAmt) from CTEOrders where CustomerName = cte.CustomerName) CustomerTotal
, (select cast(COUNT(OrderID) as decimal(8,3)) from CTEOrders where OrderDate = cte.OrderDate) / (select cast(COUNT(OrderID) as decimal(8,3)) from CETOrders) AvgDailyAmt
from CETOrders cte
order by cte.OrderDate;

While it is possible to solve the same function using the subselects, the code is already getting messier and with data sets larger than what we have here, you would definitely see performance degradation.

Some Thoughts on GROUP BY

While I am digressing, I wanted to also highlight some details concerning GROUP BY. The one the biggest difficulties working with the GROUP BY clause and aggregates, every column must either be a part of the GROUP BY or have an aggregation associated with it. The window functions help solve this problem as well.
In the following examples, the first query returns the sum of the amount by day. This is pretty standard logic when working with aggregated queries in SQL.

select OrderDate
,sum(OrderAmt) as DailyOrderAmt
from CTEOrders
group by OrderDate;

However, if you wanted to see more details, but not include them in the aggregation, the following will not work.

select OrderDate
,OrderID
,OrderAmt
,sum(OrderAmt) as DailyOrderAmt
from CTEOrders
group by OrderDate
,OrderID
,OrderAmt;

This SQL statement will return each row individually with the sum at the detail level. You can solve this using the subselect above which is not recommended or you can use a window function.

select OrderDate
,OrderID
,OrderAmt
,sum(OrderAmt) OVER (PARTITION BY OrderDate) as DailyOrderAmt
from CTEOrders

As you can see here and in previous examples the OVER clause allows you to manage the grouping based on the context specified in relationship to the current row.

One other twist on the GROUP BY clause. First, I need to give credit to Itzik Ben-Gan for calling this to my attention at one of our Minnesota SQL Server User Group meetings. In his usual fashion he was showing some T-SQL coolness and he showed an interesting error when using the OVER clause with the GROUP BY clause.

The following will return an error because the first expression is an aggregate, but the second expression which is using the OVER clause is not. Also note that in this example the OVER clause is being evaluated for the entire set of data.

select sum(OrderAmt)
, sum(OrderAmt) over() as TotalOrderAmt
from CTEOrders
group by CustomerName

The expression above returns the following error:
Column ‘CTEOrders.OrderAmt’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause

The goal of the statement above was to show the customer’s total order amount with the overall order amount. The following statement resolves this issue because it is aggregating the aggregates. The window is now summing the aggregated amount which are grouped on the customer name.

select sum(OrderAmt)
, sum(sum(OrderAmt)) over() as TotalOrderAmt
from CTEOrders
group by CustomerName

Thanks again to Itzik for bringing this problem and resolution to my attention.

Aggregates with ORDER BY

With the expansion of the OVER clause to include ORDER BY support with aggregates, window functions increased their value substantially. One of the key business problems this allowed us to solve was a running aggregate.

The first example shows how to get a running total by CustomreName based on OrderDate and OrderID.

select OrderID
,OrderDate
,OrderAmt
,CustomerName
,SUM(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderDate, OrderID) as RunningByCustomer
from CTEOrders
ORDER BY CustomerName, OrderDate;

Results

OrderID OrderDate OrderAmt CustomerName RunningByCustomer
3 3/2/2012 10 Beth 10
7 3/4/2012 10 Beth 20
10 3/4/2012 11 Beth 31
12 3/6/2012 17 Beth 48
14 3/7/2012 13 Beth 61
1 3/1/2012 10 Joe 10
4 3/2/2012 15 Joe 25
6 3/3/2012 12 Joe 37
9 3/4/2012 12 Joe 49
13 3/6/2012 19 Joe 68
2 3/1/2012 11 Sam 11
5 3/2/2012 17 Sam 28
8 3/4/2012 18 Sam 46
11 3/5/2012 14 Sam 60
15 3/7/2012 16 Sam 76

This next example is more creative. It begins to show how powerful the window functions are. In this statement, we are going to return the annual running total aggregated by day. The differentiator here is that we use a DATEPART function in the OVER clause to achieve the desired results.

select OrderID
,OrderDate
,OrderAmt
,CustomerName
,SUM(OrderAmt) OVER (PARTITION BY datepart(yyyy, OrderDate) ORDER BY OrderDate) as AnnualRunning
from CTEOrders
ORDER BY OrderDate;

Results

OrderID OrderDate OrderAmt CustomerName AnnualRunning
1 3/1/2012 10 Joe 21
2 3/1/2012 11 Sam 21
3 3/2/2012 10 Beth 63
4 3/2/2012 15 Joe 63
5 3/2/2012 17 Sam 63
6 3/3/2012 12 Joe 75
7 3/4/2012 10 Beth 126
8 3/4/2012 18 Sam 126
9 3/4/2012 12 Joe 126
10 3/4/2012 11 Beth 126
11 3/5/2012 14 Sam 140
12 3/6/2012 17 Beth 176
13 3/6/2012 19 Joe 176
14 3/7/2012 13 Beth 205
15 3/7/2012 16 Sam 205

The ORDER BY clause creates an expanding group within the partition. In the examples above, the partition was the customer. Within each partition, ordered groups based on OrderDate and OrderID are “created”. At each row, the OrderDate and OrderID groups are aggregated up to the current row’s group thus producing the running total. If more than one row has the same order grouping, all of the rows in the group are aggregated into the total as shown in the second example above with the days and years.

Aggregates with ROWS

The ROWS clause is used to further define the partition by specifying which physical rows to include based on their proximity to the current row. As noted in the first post in the series, ROWS requires the ORDER BY clause as this determines the orientation of the partition.

The following example uses the FOLLOWING keywords to find the next two purchases that the customer made.

select OrderID
,OrderDate
,OrderAmt
,CustomerName
,SUM(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderDate, OrderID ROWS BETWEEN 1 FOLLOWING and 2 FOLLOWING) as NextTwoAmts
from CTEOrders
order by CustomerName, OrderDate, OrderID;

Results

OrderID OrderDate OrderAmt CustomerName NextTwoAmts
3 3/2/2012 10 Beth 21
7 3/4/2012 10 Beth 28
10 3/4/2012 11 Beth 30
12 3/6/2012 17 Beth 13
14 3/7/2012 13 Beth NULL
1 3/1/2012 10 Joe 27
4 3/2/2012 15 Joe 24
6 3/3/2012 12 Joe 31
9 3/4/2012 12 Joe 19
13 3/6/2012 19 Joe NULL
2 3/1/2012 11 Sam 35
5 3/2/2012 17 Sam 32
8 3/4/2012 18 Sam 30
11 3/5/2012 14 Sam 16
15 3/7/2012 16 Sam NULL

As we noted in the first blog, the last two rows in the partition only contain partial values. For example, order 12 contains the sum of only one order, 14, and order 14 has now rows following it in the partition and returns NULL as a result. When working with the ROWS clause this must be taken into account.

Aggregates with RANGE

Lastly, adding the RANGE to the OVER clause allows you to create aggregates which go to the beginning or end of the partition. RANGE is commonly used with UNBOUNDED FOLLOWING which goes to the end of the partition and UNBOUNDED PRECEDING which goes to the beginning of the partition. One of the most common use would be to specify the rows from the beginning of the partition to the current row which allows for aggregations such as year to date.

In the example below, we are calculating the average order size over time to the current row. This could be a very effective in a trending report.

select OrderID
,OrderDate
,OrderAmt
,CustomerName
,AVG(OrderAmt) OVER (ORDER BY OrderID RANGE BETWEEN UNBOUNDED PRECEDING and CURRENT ROW) as AvgOrderAmt
from CTEOrders
order by OrderDate;

Results

OrderID OrderDate OrderAmt CustomerName AvgOrderAmt
1 3/1/2012 10 Joe 10
2 3/1/2012 11 Sam 10.5
3 3/2/2012 10 Beth 10.333333
4 3/2/2012 15 Joe 11.5
5 3/2/2012 17 Sam 12.6
6 3/3/2012 12 Joe 12.5
7 3/4/2012 10 Beth 12.142857
8 3/4/2012 18 Sam 12.875
9 3/4/2012 12 Joe 12.777777
10 3/4/2012 11 Beth 12.6
11 3/5/2012 14 Sam 12.727272
12 3/6/2012 17 Beth 13.083333
13 3/6/2012 19 Joe 13.538461
14 3/7/2012 13 Beth 13.5
15 3/7/2012 16 Sam 13.666666

As you can see, the latest versions of OVER clause supports powerful yet simple aggregations which can help in a multitude of reporting and business solutions. Up next, the last blog in the series – Analytic Functions which are all new in SQL Server 2012.





Happy Independence Day America and 10 Years Packt Publishing!

4 07 2014

capitol-fireworks01

 

 

If you are in the USA, I hope you take some time today to enjoy your family and friends and see some fireworks.

 

Packt Publishing Celebrates 10 Years – $10 eBooks and Videos

This month marks 10 years since Packt Publishing embarked on its mission to deliver effective learning and information services to IT professionals. In that time it’s published over 2000 titles and helped projects become household names, awarding over $400,000 through its Open Source Project Royalty Scheme.

To celebrate this huge milestone, from June 26th Packt is offering all of its eBooks and Videos at just $10 each for 10 days – this promotion covers every title and customers can stock up on as many copies as they like until July 5th.

10 days 10 years - Home Banner

Why not grab the book I coauthored?

9809EN%20Cookbook





T-SQL Window Functions – Part 2: Ranking Functions

26 06 2014

This is a reprint with some revisions of a series I originally published on LessThanDot. You can find the links to the original blogs on my Series page.

TSQL WIndow Functions_thumb[1]This is part 2 in my series on SQL window functions. In this post, we will explore using ranking functions. SQL Server support four different ranking functions which are supported in SQL Server versions 2005 and forward. All of these functions require the use of the OVER clause. The following functions are classified as ranking functions: ROW_NUMBER, RANK, DENSE_RANK, NTILE.

Once again, the following CTE will be used as the query in all examples throughout the post:

with CTEOrders as
(select cast(1 as int) as OrderID, cast(‘3/1/2012′ as date) as OrderDate, cast(10.00 as money) as OrderAmt, ‘Joe’ as CustomerName
union select 2, ‘3/1/2012′, 11.00, ‘Sam’
union select 3, ‘3/2/2012′, 10.00, ‘Beth’
union select 4, ‘3/2/2012′, 15.00, ‘Joe’
union select 5, ‘3/2/2012′, 17.00, ‘Sam’
union select 6, ‘3/3/2012′, 12.00, ‘Joe’
union select 7, ‘3/4/2012′, 10.00, ‘Beth’
union select 8, ‘3/4/2012′, 18.00, ‘Sam’
union select 9, ‘3/4/2012′, 12.00, ‘Joe’
union select 10, ‘3/4/2012′, 11.00, ‘Beth’
union select 11, ‘3/5/2012′, 14.00, ‘Sam’
union select 12, ‘3/6/2012′, 17.00, ‘Beth’
union select 13, ‘3/6/2012′, 19.00, ‘Joe’
union select 14, ‘3/7/2012′, 13.00, ‘Beth’
union select 15, ‘3/7/2012′, 16.00, ‘Sam’
)
select OrderID
,OrderDate
,OrderAmt
,CustomerName
from CTEOrders;

ROW_NUMBER

The ROW_NUMBER function will return a row number for each row within the partition based on the partition and order. This function requires the use of the ORDER BY clause. However, it is often used without the PARTITION BY clause as it will number the entire result set. If PARTITION BY is used, then the row numbering starts over within the partition. The following code shows how both of these work.

select CustomerName
, OrderDate
, OrderAmt
,ROW_NUMBER() OVER(ORDER BY CustomerName) RowNumByCust
,ROW_NUMBER() OVER(PARTITION BY OrderDate ORDER BY CustomerName) RowNumPart
from CTEOrders
order by CustomerName;

RANK and DENSE_RANK

While these are different functions with even different rules, it is easier to understand the difference when put side by side. RANK and DENSE_RANK will order the rows based on the specified partition and apply a rank or number to them. Both RANK and DENSE_RANK will assign the same rank to “ties”. For example if rows 3 and 4 have the same value in the partition, they will have the same rank. The difference is how it handles the next rank number in the series. RANK does a “true” ordering and will apply the ranking based on the number of rows and skip numbers that are ties. So, if you have a tie between the third and fourth row and the first two rows and the final row are unique the ranking is as follows: 1, 2, 3, 3, 5. As you can see, 4 is missing. DENSE_RANK keeps the tie as well, however it does not skip any numbers in the sequence. Here is the same example set based on using DENSE_RANK: 1, 2, 3, 3, 4. As with the ROW_NUMBER function, the ORDER BY is required for these functions.

select CustomerName
, OrderDate
, OrderAmt
,RANK() OVER (ORDER BY CustomerName) RankByCust
,DENSE_RANK() OVER (ORDER BY CustomerName) DenseByCust
,RANK() OVER (PARTITION BY CustomerName ORDER BY OrderDate) RankByCustDt
,DENSE_RANK() OVER (PARTITION BY CustomerName ORDER BY OrderDate) DenseByCustDt

from CTEOrders
order by CustomerName;

NTILE

The last of the ranking functions is NTILE. NTILE groups the data into ordered and ranked groups based on the ORDER BY clause. The number of groups used in the ranking are specified in the function itself. So if you specify four groups to produce a quartile ranking, four ranked values, one through four, will be assigned to each group based on the order. In many cases, the total numbers of rows is not divisible by the number of groups chosen. For instance, if the number of groups is 4 but the total number of rows is 39, then the first three groups will return 10 rows and the final group will only return 9 rows. The function will always frontload the results so the earliest groups will have the “extra” rows. If a PARTITION BY clause is used, the NTILE ranking will applied within each partition. As with the other ranking functions, the ORDER BY clause is required to use this function.

select CustomerName
,OrderDate
,OrderAmt
,NTILE(4) OVER (ORDER BY OrderDate) NtileByDt
,NTILE(2) OVER (PARTITION BY OrderDate ORDER BY  OrderAMt) NtileByDtAmt
from CTEOrders
order by CustomerName;

Up next, using aggregate functions with window functions.








Follow

Get every new post delivered to your Inbox.

Join 786 other followers

%d bloggers like this: