Effectively Integrating FHIR Data from Azure Health Services

On December 13, 2022December 13, 2022 By SteveIn 3Cloud, ADLS, Azure, Azure Data Studio, FHIR, JSON, Jupyter Notebook, Microsoft Azure, SQL Saturday, Synapse, T-SQL1 Comment

This blog is intended to be a follow up from the SQL Saturday 2022 in Oregon & SW Washington. In that session I presented an introduction to FHIR and JSON data produced from the Azure Health Services API’s.

With the recent updated mandates in the healthcare environment in the United States, Microsoft has continued to expand its capability to support the FHIR standard for integrating healthcare data. While the standard is well documented and Microsoft’s capabilities are expansive, it falls on data professionals to interpret that data and build meaningful reports and produce meaningful insights from the data as it is collected and integrated across environments. This requires a good working knowledge of JSON in SQL to manipulate complex data models. In the session, we did a short review of the FHIR standard and the overall implementation of FHIR in Azure. From there we reviewed the resulting data in the data lake and in Synapse. That was followed up with an overview into the heart of complex SQL using JSON functions in Synapse. Whether or not you are active in healthcare today, this will be an enlightening session on how to use JSON SQL functions within the Azure SQL platforms.

What is FHIR and why should you care?

FHIR stands for Fast Healthcare Interoperability Resources. this is the latest specification for interoperability in healthcare produced by HL7. To be clear the word fast has nothing to do with performance, but more about the ability to implement and integrate data quickly. With the latest regulations around the world in health care, this standard is the established standard for integrating healthcare data and we’ll continue to be on the forefront of this work. If you do any work in health care, you will need to understand FHIR because you will likely run across data formatted to the standard from many different sources.

FHIR is very well documented. In many ways when the standard is properly followed the JSON documents or other supported formats are effectively self-documenting. It is commonly understood that the core FHIR specification handles about 80% of the use cases in healthcare. It is designed to be flexible so that it can support specialized needs within regions or healthcare areas. For example, in the US there is a need to support race and ethnicity. The U.S. Core Implementation Guide provides guidance on the specification enhancements to support this need for U.S. healthcare organizations. You will find similar support for other countries as well as specific implementations for healthcare vendors such as Epic.

Neither the notebook, the presentation, or this blog is expected to be and exhaustive coverage of FHIR. before we move on to some of the other implementation pieces, it is important to understand one key aspect of FHIR is the basic building block called a resource. A resource is the core exchangeable content within the specification. All resources share the following characteristics:

A common way to define and represent the resource including data types and patterns
A common set of metadata which can be discovered easily
A human readable part

For more detailed information on the supported resources and other details around FHIR implementation, you should visit the following website:

Azure Health Services and the FHIR API

I will not be digging into a lot of the health care services information nor the FHIR support within Azure in this post. The important things to understand is that Microsoft has made a concerted effort to support this specification which includes technology and architectures for the extraction of data from various healthcare systems which will then use the FHIR APIs to standardize that extracted data into the FHIR spec typically in JSON files in the data lake. Because of the standardized format, Microsoft is able to supply a set of common schemas that can be used in serverless synapse to create external tables and views to accelerate the implementation and usage of data produced from the APIs. It is from this starting point that we are able to start working with the data in reporting and analytics solutions.

At this point I want to put a plug in for the company I work for. If you're interested in learning how Azure health services and the FHIR specification can be implemented at your company, we have FHIR Quick Start and FHIR Data Blueprint solutions. These solutions have been used by many other customers to achieve high levels of integration in their health care data estate. If you're interested in learning more, please reach out to us at: https://3cloudsolutions.com/get-started/

Working with the data from the FHIR API using JSON in SQL

As noted in the previous section, Azure Health Services comes with setup serverless tables and views to be used with the extracted data. However due to the complexity of FHIR, there are a number of columns within those tables and views which still contain JSON snippets. For example, there is one field for name which has several objects and arrays to support the specification. You cannot simply select the name from the table and use that as you move forward. There are many different fields like this throughout the data. For the rest of this blog and in the notebook, we will work through a number of scenarios to build a view of the patient resource that can be used for simple reporting. This view will contain a few JSON functions from SQL Server and solve simple to complex scenarios in the illustration.

The functions we will be using:

ISJSON
JSON_VALUE
OPENJSON

In addition to these functions, we will also be using the CROSS APPLY operator in SQL to join our data with relational data.

The examples in the notebook are built on the tables resulting from working with the Azure FHIR API. I am unable to provide a sample of the data to use with the set of information in the notebook currently. However, the SQL will work if you have your own FHIR implementation and a Patient resource to work with. rather than rewrite the entire contents of the notebook in the blog post, here is a link to the notebook.

If you plan to implement this in the same way, you will need Azure Data Lake, Azure Synapse serverless, and Azure Data Studio. the notebook can be opened in Azure Data Studio. If you are unfamiliar with working with notebooks inside of Azure Data Studio, you are not alone. Check out this post which discusses how to implement your first notebook in Azure Data Studio.

Building our view and SQL with JSON functions

If you decide not to open the notebook but are curious what the view looks like here is a finished product that we created in the notebook.

SELECT TOP (20) p.resourceType + '/' +  p.id as PatientResourceID
    , p.resourceType as ResourceType
    , p.id as ResourceID 
    , cast(p.[meta.versionId] as int) as VersionID 
    , cast(p.[meta.lastUpdated] as DATETIME2(7)) as LastUpdated 
    , JSON_VALUE(p.[name], '$[0].family') as LastName
    , JSON_VALUE(p.[name], '$[0].given[0]') as FirstName
    , cast(p.active as bit) as IsActive
    , p.gender as Gender 
    , CAST(p.birthDate as date) as BirthDate
    , CASE WHEN p.[maritalStatus.coding] is null THEN NULL
           WHEN  JSON_VALUE(p.[maritalStatus.coding], '$[0].system') = 'http://terminology.hl7.org/CodeSystem/v3-MaritalStatus' 
                    THEN JSON_VALUE(p.[maritalStatus.coding], '$[0].code')
           ELSE NULL
           END as MaritalStatus 
    , CASE WHEN JSON_VALUE(p.[address], '$[0].use') = 'home' THEN JSON_VALUE(p.[address], '$[0].state')
            WHEN JSON_VALUE(p.[address], '$[1].use') = 'home' THEN JSON_VALUE(p.[address], '$[1].state')
            WHEN JSON_VALUE(p.[address], '$[2].use') = 'home' THEN JSON_VALUE(p.[address], '$[2].state')
            WHEN JSON_VALUE(p.[address], '$[3].use') = 'home' THEN JSON_VALUE(p.[address], '$[3].state')
            ELSE NULL
            END as HomeStateOrProvince
    , e.Ethnicity
    , r.Race
FROM fhir.Patient p
INNER JOIN (SELECT id, max([meta.versionId]) as currentVersion FROM fhir.Patient GROUP BY id) cp
    ON p.[meta.versionId] = cp.currentVersion
    AND p.id = cp.id
LEFT JOIN 
    (SELECT p.id
        , CASE WHEN JSON_VALUE(ext.value,'$.extension[0].url') = 'ombCategory'
            THEN
            CASE WHEN JSON_VALUE(ext.value, '$.extension[1].valueString') IS NOT NULL  THEN JSON_VALUE(ext.value, '$.extension[1].valueString')
                    WHEN JSON_VALUE(ext.value, '$.extension[0].valueString') IS NOT    NULL THEN JSON_VALUE(ext.value, '$.extension[0].valueString')
                    ELSE JSON_VALUE(ext.value, '$.extension[0].valueCoding.display')
                    END
            ELSE JSON_VALUE(ext.value, '$.valueCodeableConcept.coding[0].display')
            END AS Ethnicity 
        FROM 
        (
            SELECT fp.id, fp.extension FROM fhir.Patient fp
            INNER JOIN (SELECT id, max([meta.versionId]) as currentVersion FROM fhir.Patient GROUP BY id) cp
                ON fp.[meta.versionId] = cp.currentVersion
                AND fp.id = cp.id
            WHERE ISJSON(fp.extension) =1
        ) p 
        CROSS APPLY 
            OPENJSON(p.extension,'$'
            ) as ext
        WHERE JSON_VALUE(ext.value,'$.url') = 'http://hl7.org/fhir/us/core/StructureDefinition/us-core-ethnicity'
    ) e on e.id = p.id 
LEFT JOIN 
    (SELECT p.id
        , CASE WHEN JSON_VALUE(ext.value,'$.extension[0].url') = 'ombCategory'
            THEN
            CASE WHEN JSON_VALUE(ext.value, '$.extension[3].valueString') IS NOT NULL THEN JSON_VALUE(ext.value, '$.extension[3].valueString')
                    WHEN JSON_VALUE(ext.value, '$.extension[2].valueString') IS NOT NULL THEN JSON_VALUE(ext.value, '$.extension[2].valueString')
                    WHEN JSON_VALUE(ext.value, '$.extension[1].valueString') IS NOT NULL THEN JSON_VALUE(ext.value, '$.extension[1].valueString')
                    WHEN JSON_VALUE(ext.value, '$.extension[0].valueString') IS NOT NULL THEN JSON_VALUE(ext.value, '$.extension[0].valueString')
                    ELSE JSON_VALUE(ext.value, '$.extension[0].valueCoding.display')
                    END
            ELSE JSON_VALUE(ext.value, '$.valueCodeableConcept.coding[0].display')
            END AS Race 
        FROM 
        (
            SELECT fp.id, fp.extension FROM fhir.Patient fp
            INNER JOIN (SELECT id, max([meta.versionId]) as currentVersion FROM fhir.Patient GROUP BY id) cp
                ON fp.[meta.versionId] = cp.currentVersion
                AND fp.id = cp.id
            WHERE ISJSON(fp.extension) =1
        ) p 
        CROSS APPLY 
            OPENJSON(p.extension,'$'
            ) as ext
        WHERE JSON_VALUE(ext.value,'$.url') = 'http://hl7.org/fhir/us/core/StructureDefinition/us-core-race'
    ) as r on r.id = p.id

Here is a sample of the results from that view:

PatientResourceID

ResourceType

ResourceID

VersionID

LastUpdated

LastName

FirstName

IsActive

Gender

BirthDate

MaritalStatus

HomeStateOrProvince

Ethnicity

Race

Patient/d8af7bfa-5008-4a0f-85d1-0af3448a31dd

Patient

d8af7bfa-5008-4a0f-85d1-0af3448a31dd

2022-05-31 18:07:03.2150000

DUCK

DONALD

male

1965-07-14

NULL

Patient/78cf7725-a0e1-44a4-94d4-055482781afb

Patient

78cf7725-a0e1-44a4-94d4-055482781afb

2022-05-31 18:07:30.7490000

Gretzky

Wayne

NULL

1990-05-31

NULL

Patient/9e909e52-61a1-be50-1878-a12ef8c36346

Patient

9e909e52-61a1-be50-1878-a12ef8c36346

2022-05-31 18:39:58.1780000

EVERYMAN

ADAM

NULL

male

1988-08-18

NULL

Non Hispanic or Latino

White+Asian

Patient/585f3cc0-c727-4989-9214-a7a7b60a2ade

Patient

585f3cc0-c727-4989-9214-a7a7b60a2ade

2022-05-31 13:14:57.0640000

DUCK

DONALD

male

1965-07-15

NULL

Patient/29a819c4-f553-8189-2354-9441b86d37ef

Patient

29a819c4-f553-8189-2354-9441b86d37ef

2022-05-18 15:18:40.1560000

FORD

ELAINE

NULL

female

1992-03-10

NULL

Patient/d5fe6802-a680-e762-8f43-9659340b00ac

Patient

d5fe6802-a680-e762-8f43-9659340b00ac

2022-05-18 14:39:52.2550000

EVERYMAN

ADAM

NULL

male

1961-06-15

NULL

Patient/4d661053-a8d0-148c-7023-54508fd04a52

Patient

4d661053-a8d0-148c-7023-54508fd04a52

2022-05-21 13:48:24.9720000

EVERYMAN

sam

NULL

male

1966-05-07

NULL

Not Hispanic or Latino

White

Wrapping it up

As you can see, understanding the specification well enough to build a complex SQL statement using JSON functions is required to work within FHIR effectively. Due to the complex nature of the nested JSON, you may not be able to reconcile this in tools such as power BI. Being able to build this out in SQL guarantees that you have provided you will report writers and analysts with a solid result set which can be used with confidence.

Resources summary:

Link to notebook
Link to FHIR spec
Link to 3Cloud FHIR
Link to Azure Health Services

T-SQL Tuesday #87 – Fixing Old Problems with Shiny New Toys: STRING_SPLIT

On February 14, 2017February 12, 2017 By SteveIn Microsoft SQL Server, SQL PASS, SQL Server 2016, T-SQL3 Comments

tsql2sday-300x300 Thanks to Matt Gordon (@atsqlspeed) for hosting this T-SQL Tuesday.

Splitting Strings in SQL

A problem that has plagued SQL developers through the years is splitting strings. Many techniques have been used as more capabilities were added to SQL Server including XML datatypes, recursive CTEs and even CLR. I have used XML datatype methods to solve the problem most often. So, without further ado…

T-SQL Function: STRING_SPLIT

I have previously highlighted this function in a webinar with Pragmatic Works as a Hidden Gem in SQL Server 2016. It was not announced with great fanfare, but once discovered, solves a very common problem.

Syntax

STRING_SPLIT(string, delimiter)

The STRING_SPLIT function will return a single column result set. The column name is “value”. The datatype will be NVARCHAR for strings that are NCHAR or NVARCHAR. VARCHAR is used for strings that are CHAR or VARCHAR types.

Example

DECLARE @csvString AS VARCHAR(100)
SET @csvString = 'Monday, Tuesday, Wednesday, Thursday, Friday'
SELECT value AS WorkDayOfTheWeek 
FROM STRING_SPLIT (@csvString, ',');

The initial example returns the follow results:#tsql2sday

value
Monday
Tuesday
Wednesday
Thursday
Friday

As you can see in the example, the results returned a leading space which was in the original string. The following example trims leading and trailing spaces.

DECLARE @csvString AS VARCHAR(100)
SET @csvString = 'Monday, Tuesday, Wednesday, Thursday, Friday'
SELECT LTRIM(RTRIM(value)) AS WorkDayOfTheWeek 
FROM STRING_SPLIT (@csvString, ',');

The cleaned example returns the follow results:

value
Monday
Tuesday
Wednesday
Thursday
Friday

Thanks again Matt for this opportunity to share an underrated, but really useful shiny new tool in SQL Server 2016.

SQL Saturday #437–Boston BI Edition 2015–You Can Still Analyze Data with T-SQL

On October 17, 2015October 17, 2015 By SteveIn Business Intelligence, Microsoft SQL Server, Speaking, SQL PASS, SQL Saturday, T-SQL2 Comments

Thanks for attending my session on analyzing data with TSQL. I hope you learned something you can take back and use in your projects or at your work. You will find an link to the session and code I used below. If you have any questions about the session post them in comments and I will try to get you the answers.

The presentation can be found here: Analyzing with TSQL

The code was put into a Word document that you can get here: Code to support the analysis with TSQL Sessions

This session is also backed by an existing blog series I have written.

T-SQL Window Functions – Part 1- The OVER() Clause

T-SQL Window Functions – Part 2- Ranking Functions

T-SQL Window Functions – Part 3: Aggregate Functions

T-SQL Window Functions – Part 4- Analytic Functions

Microsoft Resources:

OVER Clause: http://msdn.microsoft.com/en-us/library/ms189461(v=SQL.110).aspx
Analytic Functions: http://msdn.microsoft.com/en-us/library/hh213234(v=sql.110).aspx
CUBE, ROLLUP, GROUPING SETS: : https://technet.microsoft.com/en-us/library/bb522495(v=sql.105).aspx

SQL Saturday #453–Minnesota 2015–A Window Into Your Data

On October 10, 2015October 17, 2015 By SteveIn Microsoft SQL Server, SQL Saturday, T-SQL

Thanks for attending my session on window functions in TSQL. I hope you learned something you can take back and use in your projects or at your work. You will find an link to the session and code I used below. If you have any questions about the session post them in comments and I will try to get you the answers.

The presentation can be found here: A Window into Your Data

The code was put into a Word document that you can get here: TSQL Window Function Code

This session is also backed by an existing blog series I have written.

T-SQL Window Functions – Part 1- The OVER() Clause

T-SQL Window Functions – Part 2- Ranking Functions

T-SQL Window Functions – Part 3: Aggregate Functions

T-SQL Window Functions – Part 4- Analytic Functions

Microsoft Resources:

OVER Clause: http://msdn.microsoft.com/en-us/library/ms189461(v=SQL.110).aspx
Analytic Functions: http://msdn.microsoft.com/en-us/library/hh213234(v=sql.110).aspx
CUBE, ROLLUP, GROUPING SETS: : https://technet.microsoft.com/en-us/library/bb522495(v=sql.105).aspx

T-SQL Window Functions – Part 4: Analytic Functions

On July 16, 2014July 16, 2014 By SteveIn Microsoft SQL Server, Oracle, SQL Server 2012, T-SQL

This is a reprint with some revisions of a series I originally published on LessThanDot. You can find the links to the original blogs on my Series page.

In the final installment of my series on SQL window functions, we will explore using analytic functions. Analytic functions were introduced in SQL Server 2012 with the expansion of the OVER clause capabilities. Analytic functions fall in to two primary categories: values at a position and percentiles. Four of the functions, LAG, LEAD, FIRST_VALUE and LAST_VALUE find a row in the partition and returns the desired value from that row. CUME_DIST and PERCENT_RANK break the partition into percentiles and return a rank value for each row. PERCENTILE_CONT and PERCENTILE_DISC a value at the requested percentile in the function for each row. All of the functions and examples in this blog will only work with SQL Server 2012.
Once again, the following CTE will be used as the query in all examples throughout the post:

with CTEOrders as
(select cast(1 as int) as OrderID, cast(‘3/1/2012’ as date) as OrderDate, cast(10.00 as money) as OrderAmt, ‘Joe’ as CustomerName
union select 2, ‘3/1/2012’, 11.00, ‘Sam’
union select 3, ‘3/2/2012’, 10.00, ‘Beth’
union select 4, ‘3/2/2012’, 15.00, ‘Joe’
union select 5, ‘3/2/2012’, 17.00, ‘Sam’
union select 6, ‘3/3/2012’, 12.00, ‘Joe’
union select 7, ‘3/4/2012’, 10.00, ‘Beth’
union select 8, ‘3/4/2012’, 18.00, ‘Sam’
union select 9, ‘3/4/2012’, 12.00, ‘Joe’
union select 10, ‘3/4/2012’, 11.00, ‘Beth’
union select 11, ‘3/5/2012’, 14.00, ‘Sam’
union select 12, ‘3/6/2012’, 17.00, ‘Beth’
union select 13, ‘3/6/2012’, 19.00, ‘Joe’
union select 14, ‘3/7/2012’, 13.00, ‘Beth’
union select 15, ‘3/7/2012’, 16.00, ‘Sam’
)
select OrderID
,OrderDate
,OrderAmt
,CustomerName
from CTEOrders;

Position Value Functions: LAG, LEAD, FIRST_VALUE, LAST_VALUE

Who has not needed to use values from other rows in the current row for a report or other query? A prime example is needing to know what the last order value was to calculate growth or just show the difference in the results. This has never been easy in SQL Server until now. All of these functions require the use of the OVER clause and the ORDER BY clause. They all use the current row within the partition to find the row at the desired position.

The LAG and LEAD functions allow you to specify the offset or how many rows to look forward or backward and they support a default value in cases where the value returned would be null. These functions do not support the use of ROWS or RANGE in the OVER clause. The FIRST_VALUE and LAST_VALUE allow you to further define the partition using ROWS or RANGE if desired.

The following example illustrates all of the functions with various variations on the parameters and settings.

select OrderID
,OrderDate
,OrderAmt
,CustomerName
,LAG(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderID) as PrevOrdAmt
,LEAD(OrderAmt, 2) OVER (PARTITION BY CustomerName ORDER BY OrderID) as NextTwoOrdAmt
,LEAD(OrderDate, 2, ‘9999-12-31’) OVER (PARTITION BY CustomerName ORDER BY OrderID) as NextTwoOrdDtNoNull
,FIRST_VALUE(OrderDate) OVER (ORDER BY OrderID) as FirstOrdDt
,LAST_VALUE(CustomerName) OVER (PARTITION BY OrderDate ORDER BY OrderID) as LastCustToOrdByDay
from CTEOrders

Results (with shortened column names):

ID	OrderDate	Amt	Cust	PrevOrdAmt	NextTwoAmt	NextTwoDt	FirstOrd	LastCust
1	3/1/2012	10	Joe	NULL	12	3/3/2012	3/1/2012	Joe
2	3/1/2012	11	Sam	NULL	18	3/4/2012	3/1/2012	Sam
3	3/2/2012	10	Beth	NULL	11	3/4/2012	3/1/2012	Beth
4	3/2/2012	15	Joe	10	12	3/4/2012	3/1/2012	Joe
5	3/2/2012	17	Sam	11	14	3/5/2012	3/1/2012	Sam
6	3/3/2012	12	Joe	15	19	3/6/2012	3/1/2012	Joe
7	3/4/2012	10	Beth	10	17	3/6/2012	3/1/2012	Beth
8	3/4/2012	18	Sam	17	16	3/7/2012	3/1/2012	Sam
9	3/4/2012	12	Joe	12	NULL	12/31/9999	3/1/2012	Joe
10	3/4/2012	11	Beth	10	13	3/7/2012	3/1/2012	Beth
11	3/5/2012	14	Sam	18	NULL	12/31/9999	3/1/2012	Sam
12	3/6/2012	17	Beth	11	NULL	12/31/9999	3/1/2012	Beth
13	3/6/2012	19	Joe	12	NULL	12/31/9999	3/1/2012	Joe
14	3/7/2012	13	Beth	17	NULL	12/31/9999	3/1/2012	Beth
15	3/7/2012	16	Sam	14	NULL	12/31/9999	3/1/2012	Sam

If you really like subselects, you can also mix in some subselects and have a very creative SQL statement. The following statement uses LAG and a subselect to find the first value in a partition. I am showing this to illustrate some more of the capabilities of the function in case you have a solution that requires this level of complexity.

select OrderID
,OrderDate
,OrderAmt
,CustomerName
,LAG(OrderAmt, (
select count(*)-1
from CTEOrders c
where c.CustomerName = CTEOrders.CustomerName
and c.OrderID <= CTEOrders.OrderID), 0)
OVER (PARTITION BY CustomerName ORDER BY OrderDate, OrderID) as FirstOrderAmt
FROM CTEOrders

Percentile Functions: CUME_DIST, PERCENT_RANK, PERCENTILE_CONT, PERCENTILE_DISC

As I wrap up my discussion on window functions, the percentile based functions were the functions I knew the least about. While I have already used the position value functions above, I have not yet needed to use the percentiles. So, that meant I had to work with them for a while so I could share their usage and have some samples for you to use.

The key differences in the four function have to do with ranks and values. CUME_DIST and PERCENT_RANK return a ranking value while PERCENTILE_CONT and PERCENTILE_DISC return data values.

CUME_DIST returns a value that is greater than zero and lest than or equal to one (>0 and <=1) and represents the percentage group that the value falls into based on the order specified. PERCENT_RANK returns a value between zero and one as well (>= 0 and <=1). However, in PERCENT_RANK the first group is always represented as 0 whereas in CUME_DIST it represents the percentage of the group. Both functions return the last percent group as 1. In both cases, as the ranking percentages move from lowest to highest, each group’s percent value includes all of the earlier values in the calculation as well.

The following statement shows both of the functions using the default partition to determine the rankings of order amounts within our dataset.

select OrderID
,OrderDate
,OrderAmt
,CustomerName
,CUME_DIST() OVER (ORDER BY OrderAmt) CumDist
,PERCENT_RANK() OVER (ORDER BY OrderAmt) PctRank
FROM CTEOrders

Results:

OrderID	OrderDate	OrderAmt	CustomerName	CumDist	PctRank
1	3/1/2012	10	Joe	0.2	0
3	3/2/2012	10	Beth	0.2	0
7	3/4/2012	10	Beth	0.2	0
2	3/1/2012	11	Sam	0.33333333	0.214285714
10	3/4/2012	11	Beth	0.33333333	0.214285714
6	3/3/2012	12	Joe	0.46666667	0.357142857
9	3/4/2012	12	Joe	0.46666667	0.357142857
14	3/7/2012	13	Beth	0.53333333	0.5
11	3/5/2012	14	Sam	0.6	0.571428571
4	3/2/2012	15	Joe	0.66666667	0.642857143
15	3/7/2012	16	Sam	0.73333333	0.714285714
5	3/2/2012	17	Sam	0.86666667	0.785714286
12	3/6/2012	17	Beth	0.86666667	0.785714286
8	3/4/2012	18	Sam	0.93333333	0.928571429
13	3/6/2012	19	Joe	1	1

The last two functions, PERCENTILE_CONT and PERCENTILE_DISC, return the value at the percentile requested. PERCENTILE_CONT will return the true percentile value whether it exists in the data or not. For instance, if the percentile group has the values 10 and 20, it will return 15. If PERCENTILE_DISC, is applied to the same group it will return 10. It will return the smallest value in the percentile group, which in this case is 10. Both functions ignore NULL values and do not use the ORDER BY, ROWS, or RANGE clauses with the PARTITION BY clause. Instead, WITHIN GROUP is introduced which must contain a numeric data type and ORDER BY clause. Only one column can be specified here. Both functions need a percentile value which can be between 0.0 and 1.0.

The following script illustrates a couple of variations. The first two functions return the median of the default partition. Then next two return the median value for each day. Finally, the last two functions return the low and high values within the partition. The values segmented by the date partition highlight the key difference between the two functions.

select OrderID as ID
,OrderDate as ODt
,OrderAmt as OAmt
,CustomerName as CName
,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY OrderAmt) OVER() PerCont05
,PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY OrderAmt) OVER() PerDisc05
,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY OrderAmt) OVER(PARTITION BY OrderDate) PerContDt
,PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY OrderAmt) OVER(PARTITION BY OrderDate) PerDiscDt
,PERCENTILE_CONT(0) WITHIN GROUP (ORDER BY OrderAmt) OVER() PerCont0
FROM CTEOrders

Results

ID	ODt	OAmt	CName	PerCont05	PerDisc05	PerContDt	PerDiscDt	PerCont0
1	3/1/2012	10	Joe	13	13.00	10.5	10.00	10
2	3/1/2012	11	Sam	13	13.00	10.5	10.00	10
3	3/2/2012	10	Beth	13	13.00	15.0	15.00	10
4	3/2/2012	15	Joe	13	13.00	15.0	15.00	10
5	3/2/2012	17	Sam	13	13.00	15.0	15.00	10
6	3/3/2012	12	Joe	13	13.00	12.0	12.00	10
7	3/4/2012	10	Beth	13	13.00	11.5	11.00	10
10	3/4/2012	11	Beth	13	13.00	11.5	11.00	10
9	3/4/2012	12	Joe	13	13.00	11.5	11.00	10
8	3/4/2012	18	Sam	13	13.00	11.5	11.00	10
11	3/5/2012	14	Sam	13	13.00	14.0	14.00	10
12	3/6/2012	17	Beth	13	13.00	18.0	17.00	10
13	3/6/2012	19	Joe	13	13.00	18.0	17.00	10
14	3/7/2012	13	Beth	13	13.00	14.5	13.00	10
15	3/7/2012	16	Sam	13	13.00	14.5	13.00	10

As I wrap up this post, I have to give a shout out to my daughter, Kristy, who is an honors math student. She helped me get my head around this last group of functions. Her honors math work and some statistical work she had done in science helped provide additional insight into the math behind the functions. (Kristy – you rock!)

Series Wrap Up

I hope this series helps everyone understand the power and flexibility in the window functions made available in SQL Server 2012. If you happen to use Oracle, I know that many of these functions or there equivalent are also available in 11g and they also appear to be in 10g. I have to admit my first real production usage was with Oracle 11g but has since used them with SQL Server 2012. The expanded functionality in SQL Server 2012 is just one more reason to upgrade to the latest version.

Data on Wheels – Kristyna Ferris & Steve Hughes

Tag: TSQL

Effectively Integrating FHIR Data from Azure Health Services

What is FHIR and why should you care?

Azure Health Services and the FHIR API

Working with the data from the FHIR API using JSON in SQL

Building our view and SQL with JSON functions

Wrapping it up

T-SQL Tuesday #87 – Fixing Old Problems with Shiny New Toys: STRING_SPLIT

Splitting Strings in SQL

T-SQL Function: STRING_SPLIT

Syntax

Example

SQL Saturday #437–Boston BI Edition 2015–You Can Still Analyze Data with T-SQL

SQL Saturday #453–Minnesota 2015–A Window Into Your Data

T-SQL Window Functions – Part 4: Analytic Functions

Position Value Functions: LAG, LEAD, FIRST_VALUE, LAST_VALUE

Results (with shortened column names):

Percentile Functions: CUME_DIST, PERCENT_RANK, PERCENTILE_CONT, PERCENTILE_DISC

Results:

Results

Series Wrap Up

What is FHIR and why should you care?

Azure Health Services and the FHIR API

Working with the data from the FHIR API using JSON in SQL

Building our view and SQL with JSON functions

Wrapping it up

Share this:

Splitting Strings in SQL

T-SQL Function: STRING_SPLIT

Syntax

Example

Share this:

Share this:

Share this:

Position Value Functions: LAG, LEAD, FIRST_VALUE, LAST_VALUE

Results (with shortened column names):

Percentile Functions: CUME_DIST, PERCENT_RANK, PERCENTILE_CONT, PERCENTILE_DISC

Results:

Results

Series Wrap Up

Share this: