Effectively Integrating FHIR Data from Azure Health Services

This blog is intended to be a follow up from the SQL Saturday 2022 in Oregon & SW Washington. In that session I presented an introduction to FHIR and JSON data produced from the Azure Health Services API’s.

With the recent updated mandates in the healthcare environment in the United States, Microsoft has continued to expand its capability to support the FHIR standard for integrating healthcare data. While the standard is well documented and Microsoft’s capabilities are expansive, it falls on data professionals to interpret that data and build meaningful reports and produce meaningful insights from the data as it is collected and integrated across environments. This requires a good working knowledge of JSON in SQL to manipulate complex data models. In the session, we did a short review of the FHIR standard and the overall implementation of FHIR in Azure. From there we reviewed the resulting data in the data lake and in Synapse. That was followed up with an overview into the heart of complex SQL using JSON functions in Synapse. Whether or not you are active in healthcare today, this will be an enlightening session on how to use JSON SQL functions within the Azure SQL platforms.

What is FHIR and why should you care?

FHIR stands for Fast Healthcare Interoperability Resources. this is the latest specification for interoperability in healthcare produced by HL7. To be clear the word fast has nothing to do with performance, but more about the ability to implement and integrate data quickly. With the latest regulations around the world in health care, this standard is the established standard for integrating healthcare data and we’ll continue to be on the forefront of this work. If you do any work in health care, you will need to understand FHIR because you will likely run across data formatted to the standard from many different sources.

FHIR is very well documented. In many ways when the standard is properly followed the JSON documents or other supported formats are effectively self-documenting. It is commonly understood that the core FHIR specification handles about 80% of the use cases in healthcare. It is designed to be flexible so that it can support specialized needs within regions or healthcare areas. For example, in the US there is a need to support race and ethnicity. The U.S. Core Implementation Guide provides guidance on the specification enhancements to support this need for U.S. healthcare organizations. You will find similar support for other countries as well as specific implementations for healthcare vendors such as Epic.

Neither the notebook, the presentation, or this blog is expected to be and exhaustive coverage of FHIR. before we move on to some of the other implementation pieces, it is important to understand one key aspect of FHIR is the basic building block called a resource. A resource is the core exchangeable content within the specification. All resources share the following characteristics:

  • A common way to define and represent the resource including data types and patterns
  • A common set of metadata which can be discovered easily
  • A human readable part

For more detailed information on the supported resources and other details around FHIR implementation, you should visit the following website:

Azure Health Services and the FHIR API

I will not be digging into a lot of the health care services information nor the FHIR support within Azure in this post. The important things to understand is that Microsoft has made a concerted effort to support this specification which includes technology and architectures for the extraction of data from various healthcare systems which will then use the FHIR APIs to standardize that extracted data into the FHIR spec typically in JSON files in the data lake. Because of the standardized format, Microsoft is able to supply a set of common schemas that can be used in serverless synapse to create external tables and views to accelerate the implementation and usage of data produced from the APIs. It is from this starting point that we are able to start working with the data in reporting and analytics solutions.

At this point I want to put a plug in for the company I work for. If you're interested in learning how Azure health services and the FHIR specification can be implemented at your company, we have FHIR Quick Start and FHIR Data Blueprint solutions. These solutions have been used by many other customers to achieve high levels of integration in their health care data estate. If you're interested in learning more, please reach out to us at: https://3cloudsolutions.com/get-started/

Working with the data from the FHIR API using JSON in SQL

As noted in the previous section, Azure Health Services comes with setup serverless tables and views to be used with the extracted data. However due to the complexity of FHIR, there are a number of columns within those tables and views which still contain JSON snippets. For example, there is one field for name which has several objects and arrays to support the specification. You cannot simply select the name from the table and use that as you move forward. There are many different fields like this throughout the data. For the rest of this blog and in the notebook, we will work through a number of scenarios to build a view of the patient resource that can be used for simple reporting. This view will contain a few JSON functions from SQL Server and solve simple to complex scenarios in the illustration.

The functions we will be using:

  • ISJSON
  • JSON_VALUE
  • OPENJSON

In addition to these functions, we will also be using the CROSS APPLY operator in SQL to join our data with relational data.

The examples in the notebook are built on the tables resulting from working with the Azure FHIR API. I am unable to provide a sample of the data to use with the set of information in the notebook currently. However, the SQL will work if you have your own FHIR implementation and a Patient resource to work with. rather than rewrite the entire contents of the notebook in the blog post, here is a link to the notebook.

If you plan to implement this in the same way, you will need Azure Data Lake, Azure Synapse serverless, and Azure Data Studio. the notebook can be opened in Azure Data Studio. If you are unfamiliar with working with notebooks inside of Azure Data Studio, you are not alone. Check out this post which discusses how to implement your first notebook in Azure Data Studio.

Building our view and SQL with JSON functions

If you decide not to open the notebook but are curious what the view looks like here is a finished product that we created in the notebook.

SELECT TOP (20) p.resourceType + '/' +  p.id as PatientResourceID
    , p.resourceType as ResourceType
    , p.id as ResourceID 
    , cast(p.[meta.versionId] as int) as VersionID 
    , cast(p.[meta.lastUpdated] as DATETIME2(7)) as LastUpdated 
    , JSON_VALUE(p.[name], '$[0].family') as LastName
    , JSON_VALUE(p.[name], '$[0].given[0]') as FirstName
    , cast(p.active as bit) as IsActive
    , p.gender as Gender 
    , CAST(p.birthDate as date) as BirthDate
    , CASE WHEN p.[maritalStatus.coding] is null THEN NULL
           WHEN  JSON_VALUE(p.[maritalStatus.coding], '$[0].system') = 'http://terminology.hl7.org/CodeSystem/v3-MaritalStatus' 
                    THEN JSON_VALUE(p.[maritalStatus.coding], '$[0].code')
           ELSE NULL
           END as MaritalStatus 
    , CASE WHEN JSON_VALUE(p.[address], '$[0].use') = 'home' THEN JSON_VALUE(p.[address], '$[0].state')
            WHEN JSON_VALUE(p.[address], '$[1].use') = 'home' THEN JSON_VALUE(p.[address], '$[1].state')
            WHEN JSON_VALUE(p.[address], '$[2].use') = 'home' THEN JSON_VALUE(p.[address], '$[2].state')
            WHEN JSON_VALUE(p.[address], '$[3].use') = 'home' THEN JSON_VALUE(p.[address], '$[3].state')
            ELSE NULL
            END as HomeStateOrProvince
    , e.Ethnicity
    , r.Race
FROM fhir.Patient p
INNER JOIN (SELECT id, max([meta.versionId]) as currentVersion FROM fhir.Patient GROUP BY id) cp
    ON p.[meta.versionId] = cp.currentVersion
    AND p.id = cp.id
LEFT JOIN 
    (SELECT p.id
        , CASE WHEN JSON_VALUE(ext.value,'$.extension[0].url') = 'ombCategory'
            THEN
            CASE WHEN JSON_VALUE(ext.value, '$.extension[1].valueString') IS NOT NULL  THEN JSON_VALUE(ext.value, '$.extension[1].valueString')
                    WHEN JSON_VALUE(ext.value, '$.extension[0].valueString') IS NOT    NULL THEN JSON_VALUE(ext.value, '$.extension[0].valueString')
                    ELSE JSON_VALUE(ext.value, '$.extension[0].valueCoding.display')
                    END
            ELSE JSON_VALUE(ext.value, '$.valueCodeableConcept.coding[0].display')
            END AS Ethnicity 
        FROM 
        (
            SELECT fp.id, fp.extension FROM fhir.Patient fp
            INNER JOIN (SELECT id, max([meta.versionId]) as currentVersion FROM fhir.Patient GROUP BY id) cp
                ON fp.[meta.versionId] = cp.currentVersion
                AND fp.id = cp.id
            WHERE ISJSON(fp.extension) =1
        ) p 
        CROSS APPLY 
            OPENJSON(p.extension,'$'
            ) as ext
        WHERE JSON_VALUE(ext.value,'$.url') = 'http://hl7.org/fhir/us/core/StructureDefinition/us-core-ethnicity'
    ) e on e.id = p.id 
LEFT JOIN 
    (SELECT p.id
        , CASE WHEN JSON_VALUE(ext.value,'$.extension[0].url') = 'ombCategory'
            THEN
            CASE WHEN JSON_VALUE(ext.value, '$.extension[3].valueString') IS NOT NULL THEN JSON_VALUE(ext.value, '$.extension[3].valueString')
                    WHEN JSON_VALUE(ext.value, '$.extension[2].valueString') IS NOT NULL THEN JSON_VALUE(ext.value, '$.extension[2].valueString')
                    WHEN JSON_VALUE(ext.value, '$.extension[1].valueString') IS NOT NULL THEN JSON_VALUE(ext.value, '$.extension[1].valueString')
                    WHEN JSON_VALUE(ext.value, '$.extension[0].valueString') IS NOT NULL THEN JSON_VALUE(ext.value, '$.extension[0].valueString')
                    ELSE JSON_VALUE(ext.value, '$.extension[0].valueCoding.display')
                    END
            ELSE JSON_VALUE(ext.value, '$.valueCodeableConcept.coding[0].display')
            END AS Race 
        FROM 
        (
            SELECT fp.id, fp.extension FROM fhir.Patient fp
            INNER JOIN (SELECT id, max([meta.versionId]) as currentVersion FROM fhir.Patient GROUP BY id) cp
                ON fp.[meta.versionId] = cp.currentVersion
                AND fp.id = cp.id
            WHERE ISJSON(fp.extension) =1
        ) p 
        CROSS APPLY 
            OPENJSON(p.extension,'$'
            ) as ext
        WHERE JSON_VALUE(ext.value,'$.url') = 'http://hl7.org/fhir/us/core/StructureDefinition/us-core-race'
    ) as r on r.id = p.id 

Here is a sample of the results from that view:

PatientResourceIDResourceTypeResourceIDVersionIDLastUpdatedLastNameFirstNameIsActiveGenderBirthDateMaritalStatusHomeStateOrProvinceEthnicityRace
Patient/d8af7bfa-5008-4a0f-85d1-0af3448a31ddPatientd8af7bfa-5008-4a0f-85d1-0af3448a31dd22022-05-31 18:07:03.2150000DUCKDONALD1male1965-07-14NULLONNULLNULL
Patient/78cf7725-a0e1-44a4-94d4-055482781afbPatient78cf7725-a0e1-44a4-94d4-055482781afb12022-05-31 18:07:30.7490000GretzkyWayneNULLNULL1990-05-31NULLNULLNULLNULL
Patient/9e909e52-61a1-be50-1878-a12ef8c36346Patient9e909e52-61a1-be50-1878-a12ef8c3634642022-05-31 18:39:58.1780000EVERYMANADAMNULLmale1988-08-18MNULLNon Hispanic or LatinoWhite+Asian
Patient/585f3cc0-c727-4989-9214-a7a7b60a2adePatient585f3cc0-c727-4989-9214-a7a7b60a2ade12022-05-31 13:14:57.0640000DUCKDONALD1male1965-07-15NULLONNULLNULL
Patient/29a819c4-f553-8189-2354-9441b86d37efPatient29a819c4-f553-8189-2354-9441b86d37ef12022-05-18 15:18:40.1560000FORDELAINENULLfemale1992-03-10NULLNULLNULLNULL
Patient/d5fe6802-a680-e762-8f43-9659340b00acPatientd5fe6802-a680-e762-8f43-9659340b00ac32022-05-18 14:39:52.2550000EVERYMANADAMNULLmale1961-06-15SNULLNULLC
Patient/4d661053-a8d0-148c-7023-54508fd04a52Patient4d661053-a8d0-148c-7023-54508fd04a5212022-05-21 13:48:24.9720000EVERYMANsamNULLmale1966-05-07MNULLNot Hispanic or LatinoWhite

Wrapping it up

As you can see, understanding the specification well enough to build a complex SQL statement using JSON functions is required to work within FHIR effectively. Due to the complex nature of the nested JSON, you may not be able to reconcile this in tools such as power BI. Being able to build this out in SQL guarantees that you have provided you will report writers and analysts with a solid result set which can be used with confidence.

Resources summary:

JSON in SQL Server 2016 – The Good, The Bad, and The Ugly

JSONonSQL

Microsoft introduced support for JSON data in SQL Server 2016 and in Azure SQL Database. I was excited to see this functionality coming. As an early user of XML when it was introduced, my expectations were high. Microsoft has done some really good work with JSON support, but I find that is not really as comprehensive as I had hoped.

The Good: Functions to Work with JSON Data

SQL Server now has some built in functions that support working with JSON. These are on par with XML data type methods. Here is the rundown on what they are and what they do.

OPENJSON

OPENJSON allows us to convert JSON to a tabular format. This function supports two output formats: default and explicit. Default returns key value pairs, where as explicit lets us define the context. Here is a sample of the syntax that can be used.

select Restaurant.RestaurantName, Restaurant.City, Restaurant.SeatsJSON, Seats.*
from dbo.Restaurant
cross apply

openjson(Restaurant.SeatsJSON, '$.Seats')
with (SeatNumber int '$."Seat Number"'
 , SeatCode varchar (50) '$.SeatCode'
 , TableNumber int '$.TableNumber' 
) as Seats

FOR JSON

FOR JSON works in reverse. This allows us to convert tabular data into JSON. AUTO automatically formats the results as arrays and nested arrays when joins are used. You are able to use aliases to as object names. PATH allows you to specify the JSON path you want to use for the results.

ISJSON

ISJSON can be used against a string value to determine if it is properly formatted. This can be handy when working with JSON functions or even in a CHECK constraint to make sure the column has properly formatted data.

JSON_VALUE

JSON_VALUE returns a scalar value from a JSON string using JSON path. The key here is that the value returned is scalar so working with arrays can sometimes be an issue if you cannot identify the position in the array.

JSON_VALUE (SeatsJSON, '$.Seats[0].SeatCode')

JSON_QUERY

JSON_QUERY returns a JSON formatted array or object using JSON path.

JSON_QUERY (SeatsJSON, '$.Seats')

JSON_MODIFY

JSON_MODIFY allows us to change the value based on the JSON path specified.

JSON_MODIFY(@RestJSON, '$.Restaurant.ZIP', '55337')

I know that I highlighted the basics around these functions. I plan to follow up on these later. These functions represent the good parts of the JSON implementation in SQL Server.

The Bad: Not a Datatype

My key disappointment with the implementation is the fact it is not a native data type like XML. At first I did not think this would be an issue, but after working with the new functionality, it feels incomplete. We can add a constraint with the ISJSON function to make sure our data is of the right type, but XML is just a data type. The biggest miss around the data type is likely concerning indexes as I talk about next.

The Ugly: Indexes

This is the worst part of the JSON solution. Because it is not a data type, no native indexing is supported. The current recommendation is to create a computed column using the JSON_VALUE function. However, this does not work with arrays, making the indexes of limited value. In a simple set of data, such as seats in a restaurant shown below, you cannot index the seats, only the restaurants.

{
    "Restaurants": [
        {
            "Restaurant": {
                "Restaurant ID": 1,
                "RestaurantName": "Sensational Servings MSP",
                "Seats": [
                    {
                        "SeatCode": "SSMSP-1-1",
                        "TableType": "Bar",
                        "Seat Number": 1
                    },
                    {
                        "SeatCode": "SSMSP-1-2",
                        "TableType": "Bar",
                        "Seat Number": 2
                    }                ]
            }
        },
        {
            "Restaurant": {
                "Restaurant ID": 2,
                "RestaurantName": "Sensational Servings LAS",
                "Seats": [
                    {
                        "SeatCode": "SSLAS-1-101",
                        "TableType": "Bar",
                        "Seat Number": 101
                    },
                    {
                        "SeatCode": "SSLAS-1-102",
                        "TableType": "Bar",
                        "Seat Number": 102
                    }                ]
            }
        }
    ]
}

So if I am looking for seats with the TableType Bar, we would not be able to do that with an index without storing the JSON file differently because each restaurant contains an array of seats. With the attribute as part of the array, we are unable to return every instance of the seats in the index. This will result in a table scan in most cases.

We can add indexes to simple JSON snippets, but complex JSON will result in table scans due to the nature of the functions. You should test your solutions to determine if the index is sufficient to support your query pattern.

The Moral of the Story

The JSON functionality is similar to XML data type methods. The lack of real index support will likely cause issues with the functionality at scale. Use the functions to help make JSON more usable in your environment, but be aware of its limitations as well.

VS Live 2016 – Las Vegas Follow Up

LVSPK18

I spoke at Visual Studio Live in Vegas on two topics. While the presentations have been uploaded to the site and were available for attendees, the code was not distributed yet as an oversight on my part. In this post, I will do a quick summary of the sessions and make sample code available. I will be writing more on these topics throughout the year and will tag VS Live in the notices.

JSON - VSLive

JSON & SQL Server Finally Together

JSON is now part of SQL Server 2016. SQL Server now includes functions to generate and shred JSON. Here are the basics:

  • OPENJSON: Used to convert JSON data into a tabular format
  • FOR JSON: Used to create JSON from tabular data
  • ISJSON: Determines if the data in question is JSON
  • JSON_VALUE: Returns scalar values from JSON data
  • JSON_QUERY: Returns JSON formatted arrays or objects
  • JSON_MODIFY: Used to modify JSON data and properties

With all of this support, JSON is not a native data type in SQL Server like XML.

You can download supporting files and code here.

Hive - VS Live

Using Hive and Hive ODBC with HDInsight and Power BI

During this session I went through the process of setting up HDInsight and loading data into the cluster. Once created, Hive tables were created and queries created that were used with Power BI to analyze the results.

You can find the details here.