Power BI and Data Security – Compliance and Encryption

Power BI Security Logo

As Power BI becomes more prevalent in data analytics and visualization within the enterprise, data security becomes a significant concern. Power BI at its best is deployed to the Power BI service hosted on Microsoft’s Azure platform. Every enterprise should understand the level of security available with their data. Companies who have made the leap to cloud technologies such as AWS, Microsoft Azure, Salesforce, and Microsoft Office 365 should have an understanding of the data compliance and security capabilities of those solutions. However, companies who want to take advantage of Power BI but have just started their cloud journey or are cloud adverse need to know the nuances of Power BI and security.

I have been involved with data and cloud security questions a lot of the past few years. With Power BI’s rise in significance, I have had to answer more specific questions about the service. In order to provide proper guidance and not have a reference for myself, I am putting together a short series of posts on various data security items in Power BI. The topics included enterprise gateway, privacy levels, data classification, and compliance. The focus of these articles are related to using the Power BI service as this is the cloud implementation of Power BI. The desktop has setting which impact deployment of assets, but is not the focus of this series.

The Power BI service is updated frequently. These articles were created based on the Power BI implementation in early April 2017. You may find improvements and changes that impact your experience that are based on newer releases. Feel free to add comments to highlight changes.

Power BI Compliance

Let’s start with the highest level of data security and that is compliance. I previously published a post about Power BI’s inclusion in the Microsoft Trust Center. Power BI became compliant nearly a year ago in April 2016. This was a huge step forward for being able to use Power BI in the enterprise.

PowerBI Compliance 2016

You can find the latest Power BI compliance here. This same site has additional security information I will refer to throughout the posts including high level information about data security and privacy.

Power BI and Data Encryption

One of the key areas of concern is related to data when it is added or passed through the service. In this section, we will review the how Power BI handles data at rest and data in transit. The content below is summarized from the Power BI Security Whitepaper (published September 2016).

Power BI Data at Rest

Data at rest is always encrypted in Azure. Depending on the type of data, Power BI uses encrypted storage in Azure Blob Storage and Azure SQL Database. Refer to the security whitepaper for details on how the encryption keys are handled.

The table below gives a summary of how data at rest is handled based on the data source or how the data is delivered to the visuals.

Data Source Metadata Data
Live Connection (Analysis Services) Nothing stored except database name encrypted in Azure SQL DB Nothing Stored
Direct Query (SQL Server, Oracle, etc.) Encrypted in Azure Blob Storage Nothing Stored
Pushed or streamed data Encrypted in Azure Blob Storage Depending on version, encrypted in either Azure Blob Storage or Azure SQL Database
Data loaded into model (data may be refreshable or nonrefreshable) Encrypted in Azure Blob Storage Encrypted in Azure Blob Storage

Power BI Data in Transit

Simply put, data is always encrypted in transit. The following is a direct quote from the security white paper:

All data requested and transmitted by Power BI is encrypted in transit using HTTPS to connect from the data source to the Power BI service. A secure connection is established with the data provider, and only once that connection is established will data traverse the network.

Power BI Data “in use”

As data moves to the dashboards and reports to be visualized, some data elements are cached to improve performance. Data is often cached for even Direct Query connections to improve dashboard performance. Cached data is encrypted and stored in an Azure SQL Database. Pinned visuals in the Power BI dashboards such as Excel and SSRS visualizations are also encrypted and cached in an Azure SQL Database.

References

Power BI Added to Microsoft Trust Center

Microsoft Trust Center – Power BI

Webinar: A walkthrough of Power BI Security and administration considerations

Power BI Security White Paper

T-SQL Tuesday #87 – Fixing Old Problems with Shiny New Toys: STRING_SPLIT

tsql2sday-300x300Thanks to Matt Gordon (@atsqlspeed) for hosting this T-SQL Tuesday.

Splitting Strings in SQL

A problem that has plagued SQL developers through the years is splitting strings. Many techniques have been used as more capabilities were added to SQL Server including XML datatypes, recursive CTEs and even CLR. I have used XML datatype methods to solve the problem most often. So, without further ado…

T-SQL Function: STRING_SPLIT

I have previously highlighted this function in a webinar with Pragmatic Works as a Hidden Gem in SQL Server 2016. It was not announced with great fanfare, but once discovered, solves a very common problem.

Syntax

STRING_SPLIT(string, delimiter)

The STRING_SPLIT function will return a single column result set. The column name is “value”. The datatype will be NVARCHAR for strings that are NCHAR or NVARCHAR. VARCHAR is used for strings that are CHAR or VARCHAR types.

Example

DECLARE @csvString AS VARCHAR(100)
SET @csvString = 'Monday, Tuesday, Wednesday, Thursday, Friday'
SELECT value AS WorkDayOfTheWeek 
FROM STRING_SPLIT (@csvString, ',');

The initial example returns the follow results:#tsql2sday

value
Monday
 Tuesday
 Wednesday
 Thursday
 Friday

As you can see in the example, the results returned a leading space which was in the original string. The following example trims leading and trailing spaces.

DECLARE @csvString AS VARCHAR(100)
SET @csvString = 'Monday, Tuesday, Wednesday, Thursday, Friday'
SELECT LTRIM(RTRIM(value)) AS WorkDayOfTheWeek 
FROM STRING_SPLIT (@csvString, ',');

The cleaned example returns the follow results:

value
Monday
Tuesday
Wednesday
Thursday
Friday

Thanks again Matt for this opportunity to share an underrated, but really useful shiny new tool in SQL Server 2016.

North Texas SQL Server User Group – September 2016 Follow

ntssugv6_1

Thanks for attending my presentation on window functions in TSQL during the September 15, 2016. I hope you learned something you can take back and use in your projects or at your work. You will find an link to the session and code I used below. If you have any questions about the session post them in comments and I will try to get you the answers.

Slides, Code, and Follow Up Posts

The presentation can be found here: A Window into Your Data

The code was put into a Word document that you can get here: TSQL Window Function Code

This session is also backed by an existing blog series I have written.

T-SQL Window Functions – Part 1- The OVER() Clause

T-SQL Window Functions – Part 2- Ranking Functions

T-SQL Window Functions – Part 3: Aggregate Functions

T-SQL Window Functions – Part 4- Analytic Functions

Microsoft Resources:

sqlsat563_webCome On Out to SQL Saturday 563 on September 24, 2016 for more great content

 

Power BI Is Finally in the Azure Trust Center

With the most recent announcement of Power BI’s inclusion in the Azure Trust Center, it is a good time to review where we are today with Power BI security and compliance as it relates to various customer needs. I do a lot of work with financial, energy, and medical customers. These groups represent a large amount of compliance and regulation needs. I wanted to understand where we are today and this announcement is significant.

What’s in the Announcement?

One the primary roadblocks to accepting the Power BI service has been the lack of compliance and concerns around security. Microsoft has been making a number of enterprise level improvement to the Power BI service and desktop. Power BI now has the following compliance certifications:

PowerBI Compliance 2016

This announcement shows Microsoft’s continued commitment to security and compliance in its cloud based products. While Power BI is not yet to the level of Office 365, some key compliance areas are now covered.

I think the most significant compliance certification is HIPAA/HITECH which removes barriers related for the medical industry. As hospitals, insurance companies, and providers scramble to meet reporting demands from their customers and the government, Power BI gives them a flexible reporting and visualization platform to meet those needs. It will empower self-service in the organizations and departmental or enterprise collaboration with data. The HIPAA/HITECH certification will allow them to use the platform with more confidence and security.

Beyond medical, more institutions will be able to rely on Power BI in a manner that is compliant and safe. As Microsoft continues this journey with Power BI and its other Azure based offerings, customers will be able to react more quickly to the changing business and regulatory environments with confidence in the security and management of their data.

The Reality – You Are as Secure as You Choose to Be

Even with this significant move by Microsoft, you are still responsible for implementing a secure, compliant solution. Microsoft is merely providing tools that are secure and will comply with regulations if implemented correctly. The key to a secure environment will always be you. The data you use and analyze with Power BI is ultimately your responsibility.

I encourage you to review the following resources in addition to the ones above as you determine your security and compliance within the Power BI product:

 

JSON in SQL Server 2016 – The Good, The Bad, and The Ugly

JSONonSQL

Microsoft introduced support for JSON data in SQL Server 2016 and in Azure SQL Database. I was excited to see this functionality coming. As an early user of XML when it was introduced, my expectations were high. Microsoft has done some really good work with JSON support, but I find that is not really as comprehensive as I had hoped.

The Good: Functions to Work with JSON Data

SQL Server now has some built in functions that support working with JSON. These are on par with XML data type methods. Here is the rundown on what they are and what they do.

OPENJSON

OPENJSON allows us to convert JSON to a tabular format. This function supports two output formats: default and explicit. Default returns key value pairs, where as explicit lets us define the context. Here is a sample of the syntax that can be used.

select Restaurant.RestaurantName, Restaurant.City, Restaurant.SeatsJSON, Seats.*
from dbo.Restaurant
cross apply

openjson(Restaurant.SeatsJSON, '$.Seats')
with (SeatNumber int '$."Seat Number"'
 , SeatCode varchar (50) '$.SeatCode'
 , TableNumber int '$.TableNumber' 
) as Seats

FOR JSON

FOR JSON works in reverse. This allows us to convert tabular data into JSON. AUTO automatically formats the results as arrays and nested arrays when joins are used. You are able to use aliases to as object names. PATH allows you to specify the JSON path you want to use for the results.

ISJSON

ISJSON can be used against a string value to determine if it is properly formatted. This can be handy when working with JSON functions or even in a CHECK constraint to make sure the column has properly formatted data.

JSON_VALUE

JSON_VALUE returns a scalar value from a JSON string using JSON path. The key here is that the value returned is scalar so working with arrays can sometimes be an issue if you cannot identify the position in the array.

JSON_VALUE (SeatsJSON, '$.Seats[0].SeatCode')

JSON_QUERY

JSON_QUERY returns a JSON formatted array or object using JSON path.

JSON_QUERY (SeatsJSON, '$.Seats')

JSON_MODIFY

JSON_MODIFY allows us to change the value based on the JSON path specified.

JSON_MODIFY(@RestJSON, '$.Restaurant.ZIP', '55337')

I know that I highlighted the basics around these functions. I plan to follow up on these later. These functions represent the good parts of the JSON implementation in SQL Server.

The Bad: Not a Datatype

My key disappointment with the implementation is the fact it is not a native data type like XML. At first I did not think this would be an issue, but after working with the new functionality, it feels incomplete. We can add a constraint with the ISJSON function to make sure our data is of the right type, but XML is just a data type. The biggest miss around the data type is likely concerning indexes as I talk about next.

The Ugly: Indexes

This is the worst part of the JSON solution. Because it is not a data type, no native indexing is supported. The current recommendation is to create a computed column using the JSON_VALUE function. However, this does not work with arrays, making the indexes of limited value. In a simple set of data, such as seats in a restaurant shown below, you cannot index the seats, only the restaurants.

{
    "Restaurants": [
        {
            "Restaurant": {
                "Restaurant ID": 1,
                "RestaurantName": "Sensational Servings MSP",
                "Seats": [
                    {
                        "SeatCode": "SSMSP-1-1",
                        "TableType": "Bar",
                        "Seat Number": 1
                    },
                    {
                        "SeatCode": "SSMSP-1-2",
                        "TableType": "Bar",
                        "Seat Number": 2
                    }                ]
            }
        },
        {
            "Restaurant": {
                "Restaurant ID": 2,
                "RestaurantName": "Sensational Servings LAS",
                "Seats": [
                    {
                        "SeatCode": "SSLAS-1-101",
                        "TableType": "Bar",
                        "Seat Number": 101
                    },
                    {
                        "SeatCode": "SSLAS-1-102",
                        "TableType": "Bar",
                        "Seat Number": 102
                    }                ]
            }
        }
    ]
}

So if I am looking for seats with the TableType Bar, we would not be able to do that with an index without storing the JSON file differently because each restaurant contains an array of seats. With the attribute as part of the array, we are unable to return every instance of the seats in the index. This will result in a table scan in most cases.

We can add indexes to simple JSON snippets, but complex JSON will result in table scans due to the nature of the functions. You should test your solutions to determine if the index is sufficient to support your query pattern.

The Moral of the Story

The JSON functionality is similar to XML data type methods. The lack of real index support will likely cause issues with the functionality at scale. Use the functions to help make JSON more usable in your environment, but be aware of its limitations as well.

Thoughts on data, business analytics, and the SQL Server community

%d bloggers like this: