PASSMN June 2020 – Data Classification with SQL Server and Azure

I presented at the virtual Minnesota SQL Server User Group meeting on June 16, 2020. The topic was data classification with SQL Server 2019 and Azure SQL Database.

Data Classification Basics

Data classification in both SQL Server and Azure allow you to discover and label data based on information type and sensitivity. Information type is a way to describe the content of the data at high level. This includes types such as Address, Name, Networking, and Credit Card. By tagging your columns with types you will be able to easily see the types of data stored in your tables. You can also label the sensitivity. This includes labels such as Confidential and Confidential-GPDR.

Using SQL Server 2019 and SSMS 18.4+

For on premises implementations, you can use SQL Server Management Studio. I would recommend that you use SSMS 18.4 or greater. This has the most capability. SQL Server 2019 includes the sys.sensitivity_classifications system catalog view so you can query to see what field have been labeled.

To get started, open up SSMS. Right click the database and choose Tasks > Data Discovery and Classification > Classify Data. This will allow you to

Finding the Data Discovery and Classification Options in SSMS

view the Data Classification window in SQL Server. You will get a list of recommendations and the ability to add custom classifications in your SQL Server database.

The Data Classification view in SSMS

Once you have classified some of your data, you are able to view a report that shows the coverage of the classification work you have done.

Data Classification Report in SSMS

Adding Data Classification in Azure SQL Database

Azure SQL Database supports similar functionality for discovering and classifying data. The primary differences are (1) it requires Advanced Data Security which costs $15/month per server and (2) audit logging support is built in.

You can find this in the Azure portal with your SQL Database.

Advanced Data Security in Azure SQL Database

As you can see above, you get a visual here initially. Click the Data Discovery & Classification panel to open a similar classification window that we see in SSMS. This will allow you to discover and classify your data.

The key difference is turning on auditing and logging information about people querying the classified data. In the Security section in your SQL Database view in the Azure portal, choose Auditing. You can now add auditing to your server or database. (Click here for information about setting up Auditing.) I chose to use Log Analytics which is in preview. Log Analytics has a dashboard which shows activity in your database with this data.

Log Analytics Dashboard which Shows Access to Sensitive Data

You can click into the dashboard to dig into details. You can also use the Log Analytics query features to build your own queries to further analyze the data. The details contain who accessed the information, their IP address, and what was accessed. You can build more reports from that information to support more sophisticated auditing.

Final Thoughts

I think that there is still work to be done on SQL Server to better support auditing. Azure is ahead of the game in this area. More importantly, Azure logging is a platform level solution. You should be able to integrate your logging from the applications to the database in Azure.

You do have the ability to update the policy in SQL Server with a JSON file. I recommend you export the file and modify it. In Azure, you can update the information policy in the Security Center. Updating this policy allows you to discover data or information that you want to classify based on rules you set up. This should be part of your data governance plan.

One other follow up from the meeting. The question was raised about Visual Studio support in database projects. The answer is “sort of”. First, you need to make sure your project is targeting SQL Server 2019 or Azure SQL Database. Once that is set, you can use the following code to add the classification manually or you can apply it to your database and do a scheme compare to bring it in.

ADD SENSITIVITY CLASSIFICATION TO
    [SalesLT].[Customer].[FirstName]
    WITH (LABEL = 'Confidential - GDPR', LABEL_ID = 'fe62dcde-72c0-475c-b1af-fb8de4c8fc7e', INFORMATION_TYPE = 'Name', INFORMATION_TYPE_ID = '57845286-7598-22f5-9659-15b24aeb125e', RANK = MEDIUM);

You will need to know the GUIDs for the labels and types in your solution to do this manually. However, once this is done, you can see the information in the Properties window for the field as well.

Data Classification Properties in Visual Studio

The key thing to be aware of is that the properties are read only. You have to use the code to change them or do the changes in the database and use Schema Compare to bring them in.

Thanks again to those of you who joined us at the meeting. Here is the slide deck from that meeting. I look forward to sharing more with all of you later.

How to Get Developers Using Azure

You know the talk out there, everyone is moving to the cloud and people are looking at Azure to get there. But many have concerns with going to the cloud or the unknown things around Azure. That’s why we created this blog series. Today, I want to tell you why I think it’s critical to help your development teams leverage Azure, which will help you as an organization.

First, there are 3 things Azure brings to you as a business and for your teams:

1.  You can learn new things quickly.

You can go into Azure, turn on a few features and work with them to try them out. On the contrary, to learn to work and interact with Hadoop infrastructure, for example, you’d have to take the time to set up a bunch of clusters. If your team is looking to take advantage of something new like containers or NoSQL, they can try these out in Azure without investing in infrastructure.

2.  It’s good prep for when you want to go to the cloud.

Bottom line – it’s different in the cloud. Whether you plan to move to the cloud soon or sometime in the future, you’re going to need to understand how subscriptions and components work and gain knowledge in dealing with interactions and how to work in the cloud.  Your team having a better understanding is key to your ability to successfully and effectively make the move.

3.  Breeding creativity within your IT organization and development teams.

Creativity can lead to new business opportunities and improvements to your business. Giving your team the opportunity to jump in and work with some tools, can give them the ability to come up with some new ideas.

Check out Azure Data Week coming in October 2018.

Managing the $

The risk, as always, is cost. No one wants to spend money. But here are 3 things you can do to help manage your risk and cost:

  • Dev Test environments are available inside of Azure. This allows you to automatically set things to shut down. This great feature can help you manage costs.
  •  If your company uses MSDN or developer licenses, you can get a free amount of spend in Azure (from $25 – $200 per month).
  • Free trials. It’s easy to spin up multiple free trials, so you can test out something and reduce the risk of spend. This is a great opportunity to do a POC for things that have high cost risk, like Azure Data Warehouse or HD Insight clusters. Just be sure to use them effectively as they do expire after a certain point.

Check out the current free offering and services from Microsoft here: https://azure.microsoft.com/en-us/free/.

Advantages of Azure Resource Manager (ARM) Deployment Model

Are you just starting out with Azure and wonder: What is Azure Resource Manager (ARM) Deployment Model? Or what’s the difference between ARM and Classic? Whether you’re just starting out with Azure or have been using it for some time, but you’re still using Classic, I’m here to give you 6 reasons why you should be using ARM.

1.  ARM lets you deploy, manage and monitor all the Azure resources for an application, or a solution, as a group. This can include almost every resource within Azure.

2. As you do ARM template deployment, you will be able to deploy as a unit, but not just once. You’ll have the ability to deploy multiple times through the lifecycle of your application, as well as manage that deployment process.

3.  Access control. Access to the resources can be managed as a unit. As you begin to think about separation of duties, compliance or specific rules around that application, you can apply access controls and those rules apply to the entire solution.

4.  Resources can be tagged. This is a great feature when you have a lot of components that you support using Azure (as we do). You’ll need a tag to note that these things are logically grouped together. We tag resources for the reason we are using them or the purpose of billing or just to identify them within the list of resources we are using.

5.  Templates. Why do templates matter? This is one of the most robust features of all the resources. JSON templates can be created to configure your entire pattern. So, if you’re doing a standard roll out, this allows you to create a template, parametrize it and you can deploy your resources as a group.

6.  You can define the dependencies to make sure it’s deployed in the correct order. Keeping this all straight can be one of the hardest things to do. Definitely look at ARM to help with this.

Here’s a bonus 7th reason to use ARM. More than likely, Classic will be going away. No predictions on when or how this will look, but we do know almost every new thing in Azure will be going to ARM first. So, if you’re new to Azure and want to take advantage of the newest technology, use ARM.

Check out Azure Data Week coming in October 2018 – http://www.AzureDataWeek.com.

Cosmos DB for the Data Professional

Cosmos DB LogoCosmos DB is one of the fastest growing Azure services in 2018. As its popularity grows, data professionals are faced with a changing reality in the world of data. Data is no longer contained in relational databases as general rule. We saw the start of this with Hadoop data storage, but no one ever referred to Hadoop as a database. Sure Hive and other Hadoop based technologies made the data look like a database, but we (data professionals) were able to keep our distance. What’s changed?

The Cloud, Data, and Databases

As cloud reaches more and more businesses, traditional data stores are being reconsidered. We now have data stored in Azure – Azure Data Lake, Azure Storage, Azure Database Services (SQL, PostgreSQL, MySQL), Azure Data Warehouse, and now Cosmos DB. Cosmos DB is the globalized version of Azure Document DB (more about that later). If we are to grow our skillset and careers to a cloud data professional, we need to know more about other ways the data is stored and used. I want to summarize some things that we need to be aware of about Cosmos DB. If your business uses it or plans to and you are a data pro, you will need to know this.

Introducing Cosmos DB

Azure Cosmos DB is Microsoft’s globally distributed, multi-model database.

Cosmos DB Overview 201804

Source: https://docs.microsoft.com/en-us/azure/cosmos-db/introduction 

I will break down key components of Cosmos DB with a data professional in mind. There are a lot of aspects of Cosmos DB that make it very cool, but you will want to understand this when you get the call to fix the database.

Multi-model Database Service

Currently Cosmos DB supports four database models. This is like having for different database servers in one. I liken it to having SQL Server Database Engine and SQL Server Analysis Services using the same underlying engine and it only “looks different.” Cosmos DB refers to these as APIs. The API is chosen when the database is created. This optimizes the portal and database for use with that API. Other APIs can be used to query the data, but it is not optimal. Here are the four models supported and the APIs that support them.

Cosmos DB models

  • Key Value Pair: This is exactly as it sounds. The API is implemented with the Azure Table Storage APIs.
  • Wide Column or Column Family: This stores data similar to relational, but there is no row consistency (each row can look different). Cosmos DB uses the Cassandra API to support this model. (For more information on Cassandra click here.)
  • Documents: This model is based on JSON document storage. Cosmos DB currently supports two APIs for this model: SQL which is the Document DB API and Mongo DB. These are the most common models used in Cosmos DB today. Document DB is the “parent” to Cosmos DB which was rebranded.
  • Graph: Graph databases are used to map relationships in data and were made popular with Facebook for instance. Microsoft uses the open source Gremlin API to support the Graph Database Model.

None of these databases are traditional row/column stores. They are all variations of NoSQL databases.

Turnkey Global Distribution

This is a key attribute for Cosmos DB. Cosmos DB can be easily distributed around the world. Click the data center you want to replicate to and Cosmos DB takes care of the rest. Cosmos DB uses a single write node and multiple read nodes. However, because Cosmos DB was built with global distribution in mind, you can easily and safely move the write node as well. This allows you to “chase the sun” and keep write operations happening “locally”.

Data Consistency

Data consistency is a primary concern of any data professional. The following tables compare Cosmos DB Consistency Levels with SQL Server Isolation Levels. These are not a one for one match, but demonstrate the different concerns between the systems.

 

Cosmos DB

SQL Server

Consistency Level Guarantees Isolation Level Dirty Read Non- repeatable Read Phantom
Strong Reads are guaranteed to return the most recent version of an item. Serializable No No No
Bounded Staleness Consistent Prefix or read order. Reads lag behind writes by prefixes (K versions) or time (t) interval. Snapshot No No No
Session Consistent Prefix. Monotonic reads, monotonic writes, read-your-writes, write-follows-reads. Repeatable Read No No Yes
Consistent Prefix Updates returned are some prefix of all the updates, with no gaps. Reads are not read out of order. Read Committed No Yes Yes
Eventual Out of order reads. Read Uncommitted Yes Yes Yes

As you can see, there are some similarities. These options are important to understand. In the Cosmos DB, the more consistent you need the data, the higher the latency in the distributed data. As a result, most Cosmos DB solutions usually start with Session Consistency as this gives a good, consistent user experience while reducing latency in the read replicas.

Throughput

I am not going to dig into this much. But you need to understand that Request Units (RU) are used to guarantee throughput in Cosmos DB. As a baseline, Microsoft recommends thinking that a 1 KB JSON file will require 1 RU. The capacity is reserved for each second. You will pay for what you reserve, not what you use. If you exceed capacity in a second your request will be throttled. RUs are provisioned by region and can vary by region as a result. But they are not shared between regions. This will require you to understand usage patterns in each region you have a replica.

Scaling and Partitions

Within Cosmos DB, partitions are used to distribute your data for optimal read and write operations. It is recommended to create a granular key with highly distinct values. The partitions are managed for you. Cosmos DB will split or merge partitions to keep the data properly distributed. Keep in mind your key needs to support distributed writes and distributed reads.

Indexing

By default, everything is indexed. It is possible to use index policies to influence the index operations. Index policies are modified for storage, write performance, and read or query performance. You need to understand your data very well to make these adjustments. You can include or exclude documents or paths, configure the index type, and configure the index update mode.  You do not have the same level of flexibility in indexes found in traditional relational database solutions.

Security

Cosmos DB is an Azure data storage solution which means that the data at rest is encrypted by default and data is encrypted in transit. If you need RBAC, Azure Active Directory (AAD) is supported in Cosmos DB.

SLAs

I think that the SLAs Microsoft provides with Cosmos DB are a key differentiator for them. Here is the short summary of guarantees Microsoft provides:

  • Latency: 99.99% of P99 Latency Attainment (based on hours over the guarantee)
    • Reads under 10 ms
    • Writes under 15 ms
  • Availability
    • All up – 99.99% by month
    • Read – 99.999% by month
  • Throughput – 99.99% based on reserved RUs (number of failures to meet reserved amount)
  • Consistency – 99.99% based on setting

These are financially backed SLAs from Microsoft. Imagine you providing these SLAs for your databases. This is very impressive.

Wrap Up

For more information, check out Microsoft’s online documentation on Cosmos DB.

I presented this material at the April 2018 PASS MN User Group Meeting. The presentation can be found here.

Power BI Data Security – Sharing in Email

 

Power BI Security LogoMicrosoft has expanded sharing by allowing users to share Power BI content via email. In a previous post, I discussed how sharing content within your organization should be handled carefully. However, the new process opens up the opportunity to share outside your organization by sending an email. In particular, you can now share with users who have a personal email address such as @outlook.com and @gmail.com. Let’s dig into the implications of this capability.

Sharing Using Email

First, you need to be aware that this functionality is as simple as the original methods of sharing. You click the Share button on your report or dashboard to open the Share dialog.

The Share report dialog in this case accepts email addresses which is not a significant change. However, as shown below, you can add personal emails and emails outside your organization. You be warned, but users do not always pay attention to this or understand the implications.

Share report - outside

You will also notice that consumers need to still have a Power BI Pro account assigned to them or you need to be using Power BI Premium for this to work.

Following the Email Process

When you share, you usually will need to send an email to the recipient. Here is the email content.

Report Share EmailTime to click the report link. This opens a series of dialogs which determine how much you have access. It is important to note that this is all made possible with Azure B2B. More about that in a moment. Let’s trace the story through. The link opens the following page.

Report Share Email - Welcome Link

As you can see, the next step is to log in. I am using an outlook.com account so it prompts me to authenticate. Once I have authenticated, I get the following notice.

Report Share Email - Opened Report

My account does not have Power BI Pro, but now I can try it for free for 60 days and get access to the data while I am on the trial. I clicked both options, because I can. The Upgrade account option would require me to pay for Pro. However, Try Pro for free works and I was able to access the report fully. I have successfully shared my corporate content with a personal user.

Preventing Sharing Outside Your Organization

While in some cases, you need to share outside your organization, we will assume here you need to disable this functionality. There are a few places you can make this happen.

Power BI Admin Portal

First, in Power BI go to the Admin portal and disable sharing outside your organization. If you have followed my previous advice, this will already be disabled.

 

PBI Admin Portal - Disable Sharing

As you can see, this will disable content for users who have been shared with previously. If you need to share, you can specify groups that have that permission.

Office 365 Admin Center

Next, this can be turned off in the Office 365 Admin Center in the Security and privacy area.

PBI O365 Admin Center - Disable Sharing

This prevents the ability to add guest users to the organization. This will disable this capability across Office 365. There is no option to allow some users this access. Once this is disabled, sharing outside the organization which requires a guest user will not be possible.

Azure Active Directory

Finally, you can shut this down from Azure Active Directory. Guest users are ultimately managed through Azure Active Directory and this is the best place to turn this off corporately if you do not need this functionality.

PBI AAD - Disable Sharing

In AAD you have four options.

  1. Guest users permissions are limited. This limits guest user capabilities with regard to the directory. Yes is the default and recommended.
  2. Admins and users in the guest inviter role can invite. This would be a typical option we can understand. However, it is important to note that Admin users in Power BI workspaces will have the ability to create guest users and share reports externally with this permission on.
  3. Members can invite. Just like it sounds. Any member of a group can invite guest users in.
  4. Guests can invite. This allows guests to invite other guests. Seems dangerous to me.

As you can see from my tenant, the options are all on which is the default. Be sure to understand what capability you want to use and set it appropriately within your tenant.

Tracking Sharing

In the Office 365 logging, you can see who and what has been shared. This log covers internal and external shares and should be monitored for auditing and compliance purposes.

Azure B2B

Azure B2B and the sharing capabilities in Power BI go hand in hand. This allows organizations to share content in a controlled fashion to consumers outside their organization. While this is required for certain scenarios, be mindful of who has the capability to share, and track sharing to make sure the data is being handled as you require.

Final Thoughts and References

You need to remember that sharing is at the heart of Power BI and you need to manage how and who can share. If you need to do more extensive sharing, by all means, use these features. For those, who need to lock it down tighter, you can follow the steps above to prevent sharing until you have a process and pattern. Power BI continues to improve and grow and as that happens we can expect more security options to support the new functionality. Enjoy Power BI, it is a great tool and will only continue to get better.

References

Using Azure AD B2B with Power BI

Auditing Power BI

Share your Power BI content with anyone by email