PASSMN June 2020 – Data Classification with SQL Server and Azure

I presented at the virtual Minnesota SQL Server User Group meeting on June 16, 2020. The topic was data classification with SQL Server 2019 and Azure SQL Database.

Data Classification Basics

Data classification in both SQL Server and Azure allow you to discover and label data based on information type and sensitivity. Information type is a way to describe the content of the data at high level. This includes types such as Address, Name, Networking, and Credit Card. By tagging your columns with types you will be able to easily see the types of data stored in your tables. You can also label the sensitivity. This includes labels such as Confidential and Confidential-GPDR.

Using SQL Server 2019 and SSMS 18.4+

For on premises implementations, you can use SQL Server Management Studio. I would recommend that you use SSMS 18.4 or greater. This has the most capability. SQL Server 2019 includes the sys.sensitivity_classifications system catalog view so you can query to see what field have been labeled.

To get started, open up SSMS. Right click the database and choose Tasks > Data Discovery and Classification > Classify Data. This will allow you to

Finding the Data Discovery and Classification Options in SSMS

view the Data Classification window in SQL Server. You will get a list of recommendations and the ability to add custom classifications in your SQL Server database.

The Data Classification view in SSMS

Once you have classified some of your data, you are able to view a report that shows the coverage of the classification work you have done.

Data Classification Report in SSMS

Adding Data Classification in Azure SQL Database

Azure SQL Database supports similar functionality for discovering and classifying data. The primary differences are (1) it requires Advanced Data Security which costs $15/month per server and (2) audit logging support is built in.

You can find this in the Azure portal with your SQL Database.

Advanced Data Security in Azure SQL Database

As you can see above, you get a visual here initially. Click the Data Discovery & Classification panel to open a similar classification window that we see in SSMS. This will allow you to discover and classify your data.

The key difference is turning on auditing and logging information about people querying the classified data. In the Security section in your SQL Database view in the Azure portal, choose Auditing. You can now add auditing to your server or database. (Click here for information about setting up Auditing.) I chose to use Log Analytics which is in preview. Log Analytics has a dashboard which shows activity in your database with this data.

Log Analytics Dashboard which Shows Access to Sensitive Data

You can click into the dashboard to dig into details. You can also use the Log Analytics query features to build your own queries to further analyze the data. The details contain who accessed the information, their IP address, and what was accessed. You can build more reports from that information to support more sophisticated auditing.

Final Thoughts

I think that there is still work to be done on SQL Server to better support auditing. Azure is ahead of the game in this area. More importantly, Azure logging is a platform level solution. You should be able to integrate your logging from the applications to the database in Azure.

You do have the ability to update the policy in SQL Server with a JSON file. I recommend you export the file and modify it. In Azure, you can update the information policy in the Security Center. Updating this policy allows you to discover data or information that you want to classify based on rules you set up. This should be part of your data governance plan.

One other follow up from the meeting. The question was raised about Visual Studio support in database projects. The answer is “sort of”. First, you need to make sure your project is targeting SQL Server 2019 or Azure SQL Database. Once that is set, you can use the following code to add the classification manually or you can apply it to your database and do a scheme compare to bring it in.

ADD SENSITIVITY CLASSIFICATION TO
    [SalesLT].[Customer].[FirstName]
    WITH (LABEL = 'Confidential - GDPR', LABEL_ID = 'fe62dcde-72c0-475c-b1af-fb8de4c8fc7e', INFORMATION_TYPE = 'Name', INFORMATION_TYPE_ID = '57845286-7598-22f5-9659-15b24aeb125e', RANK = MEDIUM);

You will need to know the GUIDs for the labels and types in your solution to do this manually. However, once this is done, you can see the information in the Properties window for the field as well.

Data Classification Properties in Visual Studio

The key thing to be aware of is that the properties are read only. You have to use the code to change them or do the changes in the database and use Schema Compare to bring them in.

Thanks again to those of you who joined us at the meeting. Here is the slide deck from that meeting. I look forward to sharing more with all of you later.

Setting Up Tabular Models on Windows Azure

In my last post, I discussed how to set up Oracle in Windows Azure. During a customer call, there were questions about using SQL Server Analysis Services (SSAS) tabular models with Azure. This post will walk through setting up an Azure VM and deploy a tabular model to that VM.

If you do not have an Windows Azure account you can use a trial account with your Microsoft or Live account. Check out http://www.windowsazure.com for details on how to “try it free.”

Setting Up the VM in Azure

From the Management Portal on your Azure account, select Virtual Machines from the left then Add at the bottom. On the next screen, choose to create your VM from the gallery. You should see the Choose an Image option as seen below. As you can see, I have the SQL Server 2012 SP1 Enterprise image highlighted. You will need to use the Enterprise license as Tabular does not run on Standard. In this case, the Data Warehousing image is NOT the Business Intelligence Edition of SQL Server.

image

You can also choose to create a “blank” VM and load up SQL Server on your own. I chose to use the image to speed up the process – which it did substantially.

After selecting the image, the next few steps guide you through setting up the VM. For the most part, the defaults will work fine. Be aware that once this is turned on, you will be charged for it running. It is up to you to make sure you understand the costs, even if you are using the free trial.

During the setup steps, you will create the VM and its related cloud service. Remember that the account is your admin account for the VM and you will need those credentials to Remote Desktop into the VM. On the last, setup page is the Endpoints. Leave the defaults, we will add an endpoint for our tabular model later.

At this point, it will take a few minutes to set up your new VM. Once it is setup, open a Remote Desktop session into it. If you look at services or at the SQL Configuration console you will notice that everything except a tabular instance have been set up for you. As a result, I would not recommend using this gallery image for a production deployment. You should look at creating your own template if you want a more locked down and refined setup.

Setting Up the Tabular Instance in Azure

As noted before, the tabular instance is not set up. The installation media is on the server, so you can run that to install your tabular instance. I won’t walk through the install process, but this was great to find because that meant I did not have to copy media to my VM.

Making the Tabular Instance Available

This section covers the tedious tasks required to make your tabular instance available for querying outside of the VM. There are three basic steps to getting your instance “online”: setting the port number in SSAS, updating the firewall, and adding endpoints. I will walk you through the steps I used to get this done followed by some references that helped me get here.

Setting the Port Number in SSAS

By default, SSAS, both multidimensional and tabular instances, use dynamic ports. In order, to allow connections through the firewall and endpoints, the port number needs to be fixed. I used guidance from TechNet and did the following steps to set the port.

    1. Opened the Task Manager to get the PID for MSOLAP$<<instance name>>.
    2. Ran netstat –ao –p TCP to get a list of ports used by current processes. Once I had identified my port number, I also noted the server IP address which is required in the next step.
    3. I chose to confirm that I had the correct port by connecting to the instance using the IP address and port number.
    4. Next, we have to go old school and modify the msmdsrv.ini file. The typical install path for this file is C:\Program Files\Microsoft SQL Server\<<instance name>>\OLAP\Config\msmdsrv.ini.
    5. Open the file in notepad and find the <Port>0</Port> tag.
    6. Change the port number to the port number that was identified above. (Technically we could have used any open port number. I chose to do this because I was sure the port number was available.)
    7. Save the changes and restart the service.
    8. Once again confirm you can connect to the server with SSMS using the IP address and port number.

Now you have set up the SSAS instance to use a fixed port number.

Updating the Firewall

Now that we have a port number, we can create a firewall rule. We access the firewall rules from the Server Manger. In the Windows Firewall console, we will be creating a new Inbound Rule..

image

  1. The rule type is Port
  2. We will apply the rule to TCP and specify the port we defined above.
  3. On the action tab, we selected Allow the Connection. (If you are planning to use this in a production environment, you will need to verify your connection requirements.)
  4. Next, we clear any connection we don’t want to apply.
  5. Finally, I named the rule with a descriptive name.

Now we have the firewall rule in place to allow external connections for the tabular instance.

Adding Endpoints

The final step to making the instance available is to add the endpoints in Azure. In the WIndows Azure portal, we need to go the VMs page again, select our VM, and open the ENDPOINTS tab. From here we create a new endpoint.

  1. We are creating a standalone endpoint.
  2. Next, we select the TCP protocol and add the port number to both the private and public port textboxes.
  3. Finally, we apply the changes.

We are now ready to test connectivity.

References

Setting up VM Endpoints

Configuring Windows Firewall

Configuring Windows Firewall with SSAS

Connecting to the Tabular Instance in Azure

So to verify this works, I wanted to connect to the model with SSMS on my desktop. However, it uses Windows authentication and I am not on the same domain. (My IT staff is likely still laughing about me joining my VM to our domain.)

Thankfully, Greg Galloway (blog) reminded me of how to set up runas to use different credentials to connect to SSAS. Here is the syntax I used to connect to the tabular instance on Azure using a command window:

runas /netonly /user:<<VM name>>\<<username>> “C:\Program Files (x86)\Microsoft SQL Server\110\Tools\Binn\ManagementStudio\Ssms.exe”

This allowed me to open SSMS using those credentials. Next, I used the VM + port to connect. You will see that the Windows credentials in the dialog are not what you entered on the command line. This is expected and the credentials you entered in the command line will be passed through to the VM.

Deploying a Tabular Model to Azure

I attempted three different ways to deploy my model to the Azure VM. Two worked, one did not.

Deploying from Visual Studio on My Desktop

My first attempt was to deploy from Visual Studio on my desktop. I set the deployment properties to point to the Azure instance using the same credentials I had used with SSMS. I also set up a command line execution to use runas like with SSMS.

It appeared I could connect, but I continually got a permissions error which is shown below. After much frustration, I gave up on it and moved to the next options.

image

Using XMLA to Deploy

This is the most straightforward way to deploy an SSAS DB on a different server. I used SSMS to generate the Create Database XMLA statement. Because I had not deployed it locally, I needed to modify the XMLA statement to remove the user name and guid from the database name and database ID. (AdvWorksTab1_SteveH_<<Some GUID>>)

In a bit of irony, I can use the SSMS connection from my desktop using the runas to deploy the database to the VM.

The reality is that this is easy and acceptable way to deploy SSAS databases to production environments, in Azure or in your datacenter.

Deploying from Visual Studio on the VM

The final method I used was deploying Visual Studio onto the VM and deploying from there. I used VisualStudio.com (TFS online) to transfer the source code from my desktop to the VM. I had to install the TFS client on the VM, but SSDT with BI tools is already there.

  1. Installed the VS 2010 TFS Client: http://www.microsoft.com/en-us/download/details.aspx?id=329
  2. Then installed Visual Studio SP1  http://social.msdn.microsoft.com/Forums/vstudio/en-US/4e4851dc-eb29-4081-9484-d38a6efa07ee/unable-to-connect-to-tfs-online?forum=TFService
  3. Finally installed VS2010 Team Foundation Server Compatibility GDR (KB2662296) http://www.microsoft.com/en-us/download/details.aspx?id=29082

Now it will connect to TFS Online. I got the latest from my project and pointed the project to my tabular instance.

Be sure to check the impersonation settings.

Next, I deployed the project to the local tabular instance on the VM and it worked great. This might make sense for development, but I would not use this method in a production environment.

Some Closing Thoughts

I was amazed at how simple it was to create the VM and set up tabular in Azure. Knowing what I know now, I would be able to set up a usable instance fairly quickly and deploy a database using XMLA without much effort. That was very nice.

Doesn’t work with Office 365

I started this project to determine the connectivity capability with Office 365. Well, that does not work in my current configuration. I was able to create a workbook on my desktop using my Azure tabular model and Excel. It works just as you would expect. However, when I deployed the workbook to Office 365, data refresh always failed.

image

My next thought was to use a data gateway, but those only work with SQL Server Data Engine and Oracle, not SSAS. This is still a significant issue with making Power BI able to fully take advantage of existing BI assets at a company.

Using Azure Active Directory

My next step would be to use Azure Active Directory services to try to get Windows Authentication to work throughout. But that is for a later project and post.

X on XMLA: ii. Basic Structure of XMLA

image_thumb

The second post in the XMLA series focuses on the basic structure of XMLA.  At its heart, XMLA is XML.  As a result it is usually not a comfortable work environment for most database developers or administrators.  It is no different for Analysis Services developers or administrators.  While a discussion on XML formatting is out of scope for this conversation, I will break down the XML structures used in XMLA so you can see how the basic construction of the language works.

XML Namespace

In XML, a namespace is required to help the tools understand how to interpret the content.  The namespace used is urn:schemas-microsoft-com:xml-analysis. Usage of the namespace is handled at the method level.

XMLA Methods

There are two methods supported in the XMLA standard – Discover and Execute.

Discover

The Discover method will return XML-formatted results.  The Discover statement is similar in function to a SELECT statement in SQL.  Discover can be used to query the data within the multidimensional database as well as server information.

Within the Discover method, the following properties are supported which are implemented as child elements:  RequestType, Properties, and Restrictions.  The RequestType property determines the schema rowset to be returned.  The Properties property is a collection of options that can be implemented which are determined by the RequestType.  Finally, the Restrictions property can be used to restrict the results returned by the Discover Statement.

Here is an example of getting the list of dimensions from the Adventure Works sample cube (available on CodePlex.com).

image

Execute

The Execute method executes statements against the multidimensional database.  The Execute statement can be used to process, deploy and execute MDX statements.  It too returns its results in an XML format, including error messages.

The Execute method supports two properties which are implemented as child elements: Command and Properties.  The Command property contains the actual command that is being implemented.  Properties, like for Discover, enables Command specific properties which can be used to direct the command.

This method is implemented two ways in Management Studio.  In the first example, the syntax is explicitly called.

image

In this example, you can see that Management Studio will wrap execute statements with the Execute and Command elements making it simpler for you to work with Execute methods.

image

SOAP

What I do not discuss in this blog is the SOAP implementation.  As most of us use Management Studio or similar tools which don’t look at the SOAP envelope, I did not cover it.  However, if you need more details on SOAP or on the XMLA standards refer to http://xmla.org for more details.