Memphis SQL Saturday 2022 & a Notebook

Back in person again! It is awesome to be able to get back into the SQL community and see fellow data professionals. A huge shout out to the Memphis data community leaders in particular Zach Golden and Rob Demotsis who put on a great event for their first one out of the pandemic. I was also able to get together with fellow 3Clouders – Dawn Clement and Kristyna Hughes.

Steve, Dawn, Kristyna

A new but different opportunity

For me this was a very special event. Not only is it the first event I’ve been able to do in person since COVID started, but it is also the first event that I have presented at since being diagnosed with ALS. There are times I think I talk about this too much, but it is front and center of who I am now. I want to encourage others who have similar disabilities to remain active as they work in their new reality.

So how did this change for me? Well, having presented on SQL many times through the years, I typically use a method of highlighting code in management studio and executing it. That however would not work in this case. I moved all my code over to a notebook in Azure Data Studio. This allowed me to execute the code a step at a time with a simple button push. To read more about the experience of creating a notebook, check out my previous blog post here.

The other key thing that changed for me was having my wife, Sheila, join me on the platform to push the buttons that I needed for the presentation and the demo. This was definitely a new experience for her and me. She did a great job following my cues and sometimes a lack thereof. She was able to get us through the demos and leveraging the clever new notebook I used. This is the new normal for us and I look forward to presenting for as long as I am able.

Sheila and I co-presenting

Azure SQL Elasticity

This was the topic that I spoke on. We covered elastic queries, elastic jobs, and elastic transactions. As promised to the attendees and those of you who are reading this or are following up on my post about notebooks, I have published the notebook on the Data on Wheels GitHub which you can find here.

After you have downloaded the folders from GitHub, Open Azure Data Studio and browse to the Notebooks section. Click the Open Jupyter Book button has shown below.

This will open a File Explorer dialog. Choose azure SQL database elasticity folder and then click Select Jupyter Book.

This will open the Jupyter book which contains the markdown files with information and the notebooks you need to set up and run the demos. Enjoy!

Thanks to those of you who are able to attend. I hope you enjoyed the event as much as I did!

Moving Synapse Databases Between Subscriptions – Practical Guidance

One of the tasks, we often do with migration projects is move large volumes of data. Depending on how you are configured, you may need to do the migration project in a development or UAT environment as opposed to a production environment. This is particularly true if you have policies in place on your production subscription that don’t allow the individuals doing the migration and validation tasks to work in that subscription.

Just Copy It… Nope

So you can copy Azure SQL Database using the Azure Portal, PowerShell, Azure CLI, and T-SQL. However, this functionality is limited to Azure SQL Database and does not work for Azure Synapse databases (a.k.a. SQL Pools). Early in 2021, the ability to use the copy functionality to copy databases between subscriptions is also supported but requires security work to make sure the permissions in the database servers and networking allow that to happen.

You can find out more about copying Azure SQL Database in this Microsoft Doc.

Just Restore It… Nope

You can restore to the current server or another server on the same subscription. However, you are unable to restore across subscription boundaries at this time. If you need to move to another server in the current subscription, the process is straightforward, you can use the restore process in Synapse to restore to the current server using a different name. You can also restore to a different server in either the current or different resource group in the same subscription. The restore technique is used in our move process, so details on how to restore a Synapse database will be in the next section.

Let’s Move a Synapse Database

The process to move a Synapse database to another subscription requires some planning and pre-work. The first thing you need to do is create a new SQL Server in the same subscription you have your current Synapse environment. Because you can’t simply create servers, I would recommend that you add an Azure SQL Database to the server as a placeholder. An S0 should be sufficient to keep this server in place for what we are doing. DO NOT ADD anything to this server that will not be migrated. This is a temporary holding place for migrating databases. (This also works for other SQL Databases, but other options may work as well but are not the focus of this post.)

Now that you have the migration server created, the next step is to create a restore point. While this is not required because you can use the automatic restore points, creating a user-defined restore point is recommended. A user-defined restore point, allows you to choose the status of the database you want to migrate, rather than relying on the automatic points and trying to make sure you pick the right time (in UTC of course).

Once you have set the restore point, in the database you want to migrate, select Restore to open the panel to restore your database.

On the restore page, you have a number of options to complete.

  • Restore point type: Choose User-defined restore points
  • SQL pool name: This is not a big deal. The name is the database name used during the migration process and is not the final name used in the target server. Make sure it is something you know.
  • Restore point: Select the restore point you created for this purpose.
  • Server: Choose the migration server you created as the target.
  • Performance level: This one is more interesting. I typically choose a smaller performance level for this restore. Keep in mind that Azure needs to allocate resources to support the restore. Because this is not a final deployment, smaller may go faster. However, NO SLAs exist for this process. That means your mileage will vary. We have seen restores happen in 30 minutes one day and over 5 hours the next. It will be very dependent on the data center and how busy it is. This time variation must be accounted for in your planning.

The next step is to move the server using the Move operation on the server page. You have the option to move to another resource group or another subscription. In our case, we will choose another subscription. IMPORTANT: You will need Contributor permissions in the target subscription in order to move the server to that subscription.

After you have moved the server to the target subscription, you need to set a restore point for that database on the migration server. Then you can restore that database to the target server. It is very important that you use the naming convention and performance levels that you need for this restore as it is the final step in the process. Once again the restore process has no Microsoft SLA and as a result may take longer than planned. You need to have contingencies in place if you are working in a deployment window or have time restrictions.

Finally, you need to clean up the migration server. I would recommend either scaling down or pausing the Synapse database to give you a backup for a while if needed. Once the database is validated on the target server, you can remove the Synapse database (removes storage costs). I would recommend keeping this server as your migration server to use in the future. You can use this process to create copies of databases for development and UAT or similar needs from production instances.

Other Thoughts and Considerations

Here are my final thoughts on this process. First, the fact that no SLA on the restore process is provided by Microsoft has created issues for us in some cases. We have had to extend deployment windows during production deployments on occasion. My recommendation is to plan for the worst case and finish early if all comes together on time.

This process works! You can use it with other SQL assets and you can use it in multiple directions. Keep the migration server around so you can support other processes. If you clean most of it out, the cost of maintaining it is the S0 SQL Database.

One final thought, this is Azure. Thus, this guidance could change tomorrow. We have been using this for about 12 months when this was written. I hope this helps some of you move these databases to support your business and development needs.

Azure SQL Database Elasticity – Presentation Notes

This blog covers the content and points to the code used to create the demos in my Azure SQL Database Elasticity presentations. As of today, I have presented this at the Minnesota SQL Server User Group (PASSMN) in September 2020 and as a webinar for 3Cloud in October 2020.

Elastic Queries

Elastic queries allow developers to interact with data from multiple databases supported on the Azure SQL database platform including Synapse. Elastic queries are often referred to as Polybase which is currently implemented in SQL Server 2019 and Azure Synapse. The key difference is that elastic queries only allow you to interact with other Azure SQL Databases but not Hadoop or other database implementations (e.g. Teradata or Oracle). Part of the confusion comes from the fact that the implementation looks very similar. Both toolsets use external tables in SQL Server to interact with the connected data sources. However, Polybase requires additional components to run whereas elastic queries are ready to go without additional setup.

Be aware elastic queries are still in preview. Also, elastic queries are included in the cost of Azure SQL Database in standard and premium tiers.

Elastic Query Strategies

Elastic queries support three key concepts and will influence how you implement the feature.

  1. Vertical partitioning. This concept uses complete tables in separate databases. It could be a shared date table or dimensions in a data warehouse solution. Vertical partitioning is a method to scale out data solutions. This is one method to use Azure SQL database for larger data solutions.
  2. Horizontal partitioning or sharding. Whereas vertical partitioning keeps tables together, horizontal partitioning shards or spreads the data from a single table across multiple Azure SQL Databases. This is the most complex type of partitioning as it requires a shard map. This is typically implemented with .NET or Java applications.
  3. Data virtualization. This concept is a mix of the partitioning solutions to achieve the goal of virtualizing the data. The idea with data virtualization is that we can use a single Azure SQL Database to interact with data from multiple databases. While this concept is limited due to the limit to use Azure SQL Databases, it is a concept to look for more improvements as the product matures even more.

Elastic Query Demo

The demo used in the presentations is configured as shown here:

Three S1 Azure SQL Databases on the same Azure SQL Server. I used ADF (Azure Data Factory) to move Fact.Purchase to WideWorldDW_2 and the three related dimensions (dimDate, dimStockItem, dimSupplier) to WideWorldDW_3. I then used WideWorldDW_3 to implement the external tables to work with the data. The WideWorldImportersDW-Standard was used as the original restore of the sample database. It is the source of the data but is not used in the demos.

One note on the demo. I did not include the ADF jobs. Use the Copy activity to move the tables to the target databases. You can find more information here.

The demo code to set up the environment can be found here.

Elastic Jobs

Elastic jobs is the alternative to SQL Server Agent Jobs in Azure SQL Database. While Agent is included in Azure SQL Managed Instance, the rest of the platform needed an option to create jobs. Elastic jobs solves that issue. Currently this is also in preview and is also included with Azure SQL Database. The only additional cost is that a dedicated job database is required to support elastic jobs.

The best comparison is still with SQL Server Agent. Elastic jobs are structured with jobs which have job steps. The only limitation at the moment is that job steps must be T-SQL. Jobs can be created in the Azure portal, with PowerShell, with REST, or with T-SQL.

Elastic Transactions

One of the key pieces that was originally missing from the Azure SQL Database rollout was cross database transactions that were supported in SQL Server with MSDTC. Elastic transactions add this functionality to Azure SQL Database and is built into the platform. This functionality is application driven and currently supported in the latest .NET libraries. Overall, this will allow you to support transactions across 100 databases or fewer. While there is no limit, Microsoft currently recommends only using this to support distributed transactions over 100 or less databases due to potential performance issues.

There are a few limitations to be aware of:

  • Only supports Azure SQL Databases
  • Only supports .NET transactions
  • Does not support T-SQL Distributed transactions
  • Does not support WCF transactions

Wrap Up

Microsoft continues to improve the functionality in Azure SQL Database. These elastic features are part of that process. While I typically do not have many uses for distributed transactions, we have actively implemented elastic queries and elastic jobs for customers and look to use them more in the future.

Azure SQL Elasticity References

Hopefully you too will be able to use the elastic functionality as you continue to embrace the Azure data platform.

PASSMN June 2020 – Data Classification with SQL Server and Azure

I presented at the virtual Minnesota SQL Server User Group meeting on June 16, 2020. The topic was data classification with SQL Server 2019 and Azure SQL Database.

Data Classification Basics

Data classification in both SQL Server and Azure allow you to discover and label data based on information type and sensitivity. Information type is a way to describe the content of the data at high level. This includes types such as Address, Name, Networking, and Credit Card. By tagging your columns with types you will be able to easily see the types of data stored in your tables. You can also label the sensitivity. This includes labels such as Confidential and Confidential-GPDR.

Using SQL Server 2019 and SSMS 18.4+

For on premises implementations, you can use SQL Server Management Studio. I would recommend that you use SSMS 18.4 or greater. This has the most capability. SQL Server 2019 includes the sys.sensitivity_classifications system catalog view so you can query to see what field have been labeled.

To get started, open up SSMS. Right click the database and choose Tasks > Data Discovery and Classification > Classify Data. This will allow you to

Finding the Data Discovery and Classification Options in SSMS

view the Data Classification window in SQL Server. You will get a list of recommendations and the ability to add custom classifications in your SQL Server database.

The Data Classification view in SSMS

Once you have classified some of your data, you are able to view a report that shows the coverage of the classification work you have done.

Data Classification Report in SSMS

Adding Data Classification in Azure SQL Database

Azure SQL Database supports similar functionality for discovering and classifying data. The primary differences are (1) it requires Advanced Data Security which costs $15/month per server and (2) audit logging support is built in.

You can find this in the Azure portal with your SQL Database.

Advanced Data Security in Azure SQL Database

As you can see above, you get a visual here initially. Click the Data Discovery & Classification panel to open a similar classification window that we see in SSMS. This will allow you to discover and classify your data.

The key difference is turning on auditing and logging information about people querying the classified data. In the Security section in your SQL Database view in the Azure portal, choose Auditing. You can now add auditing to your server or database. (Click here for information about setting up Auditing.) I chose to use Log Analytics which is in preview. Log Analytics has a dashboard which shows activity in your database with this data.

Log Analytics Dashboard which Shows Access to Sensitive Data

You can click into the dashboard to dig into details. You can also use the Log Analytics query features to build your own queries to further analyze the data. The details contain who accessed the information, their IP address, and what was accessed. You can build more reports from that information to support more sophisticated auditing.

Final Thoughts

I think that there is still work to be done on SQL Server to better support auditing. Azure is ahead of the game in this area. More importantly, Azure logging is a platform level solution. You should be able to integrate your logging from the applications to the database in Azure.

You do have the ability to update the policy in SQL Server with a JSON file. I recommend you export the file and modify it. In Azure, you can update the information policy in the Security Center. Updating this policy allows you to discover data or information that you want to classify based on rules you set up. This should be part of your data governance plan.

One other follow up from the meeting. The question was raised about Visual Studio support in database projects. The answer is “sort of”. First, you need to make sure your project is targeting SQL Server 2019 or Azure SQL Database. Once that is set, you can use the following code to add the classification manually or you can apply it to your database and do a scheme compare to bring it in.

ADD SENSITIVITY CLASSIFICATION TO
    [SalesLT].[Customer].[FirstName]
    WITH (LABEL = 'Confidential - GDPR', LABEL_ID = 'fe62dcde-72c0-475c-b1af-fb8de4c8fc7e', INFORMATION_TYPE = 'Name', INFORMATION_TYPE_ID = '57845286-7598-22f5-9659-15b24aeb125e', RANK = MEDIUM);

You will need to know the GUIDs for the labels and types in your solution to do this manually. However, once this is done, you can see the information in the Properties window for the field as well.

Data Classification Properties in Visual Studio

The key thing to be aware of is that the properties are read only. You have to use the code to change them or do the changes in the database and use Schema Compare to bring them in.

Thanks again to those of you who joined us at the meeting. Here is the slide deck from that meeting. I look forward to sharing more with all of you later.

Azure Data Relational Services

Today I’d like to talk about the Azure Relational Data Services Platform. This is an important foundational component for many things that are being built on Azure Platform as a Service related to databases.

One of the key PaaS offerings when Microsoft started with Azure was Azure SQL Database. Moving forward, changes were made to this and Azure SQL DW was released. Recently, Microsoft released a preview of the Azure SQL Database Managed Instance option. This is significant as it is a v-core plus storage option and intended to have parity with the on premises version of SQL Server, plus is a key step to separating compute and storage for Azure SQL Databases as well.

See you at Azure Data Week in a few days!

This is important since it allows Microsoft to standardize their relational database support pattern for other databases as well. This has existed for Azure DW for some time and was also improved in Gen 2. Check out more about this in some previous posts in this series.

Azure’s Relational Database platform supports Azure DW’s MPP platform, Azure SQL Database or SQL Server as PaaS, Azure Database for MySQL and PostgreSQL. So, open source databases are supported on the same relational data services platform. Azure Database for MariaDB is coming by the end of 2018.

You may be thinking, why is all this important and what does a common platform include?

  • First, Azure storage services as a foundation for all databases and all the data on the Azure platform. All data stored here, as well as Azure Databases, whether open source or SQL, are encrypted at rest.
  • Manages high availability of a solution by keeping free copies of data available for the platform at all times. So high availability built in and encryption at rest—secure and available.
  • Azure compute is the VMs supporting the compute needs of the databases. This is where you pick the cores that you want to provide scale up function. However, you’re not managing VMs, you’re managing capacity. Microsoft has taken on the task of understanding what you need from a capacity standpoint, like how do you want to scale up or down or how many v-cores do you want to set aside.
  • A key component of many things in Azure is that we can scale compute separate from storage. The database services platform sits on top of Azure storage and compute, so its strength is that the core of the solution lives in those 2 platforms. It allows support of MPP, open source and SQL databases with PaaS.
  • Databases services is where the next tier happens (or all the cool stuff). On top of the foundation, Microsoft adds a set of common components that are used across all these databases.
  • It’s a trusted platform with things like backup and restore, security, audit and isolation all managed in this service. This allows you to trust the platform and build databases with confidence in the security.
  • It’s flexible, enabling scalability and resource management within the platform. This includes features like scaling up or down on demand and adding storage as you need, giving flexibility to the platform. This is hard to do if you build this for yourself or use an IaaS solution.
  • It’s intelligent. We see big benefits in the fact that it provides monitoring, automated tuning and advisors to the platform. These are built in to make your databases better, so you can rely on good performance and know what is happening in your database when you need to.
  • Think of the third tier (after storage/compute and database services) as each unique database platform and the features each brings to your application. Whether you’re using an open source product that’s using MySQL or a SQL Server, their feature sets come forward in PaaS.

Another advantage to mention is by supporting standard SQL and managed instances, and MySQL and PostgreSQL Community Edition, it makes moving to the cloud so much easier. This open opportunities for you to migrate in clean fashion using all the capabilities of a system you’re familiar with.