I’ve seen notebooks used in Azure Data Studio on multiple occasions. I really like the concept of notebooks, having done some work within Azure Databricks notebooks, but not extensively. As I go into the process that I went through, it’s important to understand that I am not a data scientist and have not done extensive development or spent a lot of time in Python or Jupyter notebooks. Furthermore, my interest in the notebooks was elevated when I realized I wanted to continue presenting while working through my current ALS diagnosis. I have limited use of my hands and arms so highlighting and executing code, especially in front of a crowd, was going to be problematic. (If you want to learn more about my condition and tools I’m using to maintain my ability to work, please check out this series of articles on our blog.)
Let’s start with the core problem that I’m trying to solve today. I will be presenting a session on elastic queries in Azure SQL database. Most of the code is ready to go since I have done this presentation a few times. As I was working through testing my demo, I found executing code by highlighting and pushing “run” in either Data Studio or in SQL Server Management Studio was difficult because I struggled to control highlighting the code. I was also looking for better ways to automate the process, but more about that later. I watched a couple of demos on using notebooks and found some of the notebooks that have been created by Microsoft. I realized I could put together my entire demo package to share with the attendees and build the demo so that I could execute it a step at a time without highlighting. Now that you have the background of what I was trying to accomplish, let’s look at the process I went through getting this done.
How in the world do you work with notebooks in Azure Data Studio?
One of the interesting things about working with notebooks, is that if you want to work with notebooks, it’s likely that you already have and you prefer to use them. This means that the instructions for how to create, organize, and use notebooks within Azure Data Studio is a bit lacking. For example, it was not entirely clear to me that one part of the process is creating a folder to store your notebooks with your markdown files and other content. So, let’s go through the process of creating your first notebook step by step with explanations about what’s happening.
The organization of notebooks and files in Azure Data Studio
Part of my struggle in understanding what was happening is each time I tried to create a notebook it asked me for locations and files. I thought it should know where they should go. So, as a newbie with notebooks and organization with Azure Data Studio, I created a notebook and a Jupyter book so I could see how the files are organized. Then I could go back and create the Jupyter book correctly from the beginning. While I may not get all of the terminology correct in this process, this is my discovery as I move forward through the process.
Once I started working with the notebook process in Azure Data Studio, I realized there were multiple components involved:
- Jupyter book
- Markdown file
While I am sure there are simple ways to create what we would like to do, I’m coming at this entirely from Azure Data Studio as a data developer not a data scientist. Each time I tried to create my first Jupyter book, I didn’t understand what its purpose was in the beginning. When you create a Jupyter book, it looks like you’re creating a folder. That folder will also contain several helper files to organize your notebooks, markdown files, and sections. Before we leave the structure and organization section here, I want to clarify that the book is the parent folder, and the section is a sub folder within the book. Markdown files and notebooks are files created that are organized for particular purposes. The markdown file is effectively a document that allows you to create a nicely formatted informational component for your notebook. The notebook files are actual Jupyter notebook files which are split into sections for code and text.
Here is the high level organization of the Jupyter book we are going to create:
- Jupyter book: Azure SQL database elasticity
- Markdown file: README
- Section: Setting up the demo
- Markdown file: Set up instructions
- Notebook: Prepping the demo
- Section: Elastic query demo
- Markdown file: Elastic query demo instructions
- Notebook: Elastic query demo
- Section: Elastic job demo
- Markdown file: Elastic job demo instructions
- Notebook: Elastic job demo
For the purposes of this blog post, we will walk through the process of creating the original Jupyter book and the elastic query demo section. That section has a good mix of code and text to illustrate the power and capabilities of notebooks.
Creating your first notebook in Azure Data Studio
Let’s begin creating our first notebook in Azure Data Studio. Before we dive into this process too deeply, I want to be clear that we are going to create a Jupyter book to add our notebooks to. This is not required as you can create a new notebook from the file menu or with the shortcut as noted on the screen in Azure Data Studio. What confused me about this initially is that you cannot create a simple notebook from the notebooks section in Azure Data Studio. When you create your notebook, you can save it as a file in the location of your choosing, but it will not show up in the notebook section. Once you create a notebook, if you are not using a Jupyter book to host it in, you can reopen it just by choosing Open File from the menu. While this may make sense to others, it was not entirely intuitive to me in the beginning. I had to do some mucking around to figure out that process.
So, we will start our process by creating a Jupyter book to host all our notebooks and markdown files. This Jupyter book will also be readily displayed in the notebook section on Azure Data Studio. Using the … to get to the More Actions menu, choose Create Jupyter Book.
In the dialogue give your new Jupyter book a name and specify the location you want to store it in. I have not used the optional content folder for this exercise and will recommend that you do not either.
If you go to the folder location you created your Jupyter book in, you will see that it also created three files in the folder named the same as your Jupyter book:
In the notebook section of Azure Data Studio, you should see your Jupyter book with a README markdown file in it. For now, we will leave the README file as an introduction to what is in your notebook. (Be aware, that you can remove the file by deleting it, but you will need to update the TOC file to reflect the changes you made. If you do not update the TOC file, you may see missing file error messages in Azure Data Studio.)
I will not take time in this post to review what is possible in a markdown file. The key here is you can update the README file that was created with headers and formatting to provide instructions on how to use the various contents of your Jupyter book. If you double click within the README file, it will open up the readme.md file in a new tab in Azure Data Studio. This has a line number and will allow you to update and add content.
The following code gives you an example of some markdown syntax:
# Welcome to the Jupyter book on Azure SQL Database elasticity This book contains 3 sections * The first section contains instructions on how to set up the demo * The second section contains the demo for elastic queries * The third section contains a demo for elastic jobs
This will result in the following look and feel in your README file
Adding a section
The next thing we will do is add a section where we will host the executable demo code. Right click on your notebook and choose Add Section. We will add the title as Elastic query.
Adding the notebook
Up to this point, we have been building the framework to support our first notebook. While all these steps are not required, this is the most complete approach. Right click on your section and choose New Notebook. This will create a Jupyter notebook in the subfolder of your section.
Once you create the notebook, it will open a tab in Azure Data Studio with the notebook. You will notice that it has something called Kernel. The kernel allows you to set the default language used for the notebook. For the work that we are doing we will be using the SQL kernel. This will allow us to execute SQL code against a database. In the Attach to dropdown, you will see databases that you can use to execute code. The Cell dropdown allows you to add cells which can contain code or text.
Now let us get down to the business of creating a notebook with executable code. Before we add executable code, let us add a text cell as an introduction to the code. You can do this by clicking the cell dropdown and choosing text. Once you add the text cell you will notice there is a formatting bar which ironically is missing in the markdown files editor. This means it is easier to create formatted text in a cell in a notebook rather than in the markdown file itself. Keep this in mind as you create your notebooks and add content to your Jupyter book. These cells are easier to work with at times than the full file. This is particularly true if you are not knowledgeable on formatting markdown.
At this point, let us add a quick introduction to what we are about to do in the in the following code cells.
Next, we will add a code cell. From the dropdown menu for cell, choose Code Cell. This will add a code cell to your notebook which uses the language selected in your kernel. There is also a play button which allows you to execute the code.
I am going to add the code that is required to clean up the tables for the demo. The resulting code cell will look like the following:
As a last step to understanding how notebooks and code work in the environment, we can execute the code by pushing the play button in the code cell. This will return the result of that execution as shown below:
Congratulations, you have created your first notebook with executable code against a SQL Server database! You can continue to add more text cells and code cells as needed. One of the reasons I like this pattern is that it allows me to execute the code without having to highlight it while doing demos. Each cell can be run independently. You will also notice there is a Run All button if you choose to run all the scripts at the same time that you have in your notebook. This could be valuable if you have a set of maintenance operations or related items you want to run and you have collected in a notebook for use.
Another key thing to remember is that notebooks are shareable. Because the connection is outside of the notebook, once you share the notebook, they will have to connect to an environment that allows them to execute the same code. You can add your notebooks to GitHub or similar source control to manage change and allow you to share common resources easily without just distributing SQL files.
Before we wrap up
I feel I would be remiss if I did not also demonstrate what happens when you get data results in a notebook. In my case I have a database I can connect to which has WideWorldImporters loaded into it. I am going to select the top 1000 rows from the DimSupplier table. Once I run the code cell, I get the rows affected, the execution time, and a table with results as shown here:
As you can see in the results window, you have several export options and a chart option that you can use to further visualize or work with the data that you have retrieved. I would encourage you to explore these options as it depends on the type of data you are working with whether they work well for you or not. For example, supplier data does not chart very well, whereas if I had used fact data there may have been some interesting charting options. A notebook could be a straightforward way to demonstrate some simple reporting for a technically savvy audience.
Wrapping it up
There are many more functions that I did not cover around notebooks, and I assume that Microsoft will continue to make improvements to the overall capabilities here. I look forward to using notebooks more as a terrific way to share code and run demos. I hope you find this as valuable as well.
For those of you who are not sure about using notebooks, this is an effective way to build your skills while not trying to learn a new language if you are familiar with SQL. My first exposure was using Python in a Databricks environment. That was much to learn while also trying to understand how notebooks functioned. As the data environment continues to expand and require new skill sets, understanding how to use and leverage notebooks on a regular basis is a good skill to have. Microsoft has done us a great favor by using standard Jupyter notebooks which are used in data science, Databricks, and other areas of data practice.
If you are following my work enablement series, you know one of the things that I am passionate about is simplifying how I work, in order to stay working while continuing to lose functionality in my arms. Notebooks help with this by allowing me to execute code without highlighting it when doing demos. Because highlighting code and executing it in a tool like SQL Server Management Studio requires multiple touches on the keyboard and mouse, I struggle to do it efficiently. The ability to organize my demo around code cells and then have a self-documenting notebook to pass along to attendees is a huge win for me. I hope this helps others who struggle in the same way. And I hope this was helpful to those who have not used or seen notebooks in their current work environment but may in the future.
I will be creating and sharing a completed notebook for the demos related to my presentation on elastic capabilities with Azure SQL. Look for that presentation follow up from the Memphis SQL Saturday in October 2022. I will publish a follow up blog post with a link to the completed notebook used with that demo.