Building, Deploying, Sharing a Remote Jupyter Book in Azure Data Studio

When working with Azure Data Studio and its support of Jupyter books, you will find there is an option for remote Jupyter books. As shown in the image below, you can open that Jupyter book and follow through the dialogue for a couple of Microsoft books that are readily available.

top portion of the dialog to add a remote Jupyter book
Dialog for adding a remote Jupyter book

So let’s start at the beginning now let you know what the end game is. What if you want to create and share a Jupyter book full of helpful hints for the world to see? How would you go about doing that? the first question you may be asking is what is a Jupyter book? As it turns out in Azure Data Studio a Jupyter book is a collection of folders and files managed by yaml files.

If you’ve been a SQL Server developer for a very long time, this may be meaningless to you. I know it was for me.

Check out this blog post to see how I created my first notebook and Jupyter book in Azure Data Studio. Once you have created your own content, even if it’s just a sample to try, come back to this post and we will walk through how to share your notebook as a remote Jupyter book.

Get it into GitHub

To share your Jupyter book you will need to get your Jupyter book into GitHub. Use File >> Explorer from the menu in Azure Data Studio to open your repo directory that is stored locally. You can either save your notebooks to this location or copy and paste them using File Explorer. I’ll be writing more details around using GitHub with Azure Data Studio in a future blog post.

Now that you have the content in your repo, you can now commit the code to GitHub. At this point if your repo is not public, you will need to make it public to complete the process. My recommendation is if you have a repo you’d use for a lot of your internal work, you should create a new repo that is public and copy the content there.

Create a zipped file

Because what you’re working with is a folder structure and a set of files, you will not build this in a traditional code build fashion. That begs the question how do you create a release? You create a release of your Jupyter book by zipping up the folder and naming your zip file a very specific way. Before we get into the naming process, remember that Azure Data Studio supports both Mac and Linux as well as Windows. Because of this if you want your remote Jupyter book to be available to Mac and Linux users, you will need to create a tar.GZ file as well.

Your file name needs to contain all the relevant properties required in a remote Jupyter book release. This includes the following attributes separated by hyphens:

  • Jupyter book name
  • Release number
  • Language

In the example I am using, the file name looks like this:

AzureSQLDatabaseElasticity-0.0-EN.zip

Now that you have the file ready, we can create the release.

Creating a GitHub release

Time to create that release. In GitHub click the Release button to open the releases page to create a new release. Click on the Draft a new release button to open the page that will let you create your next release. You will need to add the zip files by dropping or selecting the files where it says to attach binaries to the release. Add additional documents or instructions as needed. Click Publish release to build and make a release available.

New Release Page at GitHub

Now that you have the release created, you should be able to open that remote Jupyter book.

Opening your remote Jupyter book

As you saw in the beginning only two repos are included by default in Azure Data Studio. Even though you have created a remote Jupyter book, it does not show up on the list. You will also notice that it does not use a proper URL even though that appears to be the value that is requested. To connect to your Jupyter book, you will need to use the following format in the URL text box: repos/your GitHub/your repository name. For example, you can find my example remote notebook on Azure SQL Database Elasticity in this repo: repos/DataOnWheels/notebooks.

It appears that in the background, Azure Data Studio handles the front end of the URL. Once you have entered this in you should be able to click Search and fill in the related information that you built into the file name that you added to the release. once you have filled out all the information, you should be able to click Add and this will import the notebook into your local Azure Data Studio. This effectively loads a disconnected copy of the original notebook. Congratulations, now you have uploaded your own Jupyter book and made it available for others using the remote option.

Why would you ever use this?

Be aware that we would not recommend the usage of a public repository for privileged information that you would work with at your company. However, if you find it necessary to share a notebook as a result of building a presentation or you want to share some great information out easily, this might be a good fit for your use case. The Jupyter book that I have shared using this methodology is also available in my standard GitHub repository for code sharing. However, attaching to a remote Jupyter book to get all the example code easily added to Azure Data Studio is a win. It is an easier way to distribute the work however it is not well-known as a pattern.

If you successfully create an interesting or boring Jupyter book that you want to share, I encourage you to include it in the comments below so we can all have a look.

Out of Your SSMS World, Jupyter Notebooks in Azure Data Studio – Louisville Data Technology Group, Feb 2023

It was a lot of fun to speak at the Louisville Data Technology Group in February. Sheila and I presented on Jupyter notebooks in Azure Data Studio. The session was very fun with a lot of interesting interaction from individuals who were looking at both developer and administration tooling within Azure Data Studio as well as understanding how to use Jupyter notebooks most for the first time.

Steve presenting in Louisville

The start of the session was a general introduction to Jupyter notebooks and Jupyter books. You can find the short slide deck here. I think the key thoughts from the introduction was the fact that Jupyter notebooks have been around for a long time and have often been used in data science as well as data engineering with Python. For example, my first exposure to notebooks was working with Databricks and more of an data engineering workload. One interesting note is that Jupyter books appear to be a nonstandard or as far as I can tell hard to understand component. Jupyter books are in fact a folder structure used to organize the contents including various markdown files, subfolders, and notebooks. Jupyter books allow you to store and organize your content and even share it in an organized way.

My first real exposure to Jupyter notebooks in a functional way was to create a platform on which my wife could help with presentations in a simpler manner than just using SQL Server Management Studio. As a result, I began to dig into how Jupyter notebooks could help us during presentations. We have since used Jupyter notebooks at two different SQL Saturdays and presented on how to use notebooks in this session at this user group. You can read about my first experience with notebooks in this post.

As part of our presentation at the Louisville Data Technology Group, my wife and I worked on a step by step walk through the demo. I’ve made some updates to the instructions to hopefully help any of you recreate the demo that we did during the presentation. You can find that step by step here. Besides the demo instructions, a sample of the completed sample notebook is also stored in that GitHub location.

Questions from the group

Can we mix Python and SQL in an SQL kernel notebook?

This is not possible at this time. Currently the notebook attaches to a single kernel and while there is an option to change what type of code is in the cell the only option available when you click in the SQL on the lower right corner is SQL.

When working with an SQL notebook does it create one or more sessions with each cell that is used?

We’re working with Azure Data Studio, each notebook or file will create a new session when connected. In our case each notebook will have its own session and the queries will run within that session for the single notebook. If you open separate notebooks, you will get separate sessions for each notebook to operate in.

In a SQL tab on Azure Data Studio, can you use the same charting functions with your result sets?

We were able to demonstrate this during the user group meeting. The charting and export functions that are available with the results in a notebook code cell execution are also available for results that’s from a traditional SQL execution. The image below shows where you can find be charting and export options from a result set in traditional SQL.

Insert image here

What is the best way to share notebooks with your team?

During our demo, we illustrated how to connect to remote Jupyter books. That however is a great approach for content you want to share with the general public. If you are working with a team and are managing a set of code in notebooks, the preferred approach would be to use GitHub. This would allow each of you to clone the repo and commit its changes back to the notebook and retrieve updates made by other team members.

Converting existing SQL files to notebooks

If you open a .sql file in the Azure Data Studio you have the option to convert the notebook to a SQL file. Typically, this will take comments and try to put them into text cells and separate your code the best it can into code cells that make sense. Be aware that it’s not always consistent and you will likely want to run through your notebook to verify that the result in the notebook is what you would desire. If you want to be proactive you can use markdown formatting in your comments that will then be converted to proper markdown when converted to a notebook.

/*
# This is an example of a header 
Here is an example of **bolded text**
*/

The code above would look like this when converted.

It is also possible to convert a notebook to SQL and it will create the reverse process with commented code and markdown tagging.

Effectively Integrating FHIR Data from Azure Health Services

This blog is intended to be a follow up from the SQL Saturday 2022 in Oregon & SW Washington. In that session I presented an introduction to FHIR and JSON data produced from the Azure Health Services API’s.

With the recent updated mandates in the healthcare environment in the United States, Microsoft has continued to expand its capability to support the FHIR standard for integrating healthcare data. While the standard is well documented and Microsoft’s capabilities are expansive, it falls on data professionals to interpret that data and build meaningful reports and produce meaningful insights from the data as it is collected and integrated across environments. This requires a good working knowledge of JSON in SQL to manipulate complex data models. In the session, we did a short review of the FHIR standard and the overall implementation of FHIR in Azure. From there we reviewed the resulting data in the data lake and in Synapse. That was followed up with an overview into the heart of complex SQL using JSON functions in Synapse. Whether or not you are active in healthcare today, this will be an enlightening session on how to use JSON SQL functions within the Azure SQL platforms.

What is FHIR and why should you care?

FHIR stands for Fast Healthcare Interoperability Resources. this is the latest specification for interoperability in healthcare produced by HL7. To be clear the word fast has nothing to do with performance, but more about the ability to implement and integrate data quickly. With the latest regulations around the world in health care, this standard is the established standard for integrating healthcare data and we’ll continue to be on the forefront of this work. If you do any work in health care, you will need to understand FHIR because you will likely run across data formatted to the standard from many different sources.

FHIR is very well documented. In many ways when the standard is properly followed the JSON documents or other supported formats are effectively self-documenting. It is commonly understood that the core FHIR specification handles about 80% of the use cases in healthcare. It is designed to be flexible so that it can support specialized needs within regions or healthcare areas. For example, in the US there is a need to support race and ethnicity. The U.S. Core Implementation Guide provides guidance on the specification enhancements to support this need for U.S. healthcare organizations. You will find similar support for other countries as well as specific implementations for healthcare vendors such as Epic.

Neither the notebook, the presentation, or this blog is expected to be and exhaustive coverage of FHIR. before we move on to some of the other implementation pieces, it is important to understand one key aspect of FHIR is the basic building block called a resource. A resource is the core exchangeable content within the specification. All resources share the following characteristics:

  • A common way to define and represent the resource including data types and patterns
  • A common set of metadata which can be discovered easily
  • A human readable part

For more detailed information on the supported resources and other details around FHIR implementation, you should visit the following website:

Azure Health Services and the FHIR API

I will not be digging into a lot of the health care services information nor the FHIR support within Azure in this post. The important things to understand is that Microsoft has made a concerted effort to support this specification which includes technology and architectures for the extraction of data from various healthcare systems which will then use the FHIR APIs to standardize that extracted data into the FHIR spec typically in JSON files in the data lake. Because of the standardized format, Microsoft is able to supply a set of common schemas that can be used in serverless synapse to create external tables and views to accelerate the implementation and usage of data produced from the APIs. It is from this starting point that we are able to start working with the data in reporting and analytics solutions.

At this point I want to put a plug in for the company I work for. If you're interested in learning how Azure health services and the FHIR specification can be implemented at your company, we have FHIR Quick Start and FHIR Data Blueprint solutions. These solutions have been used by many other customers to achieve high levels of integration in their health care data estate. If you're interested in learning more, please reach out to us at: https://3cloudsolutions.com/get-started/

Working with the data from the FHIR API using JSON in SQL

As noted in the previous section, Azure Health Services comes with setup serverless tables and views to be used with the extracted data. However due to the complexity of FHIR, there are a number of columns within those tables and views which still contain JSON snippets. For example, there is one field for name which has several objects and arrays to support the specification. You cannot simply select the name from the table and use that as you move forward. There are many different fields like this throughout the data. For the rest of this blog and in the notebook, we will work through a number of scenarios to build a view of the patient resource that can be used for simple reporting. This view will contain a few JSON functions from SQL Server and solve simple to complex scenarios in the illustration.

The functions we will be using:

  • ISJSON
  • JSON_VALUE
  • OPENJSON

In addition to these functions, we will also be using the CROSS APPLY operator in SQL to join our data with relational data.

The examples in the notebook are built on the tables resulting from working with the Azure FHIR API. I am unable to provide a sample of the data to use with the set of information in the notebook currently. However, the SQL will work if you have your own FHIR implementation and a Patient resource to work with. rather than rewrite the entire contents of the notebook in the blog post, here is a link to the notebook.

If you plan to implement this in the same way, you will need Azure Data Lake, Azure Synapse serverless, and Azure Data Studio. the notebook can be opened in Azure Data Studio. If you are unfamiliar with working with notebooks inside of Azure Data Studio, you are not alone. Check out this post which discusses how to implement your first notebook in Azure Data Studio.

Building our view and SQL with JSON functions

If you decide not to open the notebook but are curious what the view looks like here is a finished product that we created in the notebook.

SELECT TOP (20) p.resourceType + '/' +  p.id as PatientResourceID
    , p.resourceType as ResourceType
    , p.id as ResourceID 
    , cast(p.[meta.versionId] as int) as VersionID 
    , cast(p.[meta.lastUpdated] as DATETIME2(7)) as LastUpdated 
    , JSON_VALUE(p.[name], '$[0].family') as LastName
    , JSON_VALUE(p.[name], '$[0].given[0]') as FirstName
    , cast(p.active as bit) as IsActive
    , p.gender as Gender 
    , CAST(p.birthDate as date) as BirthDate
    , CASE WHEN p.[maritalStatus.coding] is null THEN NULL
           WHEN  JSON_VALUE(p.[maritalStatus.coding], '$[0].system') = 'http://terminology.hl7.org/CodeSystem/v3-MaritalStatus' 
                    THEN JSON_VALUE(p.[maritalStatus.coding], '$[0].code')
           ELSE NULL
           END as MaritalStatus 
    , CASE WHEN JSON_VALUE(p.[address], '$[0].use') = 'home' THEN JSON_VALUE(p.[address], '$[0].state')
            WHEN JSON_VALUE(p.[address], '$[1].use') = 'home' THEN JSON_VALUE(p.[address], '$[1].state')
            WHEN JSON_VALUE(p.[address], '$[2].use') = 'home' THEN JSON_VALUE(p.[address], '$[2].state')
            WHEN JSON_VALUE(p.[address], '$[3].use') = 'home' THEN JSON_VALUE(p.[address], '$[3].state')
            ELSE NULL
            END as HomeStateOrProvince
    , e.Ethnicity
    , r.Race
FROM fhir.Patient p
INNER JOIN (SELECT id, max([meta.versionId]) as currentVersion FROM fhir.Patient GROUP BY id) cp
    ON p.[meta.versionId] = cp.currentVersion
    AND p.id = cp.id
LEFT JOIN 
    (SELECT p.id
        , CASE WHEN JSON_VALUE(ext.value,'$.extension[0].url') = 'ombCategory'
            THEN
            CASE WHEN JSON_VALUE(ext.value, '$.extension[1].valueString') IS NOT NULL  THEN JSON_VALUE(ext.value, '$.extension[1].valueString')
                    WHEN JSON_VALUE(ext.value, '$.extension[0].valueString') IS NOT    NULL THEN JSON_VALUE(ext.value, '$.extension[0].valueString')
                    ELSE JSON_VALUE(ext.value, '$.extension[0].valueCoding.display')
                    END
            ELSE JSON_VALUE(ext.value, '$.valueCodeableConcept.coding[0].display')
            END AS Ethnicity 
        FROM 
        (
            SELECT fp.id, fp.extension FROM fhir.Patient fp
            INNER JOIN (SELECT id, max([meta.versionId]) as currentVersion FROM fhir.Patient GROUP BY id) cp
                ON fp.[meta.versionId] = cp.currentVersion
                AND fp.id = cp.id
            WHERE ISJSON(fp.extension) =1
        ) p 
        CROSS APPLY 
            OPENJSON(p.extension,'$'
            ) as ext
        WHERE JSON_VALUE(ext.value,'$.url') = 'http://hl7.org/fhir/us/core/StructureDefinition/us-core-ethnicity'
    ) e on e.id = p.id 
LEFT JOIN 
    (SELECT p.id
        , CASE WHEN JSON_VALUE(ext.value,'$.extension[0].url') = 'ombCategory'
            THEN
            CASE WHEN JSON_VALUE(ext.value, '$.extension[3].valueString') IS NOT NULL THEN JSON_VALUE(ext.value, '$.extension[3].valueString')
                    WHEN JSON_VALUE(ext.value, '$.extension[2].valueString') IS NOT NULL THEN JSON_VALUE(ext.value, '$.extension[2].valueString')
                    WHEN JSON_VALUE(ext.value, '$.extension[1].valueString') IS NOT NULL THEN JSON_VALUE(ext.value, '$.extension[1].valueString')
                    WHEN JSON_VALUE(ext.value, '$.extension[0].valueString') IS NOT NULL THEN JSON_VALUE(ext.value, '$.extension[0].valueString')
                    ELSE JSON_VALUE(ext.value, '$.extension[0].valueCoding.display')
                    END
            ELSE JSON_VALUE(ext.value, '$.valueCodeableConcept.coding[0].display')
            END AS Race 
        FROM 
        (
            SELECT fp.id, fp.extension FROM fhir.Patient fp
            INNER JOIN (SELECT id, max([meta.versionId]) as currentVersion FROM fhir.Patient GROUP BY id) cp
                ON fp.[meta.versionId] = cp.currentVersion
                AND fp.id = cp.id
            WHERE ISJSON(fp.extension) =1
        ) p 
        CROSS APPLY 
            OPENJSON(p.extension,'$'
            ) as ext
        WHERE JSON_VALUE(ext.value,'$.url') = 'http://hl7.org/fhir/us/core/StructureDefinition/us-core-race'
    ) as r on r.id = p.id 

Here is a sample of the results from that view:

PatientResourceIDResourceTypeResourceIDVersionIDLastUpdatedLastNameFirstNameIsActiveGenderBirthDateMaritalStatusHomeStateOrProvinceEthnicityRace
Patient/d8af7bfa-5008-4a0f-85d1-0af3448a31ddPatientd8af7bfa-5008-4a0f-85d1-0af3448a31dd22022-05-31 18:07:03.2150000DUCKDONALD1male1965-07-14NULLONNULLNULL
Patient/78cf7725-a0e1-44a4-94d4-055482781afbPatient78cf7725-a0e1-44a4-94d4-055482781afb12022-05-31 18:07:30.7490000GretzkyWayneNULLNULL1990-05-31NULLNULLNULLNULL
Patient/9e909e52-61a1-be50-1878-a12ef8c36346Patient9e909e52-61a1-be50-1878-a12ef8c3634642022-05-31 18:39:58.1780000EVERYMANADAMNULLmale1988-08-18MNULLNon Hispanic or LatinoWhite+Asian
Patient/585f3cc0-c727-4989-9214-a7a7b60a2adePatient585f3cc0-c727-4989-9214-a7a7b60a2ade12022-05-31 13:14:57.0640000DUCKDONALD1male1965-07-15NULLONNULLNULL
Patient/29a819c4-f553-8189-2354-9441b86d37efPatient29a819c4-f553-8189-2354-9441b86d37ef12022-05-18 15:18:40.1560000FORDELAINENULLfemale1992-03-10NULLNULLNULLNULL
Patient/d5fe6802-a680-e762-8f43-9659340b00acPatientd5fe6802-a680-e762-8f43-9659340b00ac32022-05-18 14:39:52.2550000EVERYMANADAMNULLmale1961-06-15SNULLNULLC
Patient/4d661053-a8d0-148c-7023-54508fd04a52Patient4d661053-a8d0-148c-7023-54508fd04a5212022-05-21 13:48:24.9720000EVERYMANsamNULLmale1966-05-07MNULLNot Hispanic or LatinoWhite

Wrapping it up

As you can see, understanding the specification well enough to build a complex SQL statement using JSON functions is required to work within FHIR effectively. Due to the complex nature of the nested JSON, you may not be able to reconcile this in tools such as power BI. Being able to build this out in SQL guarantees that you have provided you will report writers and analysts with a solid result set which can be used with confidence.

Resources summary:

Memphis SQL Saturday 2022 & a Notebook

Back in person again! It is awesome to be able to get back into the SQL community and see fellow data professionals. A huge shout out to the Memphis data community leaders in particular Zach Golden and Rob Demotsis who put on a great event for their first one out of the pandemic. I was also able to get together with fellow 3Clouders – Dawn Clement and Kristyna Hughes.

Steve, Dawn, Kristyna

A new but different opportunity

For me this was a very special event. Not only is it the first event I’ve been able to do in person since COVID started, but it is also the first event that I have presented at since being diagnosed with ALS. There are times I think I talk about this too much, but it is front and center of who I am now. I want to encourage others who have similar disabilities to remain active as they work in their new reality.

So how did this change for me? Well, having presented on SQL many times through the years, I typically use a method of highlighting code in management studio and executing it. That however would not work in this case. I moved all my code over to a notebook in Azure Data Studio. This allowed me to execute the code a step at a time with a simple button push. To read more about the experience of creating a notebook, check out my previous blog post here.

The other key thing that changed for me was having my wife, Sheila, join me on the platform to push the buttons that I needed for the presentation and the demo. This was definitely a new experience for her and me. She did a great job following my cues and sometimes a lack thereof. She was able to get us through the demos and leveraging the clever new notebook I used. This is the new normal for us and I look forward to presenting for as long as I am able.

Sheila and I co-presenting

Azure SQL Elasticity

This was the topic that I spoke on. We covered elastic queries, elastic jobs, and elastic transactions. As promised to the attendees and those of you who are reading this or are following up on my post about notebooks, I have published the notebook on the Data on Wheels GitHub which you can find here.

After you have downloaded the folders from GitHub, Open Azure Data Studio and browse to the Notebooks section. Click the Open Jupyter Book button has shown below.

This will open a File Explorer dialog. Choose azure SQL database elasticity folder and then click Select Jupyter Book.

This will open the Jupyter book which contains the markdown files with information and the notebooks you need to set up and run the demos. Enjoy!

Thanks to those of you who are able to attend. I hope you enjoyed the event as much as I did!

My experience working with notebooks in Azure Data Studio

I’ve seen notebooks used in Azure Data Studio on multiple occasions. I really like the concept of notebooks, having done some work within Azure Databricks notebooks, but not extensively. As I go into the process that I went through, it’s important to understand that I am not a data scientist and have not done extensive development or spent a lot of time in Python or Jupyter notebooks. Furthermore, my interest in the notebooks was elevated when I realized I wanted to continue presenting while working through my current ALS diagnosis. I have limited use of my hands and arms so highlighting and executing code, especially in front of a crowd, was going to be problematic. (If you want to learn more about my condition and tools I’m using to maintain my ability to work, please check out this series of articles on our blog.)

Let’s start with the core problem that I’m trying to solve today. I will be presenting a session on elastic queries in Azure SQL database. Most of the code is ready to go since I have done this presentation a few times. As I was working through testing my demo, I found executing code by highlighting and pushing “run” in either Data Studio or in SQL Server Management Studio was difficult because I struggled to control highlighting the code. I was also looking for better ways to automate the process, but more about that later. I watched a couple of demos on using notebooks and found some of the notebooks that have been created by Microsoft. I realized I could put together my entire demo package to share with the attendees and build the demo so that I could execute it a step at a time without highlighting. Now that you have the background of what I was trying to accomplish, let’s look at the process I went through getting this done.

How in the world do you work with notebooks in Azure Data Studio?

One of the interesting things about working with notebooks, is that if you want to work with notebooks, it’s likely that you already have and you prefer to use them. This means that the instructions for how to create, organize, and use notebooks within Azure Data Studio is a bit lacking. For example, it was not entirely clear to me that one part of the process is creating a folder to store your notebooks with your markdown files and other content. So, let’s go through the process of creating your first notebook step by step with explanations about what’s happening.

The organization of notebooks and files in Azure Data Studio

Part of my struggle in understanding what was happening is each time I tried to create a notebook it asked me for locations and files. I thought it should know where they should go. So, as a newbie with notebooks and organization with Azure Data Studio, I created a notebook and a Jupyter book so I could see how the files are organized. Then I could go back and create the Jupyter book correctly from the beginning. While I may not get all of the terminology correct in this process, this is my discovery as I move forward through the process.

Once I started working with the notebook process in Azure Data Studio, I realized there were multiple components involved:

  • Jupyter book
  • Markdown file
  • Notebook
  • Section

While I am sure there are simple ways to create what we would like to do, I’m coming at this entirely from Azure Data Studio as a data developer not a data scientist. Each time I tried to create my first Jupyter book, I didn’t understand what its purpose was in the beginning. When you create a Jupyter book, it looks like you’re creating a folder. That folder will also contain several helper files to organize your notebooks, markdown files, and sections. Before we leave the structure and organization section here, I want to clarify that the book is the parent folder, and the section is a sub folder within the book. Markdown files and notebooks are files created that are organized for particular purposes. The markdown file is effectively a document that allows you to create a nicely formatted informational component for your notebook. The notebook files are actual Jupyter notebook files which are split into sections for code and text.

Here is the high level organization of the Jupyter book we are going to create:

  • Jupyter book: Azure SQL database elasticity
    • Markdown file: README
    • Section: Setting up the demo
      • Markdown file: Set up instructions
      • Notebook: Prepping the demo
    • Section: Elastic query demo
      • Markdown file: Elastic query demo instructions
      • Notebook: Elastic query demo
    • Section: Elastic job demo
      • Markdown file: Elastic job demo instructions
      • Notebook: Elastic job demo

For the purposes of this blog post, we will walk through the process of creating the original Jupyter book and the elastic query demo section. That section has a good mix of code and text to illustrate the power and capabilities of notebooks.

Creating your first notebook in Azure Data Studio

Let’s begin creating our first notebook in Azure Data Studio. Before we dive into this process too deeply, I want to be clear that we are going to create a Jupyter book to add our notebooks to. This is not required as you can create a new notebook from the file menu or with the shortcut as noted on the screen in Azure Data Studio. What confused me about this initially is that you cannot create a simple notebook from the notebooks section in Azure Data Studio. When you create your notebook, you can save it as a file in the location of your choosing, but it will not show up in the notebook section. Once you create a notebook, if you are not using a Jupyter book to host it in, you can reopen it just by choosing Open File from the menu. While this may make sense to others, it was not entirely intuitive to me in the beginning. I had to do some mucking around to figure out that process.

So, we will start our process by creating a Jupyter book to host all our notebooks and markdown files. This Jupyter book will also be readily displayed in the notebook section on Azure Data Studio. Using the to get to the More Actions menu, choose Create Jupyter Book.

Create new Jupyter book

In the dialogue give your new Jupyter book a name and specify the location you want to store it in. I have not used the optional content folder for this exercise and will recommend that you do not either.

New Jupyter book dialogue

If you go to the folder location you created your Jupyter book in, you will see that it also created three files in the folder named the same as your Jupyter book:

  • _config.yml
  • _toc.yml
  • README.md

In the notebook section of Azure Data Studio, you should see your Jupyter book with a README markdown file in it. For now, we will leave the README file as an introduction to what is in your notebook. (Be aware, that you can remove the file by deleting it, but you will need to update the TOC file to reflect the changes you made. If you do not update the TOC file, you may see missing file error messages in Azure Data Studio.)

New Jupyter book with README

I will not take time in this post to review what is possible in a markdown file. The key here is you can update the README file that was created with headers and formatting to provide instructions on how to use the various contents of your Jupyter book. If you double click within the README file, it will open up the readme.md file in a new tab in Azure Data Studio. This has a line number and will allow you to update and add content.

The following code gives you an example of some markdown syntax:

# Welcome to the Jupyter book on Azure SQL Database elasticity
This book contains 3 sections
* The first section contains instructions on how to set up the demo
* The second section contains the demo for elastic queries
* The third section contains a demo for elastic jobs

This will result in the following look and feel in your README file

Formatted README markdown file

Adding a section

The next thing we will do is add a section where we will host the executable demo code. Right click on your notebook and choose Add Section. We will add the title as Elastic query.

Adding the notebook

Up to this point, we have been building the framework to support our first notebook. While all these steps are not required, this is the most complete approach. Right click on your section and choose New Notebook. This will create a Jupyter notebook in the subfolder of your section.

New section with a notebook

Once you create the notebook, it will open a tab in Azure Data Studio with the notebook. You will notice that it has something called Kernel. The kernel allows you to set the default language used for the notebook. For the work that we are doing we will be using the SQL kernel. This will allow us to execute SQL code against a database. In the Attach to dropdown, you will see databases that you can use to execute code. The Cell dropdown allows you to add cells which can contain code or text.

Azure Data Studio supports other kernels that can be used for executing code against various workloads. These include Python, Spark, PySpark, and PowerShell.

Now let us get down to the business of creating a notebook with executable code. Before we add executable code, let us add a text cell as an introduction to the code. You can do this by clicking the cell dropdown and choosing text. Once you add the text cell you will notice there is a formatting bar which ironically is missing in the markdown files editor. This means it is easier to create formatted text in a cell in a notebook rather than in the markdown file itself. Keep this in mind as you create your notebooks and add content to your Jupyter book. These cells are easier to work with at times than the full file. This is particularly true if you are not knowledgeable on formatting markdown.

At this point, let us add a quick introduction to what we are about to do in the in the following code cells.

Formatted text cell

Next, we will add a code cell. From the dropdown menu for cell, choose Code Cell. This will add a code cell to your notebook which uses the language selected in your kernel. There is also a play button which allows you to execute the code.

Empty code cell

I am going to add the code that is required to clean up the tables for the demo. The resulting code cell will look like the following:

Code cell with DDL code

As a last step to understanding how notebooks and code work in the environment, we can execute the code by pushing the play button in the code cell. This will return the result of that execution as shown below:

Code cell with results

Congratulations, you have created your first notebook with executable code against a SQL Server database! You can continue to add more text cells and code cells as needed. One of the reasons I like this pattern is that it allows me to execute the code without having to highlight it while doing demos. Each cell can be run independently. You will also notice there is a Run All button if you choose to run all the scripts at the same time that you have in your notebook. This could be valuable if you have a set of maintenance operations or related items you want to run and you have collected in a notebook for use.

Another key thing to remember is that notebooks are shareable. Because the connection is outside of the notebook, once you share the notebook, they will have to connect to an environment that allows them to execute the same code. You can add your notebooks to GitHub or similar source control to manage change and allow you to share common resources easily without just distributing SQL files.

Before we wrap up

I feel I would be remiss if I did not also demonstrate what happens when you get data results in a notebook. In my case I have a database I can connect to which has WideWorldImporters loaded into it. I am going to select the top 1000 rows from the DimSupplier table. Once I run the code cell, I get the rows affected, the execution time, and a table with results as shown here:

Code cell with data results

As you can see in the results window, you have several export options and a chart option that you can use to further visualize or work with the data that you have retrieved. I would encourage you to explore these options as it depends on the type of data you are working with whether they work well for you or not. For example, supplier data does not chart very well, whereas if I had used fact data there may have been some interesting charting options. A notebook could be a straightforward way to demonstrate some simple reporting for a technically savvy audience.

Wrapping it up

There are many more functions that I did not cover around notebooks, and I assume that Microsoft will continue to make improvements to the overall capabilities here. I look forward to using notebooks more as a terrific way to share code and run demos. I hope you find this as valuable as well.

For those of you who are not sure about using notebooks, this is an effective way to build your skills while not trying to learn a new language if you are familiar with SQL. My first exposure was using Python in a Databricks environment. That was much to learn while also trying to understand how notebooks functioned. As the data environment continues to expand and require new skill sets, understanding how to use and leverage notebooks on a regular basis is a good skill to have. Microsoft has done us a great favor by using standard Jupyter notebooks which are used in data science, Databricks, and other areas of data practice.

If you are following my work enablement series, you know one of the things that I am passionate about is simplifying how I work, in order to stay working while continuing to lose functionality in my arms. Notebooks help with this by allowing me to execute code without highlighting it when doing demos. Because highlighting code and executing it in a tool like SQL Server Management Studio requires multiple touches on the keyboard and mouse, I struggle to do it efficiently. The ability to organize my demo around code cells and then have a self-documenting notebook to pass along to attendees is a huge win for me. I hope this helps others who struggle in the same way. And I hope this was helpful to those who have not used or seen notebooks in their current work environment but may in the future.

I will be creating and sharing a completed notebook for the demos related to my presentation on elastic capabilities with Azure SQL. Look for that presentation follow up from the Memphis SQL Saturday in October 2022. I will publish a follow up blog post with a link to the completed notebook used with that demo.