SQL Bits 2023 Recap

I had an amazing time adventuring to Wales with other members of the data community last week! It was my first international conference, and I had heard so many incredible things about it, but still it blew away all my expectations. I’m so honored to have met so many incredible people throughout the week! Thank you to new friends and old for making it a conference to remember.

Below are the links to GitHub for my session as well as some notes from the sessions I attended. Disclaimer, these notes are not even close to the real thing. Be sure to check out these sessions if you see them being held at your local user group or SQL Saturday in the future! I tend to take very bullet point oriented notes to best organize my thoughts, hopefully these spark your interest and guide you to professionals who know about specific technologies within the data realm.

Check out the agenda and feel free to reach out to folks with topics you’re interested in! They may have their slide deck easily accessible to folks who may not make it to the conference.

Power BI Meets Programmability with Yours Truly

Thank you so much to everyone who made it out to my session!

It was incredible to have 48 people in person and another 21 virtual! It was a humbling experience to see that room fill up and know they were all there to learn something new and exciting to help with their day-to-day work. Can’t wait to look through the feedback and learn where to tweak my presentation to make an even bigger impact in the future!

  • Github with slide deck, sample C# code, and sample PBIX file (with Excel file that powers it): Github Link
  • Category of blog posts related to TOM and C#: https://dataonwheels.wordpress.com/category/c-tom-and-power-bi/
  • Fun tip someone showed me after the session:
    If you use /* and */ to comment out chunks of code (like I do frequently during the session), you can use /* and –*/ so that you only need to comment out the /* to run that block. Pretty neat!

Supercharge Power BI with Azure Synapse Analytics with Mathias Halkjaer

Limitations of PBI

  • Source load minimization doesn’t exist in most cases (lots of queries against the same source)
  • Out of sight transformation layer
  • Not best tool for data warehouse tasks
  • No logs or output from quality checks
  • Doesn’t handle humongous data very well

Enchantments needed for PBI from Synapse

  • Time-travel (historical data & change tracking)
  • Enlarge (massive scale)
  • Counter spell (reverse ETL)
  • Wish (supports multiple programming languages)
  • Divination (AI)
  • Regenerate (CI/CD and source control)

Azure Synapse

  • A toolbox full of tools: serverless sql, data warehouse, spark engine, etc.

Data Lake

  • Landing zone for data
  • Cheap
  • Redundant
  • Quick and dirty analytics
  • Performance
  • Most efficient way to load data into data warehouse
  • Easily consumed

Report Design & Psychology (Quick Tips) with Jon Lunn

Dual process theory: intuitive vs attentive

You can mentally process up to 15 items intuitively. Past that it gets pushed to attentive.

Things can move from system 2 (attentive) to system 1 (intuitive) with repetition (aka driving or typing).

Layout – left to right, top to bottom. Move most important KPI to top left. Remove distractions (excess colors, 3d visuals, pie charts, etc). Titles in upper left make it easier to know what’s contained within the visual.

Tip – to visualize dots use dice patterns to move it to a system 1 process.

Ratios (how many, relative) vs percentages (how likely, chance risk probability). Both are good options to contextualize a number more. We usually understand ratios much better because how many is more intuitive than how likely.

Intuitive systems are more prone to errors or bias. If you want to hide something, go for percentage.

Use the power of 10’s (aka 8 out of 10 preferred it). Round and use a ratio to move it into intuitive process.

Power BI Datamarts What, How, Why with Marthe Moengen

You can manage RLS roles within datamart, join tables via GUI, and join with similar UI to PBI desktop modeling view. In preview only. You can use as a SQL endpoint to connect to your datamart.

Why? Access data through a SQL endpoint. Do simple aggregation using SQL script instead of M. There are potential challenges since this is in preview.

It keeps getting easier to move Oracle and Open Source workloads to Azure with David Levy

  • Cloud migration phases:
    • Discover
    • Assess
    • Migrate
    • Optimize
    • Secure & Manage
  • Azure Migrate: central hub of tools for datacenter migration
    • Primary tool for discover and assess
    • Azure Database Migration Service – Gen 2 has much more control for end users to say how many resources that item gets
  • Use Oracle Assessments in Azure Data Studio to get recommendations for sku size
  • Oracle databases in Oracle cloud can use OCI Interconnect to pull data back and forth as Azure IaaS
  • A lot of people move to PostgresSQL to pay less and get open source resources
  • Migration strategy
    • Rehost
    • Refactor
    • Rearchitect
    • Replace
  • No better way to run software than by SaaS. Why invest the time and resources when the vendor can maintain this for you and allow you to focus on your business?
  • The Oracle Assessments in Azure Data Studio can see all the database details, SKU recommendations, and Feature Compatibility. Each section has a migration effort attached, very nice
  • Azure is a great place for Postgres SQL:
    • High availability
    • Maintained
    • PostgresSQL migration extension can run assessment in ADS and understand the migration readiness
    • Integrated Assessment
    • Easy set up and config in ADS
  • To migrate postgres
    • Pg_dump scripts out the PostgresSQL schema
    • Pg-sql allows creating databases and tables
    • Use the PostgreSQL to Azure Database for PostgreSQL Online Migration Wizard
    • You can start cutover with pending changes
  • Azure Database for MySQL
    • Flexible server is now available, allows mission critical apps to use zone redundancy and fine-grain maintenance scheduling
    • Azure Database Migration Service (ADMS)
      • Migrates schema an data
      • Preview = replicate changes
      • You can do this as an online migration
  • Optimizing your environment
    • Costs during & after migration
      • During: Azure TCO Calculator, Azure Migrate, Azure Hybrid Benefit & Reserved Instances, and join Azure Migration & Modernization Program
        • Do reservations early, you’ll lose money otherwise
        • AMMP = Azure Migration & Modernization Program, provides coaching with best practice
    • Make sure you get good data for trouble times so your migrated instance is ready for the worst case scenario
    • Execute iteratively

5 Things You Can Do to build better looking reports – James McGillivray

  • Design is subjective
  • No clear end point
  • No one solution
  • Design is often seen as less important
  • Blog: jimbabwe.co.za
  • Tip Techniques
    • Grid Layout
      • Pro: left brain, less layout tweaking happens, looks professional
      • Con: can look boxy,
      • Gutters = white space between visuals
      • Margins = white space along edge of report
    • Maximize Real Estate
      • Pro: bigger visuals = more value, fewer visuals = less distractions
      • Con: often hard to convince stakeholders, all questions not answered by default
    • Beautiful Colors
      • Pro: use color to convey meaning, highlight data, show continuous data, qualitative sequential divergent
      • Con: many pitfalls, accessibility concerns, can be polarising
    • Consider the Audience
      • Pro: most important elements first, remove unneeded elements, hierarchy view (summary on top, grouped middle, detail on bottom)

Identifying and Preventing Unauthorized Power BI Gateways with Angela Henry

  • The problem: available to all, gateways are not only for PBI, results in chaos
  • How do we find gateways?
    • Manage connections & gateways. Let’s us identify gateways. Need to be Azure AD global, PBI service admin, or Gateway Admin
  • Reason to restrict: governance

Building, Deploying, Sharing a Remote Jupyter Book in Azure Data Studio

When working with Azure Data Studio and its support of Jupyter books, you will find there is an option for remote Jupyter books. As shown in the image below, you can open that Jupyter book and follow through the dialogue for a couple of Microsoft books that are readily available.

top portion of the dialog to add a remote Jupyter book
Dialog for adding a remote Jupyter book

So let’s start at the beginning now let you know what the end game is. What if you want to create and share a Jupyter book full of helpful hints for the world to see? How would you go about doing that? the first question you may be asking is what is a Jupyter book? As it turns out in Azure Data Studio a Jupyter book is a collection of folders and files managed by yaml files.

If you’ve been a SQL Server developer for a very long time, this may be meaningless to you. I know it was for me.

Check out this blog post to see how I created my first notebook and Jupyter book in Azure Data Studio. Once you have created your own content, even if it’s just a sample to try, come back to this post and we will walk through how to share your notebook as a remote Jupyter book.

Get it into GitHub

To share your Jupyter book you will need to get your Jupyter book into GitHub. Use File >> Explorer from the menu in Azure Data Studio to open your repo directory that is stored locally. You can either save your notebooks to this location or copy and paste them using File Explorer. I’ll be writing more details around using GitHub with Azure Data Studio in a future blog post.

Now that you have the content in your repo, you can now commit the code to GitHub. At this point if your repo is not public, you will need to make it public to complete the process. My recommendation is if you have a repo you’d use for a lot of your internal work, you should create a new repo that is public and copy the content there.

Create a zipped file

Because what you’re working with is a folder structure and a set of files, you will not build this in a traditional code build fashion. That begs the question how do you create a release? You create a release of your Jupyter book by zipping up the folder and naming your zip file a very specific way. Before we get into the naming process, remember that Azure Data Studio supports both Mac and Linux as well as Windows. Because of this if you want your remote Jupyter book to be available to Mac and Linux users, you will need to create a tar.GZ file as well.

Your file name needs to contain all the relevant properties required in a remote Jupyter book release. This includes the following attributes separated by hyphens:

  • Jupyter book name
  • Release number
  • Language

In the example I am using, the file name looks like this:

AzureSQLDatabaseElasticity-0.0-EN.zip

Now that you have the file ready, we can create the release.

Creating a GitHub release

Time to create that release. In GitHub click the Release button to open the releases page to create a new release. Click on the Draft a new release button to open the page that will let you create your next release. You will need to add the zip files by dropping or selecting the files where it says to attach binaries to the release. Add additional documents or instructions as needed. Click Publish release to build and make a release available.

New Release Page at GitHub

Now that you have the release created, you should be able to open that remote Jupyter book.

Opening your remote Jupyter book

As you saw in the beginning only two repos are included by default in Azure Data Studio. Even though you have created a remote Jupyter book, it does not show up on the list. You will also notice that it does not use a proper URL even though that appears to be the value that is requested. To connect to your Jupyter book, you will need to use the following format in the URL text box: repos/your GitHub/your repository name. For example, you can find my example remote notebook on Azure SQL Database Elasticity in this repo: repos/DataOnWheels/notebooks.

It appears that in the background, Azure Data Studio handles the front end of the URL. Once you have entered this in you should be able to click Search and fill in the related information that you built into the file name that you added to the release. once you have filled out all the information, you should be able to click Add and this will import the notebook into your local Azure Data Studio. This effectively loads a disconnected copy of the original notebook. Congratulations, now you have uploaded your own Jupyter book and made it available for others using the remote option.

Why would you ever use this?

Be aware that we would not recommend the usage of a public repository for privileged information that you would work with at your company. However, if you find it necessary to share a notebook as a result of building a presentation or you want to share some great information out easily, this might be a good fit for your use case. The Jupyter book that I have shared using this methodology is also available in my standard GitHub repository for code sharing. However, attaching to a remote Jupyter book to get all the example code easily added to Azure Data Studio is a win. It is an easier way to distribute the work however it is not well-known as a pattern.

If you successfully create an interesting or boring Jupyter book that you want to share, I encourage you to include it in the comments below so we can all have a look.

The Microsoft Ability Summit in Review

abilities summit header

As many of you are aware, I have been dealing with a progressive version of ALS which is affecting my hands and arms and thus my ability to type. Throughout this process I have been writing about the technology that has allowed me to keep working. I really started to dig in and embrace the work that Microsoft is doing in the accessibility space. As I was looking around at what they were doing I stumbled across last year’s Ability Summit. It was interesting because in this Summit they announced the release of some of the Surface accessibility tools as well as Voice Access both of which I will be using a lot of. I have been using Voice Access since it was released in preview in the middle of last year. I plan to write about my experience with Voice Access after the next update of the product which was dropped in preview late February and I hope to have the update soon. My hope is that a number of the issues I’ve been dealing with in the product will be resolved by then.

Picture of Microsoft Adaptive Accessories
Microsoft Adaptive Accessories

Moving on to the Summit. When I came across the Summit in YouTube I was really impressed with the overall approach as well as content and conversations that were part of the event. This year’s Summit took place on March 8th. I took the time to watch much of the conference with my wife to see what new and great things are coming from Microsoft to further enable those of us with disabilities to work in a Microsoft and Windows environment. I was not disappointed! The conference had a number of sessions where they talk about how various individuals are impacting their workplaces as well as reflecting on the life of one of the foremost disability advocates in the country, Judy Heumann.

Along the way, there were a few things that they announced or talked about that excited me. The first of course was that Voice Access was fully released in February. While I haven’t received the update yet I’m looking forward to getting that and reviewing it in a different post. They also showed how Microsoft 365 including the Office tools and Teams have continued to embrace accessibility for people with all types of disabilities. One of the neat things they demonstrated was there’s now an accessibility checker in Word for example that will allow you to see if the formatting you have chosen in your document is accessible or not. I look forward to seeing the continued improvements on this as it brings awareness to those of us who don’t struggle with some of those same issues and shows us how to build documents that are considered fully accessible.

GitHub Next logo
GitHub Next

One of the tools I’m most interested in is Hey GitHub! Beyond even GitHub Copilot, you now have the ability to use voice commands to build code. While this is still in the early stages it is an awesome concept. I of course will be looking at how I could potentially use this to build out some SQL code so I can continue to demo.

There was also a lot of conversation around the impact of AI on our ability to be more productive. This includes the various AI capabilities such as support writing code in Visual Studio and creating summaries of meetings with ChatGPT. I hope to put some of these new findings to good use.

I have been using the public version of ChatGPT to build LinkedIn summaries of blog posts so I don’t have to type them up myself. It has been a cool experience watching the AI build the summary.

If you or your company is looking for insights into how Microsoft as well as other companies are tackling the tough questions around accessibility, I encourage you to check out the various sessions from the Ability Summit. This is a great opportunity to embrace a group of individuals we have a lot to contribute all they need is the opportunity! I want to say a huge thank you to the Microsoft team who have put together this conference and continued to make accessibility a key part of the tools they provide to us.

SQL Saturday Atlanta 2023 Recap

This past weekend, I had the pleasure of attending and speaking at SQL Saturday Atlanta! If you’re in the area, I highly recommend connecting with the local user group and getting to know fellow data nerds near you. Thank you to everyone who was able to make it out, it was great to see such a large in person SQL Saturday event post-pandemic! Big shout out to the volunteers, organizers, and sponsors who made it all happen. Thank you all for your time, hard work, and commitment that will lead to rebuilding the data community. This was the first conference I felt confident enough in my presentation to attend a session in every timeslot! Below are some takeaways I took from the sessions I attended, but there were a lot of other incredible sessions and I recommend checking out the schedule for any that interest you and reaching out to the speaker.

Attended Session Takeaways

  • Practical Use Cases for Composite Models (Kevin Arnold)
    • Never thought of using composite models for pulling in an enterprise model with a thin slice specific to the use case of the report. Genius method for maintaining a focused enterprise model while meeting the needs of your end users.
    • Perspectives can be used with personalized visuals instead of hiding columns, so end users are not overwhelmed by column and measure options.
    • Field parameters can also be used/created by end users for a cultivated experience that meets their business needs without impacting the larger audience of your enterprise model. If you haven’t heard of them (I hadn’t), highly recommend checking out this link.
  • Planning Steps for a Power BI Report (Belinda Allen)
    • Always ask stakeholders what their experience is with Power BI, it will help put all their questions and assumptions in context.
    • Ask your stakeholder for the scope of success. If they can’t define what success is for the project, you have the wrong person or the wrong client.
    • Show nothing in a needs gathering session. Listen and take notes. Similar to watching a movie before reading a book, it will severely limit the imagination necessary for an impactful report.
    • Ask who is maintaining the data currently and who will continue to do so.
    • Check out PowerBI.tips Podcast.
    • Ask if they want data access or data analytics. This will let you know if a visual report is a waste of resources for them and/or if paginated report or something similar better fits their needs.
    • Check out Chris Wagner’s blog, he has a great slide deck for a wireframing session with success owner after the needs gathering session.
    • Host office hours or something similar to foster on-going user growth
    • After project, always ask if we achieved defined success benchmarks. Try to give them a concrete ROI (ie x hours saved = x $ saved based on average salary).
    • Linktr.ee/msbelindaallen
  • Introduction to Azure Synapse Studio Development Tools (Russel Loski)
    • Synapse workspace can allow you to create T-SQL and python notebooks off items in Azure Data Lake Storage like csv and parquet files.
    • Notebooks allow markdown to be side-by-side with code
    • ctrl + space will bring up snippets to use within a notebook
    • No indexing since it’s serverless, prepare for some wait time.
    • We can promote column headers using a variable HEADER_ROW = TRUE
  • DataOps 101 – A Better Way to Develop and Deliver Data Analytics (John Kerski)
    • Check out the The DataOps Manifesto – Read The 18 DataOps Principles
    • Principles are repeatable and adaptable to new technologies
    • Make everything reproducible
    • Versioning and automated testing are keys to building sustainable solutions
    • Check out the DataOps Cookbook and pbi-tools
  • Power BI Performance in 6 demos (Patrick LeBlanc & Adam Saxton from Guy in a Cube)
    • To reduce the “other” line item in performance analyzer, limit the number of visual objects on a page.
    • Optimize DAX when the line item is over 120 milliseconds.
    • SLA for page loads is 5 seconds.
    • Using drop downs in your slicer will delay DAX from running for that visual object. That deferred execution aids in speeding up the initial load.
    • Tooltips run DAX behind the scenes on the initial load for visuals (you can see this by copying out the DAX query into the DAX Studio). To delay this execution until it’s needed, use a tooltip page.
    • If the storage engine in DAX Studio is over 20 milliseconds, there’s opportunity to optimize.
    • Variables limit the number of times a sub-measure will be run and speed up DAX queries behind visuals.
    • Keep in mind while performance tuning, Power BI desktop has 3 caches – visual, report, and analysis services engine. You can clear all caches within the desktop tool except visual. To clear that cache, you need to close and reopen the PBIX file.

My Session

I cannot express enough how grateful I am for everyone who was able to make it to my session! To have so many established professionals in the field approach me afterwards telling me how well it went was a dream come true. If you’re interested in reviewing the slides and code, please check out my GitHub folder for all you will need to recreate the demo we went through. Miss it? No worries! I’ll be presenting this topic at SQLBits and the Power BI Cruise, so come join me! I’m also open to presenting at various user groups, feel free to reach out to me at kristyna@dataonwheels.com.

Again, thank you so much to everyone who made this weekend possible and please connect with me on LinkedIN and Twitter! I’d love to stay connected!

Out of Your SSMS World, Jupyter Notebooks in Azure Data Studio – Louisville Data Technology Group, Feb 2023

It was a lot of fun to speak at the Louisville Data Technology Group in February. Sheila and I presented on Jupyter notebooks in Azure Data Studio. The session was very fun with a lot of interesting interaction from individuals who were looking at both developer and administration tooling within Azure Data Studio as well as understanding how to use Jupyter notebooks most for the first time.

Steve presenting in Louisville

The start of the session was a general introduction to Jupyter notebooks and Jupyter books. You can find the short slide deck here. I think the key thoughts from the introduction was the fact that Jupyter notebooks have been around for a long time and have often been used in data science as well as data engineering with Python. For example, my first exposure to notebooks was working with Databricks and more of an data engineering workload. One interesting note is that Jupyter books appear to be a nonstandard or as far as I can tell hard to understand component. Jupyter books are in fact a folder structure used to organize the contents including various markdown files, subfolders, and notebooks. Jupyter books allow you to store and organize your content and even share it in an organized way.

My first real exposure to Jupyter notebooks in a functional way was to create a platform on which my wife could help with presentations in a simpler manner than just using SQL Server Management Studio. As a result, I began to dig into how Jupyter notebooks could help us during presentations. We have since used Jupyter notebooks at two different SQL Saturdays and presented on how to use notebooks in this session at this user group. You can read about my first experience with notebooks in this post.

As part of our presentation at the Louisville Data Technology Group, my wife and I worked on a step by step walk through the demo. I’ve made some updates to the instructions to hopefully help any of you recreate the demo that we did during the presentation. You can find that step by step here. Besides the demo instructions, a sample of the completed sample notebook is also stored in that GitHub location.

Questions from the group

Can we mix Python and SQL in an SQL kernel notebook?

This is not possible at this time. Currently the notebook attaches to a single kernel and while there is an option to change what type of code is in the cell the only option available when you click in the SQL on the lower right corner is SQL.

When working with an SQL notebook does it create one or more sessions with each cell that is used?

We’re working with Azure Data Studio, each notebook or file will create a new session when connected. In our case each notebook will have its own session and the queries will run within that session for the single notebook. If you open separate notebooks, you will get separate sessions for each notebook to operate in.

In a SQL tab on Azure Data Studio, can you use the same charting functions with your result sets?

We were able to demonstrate this during the user group meeting. The charting and export functions that are available with the results in a notebook code cell execution are also available for results that’s from a traditional SQL execution. The image below shows where you can find be charting and export options from a result set in traditional SQL.

Insert image here

What is the best way to share notebooks with your team?

During our demo, we illustrated how to connect to remote Jupyter books. That however is a great approach for content you want to share with the general public. If you are working with a team and are managing a set of code in notebooks, the preferred approach would be to use GitHub. This would allow each of you to clone the repo and commit its changes back to the notebook and retrieve updates made by other team members.

Converting existing SQL files to notebooks

If you open a .sql file in the Azure Data Studio you have the option to convert the notebook to a SQL file. Typically, this will take comments and try to put them into text cells and separate your code the best it can into code cells that make sense. Be aware that it’s not always consistent and you will likely want to run through your notebook to verify that the result in the notebook is what you would desire. If you want to be proactive you can use markdown formatting in your comments that will then be converted to proper markdown when converted to a notebook.

/*
# This is an example of a header 
Here is an example of **bolded text**
*/

The code above would look like this when converted.

It is also possible to convert a notebook to SQL and it will create the reverse process with commented code and markdown tagging.