Derby City Data Days

It was awesome to see the Kentucky data community come out for the first Derby City Data Days in Louisville, KY! Bringing together communities from Ohio, Tennessee, and Kentucky, the Derby City Data Days event was an excellent follow-up to Data Tune in March and deepened relationships and networks made at the Nashville event. In case you missed it, below are my notes from the sessions I attended as well as the resources for my session. Be sure to check out these speakers if you see them speaking at a conference near you!

Building Self-Service Data Models in Power BI by John Ecken

What and why: we need to get out of the way of business insights. If you try to build one size fits all, it fits none. Make sure you keep your data models simple and streamlined.

Security is paramount to a self-service data model. RLS is a great option so folks only see their own data. You can provide access to the underlying data model for read and build which enables them to create their own reports off the data they have access to. If you give your user contributor access, then RLS will go away for that user. Keep in mind, business users need pro license OR need to be in a premium workspace.

One really great option is for people to use the analyze in Excel option to interact with the most popular BI tool – Excel. This allows them to build pivot tables that can refresh whenever needed. You can also directly connect to Power BI datasets from their organization! You can set up the display field option as well to get information about the record you connect to. Pretty slick! With this, security still applies from RLS which is awesome.

Data modeling basics – clean up your model by hiding or removing unnecessary columns (ie sorting columns). Relationships matter. Configure your data types intentionally. Appropriate naming is vital to business user success. Keep in mind where to do your transformations – SQL vs DAX (think Roche’s Maxum). Be sure to default your aggregations logically as well (year shouldn’t be summed up).

Power BI Measures – creations, quick create, measure context, time-based functions. Whenever possible, make explicit measures (using DAX) and hide the column that it was created off of so people utilize the measure you intended. Make sure you add descriptions, synonyms (for Copilot and QA), featured tables, and folders of measures. The functionality of featured tables makes it wise to use folders of measures within your fact tables.

John likes to use LOOKUP to pull dims back into the fact table so he ends up with as few tables as possible. There are drawbacks to this such as slower performance and model bloat, but the goal is for end users who don’t have data modeling experience or understanding. Not sure I agree with this method since it’s not scalable at all and destroys the purpose of a data model. Make sure you hide columns you don’t want end users to interact with.

To turn on feature table, go to the model view then go to Properties pane and toggle that is featured table button. It will require a description, the label that will populate, and the key column (cannot be hidden) that the user will put in excel as a reference for the business user to call records off of.

The PIVOT() TSQL Operations by Jeff Foushee

GitHub: https://github.com/jbfoushee/MyPresentations/tree/main/TSQL_Pivot_Operators

Be sure to look at his GitHub for the awesome source code!

Come to Lousiville on May 9th to see a presentation on JSON and TSQL.

The goal of this is to avoid FULL OUTER JOINs. This is extremely unscalable since maintenance would be terrible. We will avoid this by using pivot. Pivot means less rows, more columns. PIVOT promotes data to column headers.

You get to decide how the tuple that’s created on the pivot is aggregated (count, min, max, sum, avg, etc.). Exactly one aggregate can be applied, one column can be aggregates, and one columns values can be promoted into the column header.

PIVOT ( SUM(Col1) FOR [ID] IN ([ID_value_1], [ID_value_2], etc.)
SUM = the aggregate, ID = the column that will become more columns, the IN values = the column values from ID that will be promoted into the column header.

Time for a 3 column pivot. For this, we are doing a two column pivot and ignoring one of the fields. You can even pivot on computed fields but make sure you include the values in that inclusion clause. Be careful about adding unnecessary data.

How do you manage the VTCs (the column values that end up as column headers)? Option 1 – don’t. Option 2 – explicitly request the ones of interest and provision for future extras. Option 3 – use dynamic SQL! You can use cursor, XML, etc. Check out his ppt deck from github for code samples!

An n-column PIVOT works by essentially creating a 2-column pivot (at the end of the day it’s only two columns that ever get pivoted) and knowing which you want split into new columns.

Ugly side of PIVOT = lookups. The more fields you need to add in from additional tables, the worse performance will be. Your best option there would be to do a group by, the pivot. Another limitation is you can’t use a function in your pivot aggregation (SUM() vs SUM() *10). Get your raw data clean then pivot.

Time for UNPIVOT!

Unpivot = convert horizontal data to vertical. Less columns, more rows. Unpivot demotes column headers back into data.

Be very very careful with your data type your about to create. Remember lowest common denominator, all the values must be able to fit in one common datatype without overflow, truncation, or collation.

UNPIVOT( newColPropertyValue FOR newColPropertyName IN ([originalCol1], [originalCol2],etc.)

You need to make sure all your original columns have the same datatype. NULLs get automatically dropped out. If they are needed, you can convert them using an ISNULL function to a string value or int value depending on your need.

There’s also an option for XML-based unpivot.

Cross Apply = acquires a subset of data for each row found by the outer query. Behaves like an inner join, if no subset is found then the outer row disappears from the result set. Outer Apply is similar but it’s more like a left join. Cross Apply does keep your NULL values. You can also use a STRING_SPLIT with cross apply.

Multi-Unpivot normalizes hard-core denormalized data. You just add more UNPIVOT lines after your initial FROM statement! Make sure you have a WHERE statement to drop any records that don’t align at a column level. Something like WHERE LEFT(element1,8) = LEFT(element2, 8).

My Session – Time for Power BI To Git CI/CD

Thanks to everyone that attended my session! Had some great questions and conversations! Here’s the link to my github with the slide deck from today: https://github.com/Anytsirk12/DataOnWheels/tree/main/Power%20BI%20CICD

Medallion Architecture for Fabric by Steve Hughes

This session was awesome! We were able to watch a series of Fabric 5 minute videos and had an amazing discussion between them about options for building out a Fabric infrastructure using medallion techniques. Check out Steve’s YouTube channel for his Fabric 5 playlist and to learn more about his experience working with ALS.

SQL Bits 2023 Recap

I had an amazing time adventuring to Wales with other members of the data community last week! It was my first international conference, and I had heard so many incredible things about it, but still it blew away all my expectations. I’m so honored to have met so many incredible people throughout the week! Thank you to new friends and old for making it a conference to remember.

Below are the links to GitHub for my session as well as some notes from the sessions I attended. Disclaimer, these notes are not even close to the real thing. Be sure to check out these sessions if you see them being held at your local user group or SQL Saturday in the future! I tend to take very bullet point oriented notes to best organize my thoughts, hopefully these spark your interest and guide you to professionals who know about specific technologies within the data realm.

Check out the agenda and feel free to reach out to folks with topics you’re interested in! They may have their slide deck easily accessible to folks who may not make it to the conference.

Power BI Meets Programmability with Yours Truly

Thank you so much to everyone who made it out to my session!

It was incredible to have 48 people in person and another 21 virtual! It was a humbling experience to see that room fill up and know they were all there to learn something new and exciting to help with their day-to-day work. Can’t wait to look through the feedback and learn where to tweak my presentation to make an even bigger impact in the future!

  • Github with slide deck, sample C# code, and sample PBIX file (with Excel file that powers it): Github Link
  • Category of blog posts related to TOM and C#: https://dataonwheels.wordpress.com/category/c-tom-and-power-bi/
  • Fun tip someone showed me after the session:
    If you use /* and */ to comment out chunks of code (like I do frequently during the session), you can use /* and –*/ so that you only need to comment out the /* to run that block. Pretty neat!

Supercharge Power BI with Azure Synapse Analytics with Mathias Halkjaer

Limitations of PBI

  • Source load minimization doesn’t exist in most cases (lots of queries against the same source)
  • Out of sight transformation layer
  • Not best tool for data warehouse tasks
  • No logs or output from quality checks
  • Doesn’t handle humongous data very well

Enchantments needed for PBI from Synapse

  • Time-travel (historical data & change tracking)
  • Enlarge (massive scale)
  • Counter spell (reverse ETL)
  • Wish (supports multiple programming languages)
  • Divination (AI)
  • Regenerate (CI/CD and source control)

Azure Synapse

  • A toolbox full of tools: serverless sql, data warehouse, spark engine, etc.

Data Lake

  • Landing zone for data
  • Cheap
  • Redundant
  • Quick and dirty analytics
  • Performance
  • Most efficient way to load data into data warehouse
  • Easily consumed

Report Design & Psychology (Quick Tips) with Jon Lunn

Dual process theory: intuitive vs attentive

You can mentally process up to 15 items intuitively. Past that it gets pushed to attentive.

Things can move from system 2 (attentive) to system 1 (intuitive) with repetition (aka driving or typing).

Layout – left to right, top to bottom. Move most important KPI to top left. Remove distractions (excess colors, 3d visuals, pie charts, etc). Titles in upper left make it easier to know what’s contained within the visual.

Tip – to visualize dots use dice patterns to move it to a system 1 process.

Ratios (how many, relative) vs percentages (how likely, chance risk probability). Both are good options to contextualize a number more. We usually understand ratios much better because how many is more intuitive than how likely.

Intuitive systems are more prone to errors or bias. If you want to hide something, go for percentage.

Use the power of 10’s (aka 8 out of 10 preferred it). Round and use a ratio to move it into intuitive process.

Power BI Datamarts What, How, Why with Marthe Moengen

You can manage RLS roles within datamart, join tables via GUI, and join with similar UI to PBI desktop modeling view. In preview only. You can use as a SQL endpoint to connect to your datamart.

Why? Access data through a SQL endpoint. Do simple aggregation using SQL script instead of M. There are potential challenges since this is in preview.

It keeps getting easier to move Oracle and Open Source workloads to Azure with David Levy

  • Cloud migration phases:
    • Discover
    • Assess
    • Migrate
    • Optimize
    • Secure & Manage
  • Azure Migrate: central hub of tools for datacenter migration
    • Primary tool for discover and assess
    • Azure Database Migration Service – Gen 2 has much more control for end users to say how many resources that item gets
  • Use Oracle Assessments in Azure Data Studio to get recommendations for sku size
  • Oracle databases in Oracle cloud can use OCI Interconnect to pull data back and forth as Azure IaaS
  • A lot of people move to PostgresSQL to pay less and get open source resources
  • Migration strategy
    • Rehost
    • Refactor
    • Rearchitect
    • Replace
  • No better way to run software than by SaaS. Why invest the time and resources when the vendor can maintain this for you and allow you to focus on your business?
  • The Oracle Assessments in Azure Data Studio can see all the database details, SKU recommendations, and Feature Compatibility. Each section has a migration effort attached, very nice
  • Azure is a great place for Postgres SQL:
    • High availability
    • Maintained
    • PostgresSQL migration extension can run assessment in ADS and understand the migration readiness
    • Integrated Assessment
    • Easy set up and config in ADS
  • To migrate postgres
    • Pg_dump scripts out the PostgresSQL schema
    • Pg-sql allows creating databases and tables
    • Use the PostgreSQL to Azure Database for PostgreSQL Online Migration Wizard
    • You can start cutover with pending changes
  • Azure Database for MySQL
    • Flexible server is now available, allows mission critical apps to use zone redundancy and fine-grain maintenance scheduling
    • Azure Database Migration Service (ADMS)
      • Migrates schema an data
      • Preview = replicate changes
      • You can do this as an online migration
  • Optimizing your environment
    • Costs during & after migration
      • During: Azure TCO Calculator, Azure Migrate, Azure Hybrid Benefit & Reserved Instances, and join Azure Migration & Modernization Program
        • Do reservations early, you’ll lose money otherwise
        • AMMP = Azure Migration & Modernization Program, provides coaching with best practice
    • Make sure you get good data for trouble times so your migrated instance is ready for the worst case scenario
    • Execute iteratively

5 Things You Can Do to build better looking reports – James McGillivray

  • Design is subjective
  • No clear end point
  • No one solution
  • Design is often seen as less important
  • Blog: jimbabwe.co.za
  • Tip Techniques
    • Grid Layout
      • Pro: left brain, less layout tweaking happens, looks professional
      • Con: can look boxy,
      • Gutters = white space between visuals
      • Margins = white space along edge of report
    • Maximize Real Estate
      • Pro: bigger visuals = more value, fewer visuals = less distractions
      • Con: often hard to convince stakeholders, all questions not answered by default
    • Beautiful Colors
      • Pro: use color to convey meaning, highlight data, show continuous data, qualitative sequential divergent
      • Con: many pitfalls, accessibility concerns, can be polarising
    • Consider the Audience
      • Pro: most important elements first, remove unneeded elements, hierarchy view (summary on top, grouped middle, detail on bottom)

Identifying and Preventing Unauthorized Power BI Gateways with Angela Henry

  • The problem: available to all, gateways are not only for PBI, results in chaos
  • How do we find gateways?
    • Manage connections & gateways. Let’s us identify gateways. Need to be Azure AD global, PBI service admin, or Gateway Admin
  • Reason to restrict: governance