Connecting to Azure Blobs in Power BI

The step-by-step process below walks through connecting to data housed in Azure Blob Storage from Power BI using a SAS token. There are many ways to grab your data from Blob Storage, but this is the most efficient, scalable, and secure way that I found (with some security restrictions from watchful DBAs).

Resources Needed:

  • Base URL for container
  • SAS Token (must have read AND list permissions)
    • Check out the link in resources for a tutorial on generating SAS Tokens.
  • File Path (should end with .csv)
  • Power BI Desktop

Notes:

  • You can skip ahead to the sample M script if you have all your elements. Simply swap out the BaseURL, SASToken, and FilePath and you’re good to go. Otherwise, feel free to walk through the steps below to gain a deeper understanding of the process.
  • Make sure your Base URL ends with a “/”, your SAS Token starts with “?”, and your file path ends with “.csv”
  • Keep the double quotes around each parameter value, this forces Power BI to recognize it as text.

Process:

  1. In Power BI Desktop, go to Get Data and select the Web option.
  2. Switch to the advanced view and put the base URL in the first box.
  3. Put in the second box the SAS token.
  4. In a third box (click add part to get the third one), put “&restype=container&comp=list” (this will allow you to list all the blobs in that container).
  5. Expand the blob down then filter the name on the file path.
  6. Create a custom column to create the entire URL for the file (M code samples are below).
    • FileURL = BaseURL & [Name] & SASToken
  7. Create another custom column to access the web contents of your FileURL column.
    • BinaryURLContents = Web.Contents([FileURL])
  8. Remove all columns except the BinaryURLContents.
  9. Click on the “Binary” value and watch Power BI expand out your CSV file.
  10. Manipulate data from there as needed.

Final M Code:

let
    BaseURL = "BASE_URL_HERE"
    ,SASToken = "SAS_TOKEN_HERE"
    ,FilePath = "FILE_NAME_HERE_(Note do not include section of the URL from Base URL)"
    ,Source = Xml.Tables(Web.Contents(Text.From(BaseURL) &Text.From(SASToken) & "&restype=container&comp=list")),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Attribute:ServiceEndpoint", type text}, {"Attribute:ContainerName", type text}}),
    #"Removed Other Columns" = Table.SelectColumns(#"Changed Type",{"Blobs"}),
    #"Expanded Blobs" = Table.ExpandTableColumn(#"Removed Other Columns", "Blobs", {"Blob"}, {"Blob"}),
    #"Expanded Blob" = Table.ExpandTableColumn(#"Expanded Blobs", "Blob", {"Name", "Properties", "OrMetadata"}, {"Name", "Properties", "OrMetadata"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Blob", each ([Name] = Text.From(FilePath))),
    #"Added Custom" = Table.AddColumn(#"Filtered Rows", "FileURL", each BaseURL &  [Name] &  SASToken),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "BinaryURLContents", each Web.Contents([FileURL])),
    #"Removed Other Columns1" = Table.SelectColumns(#"Added Custom1",{"BinaryURLContents"}),
    BinaryURLContents = #"Removed Other Columns1"{0}[BinaryURLContents],
    #"Imported CSV" = Csv.Document(BinaryURLContents,[Delimiter=",", Columns=24, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(#"Imported CSV", [PromoteAllScalars=true])
  in
   #"Promoted Headers"
//Use this query to validate your file path
let
    Source = Xml.Tables(Web.Contents("BASE URL" & "SAS TOKEN" & "&restype=container&comp=list")),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"NextMarker", type text}, {"Attribute:ServiceEndpoint", type text}, {"Attribute:ContainerName", type text}}),
    #"Removed Other Columns" = Table.SelectColumns(#"Changed Type",{"Blobs"}),
    #"Expanded Blobs" = Table.ExpandTableColumn(#"Removed Other Columns", "Blobs", {"Blob"}, {"Blob"}),
    #"Expanded Blob" = Table.ExpandTableColumn(#"Expanded Blobs", "Blob", {"Name", "Properties", "OrMetadata"}, {"Name", "Properties", "OrMetadata"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Blob", each [Name] = "FILE PATH")
in
    #"Filtered Rows"

Additional Resources:

Power BI: Making Date & Time Keys

Saving the Day from Delay Part 2

Creating DateKey and TimeKey columns can be done with built in functions in the Power Query editor. Quick call out, if you need the time along with dates, I highly recommend splitting your datetime columns in half – one date only and one time only. From there, you can use the same process to convert your time to a decimal number and use a Time Table for your time functions (GitHub link below). Below are some screenshots to walk you through the process.

Let’s say you have a datetime column like my Date column below. To start, I recommend going to the Add Column tab in the query editor, and select Date Only then Time Only to create two new columns. This way the new columns will be right next to each other in the applied steps which will make troubleshooting down the road a lot easier. Don’t forget, you can right click on steps and rename them to help yourself walk through and/or adjust steps in the future.

Time to make our keys! There are a couple ways to do this, but the easiest is to click on the calendar icon (or clock icon for time) and select whole number (select decimal for the time only column). If you’ve worked in Excel, this will look familiar. These whole numbers for date (or decimal for datetime) is the same across the two platforms and is what DAX uses in the background to process datetime equations.

Final Date and DateKey Columns
Final Time and TimeKey Columns

And that’s it! Next post we will look at how to join the date and time tables to your keys in the data model.

External Resources:
https://data-mozart.com/inside-vertipaq-compress-for-success/
https://github.com/AnytsirkGTZ/TimeTable_MCode/blob/main/MQuery%20Time

Power BI: Data Model Optimization

Saving the Day from Delay Part 1

Optimizing your data model can be a daunting task. If you read the intro to this series, you know one of the most efficient and sustainable solutions to a bogged-down data model is to remove native date queries and use a date table. This post will dig into how and why this will speed up performance in both refreshes and in the online PBI service and how to make date and time keys.

The key to optimization is compression. An efficiently compressed data model is a lean, mean, query running machine. There are two types of compression – horizontal and vertical. Horizontal compression occurs on a row by row basis while vertical compression occurs column by column. Power BI uses the Vertipaq Engine, a vertical compression model, to compress data inside the data model. While vertical is more CPU intensive, it is also more efficient as it finds the best option for compression based on the data type in the column (values/whole numbers are most efficient). Data mozart does an in-depth look on this process that I highly recommend reviewing if you have more questions (link at bottom of this post).

Vertical compression is significantly slower on date time columns than value columns. DateKeys are your best friend in compressing your data model because they allow you to capture vital date information but store it in a value format (the most optimal format – think whole number). Converting all your primary date fields to a DateKey will allow all calculations using that primary date field to run much faster as Vertipaq can process the requests more efficiently.

Next post we’ll cover making time and date keys in the Power Query Editor.

External Resources:
https://data-mozart.com/inside-vertipaq-compress-for-success/
https://github.com/AnytsirkGTZ/TimeTable_MCode/blob/main/MQuery%20Time

Saving the Day from Delay: Recap & Series Kick-Off

Ironically, I’m a few months behind on a recap for my presentation on building and using a sustainable, dynamic date query in M code – Saving the Day from Delay. On a positive note, the delay means that I can do a series on the importance of date tables, how to effectively use them in DAX, and what I mean by sustainable and dynamic practices. At the end of this post is a link to a GitHub containing my most up-to-date M Query referenced in this series and the PowerPoint used in the aforementioned presentations.

Presentation Recap

Using a date query, whether from a SQL table or Power Query table (M), can aid in optimizing and honing your data model. A main culprit in sluggish Power BI data models are the native date hierarchies. Native date hierarchies are the default setting in Power BI and provide you with hierarchies to use in visuals that allow for easy drilldowns through year, quarter, month, and day. You can turn off this default under File > Options > Current File > Data Load (see below). You can also do this in global settings, but I recommend doing this file by file as the benefits reach diminishing returns on datasets containing under a 500,000 rows.

Where to change your default settings

If you’re unsure what I mean by native data hierarchy, the image below contains an example of a date hierarchy built by Power BI. To rebuild a date hierarchy from a standard date table, drag and drop columns on top of each other to create a custom date hierarchy. By right clicking on the Hierarchy title, you can rename and reorder your hierarchy accordingly. One of the biggest reasons to turn off the native date hierarchy is that you now only have one hidden DAX table in your data model for the date instead of however many date fields you have in your model (I’ve seen some models with up to 15 date fields!).

Native Date Hierarchy

In subsequent posts, I’ll cover the benefits of turning off this feature, reconnecting your data model, how to customize the date query, custom columns in M, and best practices for a lean, mean data model machine.

https://github.com/AnytsirkGTZ/DateTable_MCode/blob/main/MQuery%20Calendar

Minnesota BI User Group – Powering Up HDInsight with Power BI (December 2015)

On Wednesday, December 16, I presented on this topic at the Minnesota BI User Group.  This session is based on five blog posts that I created in August 2015.

You can find the presentation here: Powering Up HDInsight with Power BI (pdf).

The details can be found in the blog posts noted below:

HDInsight-Series-Featured-Pic_thumb

Setting Up and HDInsight Cluster (No Scripts Required)

Exploring the Microsoft Azure HDInsight Query Console (No Scripting Required)

Uploading Files to an HDInsight Cluster (No Scripting Required)

Using Power BI with HDInsight Part 1: Power Query and Files

Using Power BI with HDInsight Part 2: Power BI Desktop and Hive

My goals for this series

1. Document using Power BI with HDInsight

2. Prove that you can set up a HDInsight Cluster with no scripts

Other References from the Session

Azure: http://azure.microsoft.com/en-us/

Cloud Berry: http://www.cloudberrylab.com/free-microsoft-azure-explorer.aspx

 

Thanks for attending my session.