Category Archives: Microsoft Azure

Cosmos DB for the Data Professional

Cosmos DB LogoCosmos DB is one of the fastest growing Azure services in 2018. As its popularity grows, data professionals are faced with a changing reality in the world of data. Data is no longer contained in relational databases as general rule. We saw the start of this with Hadoop data storage, but no one ever referred to Hadoop as a database. Sure Hive and other Hadoop based technologies made the data look like a database, but we (data professionals) were able to keep our distance. What’s changed?

The Cloud, Data, and Databases

As cloud reaches more and more businesses, traditional data stores are being reconsidered. We now have data stored in Azure – Azure Data Lake, Azure Storage, Azure Database Services (SQL, PostgreSQL, MySQL), Azure Data Warehouse, and now Cosmos DB. Cosmos DB is the globalized version of Azure Document DB (more about that later). If we are to grow our skillset and careers to a cloud data professional, we need to know more about other ways the data is stored and used. I want to summarize some things that we need to be aware of about Cosmos DB. If your business uses it or plans to and you are a data pro, you will need to know this.

Introducing Cosmos DB

Azure Cosmos DB is Microsoft’s globally distributed, multi-model database.

Cosmos DB Overview 201804

Source: https://docs.microsoft.com/en-us/azure/cosmos-db/introduction 

I will break down key components of Cosmos DB with a data professional in mind. There are a lot of aspects of Cosmos DB that make it very cool, but you will want to understand this when you get the call to fix the database.

Multi-model Database Service

Currently Cosmos DB supports four database models. This is like having for different database servers in one. I liken it to having SQL Server Database Engine and SQL Server Analysis Services using the same underlying engine and it only “looks different.” Cosmos DB refers to these as APIs. The API is chosen when the database is created. This optimizes the portal and database for use with that API. Other APIs can be used to query the data, but it is not optimal. Here are the four models supported and the APIs that support them.

Cosmos DB models

  • Key Value Pair: This is exactly as it sounds. The API is implemented with the Azure Table Storage APIs.
  • Wide Column or Column Family: This stores data similar to relational, but there is no row consistency (each row can look different). Cosmos DB uses the Cassandra API to support this model. (For more information on Cassandra click here.)
  • Documents: This model is based on JSON document storage. Cosmos DB currently supports two APIs for this model: SQL which is the Document DB API and Mongo DB. These are the most common models used in Cosmos DB today. Document DB is the “parent” to Cosmos DB which was rebranded.
  • Graph: Graph databases are used to map relationships in data and were made popular with Facebook for instance. Microsoft uses the open source Gremlin API to support the Graph Database Model.

None of these databases are traditional row/column stores. They are all variations of NoSQL databases.

Turnkey Global Distribution

This is a key attribute for Cosmos DB. Cosmos DB can be easily distributed around the world. Click the data center you want to replicate to and Cosmos DB takes care of the rest. Cosmos DB uses a single write node and multiple read nodes. However, because Cosmos DB was built with global distribution in mind, you can easily and safely move the write node as well. This allows you to “chase the sun” and keep write operations happening “locally”.

Data Consistency

Data consistency is a primary concern of any data professional. The following tables compare Cosmos DB Consistency Levels with SQL Server Isolation Levels. These are not a one for one match, but demonstrate the different concerns between the systems.

 

Cosmos DB

SQL Server

Consistency Level Guarantees Isolation Level Dirty Read Non- repeatable Read Phantom
Strong Reads are guaranteed to return the most recent version of an item. Serializable No No No
Bounded Staleness Consistent Prefix or read order. Reads lag behind writes by prefixes (K versions) or time (t) interval. Snapshot No No No
Session Consistent Prefix. Monotonic reads, monotonic writes, read-your-writes, write-follows-reads. Repeatable Read No No Yes
Consistent Prefix Updates returned are some prefix of all the updates, with no gaps. Reads are not read out of order. Read Committed No Yes Yes
Eventual Out of order reads. Read Uncommitted Yes Yes Yes

As you can see, there are some similarities. These options are important to understand. In the Cosmos DB, the more consistent you need the data, the higher the latency in the distributed data. As a result, most Cosmos DB solutions usually start with Session Consistency as this gives a good, consistent user experience while reducing latency in the read replicas.

Throughput

I am not going to dig into this much. But you need to understand that Request Units (RU) are used to guarantee throughput in Cosmos DB. As a baseline, Microsoft recommends thinking that a 1 KB JSON file will require 1 RU. The capacity is reserved for each second. You will pay for what you reserve, not what you use. If you exceed capacity in a second your request will be throttled. RUs are provisioned by region and can vary by region as a result. But they are not shared between regions. This will require you to understand usage patterns in each region you have a replica.

Scaling and Partitions

Within Cosmos DB, partitions are used to distribute your data for optimal read and write operations. It is recommended to create a granular key with highly distinct values. The partitions are managed for you. Cosmos DB will split or merge partitions to keep the data properly distributed. Keep in mind your key needs to support distributed writes and distributed reads.

Indexing

By default, everything is indexed. It is possible to use index policies to influence the index operations. Index policies are modified for storage, write performance, and read or query performance. You need to understand your data very well to make these adjustments. You can include or exclude documents or paths, configure the index type, and configure the index update mode.  You do not have the same level of flexibility in indexes found in traditional relational database solutions.

Security

Cosmos DB is an Azure data storage solution which means that the data at rest is encrypted by default and data is encrypted in transit. If you need RBAC, Azure Active Directory (AAD) is supported in Cosmos DB.

SLAs

I think that the SLAs Microsoft provides with Cosmos DB are a key differentiator for them. Here is the short summary of guarantees Microsoft provides:

  • Latency: 99.99% of P99 Latency Attainment (based on hours over the guarantee)
    • Reads under 10 ms
    • Writes under 15 ms
  • Availability
    • All up – 99.99% by month
    • Read – 99.999% by month
  • Throughput – 99.99% based on reserved RUs (number of failures to meet reserved amount)
  • Consistency – 99.99% based on setting

These are financially backed SLAs from Microsoft. Imagine you providing these SLAs for your databases. This is very impressive.

Wrap Up

For more information, check out Microsoft’s online documentation on Cosmos DB.

I presented this material at the April 2018 PASS MN User Group Meeting. The presentation can be found here.

Advertisements

Power BI Data Security – Sharing in Email

 

Power BI Security LogoMicrosoft has expanded sharing by allowing users to share Power BI content via email. In a previous post, I discussed how sharing content within your organization should be handled carefully. However, the new process opens up the opportunity to share outside your organization by sending an email. In particular, you can now share with users who have a personal email address such as @outlook.com and @gmail.com. Let’s dig into the implications of this capability.

Sharing Using Email

First, you need to be aware that this functionality is as simple as the original methods of sharing. You click the Share button on your report or dashboard to open the Share dialog.

The Share report dialog in this case accepts email addresses which is not a significant change. However, as shown below, you can add personal emails and emails outside your organization. You be warned, but users do not always pay attention to this or understand the implications.

Share report - outside

You will also notice that consumers need to still have a Power BI Pro account assigned to them or you need to be using Power BI Premium for this to work.

Following the Email Process

When you share, you usually will need to send an email to the recipient. Here is the email content.

Report Share EmailTime to click the report link. This opens a series of dialogs which determine how much you have access. It is important to note that this is all made possible with Azure B2B. More about that in a moment. Let’s trace the story through. The link opens the following page.

Report Share Email - Welcome Link

As you can see, the next step is to log in. I am using an outlook.com account so it prompts me to authenticate. Once I have authenticated, I get the following notice.

Report Share Email - Opened Report

My account does not have Power BI Pro, but now I can try it for free for 60 days and get access to the data while I am on the trial. I clicked both options, because I can. The Upgrade account option would require me to pay for Pro. However, Try Pro for free works and I was able to access the report fully. I have successfully shared my corporate content with a personal user.

Preventing Sharing Outside Your Organization

While in some cases, you need to share outside your organization, we will assume here you need to disable this functionality. There are a few places you can make this happen.

Power BI Admin Portal

First, in Power BI go to the Admin portal and disable sharing outside your organization. If you have followed my previous advice, this will already be disabled.

 

PBI Admin Portal - Disable Sharing

As you can see, this will disable content for users who have been shared with previously. If you need to share, you can specify groups that have that permission.

Office 365 Admin Center

Next, this can be turned off in the Office 365 Admin Center in the Security and privacy area.

PBI O365 Admin Center - Disable Sharing

This prevents the ability to add guest users to the organization. This will disable this capability across Office 365. There is no option to allow some users this access. Once this is disabled, sharing outside the organization which requires a guest user will not be possible.

Azure Active Directory

Finally, you can shut this down from Azure Active Directory. Guest users are ultimately managed through Azure Active Directory and this is the best place to turn this off corporately if you do not need this functionality.

PBI AAD - Disable Sharing

In AAD you have four options.

  1. Guest users permissions are limited. This limits guest user capabilities with regard to the directory. Yes is the default and recommended.
  2. Admins and users in the guest inviter role can invite. This would be a typical option we can understand. However, it is important to note that Admin users in Power BI workspaces will have the ability to create guest users and share reports externally with this permission on.
  3. Members can invite. Just like it sounds. Any member of a group can invite guest users in.
  4. Guests can invite. This allows guests to invite other guests. Seems dangerous to me.

As you can see from my tenant, the options are all on which is the default. Be sure to understand what capability you want to use and set it appropriately within your tenant.

Tracking Sharing

In the Office 365 logging, you can see who and what has been shared. This log covers internal and external shares and should be monitored for auditing and compliance purposes.

Azure B2B

Azure B2B and the sharing capabilities in Power BI go hand in hand. This allows organizations to share content in a controlled fashion to consumers outside their organization. While this is required for certain scenarios, be mindful of who has the capability to share, and track sharing to make sure the data is being handled as you require.

Final Thoughts and References

You need to remember that sharing is at the heart of Power BI and you need to manage how and who can share. If you need to do more extensive sharing, by all means, use these features. For those, who need to lock it down tighter, you can follow the steps above to prevent sharing until you have a process and pattern. Power BI continues to improve and grow and as that happens we can expect more security options to support the new functionality. Enjoy Power BI, it is a great tool and will only continue to get better.

References

Using Azure AD B2B with Power BI

Auditing Power BI

Share your Power BI content with anyone by email

 

 

Power BI and Data Security – App Workspaces and Power BI Apps

Power BI Security LogoShortly after I published the Power BI Security Sharing Data post in April, Power BI Premium, Power BI Apps, and Power BI App Workspaces were released. These changes impacted that post in many ways. As part of the follow up, I also did an updated webcast with Pragmatic Works. This is a follow up post with some of the changes called out.

We are in the process of restructuring our organization around Apps and App Workspaces. Here are some of the highlights and changes related to sharing data using these new features.

Information Architecture and the Importance of Planning

My company, Pragmatic Works, uses a number of collaborative features in Office 365 including Planner, Teams, SharePoint, and Power BI. With this level of usage, a number of Office 365 groups get created. As we begin the process of updating our reporting structure, we will be using the App Workspace model to manage content creation and the Power App model to deploy content to users.

Before we began, we had to understand who the content creators are and who the consumers would be. App Workspaces are currently managed as Office 365 groups. We have a lot groups that match to our consumers, however, they really don’t work for content creators. Why? As we began the research, consumers exist in the current groups and are excellent targets. App Workspaces already exist for these users and groups due to our use with Teams and SharePoint. But due to the current limitations within Power BI and Office 365 with regards to group management, we need to create new App Workspaces, which also create new Office 365 groups to manage content creators. Typically these groups will be small and easy to manage. By limiting the users in these workspaces, we are also able to keep the additional clutter that is required.

In our process, we treated the end result as the guidance for the required workspaces. Each workspace creates and app that we want to target a specific set of consumers. By starting there, we created the list of workspaces we need to create. Because the apps and workspaces have a 1:1 relationship with each other, the apps (collection of Power BI content with the same permissions) are the determining factor for whether a workspace will be required. Our goal was to have the appropriate level of security while still minimizing management of the additional workspaces.

App Workspaces

We created the App Workspaces based on our Information Architecture Plan. The workspaces were created with two admins and set with members who would be content creators. Part of our exercise was to understand the impact of changing roles in Office 365 and related products such as Teams. What we learned is that Admin and Owner roles are shared throughout and managed by the Office 365 group. If you make a user an Owner in Teams they become an Admin in the matching Power BI Workspace. THIS IS IMPORTANT! While creating additional workspaces for report creation adds complexity by creating Office 365 groups, we have different security and content management rules for Power BI groups.

The Admins have the ability to add users to the group. Members do not. Also a Team, for instance, may have 100s of members who are essentially consumers. We are using the same role, Member, to assign to content creators. Consumers will use the Power BI Apps to view and consume the data made available. Because of this distinction, we created new Power BI App Workspaces.

When creating Members in Power BI Workspaces, you have the option to make those members View Only. However, doing so means all content creators will need to be Workspace Admins. This may work well for your organization, but remember Admins have elevated permissions as they are also Owners in Office 365 groups.

Preventing App Workspace Creation

Currently the only way to prevent App Workspace creation in a Power BI subscription is to disable the ability to create Office 365 groups or limit that capability to a small group of people. (NOTE: This affects all Office 365 applications which use Groups to segment the app such as Teams.) This is done using PowerShell. You can find details here including what applications are affected by this change.

Power BI Apps

In order to use Power BI Apps, all users need to have a Power BI Pro license or the apps need to be deployed to Power BI Premium. Whether you choose to use Pro or Premium should be evaluated for your organization. With current retail pricing, around 500 consumers is the “break even point” when only considering licensing. I will be discussing non-license related reasons to choose Premium in a later post.

When publishing or updating an app as noted in the images below. You have the ability to assign permissions to the app. Unlike Workspaces, you are able to assign distribution lists, individuals, and security groups to an App. This allows you to manage consumers using Active Directory (AAS). PBI Workspace and App

PBI App Permissions

Once Apps are deployed, a link can be sent to the users or they can find it in their available apps from Microsoft Appsource and their organizations deployed apps. They should only see apps they have been given permissions to. Once they get the app and open it, they have read only access but full Power BI interactive capabilities.

Power BI Apps will honor Role Level Security (RLS). However, unlike content packs, users cannot modify or change any content in the app including dashboards. When using apps, you are essentially creating the entire experience for the user and it cannot be “personalized” with their own dashboard or other updates. For enterprise reporting scenarios, this makes sense. For self-service or configurable solutions, you should still consider Content Packs.

To finish our story, we will be deploying Apps based on reporting groups within our Active Directory structure. This will allow us to control access to reporting through standard processes. At the moment, a group of us will manage the Workspace groups to prevent unwanted exposure to data and to manage report “creep”. In some ways, this is contrary to the original purpose of Power BI as a self service product. We are not limiting our teams capability to do their own report creation, app deployment, or self service analytics. We are making a point that the Enterprise Reporting will be managed which most organizations need on some level. If you have a great report you want to include, the Power BI Desktop allows us portability. The only not portable portion is the Dashboard itself. Hopefully we will be able to transfer that between workspaces in the future.

Thoughts about the Microsoft Data Amp Announcements

Microsoft conducted a live event called Microsoft Data Amp to announce a number of key features and releases for SQL Server on premises and data platforms in Azure (such as Azure SQL DB and Azure Data Lake). Some of these include features that I have been waiting to see. Here are some of announcements that I am excited about.

Microsoft Data Platforms 
Intelligent – Trusted – Flexible
On-premises & Cloud

SQL Server 2017

Yes. Microsoft has officially announced that SQL Server vNext is SQL Server 2017. The marquee feature being released in SQL Server running on Linux. But this also shows Microsoft is increasing its innovation efforts with SQL Server with an even shorter time between releases.

CTP 2 of SQL Server 2017 has been released today and includes an number of analytics features such as support for graph processing and graph queries. It will be the first commercial database with built in support for AI and deep learning database applications using R and Python scripts. Check out all the database engine improvements.

Azure SQL Database

Microsoft is bringing even more symmetry between the on-premises product and the PaaS product. The goal is to support upgrades or migrations to Azure SQL DB with minimal effort and no changes. Here are some of the features that are coming to Azure SQL DB soon:

  • Support for SQL Agent
  • 3-part names
  • DBMail
  • CDC
  • Service Broker
  • Cross-Database and Cross-Instance querying
  • CLR & R Services
  • SQL Profiler
  • Native backup-restore
  • Log shipping
  • Transactional Replication

These features will definitely bring more parity to the platforms. A number of these features are key for some of my clients to move to Azure SQL DB.

Migration Project for Azure SQL DB

Whether you have SQL Server, Oracle, or MySQL, you should be able to migrate your database to Azure SQL DB in “five simple steps”. While a great tool, I am interested in exploring this more with Oracle in particular. You can create a project in Azure that let’s you choose the source database and platform and target a Azure SQL DB then move the schema and load the database. While I am skeptical on the full capability of this solution, I look forward to exploring it more.

Azure Analysis Services is GA

The last topic I am going to bring up is Azure Analysis Services. This service is now GA which brings a great service to the PaaS space in Azure. Check out the capabilities here.

Final Thoughts

Microsoft announced much more than I highlight here including tighter AI integration into the data engine, R Server 9.1, and planet scale Document DB. Check out the Microsoft Data Amp site for more videos on what’s coming to Microsoft’s data platforms.

 

Power BI and Data Security – Row Level Security (RLS)

Power BI Security LogoAs Power BI becomes more prevalent in data analytics and visualization within the enterprise, data security becomes a significant concern. Power BI at its best is deployed to the Power BI service hosted on Microsoft’s Azure platform. Every enterprise should understand the level of security available with their data. Companies who have made the leap to cloud technologies such as AWS, Microsoft Azure, Salesforce, and Microsoft Office 365 should have an understanding of the data compliance and security capabilities of those solutions. However, companies who want to take advantage of Power BI but have just started their cloud journey or are cloud adverse need to know the nuances of Power BI and security.

I have been involved with data and cloud security questions a lot of the past few years. With Power BI’s rise in significance, I have had to answer more specific questions about the service. In order to provide proper guidance and not have a reference for myself, I am putting together a short series of posts on various data security items in Power BI. The topics included enterprise gateway, privacy levels, data classification, and compliance. The focus of these articles are related to using the Power BI service as this is the cloud implementation of Power BI. The desktop has setting which impact deployment of assets, but is not the focus of this series.

The Power BI service is updated frequently. These articles were created based on the Power BI implementation in early April 2017. You may find improvements and changes that impact your experience that are based on newer releases. Feel free to add comments to highlight changes.

Row Level Security in Power BI

Row level security is the ability to filter content based on a users role. There are two primary ways to implement row level security in Power BI – through Power BI or using SSAS. Power BI has the ability in the desktop to create roles based on DAX filters which affect what users see in the various assets in Power BI.

In order for this to work, you will need to deploy to a Workspace where users only have read permissions. If the members of the group associated to the Workspace have edit permissions, row level security in Power BI will be ignored.

Power BI Manage Roles.png

Both DirectQuery and data loaded into the model support RLS is the manner described above.

LiveConnection

SQL Server Analysis Services implements RLS on its own. SSAS requires the enterprise gateway to implement LiveConnection and RLS. RLS is supported by using EffectiveUserName on the connection from Power BI to the on-premises SSAS instance. (Refer to documentation on setting up live connections to SSAS.) This method works for both multidimensional and tabular models.

References

BI SSAS Connector Deep Dive (older content but good information)

Power BI SSAS Tabular Data

Power BI Admin for RLS

Power BI Row Level Security

Row-Level Security for Cloud models and DirectQuery

Tabular Model Row Level Security White Paper