Cosmos DB for the Data Professional

Cosmos DB LogoCosmos DB is one of the fastest growing Azure services in 2018. As its popularity grows, data professionals are faced with a changing reality in the world of data. Data is no longer contained in relational databases as general rule. We saw the start of this with Hadoop data storage, but no one ever referred to Hadoop as a database. Sure Hive and other Hadoop based technologies made the data look like a database, but we (data professionals) were able to keep our distance. What’s changed?

The Cloud, Data, and Databases

As cloud reaches more and more businesses, traditional data stores are being reconsidered. We now have data stored in Azure – Azure Data Lake, Azure Storage, Azure Database Services (SQL, PostgreSQL, MySQL), Azure Data Warehouse, and now Cosmos DB. Cosmos DB is the globalized version of Azure Document DB (more about that later). If we are to grow our skillset and careers to a cloud data professional, we need to know more about other ways the data is stored and used. I want to summarize some things that we need to be aware of about Cosmos DB. If your business uses it or plans to and you are a data pro, you will need to know this.

Introducing Cosmos DB

Azure Cosmos DB is Microsoft’s globally distributed, multi-model database.

Cosmos DB Overview 201804

Source: https://docs.microsoft.com/en-us/azure/cosmos-db/introduction 

I will break down key components of Cosmos DB with a data professional in mind. There are a lot of aspects of Cosmos DB that make it very cool, but you will want to understand this when you get the call to fix the database.

Multi-model Database Service

Currently Cosmos DB supports four database models. This is like having for different database servers in one. I liken it to having SQL Server Database Engine and SQL Server Analysis Services using the same underlying engine and it only “looks different.” Cosmos DB refers to these as APIs. The API is chosen when the database is created. This optimizes the portal and database for use with that API. Other APIs can be used to query the data, but it is not optimal. Here are the four models supported and the APIs that support them.

Cosmos DB models

  • Key Value Pair: This is exactly as it sounds. The API is implemented with the Azure Table Storage APIs.
  • Wide Column or Column Family: This stores data similar to relational, but there is no row consistency (each row can look different). Cosmos DB uses the Cassandra API to support this model. (For more information on Cassandra click here.)
  • Documents: This model is based on JSON document storage. Cosmos DB currently supports two APIs for this model: SQL which is the Document DB API and Mongo DB. These are the most common models used in Cosmos DB today. Document DB is the “parent” to Cosmos DB which was rebranded.
  • Graph: Graph databases are used to map relationships in data and were made popular with Facebook for instance. Microsoft uses the open source Gremlin API to support the Graph Database Model.

None of these databases are traditional row/column stores. They are all variations of NoSQL databases.

Turnkey Global Distribution

This is a key attribute for Cosmos DB. Cosmos DB can be easily distributed around the world. Click the data center you want to replicate to and Cosmos DB takes care of the rest. Cosmos DB uses a single write node and multiple read nodes. However, because Cosmos DB was built with global distribution in mind, you can easily and safely move the write node as well. This allows you to “chase the sun” and keep write operations happening “locally”.

Data Consistency

Data consistency is a primary concern of any data professional. The following tables compare Cosmos DB Consistency Levels with SQL Server Isolation Levels. These are not a one for one match, but demonstrate the different concerns between the systems.

 

Cosmos DB

SQL Server

Consistency Level Guarantees Isolation Level Dirty Read Non- repeatable Read Phantom
Strong Reads are guaranteed to return the most recent version of an item. Serializable No No No
Bounded Staleness Consistent Prefix or read order. Reads lag behind writes by prefixes (K versions) or time (t) interval. Snapshot No No No
Session Consistent Prefix. Monotonic reads, monotonic writes, read-your-writes, write-follows-reads. Repeatable Read No No Yes
Consistent Prefix Updates returned are some prefix of all the updates, with no gaps. Reads are not read out of order. Read Committed No Yes Yes
Eventual Out of order reads. Read Uncommitted Yes Yes Yes

As you can see, there are some similarities. These options are important to understand. In the Cosmos DB, the more consistent you need the data, the higher the latency in the distributed data. As a result, most Cosmos DB solutions usually start with Session Consistency as this gives a good, consistent user experience while reducing latency in the read replicas.

Throughput

I am not going to dig into this much. But you need to understand that Request Units (RU) are used to guarantee throughput in Cosmos DB. As a baseline, Microsoft recommends thinking that a 1 KB JSON file will require 1 RU. The capacity is reserved for each second. You will pay for what you reserve, not what you use. If you exceed capacity in a second your request will be throttled. RUs are provisioned by region and can vary by region as a result. But they are not shared between regions. This will require you to understand usage patterns in each region you have a replica.

Scaling and Partitions

Within Cosmos DB, partitions are used to distribute your data for optimal read and write operations. It is recommended to create a granular key with highly distinct values. The partitions are managed for you. Cosmos DB will split or merge partitions to keep the data properly distributed. Keep in mind your key needs to support distributed writes and distributed reads.

Indexing

By default, everything is indexed. It is possible to use index policies to influence the index operations. Index policies are modified for storage, write performance, and read or query performance. You need to understand your data very well to make these adjustments. You can include or exclude documents or paths, configure the index type, and configure the index update mode.  You do not have the same level of flexibility in indexes found in traditional relational database solutions.

Security

Cosmos DB is an Azure data storage solution which means that the data at rest is encrypted by default and data is encrypted in transit. If you need RBAC, Azure Active Directory (AAD) is supported in Cosmos DB.

SLAs

I think that the SLAs Microsoft provides with Cosmos DB are a key differentiator for them. Here is the short summary of guarantees Microsoft provides:

  • Latency: 99.99% of P99 Latency Attainment (based on hours over the guarantee)
    • Reads under 10 ms
    • Writes under 15 ms
  • Availability
    • All up – 99.99% by month
    • Read – 99.999% by month
  • Throughput – 99.99% based on reserved RUs (number of failures to meet reserved amount)
  • Consistency – 99.99% based on setting

These are financially backed SLAs from Microsoft. Imagine you providing these SLAs for your databases. This is very impressive.

Wrap Up

For more information, check out Microsoft’s online documentation on Cosmos DB.

I presented this material at the April 2018 PASS MN User Group Meeting. The presentation can be found here.

Power BI Data Security – Sharing in Email

 

Power BI Security LogoMicrosoft has expanded sharing by allowing users to share Power BI content via email. In a previous post, I discussed how sharing content within your organization should be handled carefully. However, the new process opens up the opportunity to share outside your organization by sending an email. In particular, you can now share with users who have a personal email address such as @outlook.com and @gmail.com. Let’s dig into the implications of this capability.

Sharing Using Email

First, you need to be aware that this functionality is as simple as the original methods of sharing. You click the Share button on your report or dashboard to open the Share dialog.

The Share report dialog in this case accepts email addresses which is not a significant change. However, as shown below, you can add personal emails and emails outside your organization. You be warned, but users do not always pay attention to this or understand the implications.

Share report - outside

You will also notice that consumers need to still have a Power BI Pro account assigned to them or you need to be using Power BI Premium for this to work.

Following the Email Process

When you share, you usually will need to send an email to the recipient. Here is the email content.

Report Share EmailTime to click the report link. This opens a series of dialogs which determine how much you have access. It is important to note that this is all made possible with Azure B2B. More about that in a moment. Let’s trace the story through. The link opens the following page.

Report Share Email - Welcome Link

As you can see, the next step is to log in. I am using an outlook.com account so it prompts me to authenticate. Once I have authenticated, I get the following notice.

Report Share Email - Opened Report

My account does not have Power BI Pro, but now I can try it for free for 60 days and get access to the data while I am on the trial. I clicked both options, because I can. The Upgrade account option would require me to pay for Pro. However, Try Pro for free works and I was able to access the report fully. I have successfully shared my corporate content with a personal user.

Preventing Sharing Outside Your Organization

While in some cases, you need to share outside your organization, we will assume here you need to disable this functionality. There are a few places you can make this happen.

Power BI Admin Portal

First, in Power BI go to the Admin portal and disable sharing outside your organization. If you have followed my previous advice, this will already be disabled.

 

PBI Admin Portal - Disable Sharing

As you can see, this will disable content for users who have been shared with previously. If you need to share, you can specify groups that have that permission.

Office 365 Admin Center

Next, this can be turned off in the Office 365 Admin Center in the Security and privacy area.

PBI O365 Admin Center - Disable Sharing

This prevents the ability to add guest users to the organization. This will disable this capability across Office 365. There is no option to allow some users this access. Once this is disabled, sharing outside the organization which requires a guest user will not be possible.

Azure Active Directory

Finally, you can shut this down from Azure Active Directory. Guest users are ultimately managed through Azure Active Directory and this is the best place to turn this off corporately if you do not need this functionality.

PBI AAD - Disable Sharing

In AAD you have four options.

  1. Guest users permissions are limited. This limits guest user capabilities with regard to the directory. Yes is the default and recommended.
  2. Admins and users in the guest inviter role can invite. This would be a typical option we can understand. However, it is important to note that Admin users in Power BI workspaces will have the ability to create guest users and share reports externally with this permission on.
  3. Members can invite. Just like it sounds. Any member of a group can invite guest users in.
  4. Guests can invite. This allows guests to invite other guests. Seems dangerous to me.

As you can see from my tenant, the options are all on which is the default. Be sure to understand what capability you want to use and set it appropriately within your tenant.

Tracking Sharing

In the Office 365 logging, you can see who and what has been shared. This log covers internal and external shares and should be monitored for auditing and compliance purposes.

Azure B2B

Azure B2B and the sharing capabilities in Power BI go hand in hand. This allows organizations to share content in a controlled fashion to consumers outside their organization. While this is required for certain scenarios, be mindful of who has the capability to share, and track sharing to make sure the data is being handled as you require.

Final Thoughts and References

You need to remember that sharing is at the heart of Power BI and you need to manage how and who can share. If you need to do more extensive sharing, by all means, use these features. For those, who need to lock it down tighter, you can follow the steps above to prevent sharing until you have a process and pattern. Power BI continues to improve and grow and as that happens we can expect more security options to support the new functionality. Enjoy Power BI, it is a great tool and will only continue to get better.

References

Using Azure AD B2B with Power BI

Auditing Power BI

Share your Power BI content with anyone by email

 

 

Power BI Data Security – Sharing

Power BI Security LogoMicrosoft recently added more sharing capabilities that may change my view on sharing within the enterprise. As with all things Power BI, change is inevitable.

Up to this point, I recommended that customers did not use sharing as an enterprise solution due to the inability to manage it and the potential to share data within the organization that violates compliance or internal rules.

Sharing Within Your Organization

When you share a dashboard or a report within your organization, you share the data with it. Here is the issue from my perspective. If you allow users to share content, they are responsible to share responsibly. That is correct. The content creators are now responsible to manage security as well. So, let’s walk through the basics of using sharing effectively and securely within your organization.

Why Share?

The primary reason to use share is to distribute content outside the context of a Power BI App. Power BI Apps should be your first mechanism for sharing content within your organization. It requires more thought and planning which is typically a good idea with your companies data. However, there are times when sharing makes sense. With the ability to share reports, you can limit sharing to specific areas. Also, you may want to create a “one-off” report for use in decision making but not something to be deployed in the long term.

Sharing is very different from deploying Apps. App deployment is not that difficult to do, but prevents sharing and is much easier to manage access.

The Process of Sharing

Sharing capabilities are readily available on any content that you create.

At this point, there is no way to prevent sharing within your organization. Content can be shared from My Workspace as well.

The first step to sharing is to click the Share button on the report or dashboard you want to share.

PBI Share Button

This will launch a dialog for sharing the report or dashboard as shown here:

PBI Share Dialog

I have highlighted a couple of key parts to the dialog. The first is that you can share with individuals, distribution lists, and security groups. This is similar to the permissions you can apply to an App during deployment. As a content creator, I can distribute in this fashion. Typically power users who create content will use individual names or distribution lists as they are the most common methods of working with teams.

The next part to understand is the Allow recipients to share your report option. I have a couple of issues with this option. First, it is on by default. This means if someone shares with a peer in their department that individual can then share outside their department. The original content creator no longer has control of who this is shared to when this option is turned on which is my second issue. While the content creator will be able to see everyone they share with in the Access panel of the dialog when they review it later, they have potentially released data “into the wild” without controls if they do not set this up properly.

Click Share. You have successfully shared your report. Next, let’s have a look at the Access panel after the share is done. This panel is used view and manage sharing within the workspace.

PBI Share - Access Dialog

When in this dialog you can see who has what level of access to the report or dashboard you are currently in. You will see all reshares here as well. This will allow the content creator to remove access if needed.

The Manage permissions link opens up a dialog that lets you view and manage permissions for the entire workspace.

PBI Share - Manage Access

As you can see, sharing is managed by content creators. It will be important for them to understand the process.

Monitoring Sharing

Your Power BI environment should have auditing turned on. This will allow you to run reports to understand who has shared reports and dashboards across the tenant. This will be required to manage auditing and compliance within your organization.

Sharing and Security Thoughts

As I worked through this capability, there are a couple of closing thoughts on security to keep in mind.

  1. You cannot prevent sharing. You must monitor it, so be sure you have auditing turned on in your subscription.
  2. This has a place when sharing on a smaller scale. I would not recommend it as the standard process, but it allows you to share content in smaller chunks.
  3. You must have a process and policy for sharing. This has to be understood by content creators.
  4. If you implement row-level security in Power BI or SSAS, it is honored in sharing. This will prevent unauthorized access to sensitive data. Use this when you have particularly sensitive data in use.

One other thought. If this is a significant concern, you should evaluate Power BI Premium as it will allow to manage which users have the capability to create and share content. Free users are effectively read only within the organization. This will be cost-prohibitive for smaller organizations unless security is the primary concern.

Properly planned for you will be able to share effectively with Apps as a deployment model.

Power BI and Data Security – Sharing Data

Power BI Security LogoAs Power BI becomes more prevalent in data analytics and visualization within the enterprise, data security becomes a significant concern. Power BI at its best is deployed to the Power BI service hosted on Microsoft’s Azure platform. Every enterprise should understand the level of security available with their data. Companies who have made the leap to cloud technologies such as AWS, Microsoft Azure, Salesforce, and Microsoft Office 365 should have an understanding of the data compliance and security capabilities of those solutions. However, companies who want to take advantage of Power BI but have just started their cloud journey or are cloud adverse need to know the nuances of Power BI and security.

I have been involved with data and cloud security questions a lot of the past few years. With Power BI’s rise in significance, I have had to answer more specific questions about the service. In order to provide proper guidance and not have a reference for myself, I am putting together a short series of posts on various data security items in Power BI. The topics included enterprise gateway, privacy levels, data classification, and compliance. The focus of these articles are related to using the Power BI service as this is the cloud implementation of Power BI. The desktop has setting which impact deployment of assets, but is not the focus of this series.

The Power BI service is updated frequently. These articles were created based on the Power BI implementation in early April 2017. You may find improvements and changes that impact your experience that are based on newer releases. Feel free to add comments to highlight changes.

Power BI Collaboration Basics

The focus of this post is on Power BI collaboration through sharing data using a variety of options in Power BI. While Power BI Desktop is a great tool for building datasets and reports, the real goal of a good BI solution is to share the information and analysis with the correct people in the organization who will be able to make decisions based on it. The Power BI service (https://app.powerbi.com) is the best way to do this.

First, the service requires a work or school based login, it does not work with a Microsoft, Google, Yahoo, or similar accounts. This is the beginning of the walls to protect your data. In most cases you will only be able to share data within your organization. However, there are methods to share dashboards publically. We will discuss those here and show how to turn off or regulate those features within Power BI and Office 365.

Power BI is built with Azure Active Directory (AAD) and customers who have or are in the process of implementing Office 365 are in the best position to establish proper security protocols to manage access to the Power BI service.

Power BI Dashboard Sharing

Power BI sharing can only be done on the service with dashboards. It does not work with reports or datasets and cannot be shared from Power BI Desktop. Initially, I viewed this a not a great option, but the reality is it is the best way to share content in a read only mode. A shared dashboard allows the users to interact with the data and view the underlying reports as part of the solution. This could be a good option when you want to share an executive dashboard with a security group or distribution list within your organization.

Share Dashboard

Even though the dialog shows email addresses to enter, security groups and distribution lists can also be added here keeping the AAD security model intact. Shared dashboards are marked with a distinct icon:

Dashboard Share Icon

Dashboards can be shared with free Power BI users within the organization. However, they will not be able to view any dashboards which use Power BI Pro features including workgroups, direct query, live connection, and other Pro based features. It is recommended that all users within an organization have a Pro account at this time.

Managing Share Capabilities within Power BI’s Admin Portal

As one can imagine, when the share dashboard capability was released there were reasonable concerns regarding sharing content outside the organization. When using an email address outside of the domain, users get warned they are sharing content outside of the organization.

Share Dashboard - outside

This is definitely a significant security risk. We recommend that this feature be disabled. Be aware that it is enabled by default (this may change for newer subscriptions, but most existing subscriptions have this feature on). You can deactivate this option in the Admin Portal – Tenant Settings – Export and sharing Settings as shown below.

Admin Portal - Sharing Settings.png

If you have some groups who should have permission to share outside the organization you can specify which groups have those permissions. This may be the case where you have a business to business arrangement where sharing a specific dashboard will improve your ability to communicate with the targeted organization.

If you have no compelling reason to share content outside your organization, this feature should be disabled!

Power BI Workspaces

Another way to compartmentalize or secure data is using Workspaces within Power BI. Every user, including free users, have access to My Workspace which is the default location for deploying Power BI and other BI assets. However, you also have the option to create additional workspaces as deployment targets. These Group Workspaces usually have functional and security separation associated with them.

Power BI Create Group.pngHere are the key characteristics of a group:

  • Group membership is individual users. Power BI Groups do not currently support security groups or distribution lists for membership.
  • Private vs. Public
    • Private groups limit access to members of groups.
    • Public groups work like a folder within Power BI and can be used to separate content but are not security restricted.
  • You can set the group to “read only” by setting the “Members can only view Power BI content” option.
    • This option disables editing on deployed reports and dashboards by members
    • Admin users within the group can edit reports and dashboards
    • The dataset area is not visible to members, only admins, which prevents creating new reports

With the current limitation around group membership (as of April 2017), I recommend using groups primarily as folders. As this situation improves, they will have more value as security groups as well. However, with the inability to manage these groups with AAD security groups, management will likely be prohibitive. It is likely that users will create groups to provide limited visibility with sharing, but this will create Office 365 groups to manage into the future.

Organizational Content Packs

Another method of sharing content is with organization content packs. Content packs allow the targeted users or groups to pick up the pack and use it in their workspace as needed. They can create copies of the content to use in their own dashboards and to create custom reports on the data. The data access and refresh are determined by the content pack creator. This is a way to not manage workspaces but still make content available to other users. Content packs can be made available to the entire organization, security groups, or distribution lists. Once a user gets the content pack, changes made by the owner can be updated to them as they occur.

You can limit who has permissions to publish content for the entire organization in the Admin Portal – Tenant Settings under the Content Pack Settings header. Users can continue to publish content to specific groups, but will no longer have the “My entire organization” option for publishing.

Publish to Web

Only one option counts here – disable this feature if you don’t want have a reason to display data on the internet!

My recommendation is that if you have a public facing version of your dashboards that do not any security at all, create a new subscription to manage this experience. You can disable this feature in the Admin Portal – Tenant Settings as shown below. All existing Power BI tenants have this enabled by default. You should disable this feature for your primary, internal Power BI implementations.

Publish to Web Power BI

Exporting Data and Printing Dashboards and Reports

Depending on the needs of your organization, you may need to restrict settings which allow data to be exported or printed. You have the ability to disable or enable exporting data from tiles or visualizations, exporting reports as PowerPoint presentations, and printing dashboards and reports for the entire organization or specific security groups. Both of these features have been highly requested and caution should be taken when disabling them. You can adjust these setting in the Admin portal under Tenant settings.

The Missing “Read Only” User

As you can see from the options to share or create workspaces there are methods which allow you to distribute content in read only fashion. However, in order to properly apply security and other features within Power BI, all of your enterprise users should be Power BI Pro users. Power BI Pro users still have a number of permissions that can cause issues within organizations including the ability to publish and share content from their workspace. Until Microsoft establishes a “read only” user setting or subscriber, organizations will need to manage content with the options noted above and determine the risk. In most cases, the risk is no more an issue than allowing users to use Microsoft Excel or Tableau. However, know your plan and be mindful of the updates from the Power BI team which will expand our ability to manage users.

References

Create a Group in Power BI

Sharing Your Dashboard

 

Power BI Is Finally in the Azure Trust Center

With the most recent announcement of Power BI’s inclusion in the Azure Trust Center, it is a good time to review where we are today with Power BI security and compliance as it relates to various customer needs. I do a lot of work with financial, energy, and medical customers. These groups represent a large amount of compliance and regulation needs. I wanted to understand where we are today and this announcement is significant.

What’s in the Announcement?

One the primary roadblocks to accepting the Power BI service has been the lack of compliance and concerns around security. Microsoft has been making a number of enterprise level improvement to the Power BI service and desktop. Power BI now has the following compliance certifications:

PowerBI Compliance 2016

This announcement shows Microsoft’s continued commitment to security and compliance in its cloud based products. While Power BI is not yet to the level of Office 365, some key compliance areas are now covered.

I think the most significant compliance certification is HIPAA/HITECH which removes barriers related for the medical industry. As hospitals, insurance companies, and providers scramble to meet reporting demands from their customers and the government, Power BI gives them a flexible reporting and visualization platform to meet those needs. It will empower self-service in the organizations and departmental or enterprise collaboration with data. The HIPAA/HITECH certification will allow them to use the platform with more confidence and security.

Beyond medical, more institutions will be able to rely on Power BI in a manner that is compliant and safe. As Microsoft continues this journey with Power BI and its other Azure based offerings, customers will be able to react more quickly to the changing business and regulatory environments with confidence in the security and management of their data.

The Reality – You Are as Secure as You Choose to Be

Even with this significant move by Microsoft, you are still responsible for implementing a secure, compliant solution. Microsoft is merely providing tools that are secure and will comply with regulations if implemented correctly. The key to a secure environment will always be you. The data you use and analyze with Power BI is ultimately your responsibility.

I encourage you to review the following resources in addition to the ones above as you determine your security and compliance within the Power BI product: