As Power BI becomes more prevalent in data analytics and visualization within the enterprise, data security becomes a significant concern. Power BI at its best is deployed to the Power BI service hosted on Microsoft’s Azure platform. Every enterprise should understand the level of security available with their data. Companies who have made the leap to cloud technologies such as AWS, Microsoft Azure, Salesforce, and Microsoft Office 365 should have an understanding of the data compliance and security capabilities of those solutions. However, companies who want to take advantage of Power BI but have just started their cloud journey or are cloud adverse need to know the nuances of Power BI and security.
I have been involved with data and cloud security questions a lot of the past few years. With Power BI’s rise in significance, I have had to answer more specific questions about the service. In order to provide proper guidance and not have a reference for myself, I am putting together a short series of posts on various data security items in Power BI. The topics included enterprise gateway, privacy levels, data classification, and compliance. The focus of these articles are related to using the Power BI service as this is the cloud implementation of Power BI. The desktop has setting which impact deployment of assets, but is not the focus of this series.
The Power BI service is updated frequently. These articles were created based on the Power BI implementation in early April 2017. You may find improvements and changes that impact your experience that are based on newer releases. Feel free to add comments to highlight changes.
The On-premises Data Gateway (a.k.a. Enterprise Gateway)
First, I will not be discussing the personal gateway in this post. If you have chosen to use the personal gateway, you have limited functionality and should consider using the on-premises data gateway for corporate use.
The on-premises data gateway (referred to as gateway throughout this post) “acts as a bridge, providing quick and secure data transfer between on-premises data and the Power BI, Microsoft Flow, Logic Apps, and PowerApps services.” (ref) Much of what is discussed here will apply to all of the services referenced above, but our primary concern is related to Power BI. Please refer to references at the end of this post for details about data sources supported within the gateway.
The gateway enables Power BI to use on-premises data for data refresh and direct access with Direct Query and Live Connections (SSAS multidimensional and tabular models). The gateway is used to manage connectivity and data transfer between on-premises data and Power BI with data compression and transport encryption capabilities as part of the solution. Our focus here is related to the most common questions related to the gateway’s use with Power BI. We will discuss security related to the gateway and then to how the data is secure when using the gateway.
Security on the Gateway
When the gateway is installed, the default service account NT Service\PBIEgwService is created as a Windows service logon credential. This credential has “log on as a service” permissions. The first item to note: this credential is NOT used to access data sources. This service account has localized permissions to the server or PC it is installed on. It has no permissions to on-premises data sources or cloud services that use it.
In some situations, this can create issues with proxy servers. If you run into this situation, you can change the account to a domain account. Refer to the proxy configuration documentation to make that change. The recommendation is to change this to a managed service account in Active Directory to avoid resetting passwords which will disable the gateway and likely cause user satisfaction issues.
Data Sources in the Gateway
While the gateway does not have access to services or data sources, it does have the capability to decrypt the connection information used by Power BI to connect to on-premises services. When you add data source to the gateway you created, the credentials are encrypted using the key from the gateway.
Each gateway can manage multiple data sources. (NOTE: Best practices about location and performance of the gateway are not in scope of this post.) In my example, the gateway is providing access to a folder which contains receipt files. This will allow my Power BI solution to refresh data from the source. I can add a SQL Server connection as well if it is in the same network or context. The key here is that the gateway is an entry point for your on-premises data and is not limited to a single data source.
Credentials stored with the gateway cannot be decrypted in the cloud. The credentials are only decrypted by the gateway. When considering maintenance and configuration it is important to know that this is one of the key purposes of the gateway. Without a gateway, Power BI cannot access data in your on-premises solution. (Gateways are also required for Azure IaaS solutions. However, Azure SQL Database and Azure SQL DW do not require gateways as they are PaaS solutions and managed differently within Azure.)
All data and information between the gateway and Power BI is encrypted. One of the primary concerns is around opening ports and the communication protocol that supports this communication.
The first important item to cover is that there are no inbound ports used by the gateway. The gateway creates an outbound connection to the Azure Service Bus using a specific set of ports including TCP 443 which is used for Power BI (complete list of ports used). It is possible to force the gateway to use HTTPS in lieu of direct TCP for all of its communication. If you require this as an organization, be aware that there may be performance issues. This setting can be changed in the gateway properties and will require a restart of the service.
Data and the Gateway
The second primary question in regards to the gateway is around how data is handled. When a request from Power BI is submitted for data, the Azure Service Bus holds the request with the encrypted credentials. The on-premises data gateway polls the Azure Service Bus for requests. Once the request is received by the on-premises gateway, the connection is decrypted and the query request sent to the appropriate resource. The data is then encrypted and compressed at the gateway and returned to Power BI.
No data is stored in the gateway and the data is encrypted for transit.
Users and the Gateway
One last consideration is related to who can use a gateway. In Power BI service, when you manage the gateway (see diagram above about Data Sources), you have the ability to manage access to data sources by user. This functionality also supports security groups. When implemented, only users who have access to the data source can use the data source for Power BI datasets that they are deploying. This will prevent users from publishing content that would require direct access or data refresh to sources they should not use.
When they are able to use the gateway, they will have access to refresh scheduling and other options via the dataset properties (I use the Schedule Refresh option to open the dialog).
There are a lot of considerations for enterprises who plan to implement gateways in their organizations. The key is to remember this is a bridge that allows on-premises data to be accessed by cloud services. However, the cloud services do not initiate a direct request to the on-premises data. Microsoft has done a great job allowing for a hybrid approach that enables organizations to take advantage of cloud resources while minimizing the impact to their on-premises assets.