Category Archives: SQL Azure

Thoughts about the Microsoft Data Amp Announcements

Microsoft conducted a live event called Microsoft Data Amp to announce a number of key features and releases for SQL Server on premises and data platforms in Azure (such as Azure SQL DB and Azure Data Lake). Some of these include features that I have been waiting to see. Here are some of announcements that I am excited about.

Microsoft Data Platforms 
Intelligent – Trusted – Flexible
On-premises & Cloud

SQL Server 2017

Yes. Microsoft has officially announced that SQL Server vNext is SQL Server 2017. The marquee feature being released in SQL Server running on Linux. But this also shows Microsoft is increasing its innovation efforts with SQL Server with an even shorter time between releases.

CTP 2 of SQL Server 2017 has been released today and includes an number of analytics features such as support for graph processing and graph queries. It will be the first commercial database with built in support for AI and deep learning database applications using R and Python scripts. Check out all the database engine improvements.

Azure SQL Database

Microsoft is bringing even more symmetry between the on-premises product and the PaaS product. The goal is to support upgrades or migrations to Azure SQL DB with minimal effort and no changes. Here are some of the features that are coming to Azure SQL DB soon:

  • Support for SQL Agent
  • 3-part names
  • DBMail
  • CDC
  • Service Broker
  • Cross-Database and Cross-Instance querying
  • CLR & R Services
  • SQL Profiler
  • Native backup-restore
  • Log shipping
  • Transactional Replication

These features will definitely bring more parity to the platforms. A number of these features are key for some of my clients to move to Azure SQL DB.

Migration Project for Azure SQL DB

Whether you have SQL Server, Oracle, or MySQL, you should be able to migrate your database to Azure SQL DB in “five simple steps”. While a great tool, I am interested in exploring this more with Oracle in particular. You can create a project in Azure that let’s you choose the source database and platform and target a Azure SQL DB then move the schema and load the database. While I am skeptical on the full capability of this solution, I look forward to exploring it more.

Azure Analysis Services is GA

The last topic I am going to bring up is Azure Analysis Services. This service is now GA which brings a great service to the PaaS space in Azure. Check out the capabilities here.

Final Thoughts

Microsoft announced much more than I highlight here including tighter AI integration into the data engine, R Server 9.1, and planet scale Document DB. Check out the Microsoft Data Amp site for more videos on what’s coming to Microsoft’s data platforms.

 

Advertisements

TechFuse Minnesota Follow Up

 TechFuse Fall Edition

My Session: Using Azure SQL Database for Enterprise Needs

On 10/6/2015, I presented on Azure SQL Database at TechFuse Minnesota. Some of the highlights from the session included a comparison of the various SQL Server offerings and how Azure SQL Database fits into the overall picture. During the session we also discussed the importance of V12 and elastic databases (still in preview) for the enterprise environment. I was also raised and discussed the cloudy concerns including data safety, data compliance, data privacy and business continuity. With the ongoing changes Microsoft makes in this space, expect more changes to come. However, even today you can be successful using Azure SQL Database for your enterprise needs.

The presentation can be found in PDF format here.

Microsoft has a wealth of documentation which can be found here: https://azure.microsoft.com/en-us/documentation/services/sql-database/.

Questions from the Session

How are upgrades handled in Azure SQL Database?

Most upgrades to Azure SQL Database are seamless to us. However, they have made changes over the years which have required some intervention. One such change was the move from Web and Business Service Tiers to Basic, Standard, and Premium. In this case Microsoft provided a lot of guidance around the process and it was very easy to do in most cases. The most recent instance was the move to V12. Once again Microsoft provided the upgrade path in documentation. Both upgrades could be handled in the portal. However, V12 was a significant change so groups needed to do more testing prior to an upgrade.

Microsoft’s Azure upgrades for Azure SQL Database over the past couple of years: https://azure.microsoft.com/en-us/updates/?service=sql-database.

Can we use Point in Time Restore to recover a table or to recover to a new database?

Point in Time Restore allows us to recover databases to specific points in time. Each service tier has different amounts they retain: Basic-7 days, Standard-14 days; Premium-35 days. Azure SQL Database always recovers the database to a new database. This means you can either “swap” the database once it is recovered or move the items you need to recover from the new database. One word of caution, you need to have the DTU capability to run both databases on the server to support the restore capability. Be sure to plan the recovery process and clean up when you have completed your recovery.

Thanks again to everyone who could attend this year.

Until the next session …

Powering Up HDInsight with Power BI–

On Tuesday, September 15, I presented on this topic for Pragmatic Works. You can find that session here. This session is based on five blog posts that I created in August 2015.

Powering Up HDInsight with Power BI (pdf)https://dataonwheels.files.wordpress.com/2016/02/powering-up-hdinsight-with-power-bi.pdffoundin the

HDInsight Series Featured Pic

Setting Up and HDInsight Cluster (No Scripts Required)

Exploring the Microsoft Azure HDInsight Query Console (No Scripting Required)

Uploading Files to an HDInsight Cluster (No Scripting Required)

Using Power BI with HDInsight Part 1: Power Query and Files

Using Power BI with HDInsight Part 2: Power BI Desktop and Hive

My goals for this series

1. Document using Power BI with HDInsight

2. Prove that you can set up a HDInsight Cluster with no scripts

Other References from the Session

Azure: http://azure.microsoft.com/en-us/

Cloud Berry: http://www.cloudberrylab.com/free-microsoft-azure-explorer.aspx

Wrap Up from the Session

A few questions were asked during the session and I wanted to handle some of them here.

Why did you not use Azure Resource Manager to deploy storage?

I did this as simple as possible and did not need to use the Resource Manager for my demos. However, if you need to rebuild the cluster quickly, the Azure Resource Manager would be a good option. Find out more here: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-provision-clusters/. This site will also walk through scripts and other options for setting up HDInsight clusters.

Why didn’t the table structure show up in the Power Query demo?

The Power Query demo worked with the data from a file approach. This approach is more “raw”. The files did not have column headers, so no headers were created in the table. However, when working with the Power BI Desktop demo, I used Hive. The table was defined in Hive and were easily seen. This is another case for using Hive or something similar to define the schema for ease of use.

What are the differences between Hadoop, Hortonworks, and HDInsight?

Starting from the top, Hadoop is the Adobe open source specification. All of the products listed above are based on Hadoop. Hadoop

Hortonworks and Cloudera are examples of Hadoop distributions. These companies have worked with the various versions of open source technologies around Hadoop and created a supported distribution as a result.

image 

Finally, HDInsight is Microsoft’s cloud-based Hadoop implementation. They continue to add functionality including Spark, R, Giraph, and Solr. You can expect Microsoft to continue to grow the capabilities of HDInsight as part of their cloud-based analytics solutions.

Thanks for attending my session.

Setting Up an HDInsight Cluster (No Scripts Required)

Let me start by saying, I am not a fan of scripting. It definitely has its place and a lot of my peers really like it. It is the easiest way to get functionality out from software vendors such as Microsoft. PowerShell is an incredibly powerful tool which can do just about anything. However, therein lies the problem for me. Scripting solves a lot of problems, however, I just wanted to set up and use a basic HDInsight cluster to create some Power BI demos (posts coming soon). So I started the journey to find the scripts and try to understand the syntax and so on. Then I went to the Azure Portal, here is what I did to set up my cluster and load data with no scripting required. My goal was to go to get a working demo platform up. Would I necessarily recommend this path for production work, not sure yet. But now I can work with HDInsight with considerably less work required to set up the environment.

HDInsight Cluster No Script Setup Requirements

You need an Azure account. You can go to http://azure.microsoft.com to sign up for a free account if you like. If you have an MSDN subscription you should have some time available as well.

HDInsight Cluster No Script Setup

Once you have your account created, you should go to http://portal.azure.com. We will be doing our setup from here. During the process we will be creating a storage account (if this is your first run in azure, you may choose to set up a Resource Group as well) and the HDInsight cluster. Be aware that the cluster has compute costs and the storage has storage costs. At the end we will remove the cluster to save your compute time.

Create the Storage Account

This step can be done during the HDInsight cluster creation, but this limits your ability to share data across clusters. If you are just trying it for fun, you can do this during the cluster set up.

Click the + symbol on the portal, then Data + Storage, then Storage Account. This will open a blade with the set up instructions for a storage account.

image

When you create your account you will have some options to fill in:

  • Name: this name will need to be a unique name, e.g., joescoolhdinsight
  • Pricing tier: The pricing tier is really important if you are using a limited plan or if you plan to keep the data for a long time. If you are planning to use this as a demo, I would select Locally Redundant as that is the lower cost plan.
  • Resource Group: The resource group lets organize your Azure assets. This is for your benefit, so if you want to keep all of the HDInsight components together, you could create a group for that or stick with the default.
  • Subscription: This lets you choose the subscription you want to use.
  • Location: Be sure to select a location close to you that supports HDInsight. Check http://azure.microsoft.com/en-us/regions/ to see what Azure services are supported in each region.
  • Diagnostics: This is optional. If you are looking into the diagnostics or need to prep for production, you will find this useful. In most cases, we would not turn this on for demos.

Click Create and it will create your storage account. This may take a few minutes. The notifications section on the portal will alert you when this has been completed. Once that is complete, we will continue with setting up the cluster.

Create a SQL Database for a Metastore

This is an optional section. If you would like to use Hive or Oozie and want to create a metastore so you can reuse your work when you recreate the cluster you need to create a SQL Database for this.

image

Here are the settings needed to create the database:

  • Name: Something easy for you to track on, e.g. HDInsightMetastore.
  • Server: You can use an existing server if you have it, otherwise you can create a new server. I recommend you create the server in the same location you plan to create your HDInsight cluster.
  • Pricing Tier: The default is S0. If you plan to use this for demos and don’t need the additional features, you can choose Basic.
  • Optional Configuration: no changes.
  • Resource Group: Use the Resource Group you have in place for this example.
  • Subscription: Select your subscription.

Click Create to create your database. You will work with this during the setup of your cluster.

Setting Up the HDInsight Cluster

Click the + symbol on the portal, then Data + Analytics, then HDInsight. This will open a blade with the set up instructions for a storage account.

image

Like with the storage account setup, this will open a blade with options for creating the cluster. Let’s walk through the settings on this.

  • Cluster Name: Like the storage account, this name needs to be unique.
  • Cluster Yype: Select Hadoop for this walk through.
  • Cluster Operating System: Select Windows Server for this walk through.
  • Subscription: Choose the same subscription as your storage account.
  • Resource Group: Choose the same Resource Group as your storage account.
  • Cluster Credentials: Here you select a login name and password for your cluster. You can also choose to enable Remote Desktop, but we are not using that feature for this setup. (Note: be sure to click Select at the bottom when you are done. If you don’t, you will be prompted by IE about unsaved settings.)
  • Data Source: Here is where you select your storage account. If you chose not to create a storage account, you can create a new account here as well.
  • Node Pricing Tiers: This section determines the capability and the associated computing costs of your cluster. By default, 4 worker nodes and 2 head nodes will be created with recommended servers (D12 at the time of this writing). Expand the pricing tier, to change the server type or node count. Unless you are sure you need to change, keep the default settings (you can recreate the cluster later). You will see the current hourly pricing based on your selections. This cost is incurred while the service is running. The only way to stop charges is to delete the cluster, so be sure to do this when you are done if you do not wish to pay for it to keep running.
  • Optional Configuration: You do not need to change any setting here if you choose not to. However, if you plan to delete your cluster and you want to retain the metadata, it is recommended that you set up an External Metastore using the database you created previously.
    • Select the database you want to use for the metadata in each case and update the credentials. You can use the same database for both metastores.

Next, you create the cluster. This will take a few minutes. You can track progress in the notifications section on the portal page.

Exploring Your New HDInsight Cluster

Once the cluster has been created, you will see the information page with the settings and other usage information. At the top of that area, you will see some icons. These will help you explore your cluster some more.

image

The gear will open up a settings page and you can review your settings in detail and change some if needed.

The icon with a square and an arrow will open up a dashboard with more options. We will dig into the dashboard more in the next post.

The last three icons are shortcuts to specific actions – remote desktop, scale cluster, and delete.

Once you are done, you should delete your cluster. You can always go through these steps again to recreate your cluster. In my next article we will go through what you can see and do with your cluster using the dashboard.

Intro to Data Factory–Training on the T’s Follow Up Post

PragmaticWorks-LogoThis is a follow up blog post based on the Intro to Data Factory session I gave on the Training on the T’s with Pragmatic Works. Find more free training from the past and upcoming here. I did my session on January 13, 2015.

 Intro To Data Factory

In this session, I gave a simple introduction to new Azure Data Factory using a CopyActivity pipeline between Azure Blob Storage and Azure SQL Database. Below is a diagram illustrating the factory that is created in the demo.

image

I have published my presentation materials here. This includes the sample JSON files, the Movies.csv, and PowerShell scripts.

Q & A

Here are a few questions that were answered during the session.

1. Does Availability refer to when data that has been transferred will be available? Or when the data source is actually available for query?

Availability refers to when the datasets will make a slice available. This is the when the dataset can be consumed as an input or be targeted as an output. This means you can consume data hourly but choose to push it to its final destination on a different cadence to prevent issues on the receiving end.

2. What pre-requisites are must haves?…e.g.(Azure account, HDInsight, Blob Storage Accounts, etc.)

    • An Azure Account is the only real must have. You could use two on premise SQL Server instances.
    • HDInsight if you want to use the HDInsight activitities
    • An Azure Storage account to use blob or table storage

3. How do you decide to use a Factory or Warehouse?

The factory is more of a data movement tool. A warehouse could be a source or target of a factory pipeline.

4. Is this similar to SSIS in SQL Server?

Yes and no. SSIS is definitely more mature and has more tooling available such as data sources and transformations. SSIS also have a good workflow constructor. The focus of the Data Factory initially was to load HDInsight tables from a variety of sources with more flexibility. The other note here is that Data Factory is being built from the ground up to support the scale of the cloud or Azure.

5. Can this be used for Big Data?

Absolutely. I would say that it is one of the primary reasons for the tool. In reference to the previous question, it will likely be the tool of choice for big data operations because it will be able to scale with Azure.

Links to Additional Resources on Data Factory or tools that were used in the presentation:

Azure Data Factory on Azure’s Website

Azure Data Factory Documentation

Azure Data Factory Pricing

Azure Storage Explorer

Azure PowerShell Documentation

Thanks for joining me for this presentation. We look forward to seeing you at the next Free Training on the T’s.