TechFuse Minnesota Follow Up

 TechFuse Fall Edition

My Session: Using Azure SQL Database for Enterprise Needs

On 10/6/2015, I presented on Azure SQL Database at TechFuse Minnesota. Some of the highlights from the session included a comparison of the various SQL Server offerings and how Azure SQL Database fits into the overall picture. During the session we also discussed the importance of V12 and elastic databases (still in preview) for the enterprise environment. I was also raised and discussed the cloudy concerns including data safety, data compliance, data privacy and business continuity. With the ongoing changes Microsoft makes in this space, expect more changes to come. However, even today you can be successful using Azure SQL Database for your enterprise needs.

The presentation can be found in PDF format here.

Microsoft has a wealth of documentation which can be found here: https://azure.microsoft.com/en-us/documentation/services/sql-database/.

Questions from the Session

How are upgrades handled in Azure SQL Database?

Most upgrades to Azure SQL Database are seamless to us. However, they have made changes over the years which have required some intervention. One such change was the move from Web and Business Service Tiers to Basic, Standard, and Premium. In this case Microsoft provided a lot of guidance around the process and it was very easy to do in most cases. The most recent instance was the move to V12. Once again Microsoft provided the upgrade path in documentation. Both upgrades could be handled in the portal. However, V12 was a significant change so groups needed to do more testing prior to an upgrade.

Microsoft’s Azure upgrades for Azure SQL Database over the past couple of years: https://azure.microsoft.com/en-us/updates/?service=sql-database.

Can we use Point in Time Restore to recover a table or to recover to a new database?

Point in Time Restore allows us to recover databases to specific points in time. Each service tier has different amounts they retain: Basic-7 days, Standard-14 days; Premium-35 days. Azure SQL Database always recovers the database to a new database. This means you can either “swap” the database once it is recovered or move the items you need to recover from the new database. One word of caution, you need to have the DTU capability to run both databases on the server to support the restore capability. Be sure to plan the recovery process and clean up when you have completed your recovery.

Thanks again to everyone who could attend this year.

Until the next session …

Setting Up an HDInsight Cluster (No Scripts Required)

Let me start by saying, I am not a fan of scripting. It definitely has its place and a lot of my peers really like it. It is the easiest way to get functionality out from software vendors such as Microsoft. PowerShell is an incredibly powerful tool which can do just about anything. However, therein lies the problem for me. Scripting solves a lot of problems, however, I just wanted to set up and use a basic HDInsight cluster to create some Power BI demos (posts coming soon). So I started the journey to find the scripts and try to understand the syntax and so on. Then I went to the Azure Portal, here is what I did to set up my cluster and load data with no scripting required. My goal was to go to get a working demo platform up. Would I necessarily recommend this path for production work, not sure yet. But now I can work with HDInsight with considerably less work required to set up the environment.

HDInsight Cluster No Script Setup Requirements

You need an Azure account. You can go to http://azure.microsoft.com to sign up for a free account if you like. If you have an MSDN subscription you should have some time available as well.

HDInsight Cluster No Script Setup

Once you have your account created, you should go to http://portal.azure.com. We will be doing our setup from here. During the process we will be creating a storage account (if this is your first run in azure, you may choose to set up a Resource Group as well) and the HDInsight cluster. Be aware that the cluster has compute costs and the storage has storage costs. At the end we will remove the cluster to save your compute time.

Create the Storage Account

This step can be done during the HDInsight cluster creation, but this limits your ability to share data across clusters. If you are just trying it for fun, you can do this during the cluster set up.

Click the + symbol on the portal, then Data + Storage, then Storage Account. This will open a blade with the set up instructions for a storage account.

image

When you create your account you will have some options to fill in:

  • Name: this name will need to be a unique name, e.g., joescoolhdinsight
  • Pricing tier: The pricing tier is really important if you are using a limited plan or if you plan to keep the data for a long time. If you are planning to use this as a demo, I would select Locally Redundant as that is the lower cost plan.
  • Resource Group: The resource group lets organize your Azure assets. This is for your benefit, so if you want to keep all of the HDInsight components together, you could create a group for that or stick with the default.
  • Subscription: This lets you choose the subscription you want to use.
  • Location: Be sure to select a location close to you that supports HDInsight. Check http://azure.microsoft.com/en-us/regions/ to see what Azure services are supported in each region.
  • Diagnostics: This is optional. If you are looking into the diagnostics or need to prep for production, you will find this useful. In most cases, we would not turn this on for demos.

Click Create and it will create your storage account. This may take a few minutes. The notifications section on the portal will alert you when this has been completed. Once that is complete, we will continue with setting up the cluster.

Create a SQL Database for a Metastore

This is an optional section. If you would like to use Hive or Oozie and want to create a metastore so you can reuse your work when you recreate the cluster you need to create a SQL Database for this.

image

Here are the settings needed to create the database:

  • Name: Something easy for you to track on, e.g. HDInsightMetastore.
  • Server: You can use an existing server if you have it, otherwise you can create a new server. I recommend you create the server in the same location you plan to create your HDInsight cluster.
  • Pricing Tier: The default is S0. If you plan to use this for demos and don’t need the additional features, you can choose Basic.
  • Optional Configuration: no changes.
  • Resource Group: Use the Resource Group you have in place for this example.
  • Subscription: Select your subscription.

Click Create to create your database. You will work with this during the setup of your cluster.

Setting Up the HDInsight Cluster

Click the + symbol on the portal, then Data + Analytics, then HDInsight. This will open a blade with the set up instructions for a storage account.

image

Like with the storage account setup, this will open a blade with options for creating the cluster. Let’s walk through the settings on this.

  • Cluster Name: Like the storage account, this name needs to be unique.
  • Cluster Yype: Select Hadoop for this walk through.
  • Cluster Operating System: Select Windows Server for this walk through.
  • Subscription: Choose the same subscription as your storage account.
  • Resource Group: Choose the same Resource Group as your storage account.
  • Cluster Credentials: Here you select a login name and password for your cluster. You can also choose to enable Remote Desktop, but we are not using that feature for this setup. (Note: be sure to click Select at the bottom when you are done. If you don’t, you will be prompted by IE about unsaved settings.)
  • Data Source: Here is where you select your storage account. If you chose not to create a storage account, you can create a new account here as well.
  • Node Pricing Tiers: This section determines the capability and the associated computing costs of your cluster. By default, 4 worker nodes and 2 head nodes will be created with recommended servers (D12 at the time of this writing). Expand the pricing tier, to change the server type or node count. Unless you are sure you need to change, keep the default settings (you can recreate the cluster later). You will see the current hourly pricing based on your selections. This cost is incurred while the service is running. The only way to stop charges is to delete the cluster, so be sure to do this when you are done if you do not wish to pay for it to keep running.
  • Optional Configuration: You do not need to change any setting here if you choose not to. However, if you plan to delete your cluster and you want to retain the metadata, it is recommended that you set up an External Metastore using the database you created previously.
    • Select the database you want to use for the metadata in each case and update the credentials. You can use the same database for both metastores.

Next, you create the cluster. This will take a few minutes. You can track progress in the notifications section on the portal page.

Exploring Your New HDInsight Cluster

Once the cluster has been created, you will see the information page with the settings and other usage information. At the top of that area, you will see some icons. These will help you explore your cluster some more.

image

The gear will open up a settings page and you can review your settings in detail and change some if needed.

The icon with a square and an arrow will open up a dashboard with more options. We will dig into the dashboard more in the next post.

The last three icons are shortcuts to specific actions – remote desktop, scale cluster, and delete.

Once you are done, you should delete your cluster. You can always go through these steps again to recreate your cluster. In my next article we will go through what you can see and do with your cluster using the dashboard.

Intro to Data Factory–Training on the T’s Follow Up Post

PragmaticWorks-LogoThis is a follow up blog post based on the Intro to Data Factory session I gave on the Training on the T’s with Pragmatic Works. Find more free training from the past and upcoming here. I did my session on January 13, 2015.

 Intro To Data Factory

In this session, I gave a simple introduction to new Azure Data Factory using a CopyActivity pipeline between Azure Blob Storage and Azure SQL Database. Below is a diagram illustrating the factory that is created in the demo.

image

I have published my presentation materials here. This includes the sample JSON files, the Movies.csv, and PowerShell scripts.

Q & A

Here are a few questions that were answered during the session.

1. Does Availability refer to when data that has been transferred will be available? Or when the data source is actually available for query?

Availability refers to when the datasets will make a slice available. This is the when the dataset can be consumed as an input or be targeted as an output. This means you can consume data hourly but choose to push it to its final destination on a different cadence to prevent issues on the receiving end.

2. What pre-requisites are must haves?…e.g.(Azure account, HDInsight, Blob Storage Accounts, etc.)

    • An Azure Account is the only real must have. You could use two on premise SQL Server instances.
    • HDInsight if you want to use the HDInsight activitities
    • An Azure Storage account to use blob or table storage

3. How do you decide to use a Factory or Warehouse?

The factory is more of a data movement tool. A warehouse could be a source or target of a factory pipeline.

4. Is this similar to SSIS in SQL Server?

Yes and no. SSIS is definitely more mature and has more tooling available such as data sources and transformations. SSIS also have a good workflow constructor. The focus of the Data Factory initially was to load HDInsight tables from a variety of sources with more flexibility. The other note here is that Data Factory is being built from the ground up to support the scale of the cloud or Azure.

5. Can this be used for Big Data?

Absolutely. I would say that it is one of the primary reasons for the tool. In reference to the previous question, it will likely be the tool of choice for big data operations because it will be able to scale with Azure.

Links to Additional Resources on Data Factory or tools that were used in the presentation:

Azure Data Factory on Azure’s Website

Azure Data Factory Documentation

Azure Data Factory Pricing

Azure Storage Explorer

Azure PowerShell Documentation

Thanks for joining me for this presentation. We look forward to seeing you at the next Free Training on the T’s.

SQL Azure’s place in the data world (Part 1)

I recently presented on SQL Azure at Code Mastery in Minneapolis.  The presentation focused on using SQL Azure in application development.  We took the time to create a new server and a new SQL Azure database.  After that part of the demo, we hooked up various management tools (SQL Server 2008 R2 Management Studio and SQL Azure Database Manager) and applications (e.g.  SQL Server 2008 Integration Services, Microsoft Access, Microsoft Excel).

At the end of that presentation, I started discussing how developers and architects could use SQL Azure.  That raised a number of questions on where SQL Azure fits into the data or information architecture in the real world.  Some of the questions that matter to implementers were:

  • Can you back up databases?
  • What is Microsoft’s SLA for SQL Azure?
  • How easily can you keep the data up to date between on premise and cloud databases?
  • How much data can be supported by SQL Azure?
  • Is it really cost effective?
  • What are some use cases that make sense based on the answers to the questions above?

I will tackle these questions in this and some following posts. Hopefully you will get a perspective of where SQL Azure is today and how it may or may not fit in your environment.

Can you backup databases?
What is Microsoft’s SLA for SQL Azure?

The question behind these questions is what are my disaster recovery and high availability options from service and data standpoints.  One of the coolest features of SQL Azure is the fact that it has built in disaster recovery and high availability at the service level.  Microsoft creates three copies of your database within their data centers.  In this way, Microsoft guarantees 99.9% availability of SQL Azure databases.  (See http://www.microsoft.com/windowsazure/sla/ for more details.)

However, this only covers the service availability.  What happens if your data becomes suspect or corrupt?  Well for starters, SQL Azure has no support for the standard backup and restore operations we are used to.  Microsoft is currently pointing us to creating a copy of the database within the SQL Azure platform.  (Details: http://msdn.microsoft.com/en-us/library/ff951631.aspx.)  This will be a new SQL Azure database which is a copy of the existing database.  You can use SQL Azure Data Sync (CTP1) to keep these databases in sync.  This still does not have a way to bring that database backup on premise for storage and historical retention.  Your best option at this point is to use SQL Server Integration Services until SQL Azure Data Sync supports on-premise to cloud synchronization which is expected in CTP2.

How easily can you keep the data up to date between on premise and cloud databases?

So the last questions lead into how to keep the data up to date between on premise and cloud databases.  The best option is most likely yet to come in the SQL Azure Data Sync CTP2.  When that is released, you will be able to exchange data between on premise and cloud databases with built in tools.  Until then, you do have some options.  First, you can look at using SQL Server Integration Services to keep the data moving between the two databases.  It is very straightforward to create an ADO.NET data connection and point it to a SQL Azure database.  This can be a source or destination for the data and it works very well.  Beyond that you can use BCP or T-SQL scripts to move data back and forth, but this involves more development work to complete.

As you can see, until Data Sync support is ready for primetime, keeping data synchronized between on premise and Azure databases will require some development on your end.  You will need to review the current options and stay alert for new options for syncing your data at the moment.

Next post

In the next post, we will discuss database size, cost, and effective use cases for SQL Azure.  Stay tuned.