Let me start by saying, I am not a fan of scripting. It definitely has its place and a lot of my peers really like it. It is the easiest way to get functionality out from software vendors such as Microsoft. PowerShell is an incredibly powerful tool which can do just about anything. However, therein lies the problem for me. Scripting solves a lot of problems, however, I just wanted to set up and use a basic HDInsight cluster to create some Power BI demos (posts coming soon). So I started the journey to find the scripts and try to understand the syntax and so on. Then I went to the Azure Portal, here is what I did to set up my cluster and load data with no scripting required. My goal was to go to get a working demo platform up. Would I necessarily recommend this path for production work, not sure yet. But now I can work with HDInsight with considerably less work required to set up the environment.
HDInsight Cluster No Script Setup Requirements
You need an Azure account. You can go to http://azure.microsoft.com to sign up for a free account if you like. If you have an MSDN subscription you should have some time available as well.
HDInsight Cluster No Script Setup
Once you have your account created, you should go to http://portal.azure.com. We will be doing our setup from here. During the process we will be creating a storage account (if this is your first run in azure, you may choose to set up a Resource Group as well) and the HDInsight cluster. Be aware that the cluster has compute costs and the storage has storage costs. At the end we will remove the cluster to save your compute time.
Create the Storage Account
This step can be done during the HDInsight cluster creation, but this limits your ability to share data across clusters. If you are just trying it for fun, you can do this during the cluster set up.
Click the + symbol on the portal, then Data + Storage, then Storage Account. This will open a blade with the set up instructions for a storage account.
When you create your account you will have some options to fill in:
- Name: this name will need to be a unique name, e.g., joescoolhdinsight
- Pricing tier: The pricing tier is really important if you are using a limited plan or if you plan to keep the data for a long time. If you are planning to use this as a demo, I would select Locally Redundant as that is the lower cost plan.
- Resource Group: The resource group lets organize your Azure assets. This is for your benefit, so if you want to keep all of the HDInsight components together, you could create a group for that or stick with the default.
- Subscription: This lets you choose the subscription you want to use.
- Location: Be sure to select a location close to you that supports HDInsight. Check http://azure.microsoft.com/en-us/regions/ to see what Azure services are supported in each region.
- Diagnostics: This is optional. If you are looking into the diagnostics or need to prep for production, you will find this useful. In most cases, we would not turn this on for demos.
Click Create and it will create your storage account. This may take a few minutes. The notifications section on the portal will alert you when this has been completed. Once that is complete, we will continue with setting up the cluster.
Create a SQL Database for a Metastore
This is an optional section. If you would like to use Hive or Oozie and want to create a metastore so you can reuse your work when you recreate the cluster you need to create a SQL Database for this.
Here are the settings needed to create the database:
- Name: Something easy for you to track on, e.g. HDInsightMetastore.
- Server: You can use an existing server if you have it, otherwise you can create a new server. I recommend you create the server in the same location you plan to create your HDInsight cluster.
- Pricing Tier: The default is S0. If you plan to use this for demos and don’t need the additional features, you can choose Basic.
- Optional Configuration: no changes.
- Resource Group: Use the Resource Group you have in place for this example.
- Subscription: Select your subscription.
Click Create to create your database. You will work with this during the setup of your cluster.
Setting Up the HDInsight Cluster
Click the + symbol on the portal, then Data + Analytics, then HDInsight. This will open a blade with the set up instructions for a storage account.
Like with the storage account setup, this will open a blade with options for creating the cluster. Let’s walk through the settings on this.
- Cluster Name: Like the storage account, this name needs to be unique.
- Cluster Yype: Select Hadoop for this walk through.
- Cluster Operating System: Select Windows Server for this walk through.
- Subscription: Choose the same subscription as your storage account.
- Resource Group: Choose the same Resource Group as your storage account.
- Cluster Credentials: Here you select a login name and password for your cluster. You can also choose to enable Remote Desktop, but we are not using that feature for this setup. (Note: be sure to click Select at the bottom when you are done. If you don’t, you will be prompted by IE about unsaved settings.)
- Data Source: Here is where you select your storage account. If you chose not to create a storage account, you can create a new account here as well.
- Node Pricing Tiers: This section determines the capability and the associated computing costs of your cluster. By default, 4 worker nodes and 2 head nodes will be created with recommended servers (D12 at the time of this writing). Expand the pricing tier, to change the server type or node count. Unless you are sure you need to change, keep the default settings (you can recreate the cluster later). You will see the current hourly pricing based on your selections. This cost is incurred while the service is running. The only way to stop charges is to delete the cluster, so be sure to do this when you are done if you do not wish to pay for it to keep running.
- Optional Configuration: You do not need to change any setting here if you choose not to. However, if you plan to delete your cluster and you want to retain the metadata, it is recommended that you set up an External Metastore using the database you created previously.
- Select the database you want to use for the metadata in each case and update the credentials. You can use the same database for both metastores.
Next, you create the cluster. This will take a few minutes. You can track progress in the notifications section on the portal page.
Exploring Your New HDInsight Cluster
Once the cluster has been created, you will see the information page with the settings and other usage information. At the top of that area, you will see some icons. These will help you explore your cluster some more.
The gear will open up a settings page and you can review your settings in detail and change some if needed.
The icon with a square and an arrow will open up a dashboard with more options. We will dig into the dashboard more in the next post.
The last three icons are shortcuts to specific actions – remote desktop, scale cluster, and delete.
Once you are done, you should delete your cluster. You can always go through these steps again to recreate your cluster. In my next article we will go through what you can see and do with your cluster using the dashboard.