We can create the Azure Data Explorer Environment using any of the three methods listed below.
- Using Azure Portal (also with ARM templates)
- Using PowerShell commands
- Using Command Line Interface (CLI) commands
We will first describe ADX cluster creation using Azure Portal, and then using the other two methods.
Create ADX Cluster using Azure Portal
Step -1: Click on “Create a Resource” and search for Azure Data Explorer.
Step 2: Click on “Create” button and on the page that appears, provide basic details. You need to select the subscription, resource group, provide a name for your cluster, select the region, and then select the compute specifications from the available resources for your cluster.
There are 2 different compute tiers available to choose from. They are-
- Storage Optimized Tier: These are represented by L and DS series compute resources. They are useful when you have the requirement of more cached data in comparison to the amount of CPU needed for processing queries and ingestion.
- Compute Optimized: This tier is in exact contrast to the storage optimized tier and are useful when the amount of CPU needed for processing complex queries and ingestion is more than the data needed to be cached. The storage cost in this tier is more, but the computational cost is less.
Step 3: After the basic information has been provided, click on the Next button for providing information on scaling of the cluster. You can either opt for manual scaling or select the optimized autoscale. Optimized autoscale is the recommended way as the resources will be automatically scaled depending on the workload and pre-defined rules.
Step 4: Click on the Next button for setting up the configurations. Here you can enable the capabilities of your Azure Data Explorer cluster for setting up streaming ingestion as well as purging. By default, they are set to “off”.
In case you wish to use these ADX capabilities, you can enable them from this tab, but you will have to properly configure it after the cluster has been deployed.
Step 5: Next step would be to setup the security setting for your cluster. You can chose of have a system assigned managed identity and the cluster will be registered with Azure Active Directory and you can control its access to other Azure resources.
Disk encryption can be configured once the cluster deployment is complete.
Step 6: Next step would be to configure the networking requirements. If you want your cluster to be attached to your virtual network, you can configure the details on the Networking tab.
Step 7: Once the settings are complete, you can provide the details for tagging (if needed, which is a good practice) and finally click on create. Once the deployment starts, you will see the below screen.
After the deployment is complete, you will see the message saying that the deployment is complete. From here you can go to the newly created ADX cluster page
Step 8: Now that the cluster is ready, it is time to create a database for the cluster. From the ADX cluster overview page, you can click on the “Create Database” button.
Here you need to provide the database name, retention period (in days), and the Cache period. The retention period sets the number of days that you want the data to be retained in the cluster, whereas, the cache period determines the number of days of data that you would want to keep in the hot cache for high performance queries.
You can later change these settings by going to the cluster database settings.
Step 9: After the database has been created, it is now time to configure data ingestion for Azure Data Explorer cluster. There are two ways for data ingestion. The first one is the One-Click Ingestion and the second is to create a data connection, which can be connected to the Event Hub, to the Blob Storage, or the IoT Hub as well.
In this demo, we will try to configure data ingestion using the One-Click Ingestion, which is the new feature of Azure Data Explorer cluster.
One-click ingestion helps you ramp-up data quickly to start ingesting data, creating database tables, mapping structures. You can ingest the data from different kinds of sources and in different formats like – JSON, CSV, TSV, SCSV, SOHSV, TSVE, PSV.
You can create a new table, and let one-click ingestion map the structure from the source, and create table columns. You can also use the existing tables as well
Here we have chosen to ingest data from a CSV file. The CSV file used here is the StormEvents.csv, which can be downloaded from the “National Centers For Environmental Information“. They provide sample weather data for data analysis purposes.
After the data is ingested you will see the Storm Events table created inside the database that you had created in step 8. You can then perform the queries on the database. A simple query has been performed on the ingested data as shown below.
Part – 1: Data Science Overview
Part – 2: Understanding Azure Data Explorer
Part – 3: Azure Data Explorer Features
Part – 4: Azure Data Explorer Service Capabilities
Part – 6: The Kusto Query Language
Part – 7: Data Obfuscation in Kusto Query Language
Part – 10: Managing Azure Data Explorer Cluster