The Kusto Query Language

Introduction to the Kusto Query Language

We already created the environment in the previous section, and now, we will extend our knowledge by first creating the tables using the Kusto explorer, and then import the data in the table from an external source. This is technically called data ingestion.

After creating tables and ingesting data to them we can move forward and use Kusto Query Language (aka KQL) to explore the data. We can use such queries to discover patterns, identify anomalies and outliers, create statistical modeling and more.

At the end you should get your data validated by SMEs or stakeholders. This is where you would wish to share the data. This can be done by exporting the data in the CSV format directly from ADX.

Finally, after the data has been validated, the visualized data needs to be presented. There are multiple different ways to share the visualized data. You can share the insights using Excel, or Power BI, or directly from the ADX.

TheKQL

KQL (Kusto Query Language) was developed with certain key principals in mind, like – easy to read and understand syntax, provide high-performance through scaling, and the one that can transition smoothly from simple to complex query.

Interestingly KQL is a read-only query language, which processes the data and returns results. It is very similar to SQL with a sequence of statements, where the statements are modeled as a flow of tabular data output from the previous statement to the next statement. These statements are concatenated with a pipe (|) character.

In SQL, the queries start with the column names and we only get to know about the table name when we reach the “From” statement, whereas, in KQL, the query starts with the table name followed by the pipe character after which the conditions are defined. We will see how this works shortly.

KQL Usage

KQL Usage

KQL is the query language and the Kusto Engine is the engine that receives the queries in KQL to execute them, and specifically the large datasets from Azure, like –

  1. Azure Application Insights
  2. Azure Log Analytics
  3. Windows Defender Advanced Threat Protection
  4. Azure Security Center

Apart from these, the data can be ingested from external sources as well. It can be done using the custom code in any preferred language like Python, .Net SDK, R, etc. using the Azure Data Explorer API. Data can also be ingested using Event Hub’s and Event Grid’s, and from the CSV file as well. In order to know more about the data ingestion in Azure Data Explorer, visit Overview of Data Ingestion in Azure Data Explorer.

Another interesting fact is that KQL knows to run the SQL commands as well. Also, the user can also get the KQL equivalent of the SQL command (in most cases), as KQL supports a subset of the SQL language. For this you can use Kusto to translate the SQL query to an equivalent KQL by prefixing it with ‘Explain’.

Explain
Select Count_Big(*) as BigCount from StormEvents

For detailed information, visit SQL to Kusto Cheat Sheet

KQL Commands Overview

Below are a few basic KQL commands to get you familiarized as to how KQL queries look like. You definitely need to know and learn about entity types, Data types, tabular operators, scalar operators, functions, scalar functions, time-series analysis, and other important KQL control commands.

In the following examples, we are using the weather information provided by “National Centers For Environmental Information“ . They provide sample weather data from previous years in the CSV format for data analysis purposes. We are using the StormEvents.csv file to work with KQL.

  • To create a table named StormEvents with simple columns and datatypes.
.create table StormEvents 
(StartTime: datetime, EndTime: datetime, EpisodeId: int, EventId: int, State: string, EventType: string, InjuriesDirect: int, InjuriesIndirect: int, DeathsDirect: int, 
DeathsIndirect: int, DamageProperty: int, DamageCrops: int, Source: string, BeginLocation: string, EndLocation: string, BeginLat: real, BeginLon: real, EndLat: real, 
EndLon: real, EpisodeNarrative: string, EventNarrative: string, StormSummary: dynamic)
  • To ingest the data into the StormEvents table from the CSV file. The “with (ignoreFirstRecord = true)” is used to ignore the first row of the CSV file in case it contains the column header.
.ingest into table StormEvents
h'https://kustosamplefiles.blob.core.windows.net/samplefiles/StormEvents.csv?st=2018-08-31T22%3A02%3A25Z&se=2020-09-01T22%3A02%3A00Z&sp=r&sv=2018-03-28&sr=b&sig=LQIbomcKI8Ooz425hWtjeq6d61uEaq21UVX7YrM61N4%3D'
with (ignoreFirstRecord=true)
  • To display the entire table just write the table name
StormEvents
  • To list the weather events from the StormEvents table data between January 1st, 2007 and December 31st, 2007, use the below query. For this, the ‘between‘ operator is used in the where clause.
StormEvents
| where StartTime between (datetime('2007-01-01')..datetime('2007-12-31'))
  • In case you wish to list the event type where the injuries have been greater than 10. Note that we are using the ‘project’ operator, which helps in selecting the columns to include, renamed, or dropped in the results returned as a part of the query. We can also use it to include computed columns.
StormEvents
| where StartTime between (datetime('2007-01-01')..datetime('2007-12-31'))
| where InjuriesDirect > 10
| project EventType, InjuriesDirect, EpisodeNarrative
  • To count the weather events by type for the month of July in 2007 in a particular state (For ex – Washington)
StormEvents
| where  State == "WASHINGTON" and StartTime between (datetime('2007-07-01')..datetime('2007-07-31'))
| summarize StormCount = count() by EventType
  • To list the count of weather events for all the states, where the count is greater than 2000
StormEvents
| summarize event_count = count() by State
| where event_count > 2000
| project State, event_count
| sort by event_count desc

Demo Platforms

There are certain demo platforms that are provided by Microsoft, which can be used free of cost for practice purposes. They are for –

  1. Log Analytics 
  2. Application Insights 
  3. Windows Defender Advanced Threat Protection

These platforms also have saved queries that can be used to get an insight into how queries are formed and complex queries can be built. You can save your queries as well.

There are two dedicated courses by Robert Cain on the Kusto Query Language on Pluralsight, which gives you deeper insight into KQL and that course is highly recommended for you as a data engineer as it details out the different kinds of commands and capabilities of KQL. They are –

  1. Azure Data Explorer – Basic KQL
  2. Azure Data Explorer – Advanced KQL

Part – 1: Data Science Overview

Part – 2: Understanding Azure Data Explorer

Part – 3: Azure Data Explorer Features

Part – 4: Azure Data Explorer Service Capabilities

Part – 5: Creating the ADX Environment

Part – 7: Data Obfuscation in Kusto Query Language

Part – 8: Data Ingestion Preparation: Schema Mapping

Part – 9: Overview of data ingestion in Azure Data Explorer

Part – 10: Managing Azure Data Explorer Cluster

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: