Introduction to the Kusto Query Language
We already created the environment in the previous section, and now, we will extend our knowledge by first creating the tables using the Kusto explorer, and then import the data in the table from an external source. This is technically called data ingestion.
After creating tables and ingesting data to them we can move forward and use Kusto Query Language (aka KQL) to explore the data. We can use such queries to discover patterns, identify anomalies and outliers, create statistical modeling and more.
At the end you should get your data validated by SMEs or stakeholders. This is where you would wish to share the data. This can be done by exporting the data in the CSV format directly from ADX.
Finally, after the data has been validated, the visualized data needs to be presented. There are multiple different ways to share the visualized data. You can share the insights using Excel, or Power BI, or directly from the ADX.
KQL (Kusto Query Language) was developed with certain key principals in mind, like – easy to read and understand syntax, provide high-performance through scaling, and the one that can transition smoothly from simple to complex query.
Interestingly KQL is a read-only query language, which processes the data and returns results. It is very similar to SQL with a sequence of statements, where the statements are modeled as a flow of tabular data output from the previous statement to the next statement. These statements are concatenated with a pipe (|) character.
In SQL, the queries start with the column names and we only get to know about the table name when we reach the “From” statement, whereas, in KQL, the query starts with the table name followed by the pipe character after which the conditions are defined. We will see how this works shortly.
KQL is the query language and the Kusto Engine is the engine that receives the queries in KQL to execute them, and specifically the large datasets from Azure, like –
- Azure Application Insights
- Azure Log Analytics
- Windows Defender Advanced Threat Protection
- Azure Security Center
Apart from these, the data can be ingested from external sources as well. It can be done using the custom code in any preferred language like Python, .Net SDK, R, etc. using the Azure Data Explorer API. Data can also be ingested using Event Hub’s and Event Grid’s, and from the CSV file as well. In order to know more about the data ingestion in Azure Data Explorer, visit Overview of Data Ingestion in Azure Data Explorer.
Another interesting fact is that KQL knows to run the SQL commands as well. Also, the user can also get the KQL equivalent of the SQL command (in most cases), as KQL supports a subset of the SQL language. For this you can use Kusto to translate the SQL query to an equivalent KQL by prefixing it with ‘Explain’.
Explain Select Count_Big(*) as BigCount from StormEvents
For detailed information, visit SQL to Kusto Cheat Sheet
KQL Commands Overview
Below are a few basic KQL commands to get you familiarized as to how KQL queries look like. You definitely need to know and learn about entity types, Data types, tabular operators, scalar operators, functions, scalar functions, time-series analysis, and other important KQL control commands.
In the following examples, we are using the weather information provided by “National Centers For Environmental Information“ . They provide sample weather data from previous years in the CSV format for data analysis purposes. We are using the StormEvents.csv file to work with KQL.
- To create a table named StormEvents with simple columns and datatypes.
.create table StormEvents (StartTime: datetime, EndTime: datetime, EpisodeId: int, EventId: int, State: string, EventType: string, InjuriesDirect: int, InjuriesIndirect: int, DeathsDirect: int, DeathsIndirect: int, DamageProperty: int, DamageCrops: int, Source: string, BeginLocation: string, EndLocation: string, BeginLat: real, BeginLon: real, EndLat: real, EndLon: real, EpisodeNarrative: string, EventNarrative: string, StormSummary: dynamic)
- To ingest the data into the StormEvents table from the CSV file. The “with (ignoreFirstRecord = true)” is used to ignore the first row of the CSV file in case it contains the column header.
.ingest into table StormEvents h'https://kustosamplefiles.blob.core.windows.net/samplefiles/StormEvents.csv?st=2018-08-31T22%3A02%3A25Z&se=2020-09-01T22%3A02%3A00Z&sp=r&sv=2018-03-28&sr=b&sig=LQIbomcKI8Ooz425hWtjeq6d61uEaq21UVX7YrM61N4%3D' with (ignoreFirstRecord=true)
- To display the entire table just write the table name
- To list the weather events from the StormEvents table data between January 1st, 2007 and December 31st, 2007, use the below query. For this, the ‘between‘ operator is used in the where clause.
StormEvents | where StartTime between (datetime('2007-01-01')..datetime('2007-12-31'))
- In case you wish to list the event type where the injuries have been greater than 10. Note that we are using the ‘project’ operator, which helps in selecting the columns to include, renamed, or dropped in the results returned as a part of the query. We can also use it to include computed columns.
StormEvents | where StartTime between (datetime('2007-01-01')..datetime('2007-12-31')) | where InjuriesDirect > 10 | project EventType, InjuriesDirect, EpisodeNarrative
- To count the weather events by type for the month of July in 2007 in a particular state (For ex – Washington)
StormEvents | where State == "WASHINGTON" and StartTime between (datetime('2007-07-01')..datetime('2007-07-31')) | summarize StormCount = count() by EventType
- To list the count of weather events for all the states, where the count is greater than 2000
StormEvents | summarize event_count = count() by State | where event_count > 2000 | project State, event_count | sort by event_count desc
There are certain demo platforms that are provided by Microsoft, which can be used free of cost for practice purposes. They are for –
These platforms also have saved queries that can be used to get an insight into how queries are formed and complex queries can be built. You can save your queries as well.
There are two dedicated courses by Robert Cain on the Kusto Query Language on Pluralsight, which gives you deeper insight into KQL and that course is highly recommended for you as a data engineer as it details out the different kinds of commands and capabilities of KQL. They are –
Part – 1: Data Science Overview
Part – 2: Understanding Azure Data Explorer
Part – 3: Azure Data Explorer Features
Part – 4: Azure Data Explorer Service Capabilities
Part – 5: Creating the ADX Environment
Part – 7: Data Obfuscation in Kusto Query Language
Part – 8: Data Ingestion Preparation: Schema Mapping
Part – 9: Overview of data ingestion in Azure Data Explorer
Part – 10: Managing Azure Data Explorer Cluster