Data Obfuscation in Kusto Query Language


One of the facts about the Azure Data Explorer Cluster is that the system tracks all the queries and stores them for telemetry and analysis purposes and, therefore, this data is available for the cluster owner to view.

There may be a condition where your query contains sensitive data, such as passwords, contact details, SSN numbers, etc., which you do not wish to share with the cluster owner. This is where data obfuscation comes into the picture, which is also known as Dynamic Data Masking (DDM).

You can obfuscate the sensitive data stored in the queries such that the data is converted to an asterisk(*). It might be a little confusing in the beginning, and it might be misunderstood as data encryption, but both data obfuscation and encryption are different things.

Data encryption can encrypt the data in the tables during ingestion, whereas, obfuscating the query or the ingestion statement comes into play when there are critical or sensitive information that you do not want to be retained in the query text for it to be later read by the cluster owner. It will mask that critical information as an asterisk.

In case the encryption or masking is needed at the ingestion level, data should be obfuscated at the source, and the power of data explorer is in simply ingesting what is present. It may be possible to include some transformation tasks, such as obfuscation during ingestion via the SDK.

For obfuscation of the intended string literal, you can either put an “h” or an “H” in front of it. Let me try to explain it with a few examples  –

Example -1

print ObfuscatedLiteral = h'https://kustosamplefiles.blob.core.windows.net/samplefiles/StormEvents.csv?st=2018-08-31T22%3A02%3A25Z&se=2020-09-01T22%3A02%3A00Z&sp=r&sv=2018-03-28&sr=b&sig=LQIbomcKI8Ooz425hWtjeq6d61uEaq21UVX7YrM61N4%3D'
Obfuscated Literal                                       
https://kustosamplefiles.blob.core.windows.net/samplefiles/StormEvents.csv?st=2018-08-31T22%3A02%3A25Z&se=2020-09-01T22%3A02%3A00Z&sp=r&sv=2018-03-28&sr=b&sig=LQIbomcKI8Ooz425hWtjeq6d61uEaq21UVX7YrM61N4%3D

Here in the output, we see the string without the “h” appended because we are running the query, but if now the cluster owner runs the below command, the obfuscated value is returned

.show queries | where StartedOn > ago(1m)

The above command will check for all the queries run in the past 1 minute. The output of the above command will be

tempsnip

If you notice, the complete text under print has been obfuscated as the “h” was placed right at the beginning of the URL.

Example -2

print ObfuscatedLiteral = 'https://kustosamplefiles.blob.core.windows.net/samplefiles/StormEvents.csv?' h'st=2018-08-31T22%3A02%3A25Z&se=2020-09-01T22%3A02%3A00Z&sp=r&sv=2018-03-28&sr=b&sig=LQIbomcKI8Ooz425hWtjeq6d61uEaq21UVX7YrM61N4%3D'
Obfuscated Literal                                       
https://kustosamplefiles.blob.core.windows.net/samplefiles/StormEvents.csv?st=2018-08-31T22%3A02%3A25Z&se=2020-09-01T22%3A02%3A00Z&sp=r&sv=2018-03-28&sr=b&sig=LQIbomcKI8Ooz425hWtjeq6d61uEaq21UVX7YrM61N4%3D

In the above example, we are trying to print the same string, but in this case, we have removed the “h” from the beginning of the URL and have placed it just before the query string parameters, and after the “StormEvents.csv?”. Here we want to obfuscate the query string parameters, which includes the shared access signature.

This will have the same output for us, but in case the cluster owner runs the show query again, he will see the below output.

sampple2

In the above output, you will see that the text column will have the URL, but the query string parameters have been obfuscated.

I hope this will help you in designing the data obfuscation strategy for your Azure Data Explorer Cluster database as a database administrator or a table administrator.

Part – 1: Data Science Overview

Part – 2: Understanding Azure Data Explorer

Part – 3: Azure Data Explorer Features

Part – 4: Azure Data Explorer Service Capabilities

Part – 6: The Kusto Query Language

Part – 8: Data Ingestion Preparation: Schema Mapping

Part – 9: Overview of data ingestion in Azure Data Explorer

Part – 10: Managing Azure Data Explorer Cluster

Blog at WordPress.com.

Up ↑

%d bloggers like this: