Microsoft Azure-Using External Data Into KQL

In many businesses, Azure is becoming the infrastructure backbone. It has become imperative to be able to query Azure using KQL, to gain insights into the Azure services your organization utilizes. In this post, let’s understand how to explore logs in Azure data storage using an external data file into KQL.

Highlights:

  • What Is Kusto Query Language (KQL)?
  • Sample Use Cases For Demo
  • Prerequisites
  • Simple Method For Basic Use Case
  • Alternative Method For Enhanced Use Case
  • Key Takeaways

 

What Is Kusto Query Language?

KQL, which stands for Kusto Query Language, is a powerful tool to explore your data and discover patterns, identify anomalies and outliers, create statistical models, and more. The language is used to query the Azure Monitor Logs, Azure Application Insights, Azure Resource Explorer, and others.

Sample Use Cases For Demo

1. Basic use case (solved using a simple method)

  • Imagine you have a set of servers and applications hosted in Azure.
  • You have configured logs & metrics collection using Azure monitoring services.
  • You must query the logs to find out applications that are hitting high processor utilization.

2. Enhanced use case (solved using an alternative method)

  • You must query the logs to find out only selected applications/serves that are hitting high processor utilization.
  • For every server, the threshold is different.
  • You want to control which serves to be queried.
  • You want to dynamically update thresholds for different computers.
  • You don’t want to update the KQL query.
Note:

  • For this demo purpose, we are using a log analytics workspace provided by Microsoft in their documentation for KQL/Kusto language. Please access the demo logs here for free.
  • To display the storage role, we have created a storage account in our subscription and blob container to host some files.

Prerequisites

  • Azure Subscription
  • Azure Storage Account (Blob Container) and/or AWS S3 Bucket
  • Azure Log Analytics Workspace (Azure Monitoring Service)

Simple Method For Basic Use Case

Let’s first explore the sample data available for the demo.

  1. Open your favorite browser and go to thislink.
  2. If you are already logged into Azure, it will open directly, else it will ask you to sign in.
  3. After signing in, a new query window is displayed.Microsoft Azure Monitoring Logs-DemoLogsBlade
  4. On the left side of the panel, you can explore tables and queries available in the demo workspace.Microsoft Azure-Logs Demo
  5. Copy & Paste the following code into new query 1.
InsightsMetrics
| where TimeGenerated > ago(30m)
| where Origin == “vm.azm.ms”
| where Namespace == “Processor”
| where Name == “UtilizationPercentage”
| summarize avg(Val) by bin(TimeGenerated, 5m), Computer //split up by computer
| join kind=leftouter (Computers) on Computer
| where isnotempty(Computer1)
| sort by avg_Val desc nulls first

Microsoft Azure-Sample Query

6. Click Run and you will see the following output.Microsoft Azure-Sample Query Results

Query only selected computers from KQL

1. Update the query to add a static list of computers from which you want to query logs.

let Computers = datatable (Computer: string)
[
“AppFE0000002”,
“AppFE0000003”,
“AppFE00005JE”,
“AppFE00005JF”,
“AppFE00005JI”,
“AppFE00005JJ”,
“AppFE00005JK”,
“AppFE00005JL”
];
InsightsMetrics
| where TimeGenerated > ago(30m)
| where Origin == “vm.azm.ms”
| where Namespace == “Processor”
| where Name == “UtilizationPercentage”
| summarize avg(Val) by bin(TimeGenerated, 5m), Computer //split up by computer
| join kind=leftouter (Computers) on Computer
| where isnotempty(Computer1)
| sort by avg_Val desc nulls first

 

2. Run the query and you will get the following results.Microsoft Azure Query Logs-Results

3. Please make a note of the output, which is filtered only for target computers.

This is basic KQL.

Alternative Method For Enhanced Use Case

Store target computer details outside the query

(We have used a .csv file for the demo)

  • Create a simple .csv file and store it on Azure Storage Blob Container.KQL Demo KQL Demo Azure Storage Blob Container

Create SAS Token for delegated access to the container

1. We will supply delegated access to the storage container by creating a SAS token. Using this token, we can access the .csv file from the KQL query.Creating SAS Token

Read more on granting limited access to Azure Storage resources using shared access signatures (SAS).

2. Update query to pull details from .csv file and return computers that are marked “Yes” to monitor.

let Computers = externaldata(Computer: string, value: int, Monitor: string)[@’https://kmpsblogdemostorage.blob.core.windows.net/kqldemo/SelectedComputers.csv’ h@’?sp=r&st=2022-08-23T13:50:14Z&se=2022-08-24T13:50:14Z&spr=https&sv=2021-06-08&sr=c&sig=fJwl%2BTddL6lXV9WONj6bmvp61PDPbf94Ou%2Fp9pAtnYE%3D’] with (ignoreFirstRecord=true);
InsightsMetrics
| where TimeGenerated > ago(30m)
| where Origin == “vm.azm.ms”
| where Namespace == “Processor”
| where Name == “UtilizationPercentage”
| summarize avg(Val) by bin(TimeGenerated, 5m), Computer //split up by computer
| join kind=leftouter (Computers) on Computer
| where isnotempty(Computer1) and Monitor contains “Yes”
| sort by avg_Val desc nulls first

3. Run the query and you will get the following results.

Azure Blob Storage-Query

4. Now, toggle the Monitor condition to No for AppFE0000002 computer.

changed to

5. Update query to pull details from .csv file and return computers that are marked “NO” to monitor.

let Computers = externaldata(Computer: string, value: int, Monitor: string)[@’https://kmpsblogdemostorage.blob.core.windows.net/kqldemo/SelectedComputers.csv’ h@’?sp=r&st=2022-08-23T13:50:14Z&se=2022-08-24T13:50:14Z&spr=https&sv=2021-06-08&sr=c&sig=fJwl%2BTddL6lXV9WONj6bmvp61PDPbf94Ou%2Fp9pAtnYE%3D’] with (ignoreFirstRecord=true);
InsightsMetrics
| where TimeGenerated > ago(30m)
| where Origin == “vm.azm.ms”
| where Namespace == “Processor”
| where Name == “UtilizationPercentage”
| summarize avg(Val) by bin(TimeGenerated, 5m), Computer //split up by computer
| join kind=leftouter (Computers) on Computer
| where isnotempty(Computer1) and Monitor contains “No”
| sort by avg_Val desc nulls first

6. Run the query and you will get the following results.

Note: Replace the location code in the above query with the URL of your Azure blob container.

Access input file stored outside Azure Storage

  • Update the query with the input file on the AWS S3 container. Run the query and you will get the same result.
let Computers = externaldata(Computer: string, value: int, Monitor: string)[@’https://psi-testing.s3.ap-south-1.amazonaws.com/SelectedComputers.csv’] with (ignoreFirstRecord=true);
InsightsMetrics
| where TimeGenerated > ago(30m)
| where Origin == “vm.azm.ms”
| where Namespace == “Processor”
| where Name == “UtilizationPercentage”
| summarize avg(Val) by bin(TimeGenerated, 5m), Computer //split up by computer
| join kind=leftouter (Computers) on Computer
| where isnotempty(Computer1) and Monitor contains “Yes”
| sort by avg_Val desc nulls first

Note: Replace the location code in the above query with the URL of your AWS S3 container.

Read more to know how to access the AWS S3 bucket.

Key Takeaways

The methods explained above offer the following benefits:

  • Provides flexibility to change target resources without updating the actual query.
  • Provides convenience to update input variables without complicating the query.
  • An overall query is compressed by decoupling target resources and threshold values that are defined outside the KQL query.
  • Most importantly, you can host your input file in any publicly accessible location, and still achieve the same functionality.