Azure Anomaly Detector Service and Sitecore Performance Counters

Microsoft Azure announced recently new cognitive service called Anomaly Detector.There are straightforward quick start guides with some data examples.
You don't need to have much knowledge in Machine Learning to understand what that service does.
The service is still in preview therefore cannot be reliably used on production.

The service receives some time-series data (better with some stable interval between data records) as input.
From Azure Anomaly Detector documentation:
The Anomaly Detector API enables you to monitor and detect abnormalities in your time series data with machine learning. The Anomaly Detector API adapts by automatically identifying and applying the best-fitting models to your data, regardless of industry, scenario, or data volume. Using your time series data, the API determines boundaries for anomaly detection, expected values, and which data points are anomalies
There should be timestamp and some numeric data within each data row.
Time-series data should be in JSON format.

The output of the service is a JSON object with information about anomaly data rows. So that, for example by sending a batch of data rows to the Anomaly Detector service you can get report about data rows that do not fit into entire data collection. The algorithm, the model which is chosen to analyze the data is something that Anomaly Detector takes care itself automatically.

After I read about Azure Anomaly Detector service I thought that Sitecore performance counters data perfectly fits into requirements about input values for that service: performance counters are recorded with (almost) stable time interval and are numeric values.
Moreover, when using Azure Application Insights for collecting logs and metrics from Sitecore instances hosted on Azure PaaS, it is really easy to get that data prepared to be sent to Anomaly Detector service.

So, I decided to do a small proof-of-concept:

Step 1: 

I have deployed vanilla Sitecore 9.1 from Azure Marketplace with Application Insights (if you don't have experience with that - it is just mouse-clicking in Azure Portal).
I also let Sitecore run some time to collect performance counters. You don't need to do anything special, just let it remain running (be careful about Azure resources cost).

Step 2:

I decided to work with "Private bytes" performance counter in order to detect any anomaly in that data. Below is the query that I have used to select the data:
Azure Application Insights - Azure Data Explorer

Step 3:

I have created Azure Anomaly Detector service using Azure Portal (free tier). That can be easily done using Azure Marketplace.

Step 4:

I created quick-and-dirty Azure Function that should do following actions:
  1. Fetch the correct data from Azure Application insights (for example using Azure Application Insights REST API)
  2. Send that data to Azure Anomaly Detector
  3. Return either complete result JSON or only "anomaly" result (for example timestamp of the anomaly data row)
I implemented all those actions in one monolithic Azure Function that is able to connect to both services (Application Insights and Anomaly Detector) using appropriate API keys stored within Azure Function App Application Settings.
The sample code can be found here.

Step 5:

Let's test how it works. I ran Azure Function on local environment, so don't wonder why you see "localhost" in the screenshots below.
When running the code we get following result from Anomaly Detector which is showing that from all 100 data rows sent to the service none is considered as anomaly:

Step 6

Let's try top provoke our CD instance to write something unusual to "Private bytes" performance counters. For example by adding to the vanilla solution some "memory leaking" rendering that will be called assigned to an item and that item will be requested.
I created a MVC Layout with memory-leaking code ans placed that MVC layout into presentation details of a new item under Home node.
It was enough to request that item only once in order to see huge boost in RAM consumption:


During my tests I got several times an error message from Anomaly Detector service stating that my data deviate too much in terms of time interval between data records. I noticed that sometimes performance counters are written from Sitecore to Application insights not exactly in 1 minute interval (for example when web application is restarting or due some other reason).
Therefore, I decided to group data records average counter values by minutes in order to get perfect 1-minute interval-ed data:

After requesting aforementioned "memory leak" page I got a spike in used memory and that information was written to "Private bytes" performance counter, after causing memory leak web app became irresponsible therefore I must restart that, so that you will see RAM consumption drop on the image below:

After running my sample code with those data I got following result from Azure Anomaly Detector service, which is clearly indicating the spike and drop after the spike (the web app restart that I did) as anomalies:


It is hard to see the correlations between JSON data from Application Insights and this Anomaly Detector response on the picture above, therefore I linked both JSON files below:



What can be done next?

Azure anomaly Detector can be used, for example to analyze behavior of some performance counters or other metrics within Sitecore application.
You can use for example Azure Functions and Azure Logic Apps in order to build some warning/alerting App that can be used as Sitecore specific addition to the great alerting functionality that exists in Azure out-of-the-box.

Comments

Post a Comment