9 min read

Automating change detection using Resource Graph and Change History

Automating change detection using Resource Graph and Change History

A few months ago Microsoft launched Change History for Azure Resource Graph. By default accessible through the Azure Activity Log and Azure Policy which allows you to inspect the changes made for a specific resource. But is that the way you should be using the change history and does that scale well? For most scenarios you might want to look into some kind of automation.

Tracking changes is important and should be a part of your governance processes. Configuration drift is one of those things "that just happens" and at some point you'll have to fix it, which is that task that nobody is looking forward to. Luckily we're talking Azure here so this feature came with REST Endpoints which allows for automation. (https://docs.microsoft.com/en-us/azure/governance/resource-graph/how-to/get-resource-changes)

Important: Change History for Azure Resource Graph is still in public preview. At this point it looks like not all resource changes are detected. Automation code provided in this post is proof of concept at best. But this might also be an excuse for my quick and dirty code to get the job done :)

Scenarios

There are a multiple scenarios where Change History is an added value for your business processes.

  • Reactive Input for incident management processes - When an incident is logged, inspect the resource and see if any change that was detected is responsible for the new behavior.
  • Proactive input for incident management processes - When your monitoring solution detects an issue, check for changes and append it to the incident that is being logged. Providing your support personnel with as much information as possible, early in the process.
  • Change tracking and reporting back to your CMDB
  • Automation allows for change tracking at scale, across multiple tenants / customers with delegated resource management .(https://www.wesleyhaakman.org/azure-lighthouse-management-at-scale-use-cases/)

What about Azure Policy? Sounds like an overlap in functionality.

Some would say that Azure Policy provides a similar feature by detecting noncompliance on a predefined set of rules. (Resource X must be configured as Y). Which is true, but that requires you to configure the Policy for a specific situation and as you might not always know when and what changes are occurring, it wouldn't work for changes that you didn't expect to happen. If anything, Azure Policy and Change History compliment each other. In fact, Change History can be accessed as a result of an Azure Policy noncompliance notification.

It is important to know that the Change History REST endpoints can provide you with a lot (and I mean a lot) of data. Think before you start gathering changes at scale. Though, if you try hard enough Resource Graph will throttle your requests. (look into https://docs.microsoft.com/en-us/azure/governance/resource-graph/concepts/guidance-for-throttled-requests)

 

Automating against the REST Endpoints

To better understand how to implement the different scenarios and the change history feature in general I decided to build some automation around it. First things first: we're talking about two REST endpoints here, which are both required to retrieve the changes:

  • resourceChanges which takes a "resourceId" and an "interval" (time slot which you want to query) and returns one or multiple Change ID(s)
  • resourceChangeDetails which takes the "resourceId" and the "changeId" as retrieved from the "resourceChanges" endpoint

The "resourceChangeDetails" endpoint will return you with the "beforeSnapshot" and "afterSnapshot" which contain (you wouldn't have guessed this) the properties before and after the change. This you can use to compare the objects and detect the changes programmatically. Sounds like something we can use.

A change is detected as a result of an action (let's take a write action for example). These actions are logged within the Azure Activity Log and optionally in Log Analytics. That means that you can detect (or guess) that a change has occurred and put some alerting on that. A workflow is coming to life here...

Alternatively you can just throw random Resource IDs and intervals against the endpoints but... throttling.  

What I wanted to achieve was an endpoint that I can query with a "resourceId" and return the recent changes. Additionally I want to check for changes when write actions have occurred.

To achieve this I set up the following resources:

  • Log Analytics to query the changes I am looking for
  • Azure Monitor Alerts to act upon the Log Analytics Query
  • Logic App to orchestrate the process
  • Azure Function (PowerShell) to query the REST endpoints
  • Outlook Connector to show that this works (I mean, you don't really want all the changes in your inbox right?)

As far as getting the change data goes, most of this could probably be done by just building the Logic App. However, for debugging and portability purposes, PowerShell was the easier option for me.

 

 

Log Analytics and Alerts
For the alerting I used a pretty straight forward query to search for a write action on Microsoft.Web with a status that matches "Succeeded". As Activities can have multiple statuses (started, failed) and these would not result in a response from the Change History endpoints,  I really just want to know about the succeeded write actions.

AzureActivity
| where parse_json(Authorization).action contains "write"
| where Category == "Administrative"
| where OperationNameValue contains "Microsoft.Web" 
| where ActivityStatusValue == "Succeeded"

I then created an Alert using the custom log search (https://docs.microsoft.com/en-us/azure/azure-monitor/platform/alerts-log). By configuring the Action Group to trigger a Logic App we can push the results of our alert into the workflow.

Note that you need the Logic App in place to complete this step.

 

 

Azure Function
The Azure Function is configured with Managed Service Identity which is granted permissions on the subscription.

If you're looking to build this at scale across tenants, look into using service principals and grant them access through Delegated Resource Management / Azure Lighthouse templates or Managed Services marketplace offerings.

I went with PowerShell for the Azure Function to query the "resourceChanges" endpoint followed by a foreach loop that queries the "resourceChangeDetails" endpoint for all the "changeIDs" that were returned in the previous call.

I went with a fixed interval, depending on your timezone or if you want to query a larger window you need to adjust this.

$endTime = (get-date -uformat '+%Y-%m-%dT%H:%M:%S.000Z')
$startTime = (get-date (get-date $endTime).AddHours(-6) -uformat '+%Y-%m-%dT%H:%M:%S.000Z')


# Body for resource change API
$bodyHashTableResourceChanges = @{
    resourceId = $resourceID 
    interval = @{
        start =  $startTime 
        end = $endTime 
    }
}

$bodyJsonResourceChanges = $bodyHashTableResourceChanges |ConvertTo-Json

# URI Resource changes
$restUriResourceChanges = "https://management.azure.com/providers/Microsoft.ResourceGraph/resourceChanges?api-version=2018-09-01-preview"

# Invoke
$responseResourceChanges = Invoke-RestMethod -Uri $restUriResourceChanges -Method Post -Body $bodyJsonResourceChanges -Headers $authHeader

The above code returns the "changeIds" containing information on both the before and after snapshots.


Wesley @ Code\Azure> $responseResourceChanges.changes|ConvertTo-Json
{
  "changeId": "{\"beforeId\":\"1989b7a0-bc73-4ae5-82e5-f1fa4218ca7f\",\"beforeTime\":\"2019-08-07T06:17:23.531Z\",\"afterId\":\"400634ab-a1f1-462f-b21d-c09b3acf50ee\",\"afterTime\":\"2019-08-07T21:13:10.341Z\"}",
  "beforeSnapshot": {
    "snapshotId": "1989b7a0-bc73-4ae5-82e5-f1fa4218ca7f",
    "timestamp": "2019-08-07T06:17:23.531Z"
  },
  "afterSnapshot": {
    "snapshotId": "400634ab-a1f1-462f-b21d-c09b3acf50ee",
    "timestamp": "2019-08-07T21:13:10.341Z"
  }
}
Wesley @ Code\Azure> 

As we now have the id(s) of the change(s) that were detected, we can query the "resourceChangeDetails" endpoint with the "resourceId" and the "changeId":

foreach ($change in $responseResourceChanges.value ) {
   
if ($change.changeId) {
    # Body Change Details
    $bodyHashTableResourceChangeDetails = @{
        resourceId = $resourceID 
        changeId = $change.changeId
    }
    
    $bodyJsonResourceChangeDetails = $bodyHashTableResourceChangeDetails |ConvertTo-Json
    # Change details API
    $restUriResourceChangeDetails = "https://management.azure.com/providers/Microsoft.ResourceGraph/resourceChangeDetails?api-version=2018-09-01-preview"
    # invoke
    $responseResourceChangeDetails = Invoke-RestMethod -Uri $restUriResourceChangeDetails -Method Post -Body $bodyJsonResourceChangeDetails -Headers $authHeader 
}  

This returns all the changes and gives us something to work with (removed some JSON for readability).

Wesley @ Code\Azure> $responseResourceChangeDetails

changeId
--------
{"beforeId":"1989b7a0-bc73-4ae5-82e5-f1fa4218ca7f","beforeTime":"2019-08-07T06:17:23.531Z","afterId":"400634ab-a1f1-462f-b21d-c09b3acf50ee","afterTi… 

Wesley @ Code\Azure> $responseResourceChangeDetails.beforeSnapshot.content |ConvertTo-Json
{
  "id": "/subscriptions/6c5e1304-4679-48bf-b9d4-a93b0024fcfa/resourcegroups/rg-webapptest01/providers/microsoft.web/sites/webappdefault",
  "name": "webappdefault",
  "type": "microsoft.web/sites",
  "location": "westeurope",
  "tags": null,
  "kind": "app",
  "sku": null,

---removed for readability---

 
    },
    "sku": "Standard",
    "slotSwapStatus": null,
    "sslCertificates": null,
    "state": "Running",
    "storageRecoveryDefaultState": "Running",
    "suspendedTill": null,
    "tags": null,
    "targetSwapSlot": null,
    "trafficManagerHostNames": null,
    "usageState": "Normal",
  }
}

Now that we have both states/snapshots (before and after the change) we can compare them. Unlike the views that the Azure Portal provides, Resource Graph will not do the comparison for you. I achieved this by outputting the contents to two different files and using "Compare-Object" to compare the contents. The actual data is stored in the "content" property of the snapshot, so that's what we will use.

I went with the following code (I know.. but it works). If you fancy creating the right objects with the right properties (make sure the properties on both sides of the comparison match) you can achieve this without writing to a file. Or if you want to compare the properties manually you can also take that path.

$responseResourceChangeDetails.beforeSnapshot.content |ConvertTo-Json > before.txt
$changesBefore = get-content before.txt
$changesBefore = $changesBefore | Where-Object {$_ -notmatch 'lastModifiedTimeUtc'}

$responseResourceChangeDetails.afterSnapshot.content |ConvertTo-Json > after.txt
$changesAfter = get-content after.txt
$changesAfter = $changesAfter | Where-Object {$_ -notmatch 'lastModifiedTimeUtc'}

# Compare changes

if ($change.changeId){

    $comparison = compare-object -ReferenceObject $changesAfter -DifferenceObject $changesBefore
    $detectedChanges += $comparison.inputObject
    }
}

$detectedChanges = $detectedChanges |convertTo-Json

The detected changes are then returned using the "Push-OutPutbinding" command as provided through the Azure Function. The result contains the changed value along with the NoteProperties that come with the PSCustomObject that was created by running "Compare-Object".

"body": [
        {
            "value": "    \"httpsOnly\": true,",
            "PSPath": "D:\\home\\site\\wwwroot\\before.txt",
            "PSParentPath": "D:\\home\\site\\wwwroot",
            "PSChildName": "before.txt",
            ...............................

The Function can now be used by the Logic-App to request changes by sending a request containing the "resourceId". Note that you can also opt to query the interval along with the "resourceId".

Full code at https://github.com/whaakman/functions-resource-graph-changes

The Logic App
Now.. I'm not the Chief Logic App officer and I'm probably over-complicating things but as far as getting the changes, it works :)

There were some things I needed to work out. As multiple writes can occur, the Alert can contain multiple affected resources. For example: if you resize an App Service Plan, properties of both the App Service Plan and all Web Apps hosted on that plan will change, resulting in an alert that contains multiple resourceIds. I decided to go with a notification for each distinct "resourceId" to keep it readable. However, the same concept used store multiple changes in an array can also be applied to the results in general; storing all the results in one or multiple array(s) and send them through a single notifcation (email in this example).

In general I configured the following:

  1. Initialize an array to store the changes
  2. Iterate through the affected items (resources)
  3. For each affected item, query the Azure Function using the "resourceId"
  4. For each change returned, store the results in the array previously initialized
  5. Get the data from the array and parse them in the next step that sends the notifcation.

This results in an E-Mail with one or multiple changes

This results in the following E-Mail:

Could use some formatting but like I said, E-Mail probably isn't the type of notification you need. However, you can just as well send the same content to your Incident Management system.

Wrap-Up

The Change History functionality for Azure Resource Graph is still in preview and still needs some work as it doesn't detect all the changes yet (but hey, that's what previews are for). It does provide a great functionality and lets you track changes across environments, something that up until now you would need custom solutions for. For both tracking changes and a feature that supports your incident management process it has great potential. Especially if you're in the business of performing governance at scale across environments, this will help you and prevent you from having to query each tenant/customer individually.

As we're talking about REST Endpoints here, you can pretty much incorporate the Resource Graph and Change History into any solution.