As defenders, detection engineers, and SOC analysts, it’s easy for us to label an alert or a security event as “garbage” when it lacks the context we implicitly expect. It’s true; there is no shortage of security event data that truly is garbage. But if a program manager, an executive leader, or a customer asked you why it’s not good, to what extent are you willing to offer an explanation beyond, “it just lacks context”?
As those who are considered subject matter experts in threats, techniques, and event data, we need to hold ourselves to a higher standard and be able to articulate objectively what constitutes a quality data source and, if it doesn’t meet a certain quality bar, to be able to clearly identify what is required to improve it. This post aims to establish a methodology for assessing data quality so that the data we spend copious amounts of money ingesting and querying gives us our expected return on investment, namely:
- It makes clear what happened based on the information present.
- It contains sufficient information to remediate the action.
- It can be correlated to other relevant data sources.
Let’s break down these three attributes that constitute a quality data source.
1. The data makes clear what happened based on the information present
At a high level, this assessment criteria considers the following question: Can you explain succinctly and confidently what actually happened to a technical audience, but one who is not necessarily a subject matter expert?
In order to answer that question, the event should tell a brief story comprised ideally of the following criteria, which the Red Canary Threat Research team has coined, “minimum viable storytelling.”
1. Who
This refers to the actor/identity that performed the action. This is not to be confused with an actor/identity that was affected by the action.
Examples:
- a user identity – i.e., an identity associated with a person
- a workload identity – e.g., service principal
- a local account
2. What
Using grammatical terminology, this refers to the following:
- Verb: the action that was performed
- Direct object: the resource that was affected by the action
- Indirect object: what was done to the affected resource
We’ll demonstrate what this looks like in the example that follows.
3. When
The date and time that the action occurred. This should be clearly distinguished from when the event data was populated.
4. Where
The environment in which the action occurred.
Examples:
- Entra tenant ID
- device ID
- computer hostname
5. Whence
The origin from which the action originated.
Examples:
- IP address
- IP geolocation
6. How
The means by which the action occurred.
Examples:
- user agent string
- application ID/name
What about “why?”
You may have noticed that “why” is conspicuously absent. That’s because an event on its own can’t tell us the intent behind the action occurring. Determining why an event happened is the objective of a defender determining if an event is benign or suspicious. A defender will not be able to confidently determine why an event occurred without quality data, however.
Piecing together the story
An event that includes all of the above criteria is well-suited to tell a succinct and clear story of what happened in no more than two sentences. This should also be the litmus test for any defender: Can you clearly explain to someone what occurred in an event? If so, you’ve demonstrated that not only is the data sufficient to tell a story but that you are comfortable enough translating the event into a format that is accessible to non-subject matter experts.
Let’s look at an example event and see if we can extract the criteria necessary to articulate a clear narrative, i.e., the “minimum viable story.” In this example, let us consider an Entra identity being added to a role. Here’s the raw AuditLogs event that we will assess:
{
"TenantId": "8b71734b-35c8-4e06-b72a-c23f700bf0dd",
"SourceSystem": "Azure AD",
"TimeGenerated": "2025-10-16T16:49:52.4205655Z",
"ResourceId": "/tenants/08cdc03a-f392-4da8-ba8e-872297df4c7f/providers/Microsoft.aadiam",
"OperationName": "Add member to role",
"OperationVersion": "1.0",
"Category": "RoleManagement",
"ResultType": "",
"ResultSignature": "None",
"ResultDescription": "",
"DurationMs": "0",
"CorrelationId": "90f4cd25-d149-4239-bd38-809c8344da8b",
"Resource": "Microsoft.aadiam",
"ResourceGroup": "Microsoft.aadiam",
"ResourceProvider": "",
"Identity": "Microsoft Graph Command Line Tools",
"Level": "4",
"Location": "",
"AdditionalDetails": {
"key": "User-Agent",
"value": "Mozilla/5.0 (Macintosh; Darwin 24.6.0 Darwin Kernel Version 24.6.0: Mon Aug 11 21:16:34 PDT 2025; root:xnu-11417.140.69.701.11~1/RELEASE_ARM64_T6020; en-US) PowerShell/7.5.3"
},
"Id": "Directory_90f4cd25-d149-4239-bd38-809c8344da8b_S4P51_17590570",
"InitiatedBy": {
"user": {
"id": "14b4f3a8-609d-4523-9997-e7557eea3d39",
"displayName": "Microsoft Graph Command Line Tools",
"userPrincipalName": "Matt@ContosoCorp.onmicrosoft.com",
"ipAddress": "51.2.72.192",
"roles": []
}
},
"LoggedByService": "Core Directory",
"Result": "success",
"ResultReason": "",
"TargetResources": [
{
"id": "5705b3c2-0eea-4a49-9a36-469001e400c3",
"displayName": null,
"type": "User",
"userPrincipalName": "TestUser@ContosoCorp.onmicrosoft.com",
"modifiedProperties": [
{
"displayName": "Role.ObjectID",
"oldValue": null,
"newValue": "\"c2166a73-164c-4f18-903c-c5f3234d3930\""
},
{
"displayName": "Role.DisplayName",
"oldValue": null,
"newValue": "\"Privileged Role Administrator\""
},
{
"displayName": "Role.TemplateId",
"oldValue": null,
"newValue": "\"e8611ab8-c189-46e8-94e1-60213ab1f814\""
},
{
"displayName": "Role.WellKnownObjectName",
"oldValue": null,
"newValue": "\"PrivilegedRoleAdmins\""
}
],
"administrativeUnits": []
},
{
"id": "c2166a73-164c-4f18-903c-c5f3234d3930",
"displayName": null,
"type": "Role",
"modifiedProperties": [],
"administrativeUnits": []
}
],
"AADTenantId": "08cdc03a-f392-4da8-ba8e-872297df4c7f",
"ActivityDisplayName": "Add member to role",
"ActivityDateTime": "2025-10-16T16:49:52.4205655Z",
"AADOperationType": "Assign",
"Type": "AuditLogs"
}
Now, using the raw event, let’s gauge the extent to which the who, what, when, where, whence, and how can be populated.
| Question | Data points |
|---|---|
| Question : Who | Data points:
|
| Question : What | Data points:
|
| Question : When | Data points: The action was performed at |
| Question : Where | Data points: The action was performed in Entra ID tenant ID |
| Question : Whence | Data points: The action was performed from IP address |
| Question : How | Data points:
|
Event narrative
Fortunately, all of the above criteria were populated with this event, allowing us to craft a coherent and succinct human-readable narrative using the following template:
At WHEN, within WHERE, WHO did WHAT (direct object) to WHAT (indirect object) from WHENCE using HOW.
Applying this template, we get the following narrative:
At
2025-10-16T16:49:52Z, within Entra ID tenantID 08cdc03a-f392-4da8-ba8e-872297df4c7f, the userMatt@ContosoCorp.onmicrosoft.comaddedTestUser@ContosoCorp.onmicrosoft.comto thePrivileged Role Administratorrole from IP address51.2.72.192usingMicrosoft Graph Command Line Toolswith the following user agent:Mozilla/5.0 (Macintosh; Darwin 24.6.0 Darwin Kernel Version 24.6.0: Mon Aug 11 21:16:34 PDT 2025; root:xnu-11417.140.69.701.11~1/RELEASE_ARM64_T6020; en-US) PowerShell/7.5.3.
2. The data contains sufficient information to remediate the action
Quality data will supply an incident responder with enough information to remediate the action that occurred manually or with help from automation. Using the above example of the member being added to a role, if a responder were to remove the role assignment, they could use the Remove-MgBetaRoleManagementDirectoryRoleAssignment Graph API cmdlet. The command requires a UnifiedRoleAssignmentId argument, which the AuditLogs event above doesn’t supply. The event does, however, supply enough information to retrieve the relevant UnifiedRoleAssignmentId value, namely, the following:
TargetResources[0].id: 5705b3c2-0eea-4a49-9a36-469001e400c3TargetResources[0].modifiedProperties['Role.TemplateId'].newValue: e8611ab8-c189-46e8-94e1-60213ab1f814
So we can retrieve the UnifiedRoleAssignmentId value with the following command:
$TargetRoleAssignment = Get-MgBetaRoleManagementDirectoryRoleAssignment -Filter "principalId eq '5705b3c2-0eea-4a49-9a36-469001e400c3' and roleDefinitionId eq 'e8611ab8-c189-46e8-94e1-60213ab1f814'"The suspect role assignment can now be remediated (i.e., removed) with the Remove-MgBetaRoleManagementDirectoryRoleAssignment command:
Remove-MgBetaRoleManagementDirectoryRoleAssignment -UnifiedRoleAssignmentId $TargetRoleAssignment.IdSo we were able to successfully remediate the action using the data present in the event.
3. The data can be correlated to other relevant data sources
A single event in isolation only tells a minor portion of an overall threat story. The ability to understand a relevant event in the context of an overall threat storyline requires the correlation of other, relevant events performed by the actor.
In the example above, it would be valuable to know all actions performed by Matt@ContosoCorp.onmicrosoft.com that used the same access token as the one used to perform the role assignment. Correlation to the corresponding sign-in event that issued the token can be performed by referencing the unique token identifier or session ID values but unfortunately, AuditLogs events have neither value populated, making direct correlation to a sign-in event and related activity impossible. Related events can be inferred, however, based on the activity datetime, user identity, IP address, and user agent.
Is this quality data?
Based on the three criteria established for data quality, the highlighted AuditLogs entry example could be assessed as follows:
| CRITERIA | ASSESSMENT | JUSTIFICATION |
|---|---|---|
| CRITERIA : Makes clear what happened based on the information present | ASSESSMENT : High quality | JUSTIFICATION : All aspects of the who, what, when, where, whence, and how were populated using available event data. |
| CRITERIA : Contains sufficient information to remediate the action | ASSESSMENT : High quality | JUSTIFICATION : The event supplied all relevant fields necessary to automate remediation. |
| CRITERIA : Can be correlated to other relevant data sources | ASSESSMENT : Medium/low quality | JUSTIFICATION : Direct correlation to related activity is not possible, although indirect correlation can be inferred based on other relevant fields. If Microsoft supplied both unique token identifier and session ID values like they do with other log sources, direct correlation would be possible. |
Overall quality assessment: High
While the event doesn’t support direct correlation, it is populated with enough information that would allow threat hunters, detection engineers, and incident responders to ask any of the following questions that would lead to a benign/suspicious determination:
- Does
Matt@ContosoCorp.onmicrosoft.comperform role assignments often?Privileged Role Administratoris a high-privilege role. IsMatt@ContosoCorp.onmicrosoft.comexpected to perform high-privileged role assignment? - Is it common for the
Microsoft Graph Command Line Toolsapplication to perform role assignments versus the Azure Portal or other sanctioned applications? Is there a reason why privileged identity management (PIM) wasn’t used to perform the role assignment? - Is the IP address common for the tenant and/or
Matt@ContosoCorp.onmicrosoft.com? - Is the user agent string common for the tenant and/or
Matt@ContosoCorp.onmicrosoft.com? - Does
TestUser@ContosoCorp.onmicrosoft.comhave an actual justification for thePrivileged Role Administratorrole? - What actions, if any, did
TestUser@ContosoCorp.onmicrosoft.comperform after the role assignment?
Applying the narrative
As defenders and threat subject matter experts, we should be able to take ownership of establishing data requirements for detection and response. We should be able to clearly articulate the return on investment for the data sources we either claim we need for detection or the ones we claim lack value and can be dropped.
We hope that this assessment methodology can serve as a foundation for extracting the most value from the data sources needed for detection and response and can also be used to hold vendors accountable for supplying the data necessary to make informed, confident decisions.