Adversary emulation and testing
Customers are testing more and emulating the same techniques that adversaries abuse, but differences in tooling and tradecraft can limit effectiveness.Pairs With This Song
In total, known tests accounted for 40 percent of threats that Red Canary detected in 2022, a year-over-year increase of roughly 20 percent. This is great news! It makes us want to crowd surf in a giant hamster ball!
Threat emulation activity increased significantly in 2022, and customers mostly tested the same techniques we observed adversaries abusing in the wild. Despite this, our security operations team finds that differentiating test detections from real-world threats can be done reliably, as security teams are constrained in subtle but important ways.
Isolating and removing test detections
This is the first year that we’ve filtered detections associated with authorized testing from our Threat Detection Report data set. In all, known tests accounted for an impressive 40 percent of threats that our team detected in 2022, a year-over-year increase of roughly 20 percent. This is great news! Security teams are working hard to validate their processes and controls through a mix of ad hoc testing, purple teaming, and red team engagements.
Understanding how security teams perform testing—and the threats they choose to emulate—is important, and we can make assessments about the quality, purpose, and authenticity of test activity by comparing authorized testing to real-world threats. In this section, we’ll examine where testers are hitting or missing the mark and offer guidance and resources that security teams can use to improve their testing capabilities.
Known tests accounted for an impressive 40 percent of threats that Red Canary detected in 2022, a year-over-year increase of roughly 20 percent.
How Red Canary identifies testing activity
Once we’ve detected, investigated, and alerted a customer to a threat, our platform provides them features for offering feedback, including the ability to signal whether a threat has been remediated—or will not be remediated. If a threat is not going to be remediated, it’s important that we know why:
- The activity and risk will be accepted
- The activity is authorized
- We incorrectly identified the activity as malicious, a false positive
- The activity is attributable to some form of adversary emulation or testing
What we saw in 2022
One way to illustrate the state of testing in 2022 is to compare the top threats and the top ATT&CK techniques with test detections and real-world threats. Such a comparison gives us a high-level look at whether or not testing activity aligns with adversary behaviors. There’s no right or wrong way to test because any given team might have different objectives. For example, one security team might be interested in emulating common threats to ensure they can detect them properly while another team might be testing very specific processes or controls.
We’ll start with techniques:
The following chart offers a high-level comparison of the ATT&CK techniques we observed in testing (blue) and real-world threats (red). It suggests that security teams are mostly testing the same techniques that adversaries are leveraging.
As we zoom in on the top 10 techniques for real-world threats and tests, we see that there are just two outliers for testing—and both are Discovery techniques: T1057: Process Discovery and T1087.002: Domain Account. Conversely, the two prevalent techniques that seem to go untested are T1055: Process Injection and T1036.003: Rename System Utilities.
That security teams would disproportionately test Discovery techniques makes sense. Discovery techniques are relatively easy to test and observe, and they are mostly benign. They offer a safe and easy way to answer a basic question: “Can I observe suspicious behavior?”
By contrast, Process Injection is a relatively esoteric technique that’s harder to understand and likely has a higher barrier of entry for both testing and observation. Detecting Process Injection at a meaningful scale requires a thorough understanding of expected process behaviors, or a baseline against which anomalies can be identified. The most effective detective controls we’ve found involve legitimate processes with unexpected file or network activity, or unusual command-line attributes, to include having no command-line arguments at all. It seems reasonable that security teams would run fewer tests for complicated techniques that are difficult to detect.
The omission of testing for renaming of system utilities may also result from difficulties detecting the technique. While T1036.003 is conceptually simpler than T1055, detecting it requires that security teams not only collect metadata that reveals the true identity of a process, but also that they are able to compare presented filenames with internal ones in real time. This is simple to understand but not so simple to operationalize—and that may account (in part) for the relative lack of testing.
Further, the prevalence of Atomic Red Team™ (which you’ll see in the following table) might also answer some questions about why certain techniques rank higher than others. For example, test coverage for Process Injection and Masquerading is relatively scant, whereas Atomic Red Team has ample coverage for Account and Process Discovery. Further, our analysis of test difficulty reveals that, in aggregate, the tests for Process Injection are indeed more complicated to execute than the tests for Process and Domain Account Discovery.
Ultimately though, these are relatively minor discrepancies. As we zoom out from the top 10 to the top 20 to the top 100 techniques for tests vs. real-world threats, we see that emulation activity is conceptually on target with actual adversary activity.
How about threats?
Our definition of a threat is broad and includes many testing tools—including Impacket, Mimikatz, Cobalt Strike, and BloodHound—that are frequently abused by adversaries. Those four tools feature prominently among our most prevalent threats, whether or not we include testing, but even so, there’s much more disparity among the comparative threat lists than techniques. The disparity continues as we expand from the top 10 to the top 20, where we see more testing tools and red teams in the test list and mostly malware in the actual threat list. The reason these lists look different is obvious: testing tools are more readily available and safer than actual malware.
That security teams are using different tools than adversaries but managing to emulate the same techniques suggests that testers and the people who develop test frameworks and tools are paying attention to threat trends and attempting to emulate them. However, we know anecdotally that individual tests—despite mapping to similar ATT&CK techniques—look and feel very different from actual adversary actions. So, why is that?
Emulations vs. the real thing
Emulations and actual adversary behaviors differ for many reasons. A test might fail to accurately emulate a real-world threat accidentally, for reasons ranging from incomplete understanding of tradecraft to inaccessibility of tooling. Alternatively, a test might not be intended to look like a real threat, as security teams opt for safety, speed, or specificity. Any testing is better than no testing at all, even if the test doesn’t look like a real-world adversary from end to end.
We’ll cover these in more detail in the coming paragraphs, but some common discrepancies between tests and threats include the following:
One reason that tests stand out from actual malicious activity is that security teams commonly prioritize what’s new and novel over what’s tried and true. Real adversaries tend not to work harder than they must, and won’t deploy a rare or valuable capability when a commodity capability will suffice. While this report underscores the degree to which long-standing techniques continue to be successful, testing often focuses on emerging use of techniques and tools or reflects recent research. There is value in exercising newer or less common tradecraft, but the rarity of these tools and behaviors are great detection opportunities.
Testers also have a tendency to set an objective and then leverage multiple techniques to achieve their goal. While this redundancy may be intentional for a security team that is attempting to validate detection coverage, it’s also conspicuous in contrast to real adversaries who strive to accomplish their goals without being detected.
Further, test activity is sometimes disjointed. A real adversary tends to follow a semi-predictable pattern that broadly involves gaining access, moving around, stealing something, and leaving. Testers, on the other hand, may appear seemingly out of nowhere, executing later-stage activity absent the activity that precedes it in a real intrusion.
One reason that tests stand out from actual malicious activity is that security teams commonly prioritize what’s new and novel over what’s tried and true.
Testers and real adversaries also tend to leverage different infrastructure, which may be partially attributable to their use of different tools, and also to the necessary legality of their operations. Many red teams will be consistent in their use of infrastructure hosting providers, ranging from cloud-based computers to DNS to domain registrars. Thus, we very rarely see red team activity emanating from geographically suspicious IP spaces, bulletproof hosting providers, or unscrupulous registrars.
More broadly, testers and adversaries have different goals. For example, a red team might want to gain administrative rights for a domain and write up a report, whereas an adversary is trying to load secondary payloads and install cryptominers, encrypt files, or steal information.
Our analysis of known testing in 2022 suggests that security teams are using a variety of tools to test a lot of the same techniques being abused by adversaries. And even if anecdotal accounts suggest that testing often looks very different from actual threats, our philosophy is that any testing is better than none, and we’d love to see the percentage of customers who perform ongoing testing continue to increase. Here are some suggestions for teams that are just getting started, as well as some for teams that are testing regularly today and looking to evolve.
First, familiarize yourself with freely available tools and information. Atomic Red Team is a freely available library of tests that are representative of real-world adversary techniques mapped to MITRE ATT&CK. Of course, there are hundreds of adversary techniques and thousands of tests, so knowing where to start is important. Use the top techniques and threats in this report, or in one of many high quality and freely available industry reports, to prioritize your testing.
Armed with the intelligence, the information, and the tools to perform your testing, start to put it all together, by understanding prevalent threats, the tools and techniques they leverage, and then the tests you perform to evaluate your defenses:
As your testing evolves, collaborate with other parts of your organization, and make purple teaming a regular part of your operational rhythm—improving the quality of your testing by soliciting real-time feedback from intelligence analysts, detection engineers, and incident responders.
Once you’ve started to get value from ongoing, low-cost atomic testing, expand your program to perform other kinds of testing as well—like penetration testing, vulnerability assessments, and more.