Better know a data source: Files

Where cybersecurity is concerned, files are any analyst’s bread and butter. We use them to log changes and dubious activity, to execute functions, and to store data on investigations. They’re also our biggest nightmare; file modifications from malicious activity and malware itself fall under this umbrella. In this blog, we’re going to examine files as a data source and their five constituent data components.

What data sources can provide insights on file changes?

According to MITRE ATT&CK, a file is a “computer resource object, managed by the I/O system, for storing data (such as images, text, videos, computer programs, or any wide variety of other media).” Files are an important data source for detection, incident response, and a wide variety of other security disciplines because files can contain payloads, exfiltrate data, and much more. As such, collecting them and understanding how to analyze their contents is essential for most security organizations.

MITRE ATT&CK’s file data source is subdivided into five data components, which are extremely prevalent techniques in MITRE ATT&CK. The ranks of each data component are noted in their corresponding sections below.

File access (12th most prevalent in MITRE ATT&CK)

File access (also referred to as file open) events happen anytime a file is opened on a computer. This is one of the most common things to happen on a computer, but is actually one of our least used types as it’s such an insane flood of data that most platforms don’t even record it. If you do have access to file open events, the best approach is to narrow your focus to sensitive files such as the browser’s credential store.

File creation (6th most prevalent)

This will be one of the most common datatypes you’ll use for detection, as you’ll expect it the most from a threat that wants to perform persistence or any actions on objective. This may be something like writing a .lnk file to the Startup folder.

File deletion (23rd most prevalent)

Some of the more evasive adversaries will make use of file deletion as a way to clear their tracks, deleting logs and forensic data that may help with an investigation.

File modification (4th most prevalent)

You may see file modifications if an adversary is using an existing file for persistence such as .bashrc or the PowerShell profile. Changes in registry keys, adjustments to items in startup, and modifications to shortcut files are also some indicators of file modification.

File metadata (13th most prevalent)

An adversary may modify file metadata if they want to cover their tracks. By modifying system timestamps for their files in a method known as timestomping, changing location data, or otherwise adjusting file attributes, adversaries can confuse or mislead defenders.

Looking at our own detectors, we heavily rely on file create events:

Count of file modification detectors

601 filemod_create
271 filemod_modify
10 filemod_delete
3 file_open

In other words, if you’re trying to gain visibility into high numbers of ATT&CK techniques and threats, file data sources are among the best options available—and they’re available in most popular endpoint detection and response (EDR) products.

Given the prevalence of files as a data source across the ATT&CK matrix, the frequency with which adversaries leverage files, and the importance of files as a forensic artifact and source for detection, we’re excited to share the file edition of our long-running Better Know a Data Source series.

What can we learn from files?

Adversaries leverage files in more ways than we can possibly list here, but in general they’re abused to deliver payloads or other information to a host, extract or otherwise remove information from a host, or make valuable information on a host impossible to access (e.g., encrypting files in a ransomware scheme). In addition to malicious activity, filemods can often be used as crucial clues in an investigation, giving context to benign user and system activity that helps us understand how the computer is being used in general.

As defenders, there’s a lot we can learn from files. We can rely on them directly for alerts or they can help “fill in the gaps” of a threat/incident. File creation events are often a critical piece of what we sometimes call the “Cotton Eye Joe” theory of security analysis (originally coined by Red Canary legend Tony Lambert):

Where did it come from? When looking into suspicious activity, we’ll often look for installers or suspicious files written into a user’s downloads or Outlook temp folder, as these are common locations for adversaries to place payloads.
Where did it go? Files are often a crucial part of the story when figuring out persistence using the startup folder and lateral movement across the network to another host.

Yellow Cockatoo

Take the below example, which shows some file modification activity in the form of a .lnk file creation in the Startup folder:

In this case, the user was looking for “Family Feud Using PowerPoint” (as can be seen in the suspicious-looking .exe file at the bottom of the screenshot) but instead they got malware. This was probably not the game show prize they were hoping for, but our prize is that it makes a great example of some behavior to look for when trying to identify Yellow Cockatoo.

In this case, the creation of the .lnk file in the Startup folder is a great piece of detection logic. It also has an Atomic Red Team test that matches up perfectly with its behavior, as seen here.

Lateral movement

In this example, we see our adversary write a file to a remote computer, and then execute it using the wmic process call create with the /node: option. If we didn’t see a file written just before, we might ask if the file was already there previously.

Filemods can also help us spot when a computer writes to another computer on the network. When you see \\ characters at the beginning of an executable path (rather than the usual C:\ , D:\ drive letters), it may be an indicator that an adversary is starting to pivot throughout the network. Any time this occurs, you have a serious threat and need to respond quickly and decisively.

BloodHound

File metadata—like filenames, for example—can play a crucial role in analysis and investigations as well, but it’s important to have a good baseline understanding of oft-abused executables or suspicious file-naming conventions. In the following example, we see a file named bloodhound.zip. If we didn’t know that BloodHound is a network enumeration and reconnaissance tool, then this filename would mean nothing to us.

Sigma rules like this one indicate that filenames are one of the best ways to identify common threat behavior. In this rule, a series of .json extensions and the most typical indicator of BloodHound—a zipped archive of the same name—are used to filter for suspect target filenames:

In the case above, even though our adversary used process injection on the logagent.exe process, they still write and interact with files related to BloodHound giving away that the actual process in memory is BloodHound.

Gootloader

In a recent example of Gootloader, we see a .dat file being created, which may seem odd at first since it’s a generic file extension that cannot be natively executed. But immediately after, the adversary renames that same file to .js and executes it using a scheduled task. This activity appears to be done to help adversaries evade a normal file “creation” event detection.

How files can add context

File events can provide important context as well. For example, we might have an alert for regsvr32.exe executing a DLL file and performing malicious actions.

It’s important to understand the activity that follows the DLL execution, but we should also ask ‘how did this bad .dll file get here?’. Did a bot create this, perhaps? Bot creation is the way we most commonly see malicious DLLs like this get executed, but that’s not the case here. When we searched for the process that wrote this DLL, we determined that the browser process iexplore.exe wrote it (as recently as 2023. Seriously, ya’ll?).

In the screenshot here we can see that the DLL was created by Internet Explorer, meaning this DLL was downloaded directly by an adversary using a browser. This information implies that there is a human operator behind the keyboard in this instance. A human operator interactively doing bad stuff on an endpoint adds another level of severity to the situation.

`ntds.dit`

Another good alert example is to look for creation of ntds.dit files, which store Active Directory domain information. ntds.dit is often an adversary’s end game since it gives them the keys to the kingdom (so to speak), allowing them to collect all the usernames, hashed passwords, and other vital account details for an entire domain.

A copy of the ntds.dit file isn’t just useful for credential theft. It can also provide valuable information for future pivoting, domain users and their access permissions, as well as a “target list” of devices within the domain. Extracting this data is typically done through volume shadow copies or through the built-in Windows ntdsutil.exe utility.

There is a Sigma rule that looks for the creation of the ntds.dit file.

Conclusion

Files are an amazing tool for sourcing data on events. Even when files are wiped or modified on a system, those changes can be the very thing of greatest use to security operations. By creating an effective net of detections based on these file modification activity types, analysts can gain a greater perspective of malicious activity in their environments.

Accelerating identity threat detection and response with GenAI