When Unix first emerged in 1971, it was an incredible advancement in operating system technology. Its creators, Ken Thompson and Dennis Ritchie of Bell Labs, paved the way for what would eventually become Linux, the operating system that Linus Torvalds would later create as a free alternative to Unix. Many of the conventions established in Unix in the early 1970s remain integral parts of Linux today.
In this article, we’re going to focus on the design philosophy that “everything is a file”—the consequences of which, it turns out, significantly impact how we secure modern Linux systems.
What is a file, anyway?
First up, what does “everything is a file” mean? Is my printer a file? Is my USB device a file? Is my file actually a file? The answer to all of those questions is: yes! By the end of this article you’ll understand how all the things have been made into files.
Reading traditional files
To start off, let’s talk about how you read a standard file on Linux systems. If you were developing a simple C program where you wanted to write some data to a file, there’d be a few basic steps you’d need to take. Let’s look at this sample program, which does exactly that: writes data to a file.
#include <fcntl.h>
#include <string.h>
#include <unistd.h>
#include <stdio.h>
int main(int argc, char **argv)
{
int fd = open("myfile.txt", O_RDWR | O_CREAT, 0644);
write(fd, "data", strlen("data"));
return 0;
}
First, we need to get (or open
) a file descriptor for the file. In the code above, we’re indicating that we want to create the file and then receive a file descriptor for it. In userspace, a file descriptor is simply a unique number that identifies an open file, but when that number reaches kernel space, it’s used as an index into an array that holds a C struct representing the file. The item contained in that array is what the kernel uses to uniquely identify a file. It is what the kernel operates on when changes to the file need to be made or data needs to be read or written.
Once we’ve obtained the file descriptor, we can perform read
, write
, and other operations on that file. That’s all fine and dandy, but what does it mean when I say that your printer or USB device is a file? In reality, this is just defining a common API. If everything can be represented as “a file,” then we can use the same set of function calls to interact with many different devices. The kernel perceives a file as something that has a defined struct file_operations
structure associated with it. The reason it needs that particular structure is because it contains the standard functions for how to operate on a “file.” In short, it passes the Duck test. Let’s look at the structure.
struct file_operations {
struct module *owner;
loff_t (*llseek) (struct file *, loff_t, int);
ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
int (*iopoll)(struct kiocb *kiocb, struct io_comp_batch *,
unsigned int flags);
int (*iterate) (struct file *, struct dir_context *);
int (*iterate_shared) (struct file *, struct dir_context *);
__poll_t (*poll) (struct file *, struct poll_table_struct *);
long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
unsigned long mmap_supported_flags;
int (*open) (struct inode *, struct file *);
int (*flush) (struct file *, fl_owner_t id);
int (*release) (struct inode *, struct file *);
…
// For the sake of brevity I’ll leave out the rest. You get the idea
} __randomize_layout;
This structure is full of function pointers! That’s because the purpose of this structure is to define the behavior of a particular file operation on your “file.” In the case of a “normal” file (i.e., an image or a text file), the read
function will ask the filesystem for the data contained in that file. In the case of a pipe, the read
function asks the kernel to look up where the pipe buffer is stored in memory and tries to read any data that might be in the pipe. For a socket, the read
function goes into the networking subsystem to retrieve data that has come across the socket based on the protocol.
In each case, there’s a distinct code path in the kernel that’s taken in order to handle the read
call—or whatever operation you’re performing. In reality, if you were to write your own kernel module to interact with your own device, you could define read
to write data somewhere and write
to turn on the toaster. The kernel module creator gets to decide what these functions do in the context of their device. To see this in action, let’s follow some code until it gets to the point where it splits off into the object specific behavior.
A pleasant stroll through the kernel
One of the great things about working with Linux is that it’s open source. For anyone who wants to understand how operating systems work, the Linux source code is a great resource—albeit a complicated one—for learning operating system internals. Our journey today will begin with the file located at fs/read_write.c
. When your user space code calls the read
function, this is the place in the kernel where the code path begins (more or less).
SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
{
return ksys_read(fd, buf, count);}
From there, we call ksys_read()
. This function is responsible for retrieving the struct fd
that corresponds with the file descriptor passed in by the user. The struct fd
structure contains the struct file_operations
structure within it.
ssize_t ksys_read(unsigned int fd, char __user *buf, size_t count)
{
struct fd f = fdget_pos(fd);
ssize_t ret = -EBADF;
if (f.file) {
loff_t pos, *ppos = file_ppos(f.file);
if (ppos) {
pos = *ppos;
ppos = &pos;
}
ret = vfs_read(f.file, buf, count, ppos);
if (ret >= 0 && ppos)
f.file->f_pos = pos;
fdput_pos(f);
}
return ret;
}
ksys_read
then calls vfs_read
, and this is where it begins to get interesting. Note the vfs
at the beginning of the function name. This indicates that we are entering the virtual file system portion of the kernel. vfs_read
is responsible for looking up the appropriate file_operations
structure so that it can know which read
implementation to call.
ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
{
ssize_t ret;
… //Skip stuff for brevity
if (file->f_op->read)
ret = file->f_op->read(file, buf, count, pos);
else if (file->f_op->read_iter)
ret = new_sync_read(file, buf, count, pos);
else
ret = -EINVAL;
… //Skip stuff for brevity
return ret;}
The line file->f_op->read
is the call that will branch off into many different paths in the kernel because it depends on who’s defining the call. There’s also a check to see if the read_iter
function exists. If read
is not defined but read_iter
is, then it will call new_sync_read
, which will eventually call file->f_op->read_iter()
. Let’s look at a few examples scattered around the kernel of where these functions get defined.
Pipes (/fs/pipe.c
)
const struct file_operations pipefifo_fops = {
.open = fifo_open,
.llseek = no_llseek,
.read_iter = pipe_read,
.write_iter = pipe_write,
.poll = pipe_poll,
.unlocked_ioctl = pipe_ioctl,
.release = pipe_release,
.fasync = pipe_fasync,
.splice_write = iter_file_splice_write,}
Network socket (/net/socket.c
)
static const struct file_operations socket_file_ops = {
.owner = THIS_MODULE,
.llseek = no_llseek,
.read_iter = sock_read_iter,
.write_iter = sock_write_iter,
.poll = sock_poll,
.unlocked_ioctl = sock_ioctl,
#ifdef CONFIG_COMPAT
.compat_ioctl = compat_sock_ioctl,
#endif
.mmap = sock_mmap,
.release = sock_close,
.fasync = sock_fasync,
.sendpage = sock_sendpage,
.splice_write = generic_splice_sendpage,
.splice_read = sock_splice_read,
.show_fdinfo = sock_show_fdinfo,
};
Shared memory (/mm/shmem.c
)
static const struct file_operations shmem_file_operations = {
.mmap = shmem_mmap,
.get_unmapped_area = shmem_get_unmapped_area,
#ifdef CONFIG_TMPFS
.llseek = shmem_file_llseek,
.read_iter = shmem_file_read_iter,
.write_iter = generic_file_write_iter,
.fsync = noop_fsync,
.splice_read = generic_file_splice_read,
.splice_write = iter_file_splice_write,
.fallocate = shmem_fallocate,
#endif
};
USB (/drivers/usb/core/devio.c
)
const struct file_operations usbdev_file_operations = {
.owner = THIS_MODULE,
.llseek = no_seek_end_llseek,
.read = usbdev_read,
.poll = usbdev_poll,
.unlocked_ioctl = usbdev_ioctl,
.compat_ioctl = compat_ptr_ioctl,
.mmap = usbdev_mmap,
.open = usbdev_open,
.release = usbdev_release,
};
TTY (/drivers/tty/tty_io.c
)
static const struct file_operations tty_fops = {
.llseek = no_llseek,
.read_iter = tty_read,
.write_iter = tty_write,
.splice_read = generic_file_splice_read,
.splice_write = iter_file_splice_write,
.poll = tty_poll,
.unlocked_ioctl = tty_ioctl,
.compat_ioctl = tty_compat_ioctl,
.open = tty_open,
.release = tty_release,
.fasync = tty_fasync,
.show_fdinfo = tty_show_fdinfo,
};
Notice how each device defines its own read/read_iter
, write/write_iter
, and other functions. It’s those functions that ultimately define what it means to read from such a device.
If you run grep -R "static const struct file_operations" *
in the kernel source directory, you’ll find many, many examples of these file_operations structures used in many of the kernel subsystems. As new devices are added to the kernel, they each define their own struct file_operations
in order to define the behavior of the device. Note, however, that not all functions need to be defined. A device can decide that it doesn’t support certain operations. This is usually done because that operation doesn’t make sense in the context of that device. For example, a driver interacting with a mouse will not need to implement filesystem operations such as flush or mmap.
What does this have to do with security?
Well, this is insightful and all, but I’m sure you’re wondering, “What does all this have to do with stopping the bad guys?” Good question. The answer is that there are a lot of ways to do the same thing on a Linux system, whether you’re a legit user or an adversary, and we need to make sure we aren’t blind to an attack vector when we’re monitoring a system.
Sending data on a TCP socket
Initiating network connections is something that adversaries almost always do—whether performing command and control (C2), exfiltrating data, or installing a backdoor. As defenders, we know this, and so we closely monitor network connections on a given machine. One approach to detecting the creation of a TCP socket is to monitor calls to things like socket
, bind
, connect
, accept
,send
, and recv
. However, from what we learned today, we know that instead of calling send
and recv
, an adversary could call read
, read_iter
, write
, write_iter
because a socket is just a file. So now we have to monitor those four functions as well—and those four functions are VERY noisy. A better approach would be to actually monitor the calls in the networking subsystem that you are interested in after it has gone through the virtual file system. This will reduce the amount of noise and improve monitoring performance.
This is just one of many tactics adversaries may employ to evade modern threat monitoring tools. Fortunately, Red Canary has developed Linux EDR with all of this in mind, and we take careful precautions to not be fooled by these kinds of tactics.
Every file everywhere, at all once
The design philosophy of “everything is a file” has actually proven to work quite well. It has survived for decades and has stayed the same at its core. New file operations have been created over the years to support new device types and operations for those devices. But the basic idea of how this portion of the VFS functions still exists and remains intact.