Few security products exist in the market that are focused on Linux—the most prevalent operating system (OS) used in the cloud. Of the Endpoint Protection Platforms (EPPs) that offer a solution for securing cloud workloads, most invest in Linux as a tertiary investment after Windows and macOS, resulting in a continued lag of features, capabilities, and support for new technologies such as Docker and Kubernetes. This attempt to retrofit a technology primarily built for end-user devices often results in suboptimal outcomes for customers. What’s more, of the available Cloud Workload Protection Platforms (CWPPs), most tend to focus on the left side of the equation—i.e., identifying risks and issues before they go into production, rather than monitoring workloads during runtime to identify threats. Ultimately, we’re left with few products that can deliver meaningful threat detection outcomes for cloud workloads.
In this post, I’ll explore what I see as the top seven reasons why so few businesses have been successful in protecting their cloud workloads, and offer up my perspective on the dynamics at play.
1. Linux attacks are poorly documented and not well understood
A well accepted truth of cybersecurity is that you can’t defend against what you don’t know. This presents a problem when it comes to cloud workload security, as there are many unknowns around the Linux operating system. While much security research has been done to document attacks on Windows, macOS, and end-user devices, there are few documented cases of what happens once adversaries move from Windows/macOS to their intended target: the production cloud environment.
Because of this lack of understanding in the public space, it’s difficult to know how adversaries are compromising Linux workloads, outside of bitcoin miner examples, which speak minimally to business impact or what’s possible. As a result, it’s hard for many to understand what is occurring in the wild, and what is the state of the art for defense, detection, and response.
2. Cloud environments require a lot of cooks in the kitchen
Cloud workloads typically serve both internal and external customers and stakeholders; they’re the lifeblood of how businesses operate. The infrastructure involved requires multiple cooks to reason about resiliency, availability, performance, security, and more.
While a usual IT workflow involves one downstream customer—the end user—by comparison, cloud environments have multiple internal and external customers, including engineers, DevOps/infrastructure teams, site reliability engineering analysts, and others. Getting the buy-in to be able to touch, modify, or introspect all machines requires a lot of social capital and know-how for your security team. In other words, you need to have built up relationships and have the technical know-how to coordinate across teams to make everyone feel good about safety, maintenance, support, upgrades, etc.
Further, if the security department does not have autonomy in this decision, and is largely an influencer, the question of budget responsibility often comes into play. Usually, the security team doesn’t have the budget necessary to float the cost, but when taking into account the large amounts of money that engineering is paying to cloud providers like AWS, this additional cloud workload security spend amounts to a small fraction of total expenditures.
It’s an unfortunate truth that many security teams don’t have visibility into, or have access to, cloud environments, as their charter is seen more-so as securing the company’s corporate assets, including end-user devices. While this dynamic is rapidly evolving, with more security staff writing code and having familiarity with cloud platforms such as AWS, as of now, it’s not uncommon to see many cloud workloads unprotected.
Many security teams don’t have visibility into, or have access to, cloud environments, as their charter is seen more-so as securing the company’s corporate assets, including end-user devices.
3. Licensing complicates things
The Linux kernel has APIs that are freely available with minimal-to-zero legal restrictions, while others have a General Public License (GPL), with copy-left requirements. Copy-left means that at a high level, any use of GPL APIs likely results in your product being GPL, effectively rendering it no longer proprietary and giving competitors and the broader public the ability to request the source code.
For intellectual property reasons, many vendors want their code to be proprietary but also want to operate at the kernel (ring 0) level. As a result, to avoid the copy-left repercussions of GPL, they hand roll complex logic instead of using well-maintained APIs. This hand-rolled logic is a bear to maintain, and as a result, these companies only support a limited number of Linux distributions, versions, and kernel versions. We see this across many EPP vendors, as well as a few CWPP vendors. This often results in customers deploying to only a portion of their environment, or attempting to cobble together multiple solutions to achieve 100% coverage.
4. Linux is a diverse ecosystem
Windows and macOS are maintained by Microsoft and Apple, respectively. Each has a limited set of distributions and versions that are supported at any given time. For example, macOS is currently at v10.15 (Catalina), while v10.16 (Big Sur) is coming out soon.
Linux, on the other hand, is different. The Linux kernel is open source, with Linux distributions built on top of the Linux kernel—each maintained by their respective contributors, which can be private companies, foundations, and others. This results in a greater diversity of Linux distributions, each with their own quirks, versioning, and requirements. Examples of Linux distributions include CentOS, RHEL, Debian, Amazon Linux, SUSE, Ubuntu, and so on. Building support for all of these in a product—either from a technology perspective or customer support perspective, can take time—depending on architectural decisions made by the vendor.
The story gets more complicated…
Each Linux distribution uses specific versions of the Linux kernel, which means that specific versions of a Linux distribution support particular APIs or features while other versions don’t. This doesn’t even touch on “cherry picked” features. This ultimately means the code needs to account for both the Linux distribution AND the Linux kernel version. Additionally, each distribution version has an opinion of how software is run as a service, ranging from upstart, to sysvinit, or systemd.
To sum this all up, the Linux ecosystem is constantly evolving, with new Linux distributions, distribution versions, kernel versions, and new ways of managing software. As a result, it takes a concerted effort for vendors to build a solution for cloud workload protection that benefits the broadest set of distributions and versions so they can provide complete coverage for organizations both large and small. Doing so requires tactical architecture and design decisions, lots of testing, support, etc. Further complicating this is the fact that many organizations are slow to upgrade their cloud infrastructure, and many are still running older versions of Linux like CentOS6 or Ubuntu 14, which commonly get left behind in support matrices.
It takes a concerted effort for vendors to build a solution for cloud workload protection that benefits the broadest set of distributions and versions so they can provide complete coverage for organizations both large and small.
5. Security is a moving target
With the evolution of cloud, virtual machines, and containers, the way in which workloads run continues to get more complicated. Just as there are many different versions and distributions of Linux, there are many different types of containers (i.e., Docker, LXC, Podman, etc.), many ways of orchestrating them (Kubernetes, AWS ECS, AWS EKS, etc.), and layers in-between like service meshes (Istio, Envoy, etc.)
This all means more effort, more code, and more support to make this happen from the vendor’s perspective. And as these technologies continue to evolve, or new ones are introduced, it’s hard for vendor products and services to keep up, especially if their core business is focused on serving the Windows and macOS market.
6. The stakes are high
Because cloud workloads serve both internal and external stakeholder/customer needs, the amount of compute used and the associated spend tend to be high. It’s not uncommon for organizations to have people or teams dedicated to squeezing every last drop of CPU/memory out of a machine prior to provisioning additional infrastructure at an added cost.
Remember that “hand-rolled, complex logic” I mentioned in reason no. 3? A simple code error could result in a full system crash, which you don’t want to happen for a whole slew of reasons but the one that matters the most to the business side is cost. If any piece of software introduces latency or CPU/memory spikes, it can impact the service’s performance and violate terms outlined in the Service-Level Agreement (SLA). This, of course, would make your customers angry and could be a blow to brand reputation, revenue, and beyond.
To help reduce these risks, any software that is used to monitor or protect cloud workloads needs to be performant (meaning low CPU and memory utilization) and must not impede existing operations, systems, and services already in play. The reason this presents a challenge is that cloud workloads vary dramatically—some are computationally expensive (high CPU utilization), while some are memory intensive; some are bursty, while others are consistent; some have 20,000 network connections per second (i.e., a proxy), while others are performing expensive data computations (i.e., data warehouse, machine learning).
These dynamics create an environment where few products can succeed; products need to take the time and effort to collect, measure, and provide transparent reporting on performance metrics, in addition to putting in safeguards to mitigate against unexpected behaviors. This is why it’s hard for EPP vendors to retrofit their technology, brand it as CWP, and successfully meet customer expectations.
7. Deploying software is still cumbersome
Whether it’s end-user devices, datacenter servers, or cloud workloads, deploying or managing software on any of these is a cumbersome process. Multiple people, teams, and processes need to be involved to help ensure things go right. With Windows and macOS environments, customers commonly use something like JAMF or SCCM to deploy, configure, and update software.
With cloud, there are a lot of opinions, tools, and ways of doing this:
- “Golden image”—you “bake” the software into the image that’s deployed
- Configuration management—software is managed with tools like Chef, Puppet, Ansible, Salt, etc.
- Container orchestration—you deploy as a containerized application (via Kubernetes or managed services like AWS ECS or AWS EKS)
With companies commonly having more than one cloud account, each dedicated to a use-case or business unit, it’s not uncommon to see multiple ways of managing and deploying software within a single company. One of those ways tends to be manually, due to a company’s failure to allocate the people and the time required to support a deployment methodology for every cloud account.
Cutting through the fog
These are just seven of many reasons why it’s challenging to secure cloud workloads. While I may have painted a slightly grim picture of the state of cloud workload security today, the rapid pace of innovation in this space means that the landscape is constantly changing. We have the opportunity to learn from these challenges to create better solutions.