Be prepared: The key to cloud and enterprise incident response

Be prepared—it’s not just the Scout motto or a haunting musical number from The Lion King. The short refrain holds water as the cornerstone of an effective incident response program. From cloud incident response to enterprise-level readiness, Red Canary Principal Readiness Engineer Gerry Johansen shared his expertise with Catalin Cimpanu during a Risky Business News sponsored interview that hits the pressing topics in incident response today. Listen here, or ead the interview transcript, which has been edited for clarity:

Playing with fire

Catalin Cimpanu

Gerry, we’re here to talk about incident response in this episode, and I’m going to be completely candid with you. If you would have asked me two or three years ago where incident response starts, I would have replied immediately: “the moment the company gets compromised,” which brings us to your job title of readiness engineer. Now, the more you think of it, the more it makes sense that of course you’re going to have a preparation phase for your incident response plans, but let’s hear it from an actual readiness engineer. What does this phase typically include?

Gerry Johansen

One of the misconceptions is that preparedness is strictly, “If I have an incident response plan, I have some associated playbooks. They sit on a shelf or they sit on a SharePoint site. We’re good. We take a look at it once a year and we’re prepared, we have a full understanding of how we can actually respond to an incident.”

Much more, preparedness is training in drilling what you have here. If we think about something ubiquitous such as a fire department; generally, every community has some sort of capability to deal with fire. It’s a high-impact event. The fire department doesn’t just keep the trucks clean and sit in the garage waiting for the alarm to go off.

A lot of the time that firefighters are not actually engaged in firefighting, they are drilling. Whether that’s individual skilling, such as how they handle their equipment, how they work with specific tools, all the way up to coordination. You’ve got several different teams all doing an aspect of firefighting. And they all have to work together. They all have to communicate. They all have to make sure they’re coordinating actions, make sure they’re not working against each other. If we take that modality, we take that way of looking at incident response, we have the same challenges. When you’re talking about a full-on enterprise-wide ransomware situation, you have teams that are doing containment and you have teams that are potentially restoring critical systems. You also have an analysis team that’s trying to pull evidence. All of these have to work in concert with each other and if you’ve never practiced that—if you’ve never even thought about how that all works—look at your communication strategies. What ends up happening is things get a lot more muddied. They get a lot more complicated, and you have teams that are either working against each other or not working in concert.

So overall, preparedness, or readiness, is ensuring that those processes you’ve crafted, those plans, playbooks, and workflows that you’ve spent a lot of time on are actually usable and that you can execute them during an incident.

Catalin Cimpanu

From what you’re telling me, incident response and preparedness takes way more effort than the actual incident response execution. It’s definitely more than attending a training session and putting your playbook on the shelf somewhere. Are companies that come to you shocked by the effort they have to put into preparing? Are companies coming to you just for an easy way, like some sort of cheat sheet for the bad times, and then go, “Oh, wow! This is an ongoing effort that’s going to take place every week, every month, every year from now on.”

Gerry Johansen

I would say a lot of organizations are not necessarily shocked. I think there has been a change in a good number of business verticals—and even across organizational size—that recognize that this is something they have to do. When they come to us, oftentimes it’s because of the expertise that we have in-house on various teams here. They recognize that this is something that they need to pay attention to. I wouldn’t say shocked, but I think they do understand when we start running through some of these exercises—when we start poking and prodding their processes—is where they have failures, where they haven’t considered something. For example, you may be talking about offline backups and you realize the offline backups are stored in an individual safe where only one individual has the combination to it. Maybe it’s the network engineer or somebody at that level. You can access that individual Monday through Friday, 8 to 5, but if it’s Sunday morning at 2 a.m., do you even know who to go to? It’s those kind of things that they really haven’t fleshed out in their processes that really do catch them off guard.

In terms of timing, there is an initial hesitation sometimes with the amount of time necessary. But when you think about how readiness can be structured and how that drilling and training can be structured, we’re talking maybe 20 minutes per week for individuals and maybe just 20-to-30 minutes once a month for teams. When you break it down like that, I think it’s a lot more palatable than saying we’re going to need five, six, seven or a full-day activity with everybody from your network team, your desktop team, your security team, communications team, legal team, and executive committee. When we pull all those, that’s when they tend to go, “Oh, we can’t support that.”

A lot of what we’ve been talking about is how to structure this, where we’re maximizing the training value with an eye on understanding that not every security operations team can take two or three days to run through drills once a month. Let’s break these out into tasks that we can look at 20-to-30 minutes per month, or 20-to-30 minutes per week. Over time, you’re building in that skill set, you’re building in that familiarization and the ability to execute without too much discussion or too much time to go from detection to response.

Making time for drills

Catalin Cimpanu

So is time needed to allocate to incident response preparedness the most common complaint that you get? Do you hear anything else?

Gerry Johansen

You look at the math and go, “Hey, we can break this out where it is maximizing the amount of time.” I wouldn’t say it’s a complaint, but one of the other observations we get is that it’s very technical. There’s a lot of moving pieces when we’re talking—just even from the response standpoint—all of this coordination, all of this tooling, all of these processes.

So a lot of the time when we start poking and prodding to say, “You need some work on crisis communications, you need some work on evidence collection or threat hunting at scale.” Organizations tend to realize very quickly “Oh, we have plans, we have playbooks, we have our plan up on the shelf, so that should be sufficient.” This is where they often will have that realization of how complex this can get, especially when we’re talking about something as serious in impact as an enterprise ransomware incident or data theft.

Catalin Cimpanu

Do you and the other Red Canary readiness engineers ever participate in the actual incident response, or do you come later or look at the report and see what new techniques, what new things you can incorporate in your materials going forward?

Gerry Johansen

Readiness aims to pull from the intel side of Red Canary. We have a very healthy, competent, well-regarded intelligence side to craft up these scenarios. In terms of working with customers that are actually experiencing a breach, that’s outside of the Readiness team. We have a number of different personnel within Red Canary who handle that. Often we’re pulling our inspiration for drills or full-on exercises directly from our intelligence profiles. Our intel folks work regularly with them. For example, last week we were talking about how we’re seeing a malicious LNK file associated with RedLine, and that makes it very realistic for the user and those teams going through there. So we take a lot of our intel about what we’re seeing from real-world scenarios and boil it down into specific tactics and techniques. We also leverage stories in the news so that we can say “This is something we need to think about.”

The way we structure these exercises and the way we structure readiness in particular is being very responsive to immediate needs. For example, if you look at a high-profile vulnerability or a high-profile threat actor, the way we work with our customers is taking something like that and turning it around in 48 hours to have something that they can actually exercise as part of of their preparation for, potentially, a new threat actor that comes out trying a novel TTP set. This way we can be much more responsive.

This goes back to the time discussion. You can exercise that in 72 hours and work with some realistic scenario versus waiting six months for that tabletop exercise. In that time, the TTPs may have changed, or the threat actor may have moved on, or you may have actually experienced some real-world activity in that intervening time.

Heads in the cloud

Catalin Cimpanu

Having your brand updated in real time is not something I would have thought to have been possible. Because you often think of IR playbooks as something that come in every month and then the IT team takes another month to read them, and so on, and so on. I was wondering if the recent widespread adoption of cloud infrastructure is making IR easier or harder for you?

Gerry Johansen

I would say it’s more complex. When we’re looking at cloud infrastructure, it is something that organizations really do have to understand. What we have in terms of cloud is a lot more telemetry and data being pulled in. For example, you look at all of the log sourcing that Azure has, everything from the infrastructure, to how you access the infrastructure, to endpoint or applications that you’re hosting in Azure. There’s a lot of data there. What incident response has had to do in the last couple of years is understand what telemetry is available and actually how to analyze that telemetry. So that does make it a little bit more complex. It adds another layer or two of data that we would have to go through.

One of the things that the cloud has—I wouldn’t say simplified—but it allows organizations to conduct a lot of activities at scale. There’s a lot of tools that run cloud natively or can be deployed in the cloud, such as remote evidence collection, telemetry aggregation through something like Elastic in the cloud, or Splunk in the cloud, that can be deployed or utilized. So it’s a double-edged sword; it does add a degree of complexity to what we need to do. We have to understand log sources. We have to understand what those log sources are telling us and extrapolate threat actor behavior through there. But it also gives us the ability to deploy tools and deploy solutions in near real time to an incident. It gives us that expandability and the ability that we don’t have to deploy tools in the enterprise to get them to work. We can deploy tools through any number of cloud platforms and utilize that infrastructure as part of incident response. So going back to the very beginning of our discussion, as organizations adopt more of these infrastructures, whether it’s AWS or Azure, thinking about this as a solution or a potential advantage if there is an issue.

Catalin Cimpanu

You previously mentioned Red Canary’s intel capabilities, which grant you access and a view of the threat actor landscape and trends. That’s quite unique. Is there something you see there that’s giving you headaches from a preparedness point of view? Is there a particular type of incident or threat actor that may be challenging to prepare for?

Gerry Johansen

At a macro level, the way that threat actors are adaptable is the headache. For example, when Microsoft limited the use of macros, that was a primary threat vector. It was macro-enabled Excel or Word documents and PowerShell, that combination of tooling. Very quickly, adversaries adopted new methods—whether it’s the mark-of-the-web bypass, using ISO files, LNK files, or HTML files—to get that initial foothold.

It becomes a game of whack-a-mole, for lack of a better term. Every time we hit that mole another one pops up. That is what every organization, not just Red Canary, is facing in terms of dealing with threat actors. Our intel team sees a commonality across a lot of different threats out there. We have our top 10 techniques that are leveraged, our top 10 tools. Qbot is in the news, that was a big one last year. We see a lot of very similar types of TTPs across a lot of different threat actors. What it tells us is there is a delta, or a fat middle of TTPs that we see a lot of threat actors leverage. When you have that deep insight and coordinate with other external sources, we’re seeing that commonality.

Catalin Cimpanu

When you have a new client, do you ever recommend tools or procedures? Or are there situations where you put emphasis on implementing procedures and then recommend various tools? Is that a common package when doing incident response?

Gerry Johansen

One of the advantages we have here is being rather tool agnostic in that we aren’t necessarily married to a specific tool. With a customer that’s coming to us that has an MDR solution, generally that’s a really good starting point for us to have. What we do is focused mainly on looking at processes, and where you will see us recommend tooling is if there is a significant hole in that process. Often we don’t need to make a recommendation on a specific tool, what we need to do is make recommendations on optimizing what they already have. A lot of times organizations come to us with a very, very good technology stack, so we may have a discussion during an exercise about command and control (C2) traffic. “Are you able to have visibility into north-south network traffic, traffic coming in, or ingressing or egressing your firewall?” And often organizations will say something to the effect of, “Well, you know, we have an excellent firewall, but we’re really not making use of deep packet inspection or doing some analytics around network connections.” So that’s looking at the process and saying, “Okay, well you have this tool, let’s optimize it, let’s put some energy and resources into making that tool work for you.”

Same thing with an EDR solution. They may say: “Well, we’ve never really worked with collecting evidence or collecting artifacts via the EDR tool.” Well, there’s your process. There’s a process that you need to develop and you need to build some motor muscle memory into that process to actually execute on that. Very rarely do I have to recommend a new solution. Often it’s about optimizing what they have, and tying a process around it so that they can execute it during the stress of an incident. There are rare circumstances where we will say there are options; there are open source solutions, in particular for organizations that are dollar conscious and do not have the budget, that will fill those gaps. But for the most part, if you have good processes and you can optimize your technology, you’re going to be much better prepared.

It’s not just about technical skills

Catalin Cimpanu

Gerry, I know you previously worked in law enforcement before joining Red Canary. Do you think there’s a particular type of personality or type of skills that incident response and especially preparedness engineers need to possess?

Gerry Johansen

If you’re going to take on the role of an incident leader or readiness specialist or engineer, a lot of it is understanding not just the technical aspects, but incident management. In the United States, the Federal Emergency Management Agency has specific training. It’s ICS 100 on Incident Command, so something like that. It’s the ability to, for lack of a better term, keep calm when the entire house is burning down around you. That is the major personality or skill set that you need to bring to this. You are going to be getting requests, pressure from all different organizations, all different personalities, and the ability to move through that is very helpful.

My law enforcement career in particular prepared me because it is a crisis-driven environment in a lot of ways, and I’ve had to deal with everything from a simple chemical spill on the highway to highly charged situations. The ability to stay calm is important. If you’re interested in learning about this, google incident management techniques, anything from the ICS 100 to a lot of different other trainings out there. That will help you understand the interplay, how you’re supposed to communicate, how you’re able to divide out and assign tasks. It’s crisis project management. The major stumbling block that individuals who want to work in a readiness incident command role have is they’ll go through this training and then not drill it. Because this is something that you can’t really do under stress if you haven’t trained, drilled, and practiced those skills over the course of a year or two years before an actual incident.

Catalin Cimpanu

So it’s more communication skills than technical skills.

Gerry Johansen

Communication is the big one.

Catalin Cimpanu

I think that’s a great way to end it. Gerry, thank you very much for your time today.

Gerry Johansen

All right. Thank you very much.