Jimmy Astle, our Senior Director of Validation and Data Science, recently joined Chris Steffen on EM360’s The Next Phase of Cybersecurity podcast to discuss how Red Canary has leveraged generative AI (GenAI) internally, how to train language models on the specifics of your organizations’ security posture, and how this technology will evolve in the future.
You can listen to the full recording below or read the transcript, which has been edited for clarity.
Chris Steffen:
Hello and welcome to this EM360 podcast. My name is Chris Steffen and I’ll be the host for today’s podcast. I’m the Vice President of Information Security Research for Enterprise Management Associates, an analyst firm that looks at how companies manage their IT infrastructure from the data center out to the user environment into the cloud. And I obviously focus primarily on information security. Really looking forward to today’s podcast.
Today we’re talking with Jimmy Astle, Senior Director of Validation and Data Science at Red Canary. We’re going to talk about generative AI and specifically the practical applications of generative AI. Jimmy, thanks for joining me today. If you would give us a little bit about yourself before we get started.
Jimmy Astle:
Yeah, thanks for having me, Chris. Excited to be here. Excited to geek out with you. I’ve been at Red Canary for almost three years now, and I now lead their data science and validation initiatives. Prior to that, I led our threat research team as well as validation and data science. We’ve kind of forked off. I spent the early part of my career, about 10 years, working for the Department of Defense on offensive defensive capacity, doing experiments and validation stuff. So it kind of led me right into where we are today and excited to geek out more on this.
Chris Steffen:
Absolutely. I’m looking forward to it too. We had a pre-conversation before we got started with this podcast and quite bluntly, we should have just recorded that conversation. It was fantastic. Definitely gonna nerd out here for the next 15-20 minutes and hopefully give you guys some pretty interesting feedback when it’s all said and done.
Talk to me a little bit about GenAI and what Red Canary is doing in general, but also what your thoughts are about what the industry is doing and how GenAI is going to play a part in the information security realm.
Jimmy Astle:
Yeah, that’s the million-dollar question, right?
Chris Steffen:
For everybody, right? And so just so we’re all clear, center square at any of the conferences you’re going to be going to this year is going to be this. This is what we’re talking about. This is what everybody else is kind of talking about. And we’re not talking about it just to hear ourselves talk. We’re talking about it because this is one of those paradigm-shifting technologies that’s really going to make an impact on really how you approach pretty much everything in your environment.
Jimmy Astle:
Yeah, I agree. You know, I think the simplest way to explain this—and why we’re seeing this reaction to it is for the first time in the history of AI or ML— is that this technology has gone from a very scientific researcher focus where you need a lot of expertise and PhDs to work on it and get it working and use it to your advantage. It’s now shifted to more of a practitioner or an engineering-led technology where you can just use simple natural language and a chat interface and do things that were unheard of years ago. Now you can just put in data, ask it how to fix your car, all these sorts of things, and it will just do it for you. You can see where this application, when you layer it into small incremental steps across your daily life, whether it’s at work or your personal task list things, is going to have a huge improvement.
The big thing is the 5 percent rule: When you apply generative AI to something and you don’t get more than a 5 percent improvement out of it, don’t be upset. Those 5 percent improvements in efficiency add up over time, especially when you apply them to different things.
When you apply generative AI to something and you don’t get more than a 5 percent improvement out of it, don’t be upset. Those 5 percent improvements in efficiency add up over time, especially when you apply them to different things.
Chris Steffen:
They really do. I hear all kinds of interesting use cases and we’ll talk about that in a minute, but think about just the simple tasks that GenAI has the ability to do. There’s a use case that I’ve been hearing more and more about, and I love it. Talking as a practitioner, we’re responsible for giving information to our executive team and distilling a common CVE into something that actually sounds like, you know, pointy hair boss speak. You can dump a CVE into a GenAI model and actually say: Make this sound like something my not-nerdy boss can understand. And it does, and you can do it. And then you can say, this is why this is going to impact our company, and it’ll do that too. And so that’s a really simple way of taking and using GenAI.
But think about it from this perspective as a practitioner, how much you hate doing that, right? You’re basically wanting to pick up the CVE. I need to get to work on it, this is what I need to do to solve the problem. Here are exactly the steps that are laid out for me from whatever detection solution that I have. And that’s what I want to do. I don’t want to sit there hand-hold my board of directors through this is what CVE means. This is why we need to do these kinds of things. If I have my AI bot that’s doing that for me, and doing it in a way that gets it even 50 percent of the way there, much less 75 percent, maybe even 80-90 percent of the way there, think of how much brain damage that solves for me.
So that’s just one, right? Talk to me a little bit about some of the other applications that you’re seeing for a GenAI perspective.
Jimmy Astle:
For folks that aren’t super familiar with Red Canary, we’re a managed detection and response provider. We have an entire platform where you send us lots of data and we will detect threats in it and we integrate with your typical security tools. And so as you can see, when you have lots of data, plus security problems, artificial intelligence and machine learning is a great way to target some of those. About a year ago, we started experimenting with these foundational models from OpenAI.
And like anything, when you start to dip your toes in these, you need to get a feel for the model. I know that sounds kind of weird, but you need to start prompting it and saying like: Hey, do you know about Red Canary? Do you know about cybersecurity? Do you know about this type of threat? Do you know what a CVE is and see what the model says? And if the model comes back with accurate answers, you can start to make assumptions in your prompting when you’re using those models if it already knows about these things. So I can ask it more direct questions.
Chris Steffen:
Let me stop you right there. One of the things that people don’t understand about GenAI in general is it’s really only as good as the training that you give to it. It’s as good as the data that you give to it. So let’s just use the example that you were using. I have been around Red Canary since the very get -go, and I can tell you that you guys have shifted over time as to what Red Canary does and what your strengths are and so on and so forth. As well you should, right? I mean, you evolve as a company. I could take and be training your GenAI model with the archival website from 8-10 years ago, and it would be very much a shift. But just to be very clear, when you take a ChatGPT model, that’s exactly what it was trained on. Understanding that and understanding that it may not have the most up-to-date information is critically important.
If not, all you’re going to go off of is something that is two, three, five, even 10 years old. And that’s not really how you want to start. So training that model, getting that baseline, as you mentioned, is critically important.
Jimmy Astle:
You can actually ask the model, Hey, when was your training date? And it will come back and tell you when it was. That’s another good way to discover and get comfortable with these. To your point, Chris, about how we’re thinking about using it, analysts are overloaded with information every day in their work. They have too much to do and not enough time to do it. And so one of the things that generative AI is really good at is summarizing information. Your initial example was spot on.
But imagine making that experience custom-tailored to the persona behind the keyboard. In your case, if I’m an analyst, I want to send a report to my board, right? But what if you are a security analyst and you’re in charge of investigating an event? What if you can augment the knowledge of that model with your own data at runtime or at inference time? That’s when you go and execute the model. That’s what we’re working on here where we have these incredible amounts of data stuffed away in our data lakes.
And so given an event, we can go pull all of the enrichment information that we know about across our vast amounts of data, clean it all up, format it all up and enrich it, and then shove it into one of these models and help it explain what’s going on. Offer insights on what it’s doing. Baseline user behaviors. It can do all of that. Again, you need to keep your use cases narrow. But when you add these up or chain them together, you get some incredible experiences that really do help make your jobs easier.
Chris Steffen:
Yeah, you’re talking to an original firewall monkey, right? And so, you know, sitting there and parsing allows and denies for a hundred billion zillion packets. It’s the nature of what it is. I promise you that an AI bot can do that a billion times faster and a billion times more efficiently than this bearded wonder, right? Because you get brain dead after, you know, the first 500 packets. It doesn’t change the fact that you’ve got to go through firewall logs and you’ve got to do those things.
But taking and handing that process off to some kind of AI—again, with strict guardrails—to try and understand these things. I want to do these things, so on and so forth, but that’s what it’s made for. That is the absolute epitome, best-case scenario of doing log parsing and getting those kinds of things done to take and free that very trained CISSP guy that with all the letters behind his name to go do.
Jimmy Astle:
Oh yeah.
Chris Steffen:
Incident response takes a lot of human interaction when it’s all said and done. That’s the kind of thing where you do not want that person parsing firewall logs.
Jimmy Astle:
Yeah, that’s right. It really will take the busy work out of what we do every day. And as you know, in the security analyst world, a lot of the decision making, there’s a ton of upfront costs that happens there, right? You invest in SIEMs and store products to help automate or correlate and bring that data together so you can make decisions. And so this technology specifically is going to have a significant influence on our internal processes and the different flows they go through.
And maybe at different stages, we get different summaries or say: Hey, let’s cut it off here and just do something. As an example, identity detection and response is the big thing nowadays. I think traditionally we’ve always focused on the endpoint because that’s where the strongest signal was, and that’s where tools were really mature. But now, you know, I always like to say this internally: computers don’t infect themselves with malware, people do. What if I can start profiling those people?
I can start to give you more insights into the risk of that person. And that’s really where we’re heading as an industry, because of the SaaS world and all the end-to-end connectivities that we have. And so what we’re seeing is you can feed these models tons of login data for Jimmy, for example.
And I want to pause there for a second. We’re not using ChatGPT. That’s an interface wrapped on top of foundational models. We’re using our own foundational models inside of Azure that are private to us in a walled garden. There’s a limit for privacy in those things. You should absolutely be aware of that. But when you’re using tools through Microsoft, AWS, Google, their cloud-based services, that’s where you get that added protection of privacy and all of the different layers that they bring for securing data inside of their data centers. So we’re using Azure and AWS as a foundational piece to do this analysis.
You can give these foundational models Jimmy’s logins and some of the characteristics of his logins and say: Give me a baseline of what Jimmy typically does in a week. And these models will come back and they will tell you your baseline. You don’t need to go write custom code or custom tools. This is one of the most important pieces as a security analyst, you get an alert that says Jimmy just logged in from California. This is unusual. I can tell you right now for your listeners, one of the most monotonous tasks in a SOC right now is running down impossible travel, improbable travel, unusual login characteristics, because you have to, humans are unpredictable, right? You just do different things.
Chris Steffen:
Yeah, it’s the corner case, right? And so you want the abnormal thing that is likely permissible, but still abnormal. You want it to pop out. Your CIO goes on a trip to Japan for whatever the reason is, and that’s great. All of a sudden you see that he’s logging in. And despite the fact that he may have every exception to the rule—because we know that executives have exceptions to every rule—it pops up saying something is weird. “Weird” being a technical term, and we need to address it. If it’s okay, then we’re good, but we need to at least find out if we know that he’s there. He’s at a conference. Everything’s good. He should be accessing all that. And then you move on, but that is what the AI is for, right? It goes out there and out of a bucket of a million events, it says these two things are weird. From weird, you can run a playbook that says when something is weird, do this, or it could be that you limit that access or pass it off to Jimmy to deal with. Okay, great. That’s fine, because Jimmy didn’t have to look at the other 999,998 events to find what was weird.
Jimmy Astle:
That’s right. Jimmy didn’t have to be an expert in understanding how to query Splunk, right? Or an Elastic search. The AI can just go do that. If you explain how to query data to it, you give it some example queries and you say: Here’s the information. Can you generate a query? It will do that for you.
Chris Steffen:
You brought up something that’s really important that people are really skittish about: intellectual property and sensitive data being dumped out to the AI. And I know that you guys aren’t doing that. I just wanted to re-emphasize that when you’re writing your term paper for your high school class, if you happen to be listening to this podcast, congratulations to you. If you’re one of those guys, fine, use GenAI, use ChatGPT, use Gemini, whatever you’re going to use. That’s a great use case for that.
If you’re playing with the big boys and you’re trying to take and protect your information and protect your clients and whatever else, by bringing this in-house you have control of that intellectual property, this is becoming the standard. And you’re going to see it more and more. When you start thinking about the key foundations for using AI in your security operations…give me a couple of those. I mean, we talked a little bit about protecting the data. Talk to me about a few more of them.
Jimmy Astle:
I think reporting is huge. You can have those tailored reports. For an analyst, I think about alert review. Alerts come in, you enrich a few of them and you kind of say: Hey, here’s what’s going on.
Also, intelligence research. A lot of times you get requests for intelligence or Hey, this latest news story for insert-your-vendor-firewall has a CVE or an RCE. What do you do about it?
You can build AI in such a way today with open source tools where instead of going to ChatGPT, you build your own custom interface. You can control exactly how the generative AI executes. When it goes off, you can tell it to search the internet. If someone asks you to search the internet, go summarize that, reflect on your response, and take your time.
Those are some of the big pieces, but I guess what I’m kind of digging into here is when you build your own AI workflows, you eliminate a lot of risk for hallucination because you’re controlling those prompts. You’re controlling exactly how the AI goes from point A to point B. Therefore, a lot of the general things that you might see or new stories and things that come up about whether the AI did this or somebody put it in, those can happen. But you really need to start to think about building these custom workflows, just like any other application that you would write in the enterprise. You wouldn’t just take something off the shelf or some open source thing and spin it up and run it. You’re gonna customize it, you’re gonna build your enterprise workflows in it and adds that layer into the software stack, if that makes sense.
When you build your own AI workflows, you eliminate a lot of risk for hallucination because you’re controlling those prompts.
Chris Steffen:
It does. Yeah, it really does. We can talk about this all day. So I have one last question before we wrap up on this podcast. We’ve talked about the past. We talked about what we’re doing today. Let’s talk a little bit about the future and how you think that GenAI is going to be used to improve security in general. I know that I have ideas. I’m sure that Red Canary has some ideas. Talk to me just in general, what do you think AI is going to be used for?
Jimmy Astle:
I think that we’re going to have custom-built foundational models that encompass your company’s knowledge; you can take all the characteristics of your company security data, whatever, and put it in a custom-built model. And I think that you’re going to have AI talking to AI. A lot of the investigation is interfacing with some other company’s model or their brain and it telling you what it sees what’s going on.
But I think that’s probably five-to-eights years out. I think in 2024, a big thing that we’re going to see is we’re going to see a big uprising of open source models being used instead of closed source models. An example of an open source model would be Meta’s Llama 3. It was released about a week ago and it’s already performing as good, if not better than GPT 3.5, which is huge. And this is free for anyone to use.
I think you’ll see impacts to the Linux kernel and the operating system world. I think you’re going to see Meta’s foundational model become a core foundation of what a lot of companies build their AI tools off of. On top of that, you’ll see agent workflows. This is where you start to build out very specific use cases and you tell the AI you have these tools to use.
If you meet one of these, go use it and come back on it. And then after that, I think you’re going to start to see fine-tuned models specifically for security tasks in the SOC. And you’ll see startups coming out, providing these.
The other thing that we didn’t go too much into is APIs. Every security tool in the world has an API. It turns out that LLM models are fine-tuned to talk to APIs. The OpenAI ones are, through natural language.
You can interface with an API of a product, bring all that information in, and now you’re getting this intimate experience. You can kind of see where these building blocks are stacking up. And as they stack up, we’re going to get to a point, and I know autonomous SOC is a thing, a lot of people say it’ll never happen. I wouldn’t say it’ll never happen. It’s not gonna happen overnight.
Chris Steffen:
Oh, I’ll never say never. No way. I mentioned this before, the idea that you can basically hand off all your log management and stuff to some kind of AI solution or aggregating solution. When I was doing that—I’m going to date myself—15-18 years ago, but the idea that I didn’t have to do any of that anymore, that was something that I dreamed about, but never thought that I was going to be able to do it. And now it’s so commonplace that it’s not even worth talking about.
Jimmy Astle:
What you’ll see is a lot of the technologies—the classical technologies where you throw your logs or your dashboards or your email solutions and things—those are all going to get generative AI and feed into them. So you won’t have to go out of your way anymore, it’ll just kind of happen. A lot of the Microsoft products, the Office suite especially has it all built-in now. And it’s kind of incredible when you start to kind of amend your workflows into using those. That’s what we’ll continue to see throughout the year, all of these incremental places where generative AI gets inserted when you start to use them. So as humans, we’re going to have to change the way we work and operate, but we’ll start to see those big advantages. Into the future, lots of automation and a lot of AI talking to AI with humans in the middle. I think a lot of new roles in the security operations center will emerge because they don’t exist right now, because we just don’t know how these tools will be applied or what the outputs of them are going to be.
Chris Steffen:
Well said. Thanks for nerding out with me, Jimmy, great insights today.