In August 2024, a significant cybersecurity incident unfolded when a faulty update from CrowdStrike triggered widespread outages across Microsoft Windows systems, affecting millions of devices globally. As organizations scrambled to restore functionality, the outage highlighted critical lessons about the complexities of modern technology and cybersecurity infrastructure—notably the interconnected nature of our supply chain—and our resilience to unexpected failures. In response to this incident and to foster greater collaboration among key stakeholders, Microsoft has announced a Windows Endpoint Security Ecosystem Summit, scheduled for September 2024.
This article serves as a timely resource for leaders and security professionals as they prepare for the summit and consider the lessons learned from the CrowdStrike outage. By re-examining the concept of “defense in depth” through the lens of security operations centers (SOC), this piece offers strategic insights on how to ensure continuous protection, even when primary defenses fail. These considerations are crucial not only for planning at the upcoming summit but also for fortifying cybersecurity frameworks in the future.
The necessity of a comprehensive incident response plan
The CrowdStrike incident highlights the importance of having a robust incident response plan that not only covers your organization but also considers dependencies on third-party vendors. The outage, which impacted approximately 8.5 million devices worldwide, underscores that even trusted partners can inadvertently become sources of significant disruption.
Leaders should ensure their incident response plans are comprehensive, covering not just internal systems but also the effects of third-party failures. And a key component of any incident response plan is an effective communication strategy, encompassing both internal communications and communication to key external stakeholders, which may include customers, partners, the board of directors, regulators, and more.
Learn more about this framework in our Incident Response and Readiness Guide
Regularly updating and testing these plans is crucial, as is ensuring that roles and responsibilities are clearly defined.
Additionally, these plans should include detailed post-incident reviews to continuously improve response strategies.
A new way to think about defense in depth in the SOC
The concept of “defense in depth” traditionally emphasizes layered security controls that ensure if one layer fails, others remain to protect against threats. However, the recent CrowdStrike outage offers a unique perspective, especially for SOCs. When a primary control—like CrowdStrike’s endpoint protection—fails, SOCs must rely on additional controls and strategies to maintain visibility and ensure robust security operations. This situation requires SOCs to go beyond standard layered defenses and think dynamically about how they can continue to detect, prevent, and respond to threats when a key component of their security stack is taken out of play.
In this context, SOCs should adopt a multi-faceted approach that includes not just alternative detection mechanisms but also the ability to rapidly adapt their incident response strategies. For instance, if an endpoint detection and response (EDR) system fails, the SOC will need to pay additional attention to identity, network, and other controls to compensate for the lack of visibility.
SOCs should adopt a multi-faceted approach that includes not just alternative detection mechanisms but also the ability to rapidly adapt their incident response strategies.
System redundancy and resilience are critical components of a SOC’s defense-in-depth strategy. Relying solely on a single vendor or solution can beget a single point of failure; instead, a diversified approach, including the implementation of failover mechanisms and alternative systems, can help mitigate the risks associated with such outages. For example, it may be prudent to maintain the ability to rapidly deploy alternative security tools, including a fallback antimalware suite, or even a set of built-in endpoint controls that can be temporarily enabled to plug any gaps left by the failed control.
The critical role of expertise
Furthermore, the CrowdStrike outage underlines the importance of integrating human expertise into the defense-in-depth strategy. Automated tools are essential, but the ability of security analysts to interpret data from multiple sources, particularly when a control fails or an expected data source goes missing, is crucial. Training SOC teams to understand and utilize a critical cross-section of tools and data sources ensures they can continue to protect the organization effectively. This approach not only maintains security posture but also enhances the SOC’s overall agility, ensuring they can respond to evolving threats and threat vectors.
Training SOC teams to understand and utilize a critical cross-section of tools and data sources ensures they can continue to protect the organization effectively.
By rethinking defense in depth from a SOC perspective, organizations can better prepare for and mitigate the impacts of similar incidents, ensuring that they remain secure even in the face of significant challenges.
Learn more
By focusing on these lessons, organizations can better prepare for and manage similar crises in the future, ensuring they are more resilient and responsive to unexpected challenges. As you prepare, I encourage you to check out our Incident Response and Readiness Guide, which distills our learnings from countless IR engagements and tabletop exercises into helpful frameworks and recommendations.