On Friday 19th July 2024, a routine update triggered an unexpected global outage, impacting numerous organisations that rely on its advanced threat intelligence and endpoint protection capabilities. This incident not only underscores the complexities inherent in software updates but also highlights the critical need for robust contingency planning.
The outage, which lasted several hours, rendered endpoint protection services inactive, leaving many businesses with the BSOD (Blue Screen of Death). Engineering teams worked quickly to identify and resolve the issue, eventually restoring full functionality via a patch, that was required to be deployed to infected devices.
Key Learnings from the outage:
- Risks of software updates:
While updates are essential for enhancing security and functionality, they can also introduce new vulnerabilities. This incident illustrates the potential risks associated with even minor changes to critical security software.
- Importance of robust testing:
Thorough testing in a controlled environment is crucial before rolling out updates on a large scale. Simulating real-world conditions can help identify potential issues that may not be evident in initial testing phases.
- Effective communication:
The response included timely updates to affected customers, keeping them informed of progress and expected resolution timelines. Transparent communication is vital during such incidents to maintain customer trust and provide guidance on interim protective measures.
- Contingency planning:
The outage highlights the necessity for global and local firms to have contingency plans in place. Backup security measures and protocols for maintaining essential operations during a disruption can mitigate the impact of such incidents.
- Collaborative response:
Coordination with affected customers and the broader cybersecurity community facilitated a more efficient resolution. Collaboration and information sharing are key components of an effective response strategy.
Best Practices for organisations
In light of the outage, organisations should consider the following best practices to enhance their resilience against similar disruptions:
- Implement staging environments:
Although it would not of have helped in this instance, it is good practice to always test updates in a staging environment that closely mirrors the live environment before deploying updates to production systems, as this can help identify potential issues before they affect critical operations.
- Develop redundancy protocols:
Ensure that you have redundant systems and alternative security measures in place to maintain protection during outages or disruptions. This could include secondary antivirus solutions or manual monitoring processes.
- Establish clear communication channels:
Maintain clear communication channels with your security vendors to receive timely updates and support during incidents. Ensure that all relevant stakeholders are informed and can act quickly based on the latest information.
- Regularly review and update incident response plans:
Continuously refine your incident response plans to account for new types of disruptions. Conduct regular drills to ensure your team is prepared to handle unexpected outages effectively.
- Monitor vendor updates:
Stay informed about upcoming updates from your security vendors and evaluate their potential impact on your systems. If possible, schedule updates during periods of low activity to minimise disruption.
- Supply chain and vendor risk:
All organisations should assess and consider and plan for cyber incidents that can impact on their supply chains and ensure their partners, vendors and suppliers do the same.
The global outage serves as a crucial reminder of the complexities involved in maintaining and updating cybersecurity solutions. While such incidents can be disruptive, they also provide valuable lessons that can enhance future resilience. By adopting robust testing practices, developing comprehensive contingency plans and fostering transparent communication, businesses can better navigate the challenges of maintaining their security posture in an ever-evolving landscape.
Stay proactive, stay prepared and stay secure!
To speak to one of cyber security experts, reach out to Academia to arrange a discussion – find out more about our cyber security solution portfolio, please click here.
This article was written by James Ferguson, Cyber Security Business Development Manager, Academia