Ask A Tech | Massive outage hits IT systems locally and worldwide, sparking global disruptions

By Nathan Vincent

Jul 22, 2024

The dreaded blue screen of death. Photo by Svitlana Hruts

Just like every IT pro in Australia, it was a pretty quiet Friday afternoon. Then, around 3pm on Friday, July 19, my work computer suddenly displayed the dreaded blue screen of death. I restarted it, only to encounter the same issue again. Soon after, the phones began ringing — first a trickle, then a flood of calls. Every computer and server in the MMG Group was experiencing the same problem. My initial thought was, “Holy crap, we’ve been attacked.”

After a few calls to external security teams, I discovered they were facing the same issue. The common cause turned out to be CrowdStrike Falcon XDR (a fancy term for antivirus). I quickly found a computer that wasn't affected and turned to my old friend ChatGPT for help. With its guidance, I was able to disable CrowdStrike and get my machine back up and running.

It seemed like a simple task, but it was anything but. Security software is designed to be difficult to disable to prevent malware infections from doing the same. To disable CrowdStrike, I had to boot Windows into Safe Mode and edit the registry to prevent CrowdStrike from starting up. This got my computer back up and running. At that time, the issue was so new that there was little information available online, and no updates from the CrowdStrike team.

Over the next few hours, we were able to restore core systems across the MMG Group and get us back online but the majority of staff devices were still affected and so workers got to go home early.

About an hour later, once we had time to catch our breath, news stories began to surface, revealing the magnitude of the issue. Initially, I thought it was an isolated incident, but social media started reporting that it affected hospitals, airports and small businesses — no-one was immune. I saw one IT professional post on Twitter that over 4000 devices at the hospital he worked for were impacted. Major news media, including ABC and Sky News, were hit hard and went offline. All major printing plants in Australia were also affected.

So, what happened? Around 3pm on Friday, July 19, an update was rolled out to the CrowdStrike platform. Every device downloaded the update, which included a small 4KB file that caused the blue screen of death. This triggered an endless reboot loop where Windows would briefly start before encountering the BSOD again. Although CrowdStrike removed the update, the damage was already done. Windows devices couldn’t stay online long enough to receive the command to remove the file, which is why the problem was so widespread.

CrowdStrike then released a workaround on its platform for manually removing the affected file, but this required physical access to each machine. As a result, IT departments faced a massive effort to address and fix the issue.

There was some confusion reported in the media, with claims that the issue was not caused by CrowdStrike but by a Microsoft update. Earlier in the day, there had indeed been an issue with an update to the Microsoft Azure platform, causing outages, but this was not as widespread. At MMG, we were not affected by the Microsoft outage.

Fortunately, at this stage, it appears to have been a software update to the CrowdStrike platform and not a malicious attack. However, we have seen similar situations before where a security platform has been hacked, and malware has been distributed to all devices using the software. For instance, in September 2019, the SolarWinds Orion software was breached, and malicious packages were distributed to end-user devices and servers. This breach was particularly severe because other security vendors trusted the SolarWinds software, allowing the malware to bypass normal checks and infect systems undetected.

Patch Wednesday, the monthly cycle for Microsoft updates, has also had its issues. Just two months ago, a patch was released that caused a reboot loop on Windows servers. Fortunately, Microsoft offers some fail-safes, allowing users to roll back to a previous version, which helped mitigate the problem.

This incident highlights how interconnected our world is and how quickly issues can spread from computer to computer and country to country within minutes. It may take months to fully understand the global impact of this event. For now, IT departments such as ours are on the front lines, working tirelessly to get systems back online. While some IT professionals have reported seeing a few systems recover, many remain down, and it could take weeks or even months to restore full functionality.

Unfortunately, as data breaches become more common, IT outages are becoming a part of our everyday lives. It’s unclear how we can fully protect against or prepare for them, but we must remain vigilant and be ready for the next one.

As always, I hope you learned something new. If you have any questions or suggestions, please reach out to me at askatech@mmg.com.au