Millions of machines around the world crashed a few days ago, showing the dreaded “Blue Screen of Death” (BSOD), affecting banks, airports, hospitals, and many other businesses, all using the Windows OS and CrowdStrike’s Event Decetion and Response (EDR) software. What was going on? How can such a calamity happen in one single swoop?
First, foul play was suspected – a cyber security attack. But it turned out to be a bad update of CrowStrike’s “Falcon” software agent that caused all this mess. What is a BSOD anyway?
Gain Insider Knowledge
Code running on Windows can run in two primary modes – user mode and kernel mode. User mode is restricted in its access, which cannot harm the OS. This is the mode applications run with – such as Explorer, Word, Notepad, and any other application. Kernel mode, however, has (almost) unlimited power. But, as the American hero movies like to say, “With great power comes great responsibility” – and this is where things can go wrong.
Kernel code is trusted, by definition, because it can do anything. The Windows kernel is the main part of kernel space, but 3rd party components may be installed in the kernel, called device drivers. Classic device drivers are required to manage hardware devices – connecting them to the rest of the OS. The fact that you can move your cursor with the mouse, see something on the screen, hear game sounds, etc., means there are device drivers dealing with the hardware correctly, some of which are not written by Microsoft.
If a driver misbehaves, such as causing an exception to occur, the system crashes with the infamous BSOD. This is not some kind of punishment, but a protection mechanism. If a driver misbehaves (or any other kernel component), it’s best to stop now, rather than letting the code continue execution which might cause more damage, like data corruption, and even prevent Windows from starting successfully.
3rd party drivers are the only entities that Microsoft has no full control over. Drivers must be properly signed, but that only guarantees that they have not been tampered with, but it does not guarantee quality.
Most Windows systems have some Anti-virus or EDR software protecting them. By default, you get Windows Defender, but there are more powerful EDRs out there, CrowdStrike’s Falcon being one of the leaders in the world.
The “incident” involved a bad update that caused a BSOD when Windows restarted. Restarting again did not help, as a BSOD was showing immediately. The only recourse is to boot in Safe Mode, where a minimal set of drivers is loaded, disable the problematic driver, and reboot again. Unfortunately, this has to be done manually on millions of machines.
The Windows kernel treats all kernel components in the same way, regardless of whether that component is from Microsoft or not. Any driver, no matter how insignificant, that causes an exception, will crash the system with a BSOD. Perhaps it would be wise to somehow designate drivers as “important” or “not that important” so they may be treated differently in case of failure. But that is not how Windows works.
This entire incident certainly raises questions and concerns – a single point of failure has shown itself in full force. Perhaps a different approach to handling kernel components should be considered. But that’s a topic for another post.