The global tech outage showed how we’re just one m
a new Republican vice presidential nominee,
the sitting president contracting Covid before dropping his reelection bid
didn’t leave you feeling sufficiently anxious about the fragility of the global order,
let’s not forget that a cybersecurity company
you’ve probably never heard of made a major opposite that showed how the internet could,
without warning, just kind of stop.
While you might not have known the name CrowdStrike before,
it’s unlikely you’ll forget it soon. With a single bug in a routine software update,
the company triggered what was likely the biggest computer outage in history
creating the kind of tech meltdown that its products are designed to prevent.
While CrowdStrike said the flawed update had been rolled back,
the problems it caused aren’t exactly the old "turn it off and turn it back on" solutions
most of us are accustomed to.
As my colleague Brian Fung reported,
the bug that put Windows computers into Blue Screen of Death mode is fixable.
But in many cases, it requires painstaking work by a human being.
Now might be a good moment to buy your IT staff some good coffee
and a bagel spread because each and every affected device
for some organizations, we’re talking thousands
will likely have to be assessed by an admin and rebooted into safe mode,
and then the offending file can be deleted by hand.
You can’t automate that, said Kevin Beaumont,
a security researcher and former Microsoft threat analyst, in a post on X.
So this is going to be incredibly painful for CrowdStrike customers.
And even if your business had nothing to do with CrowdStrike,
the outage still might have ruined your day.
Think of a cafe that uses third-party online reservation services,
contracts out its delivery orders and accepts credit
and debit cards through its point of sale,
which is connected to payment processor back-end systems.
You didn’t have to be a CrowdStrike customer to get screwed by the company’s mistake,
and that’s what made Friday’s outage so frustrating.
We’ve had scary outages before, and we will certainly have them again.
But the scale of the CrowdStrike outage is once again underscoring
just how interconnected the world has become through a network
almost none of us understands and which is largely self-regulating.
There are organizations that we’re heavily dependent upon
that we don’t even realize how dependent we are until they stop functioning,
said Stuart Madnick,
a professor of information technology at the MIT Sloan School of Management.
Microsoft estimated the CrowdStrike outage affected some 8.5 million Windows devices.
Airlines canceled 5,000 flights around the world Friday,
while delays persisted through the weekend and into Monday.
Hospitals and government services were throttled,
and in some areas 911 communications stopped working.
It’d be easy to put all the blame on CrowdStrike for its sloppy system update,
or the airlines for not building robust backup protocols,
or even Microsoft for dominating the personal computing market.
But IT experts told me there are broader systemic problems at play here.
The centralized nature of cybersecurity companies means
that we now have "a few big failure points," said Anil Khurana,
executive director of the Baratta Center for Global Business
at Georgetown’s McDonough Business School.
That by itself is not bad,
because proliferation actually makes diagnostics even more difficult."
But companies need "a better model of operational redundancy and back-ups,"
Khurana said.
Our tech platforms have a mix of legacy systems coupled with modern systems,
which means that the weakest link determines the overall system performance.
I call it a 'house of cards’ model."
Right now, there are safeguards in place,
but regulators around the world have been snoozing on cybersecurity risk management.
IT systems are truly critical infrastructure, Khurana said,
which suggests they "ought to go through the same kind of rigor,
testing and oversight that we see for the likes of Boeing or JPMorgan."
I asked Madnick whether the world should expect more mass outages.
"This was pretty bad as it is," he said. "Could it get worse?
The answer is yes, it could."
As onerous and time-consuming as the manual reboot of millions of devices is,
Friday’s outage was ultimately a one-off mistake by a company
that moved quickly to fix it.
A bad actor looking to do serious damage could use software to
make computers or other equipment blow up, catch fire,
burn — in which case, you don’t just reboot it, it’s destroyed."
OK, so there’s one nightmare scenario to make us all yearn to go live in a cave.
But before you start stockpiling canned goods,
Madnick has another way to look at our modern predicament.
"There are a lot of benefits that these technologies give us that really pay off,
99% of the time,"
he said. The most important thing is to prepare for that 1% of times when things go wrong.