Hacker News

Powered by HN Search API

CrowdStrike Update: Windows Bluescreen and Boot Loops

From https://old.reddit.com/r/crowdstrike/comments/1e6vmkf/bsod_error_in_latest_crowdstrike_update/
BLKNSLVR | 2024-07-19 | 4489

Comments:

sammy2255

2024-07-19
Yep happened to us too. Its global. And it just started happening.

scriptsmith

2024-07-19
It's crowdstrike: https://www.reddit.com/r/crowdstrike/comments/1e6vmkf/bsod_e...

> 7/18/24 10:20PT - Hello everyone - We have widespread reports of BSODs on windows hosts, occurring on multiple sensor versions. Investigating cause. TA will be published shortly. Pinned thread.

> SCOPE: EU-1, US-1, US-2 and US-GOV-1

> Edit 10:36PT - TA posted: https://supportportal.crowdstrike.com/s/article/Tech-Alert-W...

> Edit 11:27 PM PT:

> Workaround Steps:

> Boot Windows into Safe Mode or the Windows Recovery Environment

> Navigate to the C:\Windows\System32\drivers\CrowdStrike directory

> Locate the file matching “C-00000291*.sys”, and delete it.

> Boot the host normally.

jamesbfb

2024-07-19
Our company is in panic mode. 15 machines blue screened for no apparent reason and stuck in boot loop. I’m a gloating Linux user :)

Also in Australia

VargaLand

2024-07-19
According to Reddit It's hitting Croatia, Philippines, US, Germany, Mexico, India, Japan. SAP servers dropping like flies, that's Defence,Banks, Payroll all affected. Major Retail Chains like Big W down.

ajdlinux

2024-07-19

thrdbndndn

2024-07-19
I assume you have to install "CrowdStrike" yourself (i.e. not bundled with Windows by default)? I have no idea what it is before.

alams

2024-07-19
Anyone found any fixes, while Crowdstrike comes up with a fix?

michelevr

2024-07-19
Husband is a deputy in California. His department and many others here are down as well (including PDs, jails, ambulance companies, etc.)

aenis

2024-07-19
We have ~50 thousand laptops in reboot loop and ~1.5k servers as well. No resolution yet.

kswap

2024-07-19
Faced the same issue few minutes back after few loops of reboot my system is up

7am00dee

2024-07-19
A system restore helps. But obviously not when you’ve got an environment of ~500 or more clients

zcretu

2024-07-19
Quick fix that worked for us, In safe mode:

1.enter in drive C: 2.system 32 folder 3. Drivers 4. Rename crowdstrike folder to something else doesent matter what.

_kyran

2024-07-19
Was just using the energy vic website and thought I'd been rate limited when their API stopped working. Seems like it could be this.

BLKNSLVR

2024-07-19
In terms of analysing risk factors to minimise something like this happening again, what are the factors at play here?

A Crowdstrike update being able to blue-screen Windows Desktops and Servers.

Whilst Crowdstrike are going to cop a potentially existential-threatening amount of blame, an application shouldn't be able to do this kind of damage to an operating system. This makes me think that, maybe, Crowdstrike were unlucky enough to have accidentally discovered a bug that affects multiple versions of Windows (ie. it's a Windows bug, maybe more-so than it is a Crowdstrike bug).

There also seems to have been a ball-dropped in regards to auto-updating all the things. Yes, gotta keep your infrastructure up to date to prevent security incidents, but is this done in test environments before it's put into production?

Un-audited dependence on an increasingly long chain of third-parties.

All the answers are difficult, time consuming, and therefore expensive, and are only useful in times like now. And if everyone else is down, then there's safety in the crowd. Just point at "them too", and stay the path. This isn't a profitable differentiation. But it should be! (raised fists towards the sky).

tru3_power

2024-07-19
Crazy- wasn’t Azure having an outage earlier today? Is this related?

cpf_au

2024-07-19
This is effecting our company. A colleague visited her local supermarket (Woolworths) and all the self-service checkouts were effected.

vinura

2024-07-19
Anyone has a good news? or still it's on the BSOD loop?

ali2724

2024-07-19
Harvey Norman’s system is down

l0g4n_me

2024-07-19
c:\system32\drivers\csagent.sys renaming this file on server with safe mode boot fixes the issue but disables agent

jmcgough

2024-07-19
My entire emergency department got knocked offline by this. Really scary when you have ambulances coming in and are trying to stabilize a heart attack.

Update: 911 is down in Oregon too, no more ambulances at least.

prmoustache

2024-07-19
Aren't they doing canary release? Seems weird this would not have been detected on a smaller scale before with a good release process.

sakopov

2024-07-19
If the workstations are stuck in a boot loop, how will they be able to push a hotfix out?

ernth_16

2024-07-19
Uninstall the Current Version:

Open the Command Prompt as an administrator. Run the following command to uninstall the current version: shell

sc delete csagent

rubi1945

2024-07-19

dalmo3

2024-07-19
Looks like this also took down half of New Zealand's economy.

https://www.nzherald.co.nz/nz/bank-problems-reports-bnz-asb-...

jafru

2024-07-19
"F8 + last known good configuration" worked for us

gamma032

2024-07-19
The biggest mistake here is running a global update on a Friday. Disrespect to every sysadmin worldwide.

poshlandpro

2024-07-19
Official CrowdStrike workarround: 1. Boot Windows into Safe Mode or the Windows Recovery Environment 2. Navigate to the C:\Windows\System32\drivers\CrowdStrike directory 3. Locate the file matching “C-00000291*.sys”, and delete it. 4. Boot the host normally.

Cub3

2024-07-19
Workaround + update (within a authenticated portal)

https://www.reddit.com/r/crowdstrike/comments/1e6vmkf/bsod_e...

thallaran

2024-07-19
all Delta flights grounded

thallaran

2024-07-19
All flights grounded. World wide airline systems outage.

hasindh09

2024-07-19
HP laptops are booting and its looping

DELL laptops are observed, after blue dump . Server is up and running fine

Temporary workaround leads compliance issue.

From India

BLKNSLVR

2024-07-19
Flippant commentary:

Thank fuck Netflix runs on Linux. I just hope the full chain from my TV to Netflix is immune...

jkells

2024-07-19
I don't understand why this outage and the Azure outage earlier don't make it to the front page.

I'm getting more up to date technical details from the regular media.

This outage looks to be huge.

larrymcp

2024-07-19

protocolture

2024-07-19
Dont see how anyone is getting out of this without applying the workaround or reimaging their whole fleet.

kcd83

2024-07-19
Fix: Boot Windows into Safe Mode or the Windows Recovery Environment Navigate to the C:\Windows\System32\drivers\CrowdStrike directory Locate the file matching “C-00000291*.sys”, and delete it. Boot the host normally

Did it work for you?

LeoPanthera

2024-07-19
BBC live coverage: https://www.bbc.com/news/live/cnk4jdwp49et

Looks like this is a big deal.

deutschlerner

2024-07-19
https://www.bbc.co.uk/news/live/cnk4jdwp49et, seems to be quite a wide impact from this, e.g. Sky News in the UK is off air!

Melatonic

2024-07-19
US Based and got a NANOG alert email just in time. At least half our windows servers down.

I went into our crowdstrike policies and disabled auto update of the sensor. Hopefully this means it doesnt hit everything. Double check your policies!!!

Edit:

Crowdstrike has an article out on the manual fix:

https://supportportal.crowdstrike.com/s/article/Tech-Alert-W...

markus92

2024-07-19
All major US airlines have put in a total ground stop. No flights can take off anymore.

tloriato

2024-07-19
GLOBAL OUTAGES

- Major banks, media and airlines affected by major IT outage

- Significant disruption to some Microsoft services

- 911 services disrupted in several US states

- Services at London Stock Exchange disrupted

- Sky News is off air

- Flights in Berlin grounded

- Reports the issue relates to problem at global cybersecurity firm Crowdstrike

aenis

2024-07-19
We are a major CS client, with 50k windows-based endpoints or so. All down.

There exists a workaround but CS does not make it clear whether this means running without protection or not. (The workaround does get the windows boxes unstuck from the boot loop, but they do appear offline in the CS host management console - which of course may have many reasons).

martypitt

2024-07-19
Discussed more thoroughly here: https://news.ycombinator.com/item?id=41002195 (Not sure why that's not on the frontpage)

itsgrimetime

2024-07-19
I just landed at SeaTac an hour ago and the rideshare/app pickup was absolutely nutso. Like thousands of people standing around waiting for taxis and Ubers. The one person I asked what was going on said that the computer systems at all the regional hotels are down (not sure how that makes more people need cabs). Wonder if it’s from this

ThePhysicist

2024-07-19
If it's true that a bad patch was the reason for this I assume someone, or multiple people, will have a really bad day today. Makes me wonder what kind of testing they have in place for patches like this, normally I wouldn't expect something to go out immediately to all clients but rather a gradual rollout. But who knows, Microsoft keeps their master keys on a USB stick while selling cloud HSM so maybe Crowdstrike just yolos their critical software updates as well while selling security software to the world.

joeldo

2024-07-19
The impact of this will be profound!

Obviously bugs are inevitable, but why this wasn't progressively rolled out is beyond me.

raverbashing

2024-07-19
Are people counting this on the Windows TCO?

mrdeveloper16

2024-07-19
Go to advance repair option then advanced open cmd. Go to windows/system32/drivers/crowdstrike. Then list all the file and delete file name having 291 at the end using cmd "del filenameendingwith291"

RedShift1

2024-07-19
This is why you don't make changes on a Friday. Lots of weekends absolutely ruined now.

YoboDev

2024-07-19
All US flights are grounded too. The people I was traveling with cant check into hotels

lpcvoid

2024-07-19
Maybe the world can finally reconsider their use of software products that cater to security theater. And the politics in companies which lead to things like this being introduced ("nobody gets fired for buying IBM").

Edit: took out a bit of snark.

crazytony

2024-07-19
Have spent all my afternoon and all evening on a bridge trying to support flailing systems. Was supposed to be on a plane in 5 hours to start my vacation. Guaranteed it's not gonna happen.

With hearing 911 and other safety critical systems going down, I hope that the worst that comes out of this is a couple delayed flights and a couple missed bank payments.

tamimio

2024-07-19
OTA update went wrong? How can an update go live without proper testing for the millions of live connected endpoints?

sneak

2024-07-19
Maybe one day we will stop giving RCE to so many vendors via auto update.

Havoc

2024-07-19
This is why I subscribe to /r/sysadmin despite not being one ... like a canary in the coalmine for stuff like this

kitd

2024-07-19
A reminder why switching off auto-update is a thing.

bb123

2024-07-19
There appears to be a workaround but my question is how are they going to get all of these endpoints out of a BSOD loop?

chad1n

2024-07-19
The antivirus did its job, now you can't get viruses. Jokes aside, I've checked their website and it was full of AI buzzwords so I guess that happens when you focus on nonsense instead of what your customers actually need (I know that all antiviruses have a machine learning component, but usually you don't advertise it as some sort of AI to get better stocks).

chall84321

2024-07-19
i work liquor distribution in the united states and our entire company is out across 44 states, allegedly due to this “crowd strike outage”

nmcveity

2024-07-19
This gem from the ABC news coverage has my mind 100% boggled:

"711 has been affected by the outage … went in to buy a sandwich and a coffee and they couldn’t even open the till. People who had filled up their cars were getting stuck in the shop because they couldn’t pay."

Can't even take CASH payment without the computer, what a world!

bamboozled

2024-07-19
The world just became a slightly better place.

rwmj

2024-07-19
Can someone explain what Crowdstrike actually is? Reading Wikipedia it seems to be some sort of anti-virus software?

chgs

2024-07-19
Is this just a massive mistake or is it deliberate and cover for something else

tkubacki

2024-07-19
Industry should move to Linux on desktop - we should not rely on single vendor

techie128

2024-07-19
This is good and bad. This showcases the importance of CrowdStrike. This is a short term blip but in the long run they will learn from this and prevent this type of an issue in the future. On the flip side, they have a huge target on their back for the U.S. government to try and control them. They are also a huge target for malicious actors since they can clearly see that CS is part of critical US and western infra. Taking them down can cripple essential services.

On a related note, this also demonstrates the danger of centralized cloud services. I wish there were more players in this space and the governments would try their very best to prevent consolidation in this space. Alternatively, I really wish the CS did not have this centralized architecture that allows for such failure modes. Software industry should learn from great & age old engineering design principles. For example, a large ships have watertight doors that prevent compartments from flooding in case of a breach. It appears that CS didn't think the current scenario was not possible therefore didn't invest in anything meaningful to prevent this nightmare scenario.

ranjanprj

2024-07-19
High time to stop using Microsoft Windows/Azure which is full of security tech debt, that you need all these tools which themselves brick the computer

oldmanyells68

2024-07-19
Sounds like a good time to buy Red Hat stock

120bits

2024-07-19
Funny how I got rejected today from crowdstrike because I couldn’t code a hard leetcode problem under 40mins. I guess leetcode isn’t true software engineering after all.

qalmakka

2024-07-19
When will people learn?

1. Stop putting mission critical systems on Windows, it's not the reliable OS it once was since MS has cut off most of its QA

2. AV solutions are unnecessary if you properly harden your system, AV was needed pre-Vista because Windows was literally running everything as Administrator. AV was never a necessity on UNIX, whatever MS bundles in is usually enough

3. Do not install third party software that runs in kernel mode. This is just a recipe for disaster, no matter how much auditing is done beforehand by the OEM. Linux has taught multiple times that drivers should be developed and included with the OS. Shipping random binaries that rely on a stable ABI may work for printers, not for mission critical software.

jmcgough

2024-07-19
Took down our entire emergency department as we were treating a heart attack. 911 down for our state too. Nowhere for people to be diverted to because the other nearby hospitals are down. Hard to imagine how many millions of not billions of dollars this one bad update caused.

tamimio

2024-07-19
I just skimmed through the news. A lot of airports, hospitals, and even governments are down! It's ironic how people are putting their eggs in one basket, trying to avoid downtime caused by malware by relying on a company that put their system down. A lot of lessons will be learned after this for sure.

pageandrew

2024-07-19
I don’t know Windows systems. I’ve read it’s causing Blue Screen of Death.

I take that to mean that systems can’t even boot. Right?

Can this be fixed over the air?

surfingdino

2024-07-19
Back in the 1990s when Microsoft wanted to enter the embedded systems market there was a saying "You don't want Windows controlling your car's breaks". We now let them control a huge part of our lives. Should we let them add AI to the already unpalatable cocktail?

novaRom

2024-07-19
Chances if Microsoft or Crowdstrike will be held liable for financial losses caused by this outage?

iamkneel

2024-07-19
Can't wait for the Kevin Fang video about this.

techbrovanguard

2024-07-19
i've seen photos of the bsod from an affected machine, the error code is `PAGE_FAULT_IN_NONPAGED_AREA`. here's some helpful takeaways from this incident:

1) mistakes in kernel-level drivers can and will crash the entire os

2) do not write kernel-level drivers

3) do not write kernel-level drivers

4) do not write kernel-level drivers

5) if you really need a kernel-level driver, do not write it in a memory unsafe language

radiator

2024-07-19
So Crowdstrike protects your computers from cyber attacks. But who is going to protect you from Crowdstrike?

choeger

2024-07-19
It's eye-opening how bad our crucial IT infra is nowadays. Running in-kernel third-party tools (AV) on critical infrastructure on Windows? Central banks? Control towers? Seriously? We should fire everyone involved and start IT from scratch. This level of negligence cannot be fixed.

ryandv

2024-07-19
Absolutely shameful display of how the cure can be worse than the disease. It's nonsense snake oil and security theater such as this that throws the cyber"security" industry into disrepute. One may as well have just installed McAfee Anti Virus.

cromka

2024-07-19
This is a sample of what Y2K would look like if not for the countermeasures.

Eji1700

2024-07-19
Welp this fucked my night. A toast to the rest of you who are waaaaaay more screwed than me

gloosx

2024-07-19
Thanks god all the critical infrastructure in my country is still on MS DOS!

woodylondon

2024-07-19
CyberStrike offers a temporary solution for crashed systems Cyberstike has given users a potential way to fix their systems.

Boot Windows into Safe Mode or the Windows Recovery Environment (you can do that by holding down the F8 key before the Windows logo flashes on screen) Navigate to the C:WindowsSystem32driversCrowdstrike directory Locate the file matching “C-00000291.sys” file, right click and rename it to “C-00000291.renamed” Boot the host normally.

nickdothutton

2024-07-19
The only surprising thing is that this doesn't happen every month. Nobody understands their runtime environment. Most IT org's long ago "surrendered" control and understanding of it, and now even the "management" of it (I use the term loosely) is outsourced.

jkells

2024-07-19
Maybe they do perform canary deployments and Australia was the canary?

Certainly feels like it's disproportionately affecting us down under.

rochak

2024-07-19
I haven’t seen a simultaneous outage as big as this in my entire life. I’m just hoping this gets enterprises to move off of Windows.

lachlanj

2024-07-19
I'm confused, is this an issue with Windows or with Crowdstrike software installed on Windows?

dboreham

2024-07-19
For a while I've joked with family and colleagues that software is so shitty on a widespread basis these days that it won't be long before something breaks so badly that the planet stops working. Looks like it happened.

drooopy

2024-07-19
I've picked the perfect day to return from vacation. Being greeted by thousands of users being mad at you and people asking for your head on a plate makes me reconsider my career choice. Here's to 12 hours of task force meetings...

adzm

2024-07-19
Workaround fixed it for me, thankfully I had access to the bitlocker recovery keys. This will be a bad day for IT people worldwide.

cromka

2024-07-19
BBC reports: “ The cause is not known - but Microsoft says it's taking mitigation action”.

Most of the media I found say it’s because “cloud infrastructure”. I am yet to see any major source actually factually report this is caused by a bad patch in Crowdstrike software installed on top of Windows.

Gets to show how little competency there is in journalism nowadays. And begs a question how often they misinterpret and misreport things in other fields?

gedw99

2024-07-19
Ironically SolarWinds court case happened yesterday. SEC won. SolarWindows was fraudulent to say their software way “secure”. They should rename a side channel attack a “Tom and Jerry”, because its getting like a game of Cat and Mouse

gedw99

2024-07-19
Rock me Amadeus.

At least the central flight booking system is up I guess. Google brought it years ago and it's a mainframe.

Hence why google flights is so tapped in :)

reegnz

2024-07-19
Cybersecurity company secures computers worldwide by not allowing them to be turned on. - not the onion

mro_name

2024-07-19
What do card houses do for a living?

vlugovsky

2024-07-19
Crowdstrike is a perfect name for a company that could cause a worldwide outage.

jiehong

2024-07-19
IMO, having a mix of servers would help in mitigating issues like that.

Like run stuff on Linux, windows and freebsd servers, so that you have OS redundancy should an issue affect one in particular (kernel or app).

Just like you want more than a single server handling your traffic, you’d want 2 different base for those servers to avoid impacting them both with an update.

tanelpoder

2024-07-19
I'm curious why this post is still not the 1st (but 2nd after an ebook reader announcement), despite all the upvotes.

silamay

2024-07-19
I’m trying to refresh to get the latest update …

bgnn

2024-07-19
This is a manifestation of almost everything wrong about software development and marketing practices.

I work in hardware development and such a failure is almost impossible to imagine. It has to work, always. It puzzles me why this isn't the casebfor software. My SWE colleagues often get mad at us HW guys because we want to see their test coverage for the firmware/drivers etc.. The focus is having something which compiles and pushing the code to production as fast as possible and then regressing in production. Most of HW problems are a result of this. I found it's often better to go over the firmware myself and read line by line to understand what the code does. It saves so much time from endless debugging sessions later. It pisses of firmware guys, but hey, you have to break some eggs to make an omelette.

silamay

2024-07-19
I’m trying to refresh to get latest update… let’s keep posting

mehh

2024-07-19
Don’t be fooled, it’s Skynet, head to the bunkers!

openrisk

2024-07-19
The details (the particular companies / systems etc) of this global incident don't really matter.

When the entire society and economy are being digitized AND that digitisation is controlled and passes through a handful of choke points its an invitation to major disaster.

It is risk management 101, never put all your digital eggs in one (or even a few) baskets.

The love affair with oligopoly, cornered markets and power concentration (which creates abnormal returns for a select few) is priming the rest of us for major disasters.

As a rule of thumb there should be at least ten alternatives in any diversified set of critical infrastructure service providers, all of them instantly replaceable / forced to provide interoperability...

Some truths will hit you in the face again and again until you acknowledge the nature of reality.

ssss11

2024-07-19
I believe that today they struck the entire crowd… (or should that be cloud)

chucke1992

2024-07-19
I guess Microsoft can now offer some similar to a Crowstrike solution for Azure users.

teeheelol

2024-07-19
Throwaway account...

CrowdStrike in this context is a NT kernel loadable module (a .sys file) which does syscall level interception and logs then to a separate process on the machine. It can also STOP syscalls from working if they are trying to connect out to other nodes and accessing files they shouldn't be (using some drunk ass heuristics).

What happened here was they pushed a new kernel driver out to every client without authorization to fix an issue with slowness and latency that was in the previous Falcon sensor product. They have a staging system which is supposed to give clients control over this but they pissed over everyone's staging and rules and just pushed this to production.

This has taken us out and we have 30 people currently doing recovery and DR. Most of our nodes are boot looping with blue screens which in the cloud is not something you can just hit F8 and remove the driver. We have to literally take each node down, attach the disk to a working node, delete the .sys file and bring it up. Either that or bring up a new node entirely from a snapshot.

This is fine but EC2 is rammed with people doing this now so it's taking forever. Storage latency is through the roof.

I fought for months to keep this shit out of production because of this reason. I am now busy but vindicated.

Edit: to all the people moaning about windows, we've had no problems with Windows. This is not a windows issue. This is a third party security vendor shitting in the kernel.

jpl56

2024-07-19
Obligatory XKCD : dependency [0]

[0] https://xkcd.com/2347/

esskay

2024-07-19
Sitting in our work slack feeling pretty smug that I forced the migration to only Linux servers and Linux or macOS work computers now.

pulkitsh1234

2024-07-19

jpl56

2024-07-19
Was watching TV this morning in France (TF1, 8:00 CET), the weather forecast map system was out. The journalist just gave us the information as if he was on the radio, telling he was sorry for the system to be failing.

shubhamjain

2024-07-19
Naive question, if it’s a blue screen of death with a boot loop, how are they going to restore things? Don’t tell me the answer is going to every system manually.

mro_name

2024-07-19
heavy clouds this morning.

Maybe time to reconsider how solid a ground clouds are.

berkaydumaner

2024-07-19
Hi guys, what is the KB code of this update?

uitgewis

2024-07-19

r2vcap

2024-07-19
This is why I don't use Windows and refuse any SWE jobs that require Windows machines. Additionally, I believe kernel-level game anti-cheat software should be banned.

clydethefrog

2024-07-19
10 hours ago someone posted a critical post about CrowdStrike on the "wallstreetbets" subreddit.

https://old.reddit.com/r/wallstreetbets/comments/1e6ms9z/cro...

elorant

2024-07-19
How can an antivirus update affect Azure's servers?

dboreham

2024-07-19
TBF although I worried about this possibility the first time the IT dude wandered into my office in 1989 holding a floppy he said he wanted to put into all the PCs we had (we had no PCs), it has actually taken a very long time for the shit to hit the fan.

nullify88

2024-07-19
The sheer coverage of this outage across multiple businesses and industries, the impact must be greater than some of the malicious cyber attacks from ransomware, worms etc.

fudged71

2024-07-19
This title doesn’t nearly describe the breadth and severity of the problem…

solidninja

2024-07-19
Ah the "move fast and break things" philosophy gets a demonstration.

endstart

2024-07-19
The workaround suggests removing a file with .sys extension. What does the file do normally? If removed, what happens to the state of security on that system?

codeulike

2024-07-19
Microsoft are going to be pissed that this is widely being discussed as a Microsoft outage. Do AV vendors like Crowdstrike need a license or something from Microsoft to push these kernal driver based things? Or is it just like anyone can make one?

scopeh

2024-07-19
How many people still believe the "cloud" was worth it? Maybe we should go back to the days of buying software and running it ourselves with our own infrastructure.

I know, I'm dreaming.

ogurechny

2024-07-19
They all should have used some expensive corporate-and-government-level product that promises protection against exactly that kind of large scale attack on infrastructure.

fx1994

2024-07-19
That piece of "AV software" slowed down my brand new corporate i7 Lenovo to shit so I switched to M2 Pro. Best decision ever.

WatchDog

2024-07-19
This was apparently caused by a faulty "channel file"[0], which is presumably some kind of configuration database that the software uses to identify malware.

So there wasn't any new kernel driver deployed, the existing kernel driver just doesn't fail gracefully.

[0]: https://x.com/brody_n77/status/1814185935476863321

roschdal

2024-07-19
USA can no longer be trusted to supply import technology for the world.

pjmlp

2024-07-19
Yet another good example why liability in software should already be a common thing.

patates

2024-07-19
This company has post-apocalyptic style photos to make you panic-buy their solution.

https://ibb.co/Bc6n527

"62 minutes could bring your business down"

I guess they could bring all the businesses down much quicker.

edit: link https://www.crowdstrike.com/en-us/#teaser-79minutes-adversar...

Jyaif

2024-07-19
My employers pays Crowdstrike to double my build times. Quite astounding really.

guenthert

2024-07-19
afaiu google (and I presume other operators of large number of computers) deploy updates to their software first to a small set of nodes and only if after a given time the update has been deemed successful, continue to update an increasingly larger set til complete.

Isn't this done as well with automatic updates of end user software or embedded systems and if not, why not?

ivxvm

2024-07-19
Things like hospitals, airlines, 911, should have multiple systems with different software stacks and independent backends running in-parallel, so that when one infra goes down they can switch to another.

lucasRW

2024-07-19
Good day for OSINTers, APTs, redteamers, to find out who uses Crowdstrike on their endpoints.

gnuser

2024-07-19
dumb techbro c-suites: what, why would you have an issue with a proprietary closed source app that frequently self updates and sends tons of data to a third party while essentially being a backdoor? We said we wanted security and this has Security(tm) all over the literature! Look we even have dashboards for the gui-ninjas like the security team!

AndyMcConachie

2024-07-19
Finally the crowd has struck!

ba-dum ching!

commercialnix

2024-07-19
All my customers endpoints are on Linux based endpoints. Because our users' Windows apps run in vdi with disposable instances based off snapshots and highly restrictive networking on the Linux endpoints, none of our users are effected.

Running Windows on bare-metal was always obviously very stupid. The consequences of such stupidity are just being felt now.

dmarble

2024-07-19
Plot twist: The * in C-00000291*.sys is "-block-ultron"

Premature deployment of Crowdstrike AGI disaster response plan.

KingOfCoders

2024-07-19
All down had no backup plan.

pharos92

2024-07-19
1. This is why kernel modules are a bad idea 2. This is why centralism is a bad idea 3. This is why sacrificing stability for security is a bad idea 4. Security still needs to factor in security of supply - not just data safety

2-3-7-43-1807

2024-07-19
i've never heard of crowdstrike ever but it (co-)runs half of the essential IT infrastructure, worldwide?

(also, great choice of name i must say)

badrabbit

2024-07-19
Their stock price will suffer but they can waive license fees for a year or so for every endpoint affected (~$50).

They better pin this on a rogue employee, but even then, force pushing updates shouldn't be in their capability at all! They must guarantee removal of that capability.

Lawsuits should be interesting. They offer(ed?) $1 mil breach insurance to their customers, so if they were to pay only that much per customer this might be compensation north of $10B. But to be honest, wouldn't surprise me if they can pay up without going bankrupt.

The sad situation is, as twitter people were pointing out, IT teams will use this to push back against more agents for a long time to come. But in reality, these agents are very important.

Crowdstrike Falcon alone is probably the single biggest security improvement any company can make and there is hardly any competition. This could have been any security vendor, the impact is so widespread because of how widely used they are, but there is a reason why they are so widely used to begin with.

Oh and just fyi, the mitigation won't leave you unprotected, when you boot normal, the userspace exe's will replace it with a fixed version.

_kb

2024-07-19
Assuming this event itself isn't malicious, what an excellent POC for something that is. I sure hope every org out there with this level of market reach has good security in place. It's certainly going to be getting some probing after this.

nyx_land

2024-07-19
worded badly whatever

dark-star

2024-07-19
That's what you get for letting a company install a root kit on your servers and desktops ;-)

I mean, don't they do canary updates on CrowdStrike too? Every Windows admin has done this for the last 5+ years, test Windows updates on a small number of systems to see if they are stable. Why not do the same for 3rd party software?

kingkongjaffa

2024-07-19
It's kind of surprising so much infra was using windows servers or windows cloud VMs for these things. I assumed these systems would all be Linux VMS in Azure/AWS/GCP at this point.

on https://azure.status.microsoft/en-gb/status the message is currently:

> We have been made aware of an issue impacting Virtual Machines running Windows Client and Windows Server, running the CrowdStrike Falcon agent, which may encounter a bug check (BSOD) and get stuck in a restarting state.

patates

2024-07-19
This company has post-apocalyptic style photos to make you panic-buy their solution.

https://ibb.co/Bc6n527

"62 minutes could bring your business down"

I guess they could bring all the businesses down much quicker.

https://www.crowdstrike.com/en-us/#teaser-79minutes-adversar...

(Repeating my comment because other story is duped)

Thorentis

2024-07-19
Perversely, this may make many companies no longer invest in this type of cyber security software. Which may lead to a whole host of other problems...

mnau

2024-07-19
Yeah, these events will be fun once new product liability directive (that includes sw) comes into force.

nynyny7

2024-07-19
Crowdstrike marketing slogan on their website: "A radical new approach proven to stop breaches". I'll give them that: Putting all Windows computers within a company into an endless BSOD loop is a very radical approach to stop breaches. :)

tm-guimaraes

2024-07-19
How does such a huge company do “full deploys” like this? At this number of endpoints, only a few % should have been updated (and faced the problems) before a full rolout

This is not a small startup with some SaaS, these guys are in most computers of too many huge companies. Not rolling out the updates to everyone at the same time seems just too obvious

dorkwood

2024-07-19
Is this something that could be solved by building AI code review directly into git clients? I can't help thinking Claude 3 would have caught this.

tehlike

2024-07-19
Rolling out updates in an A/B test slowly is the only way to reduce the occurrence of such issues _significantly_. There's no other way, literally, nothing.

alibarber

2024-07-19
I have been told 'not to worry' because it isn't a cyber attack. Yet the outcomes we are seeing feel a lot like the doomsday predictions of what a cyberattack would do. It is almost as if we are experiencing the cybersecurity/warfare equivalent of 'friendly fire'.

elorant

2024-07-19
For years now antivirus solutions have ridiculous amount of control over the OS. I accidentally installed an adware antivirus the other day that was bundled-up with a third party software, and I had to boot to Linux to manage to completely remove the damn thing from Windows. The uninstall option left a process running that couldn’t be forcefully killed.

Microsoft needs to take control and forbid anyone and anything from running software with that kind of behavior.

simonjgreen

2024-07-19
CrowdStrike should have learned the lesson from the more seasoned players in the industry to slow roll their updates and observe.

echoangle

2024-07-19
Why would they roll out this update globally to all users immediately? Isn’t it normal to do gradual rollouts? Or did this update contain some critical security fix they wanted everyone to have as fast as possible?

madisp

2024-07-19
if I'm reading this correctly the short interest for the stock doubled over June? :)

https://www.nasdaq.com/market-activity/stocks/crwd/short-int...

nicholasbraker

2024-07-19
This article seems more relevant than ever and was posted a few days ago: https://ea.rna.nl/2024/07/12/no-it-really-no-i-t/

pxc

2024-07-19
Vendors of tools like this drive the cybersecurity industry discourse, so 'defense in depth' often practically sorta means 'add more software that does more things'.

But maybe this kind of thing can actually impart the lesson that loading your OS up with always-on, internet-connected agents that include kernel components in order to instrument every little thing any program does on the system is, uh, kinda risky.

But maybe not. I wonder if we'll just see companies flock to alternative vendors of the exact same type of product.

zmmmmm

2024-07-19
So CrowdStrike is deployed as third party software into the critical path of mission critical systems and then left to update itself. It's easy to blame CrowdStrike but that seems too easy on both the orgs that do this but also the upstream forces that compel them to do it.

My org which does mission critical healthcare just deployed ZScaler on every computer which is now in the critical path of every computer starting up and then in the critical path of every network connection the computer makes. The risk of ZScaler being a central point of failure is not considered. But - the risk of failing the compliance checkbox it satisfies is paramount.

All over the place I'm seeing checkbox compliance being prioritised above actual real risks from how the compliance is implemented. Orgs are doing this because they are more scared of failing an audit than they are of the consequences failure of the underlying systems the audits are supposed to be protecting. So we need to hold regulatory bodies accountable as well - when they frame regulation such that organisations are cornered into this they get to be part of the culpability here too.

YoboDev

2024-07-19
My team and I have begun to refer to this issue as CrowdStroke

naizarak

2024-07-19
this is really microsoft's fault for handing out kernel access to random 3rd parties, none of which are doing anything special that microsoft couldn't implement themselves (AV, anti-cheat, security)

dschuetz

2024-07-19
I wonder who exactly messed up the update, microsoft or crowdstrike. Usually, there is pre-rollout update testing AND some companies use N-1 version staging for critical/production systems. For me it feels much more complex a failure than just "it's crowdstrike's fault". Everybody involved must have done something wrong.

runningmike

2024-07-19
Guess we will never read the real facts. Truth is RMS was right. Again. Closed source security software is too often malware by design. We need open solutions we can truly trust.

AndrewDucker

2024-07-19
This is, of course, why they should be doing phased rollouts. 1% of their customers, then 10%, then all the rest.

Dentrax

2024-07-19
I'm just curious, don't they have something like "gradual rollout" to update their app? They just bulk-update simultaneously across entire agents? No way. Something is a bit off for me. But there are good lessons to learn for sure.

felix_kirkine

2024-07-19
Just gonna leave this here: https://news.ycombinator.com/item?id=32548671

grahar64

2024-07-19
I wonder what the rollout procedure is for CrowdStrike. I put $100 down that this was a minor update they decided was so minimal it didn't need extensive testing.

So many places use the "emergency break glass rollout procedure" on every deploy because it doesn't require all the hassle

vinay_ys

2024-07-19
If you are IT team for a large impactful organization, you have to control updates to your organization's fleet. You cannot let vendors push updates directly. You have to stage those updates and test them and then do a gradual rollout to your whole organization.

Plus, for your critical communication systems, you must have a disaster recovery plan that actually helps you recover quickly in minutes, not hours or days. And you have to exercise this plan regularly.

If you are crowd strike, shame on you for not testing your product better. You failed to meet a very low bar. You just shipped a 100% reproducible widely impactful bug. Your customers must leave you for a more diligent vendor.

And I really hope the leadership teams in every software engineering organization learn a valuable lesson from this – listen to that lone senior engineer in your leadership team who pushes for better craft and operational rigor in your engineering culture; take it seriously - it has real business impact.

latexr

2024-07-19
There’s already a Wikipedia page on the outage.

https://en.wikipedia.org/wiki/July_2024_global_cyber_outages

agilob

2024-07-19
Do we have any estimates how many machines are affected?

bidikburger

2024-07-19
how can billion dollar company push update before testing?

crypt1d

2024-07-19
This never would have happened if all these orgs used a blockchain.

/sarcasm

/but is it really?

hellajack3d

2024-07-19
I guess this article might need some updating soon:

https://www.crowdstrike.com/resources/reports/total-economic...

w4rh4wk5

2024-07-19
I love the name! Really tells you what's going on ^^

ghoshbishakh

2024-07-19
No worries for us. https://pinggy.io/ is working like a charm :)

nvarsj

2024-07-19
Was involved in a "security mandated" mandatory rollout of Crowdstrike at my prior company.

This software was utter shit, and broke stuff all over the place. And installs itself as basically malware into critical paths everywhere. We objected to ever using it as a SPOF, but was overruled.

So yeah, not remotely surprised this happened.

Any kind of middleware/dynamic agent is highly suspect in my experience and to be avoided.

mrinfinitiesx

2024-07-19
Half of the world's computers are down. The biggest tech failure of our time. Airports. Banks. NYSE. 298 of the fortune 500 companies. RIP.

mads_quist

2024-07-19
I'm actually very fond of "fail fast" and "no blame" culture, but someone needs to get fired for this!

mrkramer

2024-07-19
Windows breaking computers since 1985.

RadixDLT

2024-07-19
does Russia has something to do with this?

bsodfriday

2024-07-19

birracerveza

2024-07-19
Feels like what people imagined the millennium bug would have been like, just short of PCs catching on fire.

simonjgreen

2024-07-19
CrowdStrike have finally posted publicly on it: https://www.crowdstrike.com/blog/statement-on-windows-sensor...

DebtDeflation

2024-07-19
This is pretty wild. I woke up to a news alert on my phone stating a "global IT outage" took down banks, airlines (who were calling for a global ground stop for all flights), hospitals, emergency services, etc. Expected it to be some sort of Tier 1 Network issue. Nope, a failed update for some third party Windows security app.

xorcist

2024-07-19
We often read about how organizations are so bad because they don't spend enough on security. That slope is particularly slippy.

Crowdstrike is very expensive.

piva00

2024-07-19
It's bizarre reading all the headlines about companies offline, flights canceled, banks not working because of a piece of antivirus software in 2024.

Mostly because I lived through Y2K and every fear about Y2K just materialised but because of Crowdstrike instead.

I can't imagine the amount of wasted work this will create, not only the lost of operations across many industries but recovery will be absolute hell with Bitlocker. How many corporate users have access to their encryption keys? And when stored centrally, how many of the servers have Crowdstrike running and just got stuck in a boot loop now?

I don't envy the next days/weeks for Windows IT admins of the world...

sidmkp96

2024-07-19
The thing that amazes me is how they've rolled out such a buggy change at such a scale. I would assume that for such critical systems, there would be a gradual rollout policy, so that not everything goes down at once.

roca

2024-07-19
What I'm curious about: other than checkbox compliance, how does Crowdstrike convince companies to buy their product? Do they present evidence that their product is effective at protecting customers? Because certainly Crowdstrike customers still get hacked.

DebtDeflation

2024-07-19
At one point overnight airlines were calling for an "international ground stop for all flights globally". Planes in the air were unable to get clearance to land or divert. I don't believe such a thing has ever happened before except in the immediate aftermath of 9/11.

FullMetalBitch

2024-07-19
So what are going to be the consequences of this? In my country some healthcare institutions and emergency systems are working.

artk42

2024-07-19
I guess all the blamed EuroCommission will again have to do their job to bring anti-oligo/monopoly regulations, which everyone will hate but still slightly work.

Architecting technical systems is MUCH WAY easier than architecting social-economical systems. I hope one day all those tech-savvy web3 wannabe revolutionaries will start to do the real job a designing socially working systems, not only technically barely working cryptographically strong hamster-tapping scams

Renaud

2024-07-19
When you see the size if the impact across the world, the number of people who will die because hospital, emergency and logistics systems are down…

You don’t need conventional war any more. State actors can just focus on targeting widely deployed “security systems” that will bring down whole economies and bring as much death and financial damage as a missile, while denying any involvement…

crooked-v

2024-07-19
I was going to buy some put options against CRWD with spare pocket money, but it turns out that the service I have my investment money is in broken right now. I wonder if that's because of Crowdstrike.

alkhimey

2024-07-19
An update to the internal database. It still did not sunk to developers that data has equivalent risk as code. A400 crashed because of an XML file update. I have witnessed my share of critical bugs caused by "innocent? updates to "data" which were treated less seriously because of this. Management and devs alike should change their conception about this.

selimnairb

2024-07-19
How long before companies start consciously de-risking by replacing general-purpose systems like Windows with newer systems with smaller attack surfaces? Why does an airline need to use Windows at all for operations? From what I’ve seen, their backend systems are still running on mainframes. The terminals are accessed on PCs running Windows, but those could trivially be replaced with iPadOS devices that are more locked down than Windows and generally more secure by design.

DuckHunt

2024-07-19
So apparently "The issue has been identified, isolated and a fix has been deployed" https://x.com/George_Kurtz/status/1814235001745027317

Yet the chaos seems to continue. Could it be that this fix can't be rolled out automatically to affected machines because they crash during boot - before the Crowdstrike Updater runs?

jeffchien

2024-07-19

999900000999

2024-07-19
I'd bet my career CS isn't spending enough on QA. It's always the first thing to be cut, no one cares about QA when everything is going well, but when things go wrong...

alphabetting

2024-07-19
Google spending a boatload for Wiz looks smarter now

bkj512

2024-07-19
Wow

bkj512

2024-07-19
Lol we were using Symantec software so thankfully no affect.

bandrami

2024-07-19
What a fun time to be less than 48 hours out from a transcontinental flight

monkeydust

2024-07-19
So where can I buy an ETF of companies specializing in software Quality Assurance?

cja

2024-07-19
Sorry to be dense, but what is CrowdStrike and do I have it on my computer?

vtemian

2024-07-19
What's the actual magnitude of this outage? Is there a way to estimate how many machines were down?

zteppenwolf

2024-07-19
I guess people who continue to use Windows in 2024 arguably deserve this, particularly those utilizing it in a production environment.

personalityson

2024-07-19
This is what AI's first strike will look like

butler14

2024-07-19
I’m guessing it’s completely incidental that the CEO of crowdstrike was critical of China earlier this year, and that China is somehow unaffected by this ‘global’ issue!

scrollaway

2024-07-19
Those focusing on QA and staged rollouts are misguided. Yes of course a serious company should do it but CrowdStrike is a compliance checkbox ticker.

They exist solely to tick the box. That’s it. Nobody who pushes for them gives a shit about security or anything that isn’t “our clients / regulators are asking for this box to be ticked”.

The box is the problem. Especially when it’s affecting safety critical and national security systems. The box should not be tickable by such awful, high risk software. The fact that it is reflects poorly on the cybersecurity industry (no news to those on this forum of course, but news to the rest of the world).

I hope the company gets buried into the ground because of it. It’s time regulators take a long hard look at the dangers of these pretend turnkey solutions to compliance and we seriously evaluate whether they follow through on the intent of the specs. (Spoiler: they don’t)

eitland

2024-07-19
Some Canonical guy I think many years ago mentioned this as their sales strategy a few year ago after a particularly nasty Windows outage:

We don't ask customers to switch all systems from Windows to Ubuntu, but to consider moving maybe a third to Ubuntu so they won't sit completely helpless next time Windows fail spectacularly.

While I see more and more Ubuntu systems, and recently have even spotted Landscape in the wild I don't think they were as successful as they hoped with that strategy.

That said, maybe there is a silver lining on todays clouds both WRT Ubuntu and Linux in general, and also WRT IT departments stopping to reconsider some security best practices.

lopkeny12ko

2024-07-19
Isn't a Windows BSOD the equivalent of a kernel panic? I don't understand how this is CrowdStrike's fault. Vanilla userspace operations shouldn't cause a kernel panic--that's a bug in the OS, not a bug in some user software. If anything, we should be blaming Windows here?

kidbomb

2024-07-19
Lessons learned from this:

- CS: Have a staging (production-like) environment for proper validation. It looks like CS has one of these bu they have just skipped it - IT Admins: Have controlled roll-outs, instead of doing everything in a single swoop. - CS: Fuzz test your configuration

Anything I have missed?

totaldude87

2024-07-19
I hope the narrative of , install crowd strike and pass the audit or else changes after this .

but being in the industry for so long , I don't expect any changes whatsoever, it's either CS or some other tool

matt_s

2024-07-19
Does crowdstrike work similarly on MacOS? I have to imagine the "walled garden" doesn't allow for 3rd parties to insert themselves into the OS kernel but I could be wrong.

frankohn

2024-07-19
The Windows ecosystem typically deployed in corporate PCs or workstations is often insecure, slow, and poorly implemented, resulting in ongoing issues visible to everyone. Examples include problems with malware, ransomware, and Windows botnets.

In corporate environments, IT staff struggle to contain these issues using antivirus software, firewalls, and proxies. These security measures often slow down PCs significantly, even on recent multi-core systems that should be responsive.

Microsoft is responsible for providing an operating system that is inherently insecure and vulnerable. They have prioritized user lock-in, dark patterns, and ease of use over security.

Apple has done a much better job with macOS in terms of security and performance.

The corporate world is now divided into two categories: 1. Software-savvy companies that run on Linux or BSD variants, occasionally providing macOS to their employees. These include companies like Google, Amazon, Netflix, and many others. 2. Companies that are not software-focused, as it's not their primary business. These organizations are left with Microsoft's offerings, paying for licenses and dealing with slow and insecure software.

The main advantage of Microsoft's products is the Office suite: Excel, Word and Powerpoint but even Word is actually mediocre.

EDIT: improve expression and fix errors:

gchamonlive

2024-07-19
I know I have the benefit of hindsight in this regard, but how isn't there redundant checks and tests that would prevent a mishap of this magnitude?

I mean, there should be extensive automated testing using many different platforms and hardware combinations as a prerequisite for any rollout.

I guess this is what we get when everything is opaque, not only the product and the code, but also the processes involved in maintaining and evolving the solution. They would think twice about not investing heavily in testing their deployment pipelines if everyone could inspect their processes.

It might also be the case that they indeed have a thorough production and testing process deployed to support the maintenance of crowdstrike solutions, but we are only left to wonder and to trust whatever their PR will eventually throw at us, since they are a closed company.

fnord77

2024-07-19
I take it patching remote machines is going to be difficult or impossible?

I haven't used windows in years, but from what I read you need to be in safe mode to delete a crowdstrike file in a system directory, but you need some 48 char key to get into safe mode now if it is locked down?

red_admiral

2024-07-19
I can't wait for rachelbythebay's comments on this.

gz5

2024-07-19
Seems CS themselves may have been hacked? For example, seems unlikely that both:

1. CS normally pushes global updates to entire user base simultaneously?

2. This made it through their testing. Not only 'just' QA but likely CS employees internally run a version or two ahead of their customer base?

Just speculation - folks who know either answer can validate or debunk.

Kye

2024-07-19
Crowdstrike seems like the kind of thing that's sold to CEOs at conferences, forced on IT against objections, and the subject of a lot of discussion at Defcon.

EvanAnderson

2024-07-19
I wonder what Crowdstrike's opsec is like re: malicious actors gaining control of their automated update servers. This incident certainly highlights the power of that type of attack, even if this one just ends up being typical human incompetence-based.

luismedel

2024-07-19
Probably a stupid question but, how can the Windows kernel recover so well after a graphics driver crash and at the same time being unable to do the same for other kind of drivers.

red_admiral

2024-07-19
From reddit:

> I'm in Australia. All our banks are down and all supermarkets as well so even if you have cash you can't buy anything.

I hope the national security/defense people are looking at this closely. Because you can bet the bad guys are. What's the saying, civilisation is only ever three days away from collapse or something?

I am pretty convinced this is a fuckup not an attack, but if Iran or someone managed something like this, there would be hell to pay.

jimberlage

2024-07-19
For $150/hour, I will spend today consulting for businesses who need someone in the St Louis area to go reboot a remote workers’ machine.

lencastre

2024-07-19
Lots of issues in Spain and Germany.

anchochilis

2024-07-19
We routinely implement phased / canary deployments in server-side systems to prevent faults from rolling out globally. How is it possible that CrowdStrike and/or Windows does not have a similar system built in for large, institutional customers? This is outrageous.

Tylast

2024-07-19
Oh, the foresight of the 1st episode of Connections. https://www.youtube.com/watch?v=XetplHcM7aQ

pelasaco

2024-07-19
If i was North Korea, I would say that was me. That would be however a crazy story if Russia and China had done anything about it.

whoknowsidont

2024-07-19
Somewhere out there, there is an engineer with the biggest "I told you so" shit eating grin scrolling through every social media site and basking in the glory.

gquere

2024-07-19
There's supposedly a fix being deployed (https://x.com/George_Kurtz/status/1814235001745027317). Since it's a channel update I'm assuming that it would be downloaded automatically? Has anyone received it yet? Does the garbage driver disappear or is it replaced?

Edit: got in touch with an admin:

C-00000291-00000000-00000029.sys SHA256 1A30..4B60 is the bad file (timestamp 0409 UTC)

C-00000291-00000000-00000030.sys SHA256 E693..6FAE is the fix (timestamp >= 0527 UTC)

Do not rely on the hashes too much as these might vary from org to org I've read.

cryptica

2024-07-19
I've been warning about the coming software apocalypse for years. This isn't a one-off, this is the beginning of a pattern. Tech recruitment is broken, software is more complex than ever, more and more people are turning to hacking, people are growing increasingly dissatisfied with the status quo...

cryptica

2024-07-19
This is what happens when you entrust software security to ex-hackers. Hackers love complexity because that's the kind of environment they thrive in; yet when they start working for the other side as security consultants, they still love complexity. Complexity ought to be the security consultant's worst enemy.

Ex-hackers often talk about security as if it's something you need to add to your systems... Security is achieved through good software development practices and it's about minimalism. You can't take intrinsically crappy, over-engineered, complex software and make it more secure by adding layers upon layer of complex security software on top.

lobochrome

2024-07-19
This is how I would start a war… surreal.

I hope it’s just a bug.

Melatonic

2024-07-19
Good luck everyone. I just spent all night fixing my shit and we caught it early

lifeisstillgood

2024-07-19
What do we do next week?

So assuming everyone uses sneaker-net to restart what’s looking like millions of windows boxes, there comes recriminations but then … what?

I think we need to look at minimum viable PC - certain things are protected more than others. Phones are a surprisingly good example - there is a core set of APIs and no fucker is ever allowed to do anything except through those. No matter how painful. At some point MSFT is going to enforce this the way Apple does. The EU court cases be damned.

For most tasks for most things it’s hard to suggest that an OS and a webbrowser are not the maximum needed.

We have been saying it for years - what I think we need is a manifesto for much smaller usable surface areas

harimau777

2024-07-19
It seems like this would indirectly tell us what systems use Cloudstrike. Could that in of itself be information that could help an attacker? I know the security team at work is adamant about not leaking details of our system.

smithington

2024-07-19
There's already somebody trying to cash in on this problem:

https://fix-crowdstrike-apocalypse.com

steveBK123

2024-07-19
Wild that a piece of software so integral to basic function has such bad release discipline. A/B, Blue/Green, Canary, Rolling, etc..

I've worked on 4 person software teams that at least followed basic user group rolling release system.

JackC

2024-07-19
Crowdstrike did this to our production linux fleet back on April 19th, and I've been dying to rant about it.

The short version was: we're a civic tech lab, so we have a bunch of different production websites made at different times on different infrastructure. We run Crowdstrike provided by our enterprise. Crowdstrike pushed an update on a Friday evening that was incompatible with up-to-date Debian stable. So we patched Debian as usual, everything was fine for a week, and then all of our servers across multiple websites and cloud hosts simultaneously hard crashed and refused to boot.

When we connected one of the disks to a new machine and checked the logs, Crowdstrike looked like a culprit, so we manually deleted it, the machine booted, tried reinstalling it and the machine immediately crashes again. OK, let's file a support ticket and get an engineer on the line.

Crowdstrike took a day to respond, and then asked for a bunch more proof (beyond the above) that it was their fault. They acknowledged the bug a day later, and weeks later had a root cause analysis that they didn't cover our scenario (Debian stable running version n-1, I think, which is a supported configuration) in their test matrix. In our own post mortem there was no real ability to prevent the same thing from happening again -- "we push software to your machines any time we want, whether or not it's urgent, without testing it" seems to be core to the model, particularly if you're a small IT part of a large enterprise. What they're selling to the enterprise is exactly that they'll do that.

tedajax

2024-07-19
The first time I experienced crowdstrike in a corporate environment it seemed obvious that something like this would eventually happen.

ezoe

2024-07-19
Those EDR software is implemented as a kernel driver.

A third party closed source Windows kernel driver that can't be audited. It gathers massive amount of activities and send back to the central server(which can be sold) as well as execute arbitrary payload from the central server.

It became single point of failure to your whole system.

If an attacker gain control of the sysadmin PC, it's over.

If an attacker gain administrator privilege on EDR-installed system, it run the same privilege with EDR so attacker can hide their activities from EDR. There aren't many EDR products in the world it can be done.

I'd like to call it "full trust security model".

ngneer

2024-07-19
Security technology harming security? Shocker. We need less monoculture. Trouble is monoculture pays. Write the software once, deploy it everywhere - free money.

I manage a simple Tier-4 cloud application on Azure, involving both Windows and Linux machines. Crowdstrike, OMI, McAfee and endpoint protection in general has been the biggest thorn in my side.

snailb

2024-07-19
On a positive note, I'm in morocco and getting money from ATM wasn't working for the whole day I believe because of this outage. I was at the till in a supermarket and people started asking if they can chip in to pay for some food I bought because I didn't have the cash.

Humanity 1 - Technology 0

Edit: Outage of all ATM's in Morocco was yesterday not today. so not sure how the two are related.

hughw

2024-07-19
So, why did our little company's (little used) two Windows machines not BSOD overnight? They were just sitting idle. They run CS Falcon sensor. Did the update force a restart? Didn't seem to happen here.

egberts1

2024-07-19
I am quite sure that they have had three precious timezone hours to detect a total failure of telemetry after their fateful midnight upgrade.

Like the most useful Canary Island in the Coal Mine.

KingOfCoders

2024-07-19
2024 years after 2k we have 2k.

Kye

2024-07-19
Maybe a silly question, but: why hasn't this affected Linux? I assume it uses a proprietary kernel module just like it does on Windows. I guess this will come out in a post-mortem if they publish one, but it's been on my mind.

edit: aha https://news.ycombinator.com/item?id=41005936

They did do this to Linux, but in the past. Maybe whatever they did to deal with it saved Linux this time around

ngneer

2024-07-19
"There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult. It demands the same skill, devotion, insight, and even inspiration as the discovery of the simple physical laws which underlie the complex phenomena of nature."

"The most important property of a program is whether it accomplishes the intention of its user."

C.A.R. Hoare

sans_souse

2024-07-19
Why would you name your company "CrowdStrike" anyway? What does Crowd Strike even mean?

josephd79

2024-07-19
Year of Linux

markus_zhang

2024-07-19
In pre-market, CRWD is 14% down. I think investors are a bit scared that THIS time there is going to be some consequences.

ddgflorida

2024-07-19
Do you suppose they test before pushing updates out?

thomasjudge

2024-07-19
Why are "security" patches not tested before they are deployed?

egberts1

2024-07-19
Yet Lennart Pottering and Redhat (spelled that way as I am one of the original pre-IPO investor of RedHat via Alex Brown/Deutsche Bank) wants to put networking of Linux into UEFI this quarter, inside the most sacrosanct PID 1.

They still won’t learning anything from Crowdstrike’s mistakeS!

Maybe it is time for me to ditch that stock.

integricho

2024-07-19
Ironic that the software intended to prevent exactly these kinds of outages ends up causing it.

integricho

2024-07-19
This should at the very least put them out of business by causing each and every client to abandon them as their security solution.

sytelus

2024-07-19
Genuine question: How the heck crapeware like CloudStrike got into all critical systems from 911 to hospitals to airlines? My understanding was that all these critical systems are just super lazy to upgrade or install anything at all. I would love to know all the sales tactics CS used to get into millions of systems for money!

convivialdingo

2024-07-19
Here’s my take as a security software dev for 15 years.

We put too much code in kernel simply because it’s considered more elite than other software. It’s just dumb.

Also - if a driver is causing a crash MSFT should boot from the last known-good driver set so the install can be backed out later. Reboot loops are still the standard failure mode in driver development…

amai

2024-07-19
It seems monocultures are not only bad for resilience in agriculture, but also in IT.

jacobgorm

2024-07-19
The great clownstrike.

whoisstan

2024-07-19
Can someone with experience explain how integration tests did not detect that?

amai

2024-07-19
Did Crowdstrike forget the rule, that one does not simply deploy on Friday?

https://www.reddit.com/r/ProgrammerHumor/comments/f79iag/don...

dev1ycan

2024-07-19
Crazy isn't it, I had no issues because my group policy updates have been off since last year, guess the "everyone must forcefully update" for "security reasons" ended up backfiring, who could've thought

daemonologist

2024-07-19
My company has some bios bitlocker extension installed which prompts for a password on boot, so automatic updates (one of which tried to install last night) just get stuck there in jet engine mode. Normally this is extremely annoying but today I count myself lucky - aside from a couple of people with Chromebook thin clients I am the only person showing as online in Teams right now.

kaladin-jasnah

2024-07-19
Anecdote: my first job was IT at a small org. We had somehow gotten a 15 minute remote meeting with Kevin Mitnick, and asked him several questions about security best practices and software recommendations. I don't remember a lot about that meeting, but I do remember his strong recommendation of Crowdstrike. Interesting to see it brought up again in this context.

farceSpherule

2024-07-19
I absolutely abhor these end point solutions that "auto update for your convenience and safety."

I can control and manage my own systems. I do not need nanny state auto updating for me.

Crowdstrike should be held liable for financial losses associated with this nonsense.

axelthegerman

2024-07-19
Looks like crowdstrike are just delivering what their name promised, striking crowds around the world

aktuel

2024-07-19
Germany is not affected since it's Krautstrike only.

GrumpyNl

2024-07-19
Ho do they test this before they roll it out? Looks like a bug thats easy to spot. I would presume they test it at several configurations and when it passes the test ( a reboot), they roll it out. Has this been tested?

xyst

2024-07-19
CRWD dropped $50/share at market open. Wild.

Is this specific to only Windows machines “protected” with CS or is this impacting Linux/macOS as well?

remram

2024-07-19
I can't wait to see the CloudFlare traffic report after this. All those computers going down must have affected traffic worldwide. Even from Linux systems as their owners couldn't run jobs from their bricked Windows laptops.

charlie0

2024-07-19
Next up: CrowdStrike Sued for Alleged Negligence.

I hate lawyers, but this is the reason why companies outsource. Why take the blame (and spend the money) when you can blame the vendor?

Geezus_42

2024-07-19
"Incidents of this nature do occur in a connected world that is reliant on technology." - Mike Maddison, CEO, NCC Group

Until I see an explanation of how this got past testing, I will assume negligence. I wasn't directly affected, but it seems every single Windows machine running their software in my org was affected. With a hit rate that high I struggle to believe any testing was done.

zzhelezc

2024-07-19
From the BBC's cyber correspondent Joe Tidy [1]:

> A "content update" is how it was described. So, it wasn’t a major refresh of the cyber security software. It could have been something as innocuous as the changing of a font or logo on the software design.

He can't be serious, right? Right?

[1] https://www.bbc.co.uk/news/live/cnk4jdwp49et?post=asset%3Abd...

Beijinger

2024-07-19
Speaking of security. I got an email yesterday that I need a different system now to log into my social security account. This one:

https://www.id.me/government

It is for social security, taxes, unemployment benefits, whatever. And running under a foreign TLD, .ME for Montenegro. I am not a security specialist. But I think this is asking for trouble.

By the way, do you remember when fuck.yu became fuck.me ?

rajeshivivek

2024-07-19
DO NOT REDEEM SAARRRRRRRRSSSSS! BLODDY BASTARDS INVALID FORMATING SARRRRRRSSSSS!!

stevetron

2024-07-19
Working late Thursday night in Florida, USA. I have someone in Australia wanting me to write a quick script in LSL for an object in Second Life. We were interrupted: Second Life kept running, but Discord went down, telling me to 'try another server' which doesn't make sence when you are 1-on-1 with someone. All my typing in Discord turned red. Additionally, I couldn't log into the email portal for outlook.com: I got a screen of tiny-fonted text all clinging to the left edge of the display, unreadable, unusable. Second Life, though, stayed online and kept working for me, but then I'm on Windows 7. My friend who had requested the collaboration froze in Second Life on his Windows 10 system, and I don't know what his Discord was doing. I ended the session since I couldn't get a no/no-go out of him for the latest script version.

dboreham

2024-07-19
Looks like it affected the Crowdstrike stock, but not Microsoft.

upofadown

2024-07-19
Perhaps a dumb question for someone who actually knows how Microsoft stuff works...

Why would an anti-malware program be allowed to install a driver automatically ... or ever for that matter?

Added: OK, from another post I now know Crowdstrike has some sort of kernel mode that allows this sort of catastrophe on Linux. So I guess there is a bigger question here...

rajeshivivek

2024-07-19
SARRRRRRRSSSSS!

purpleblue

2024-07-19
Do all the machines need to be manually fixed? It doesn't seem like an automatica update will work here...

snappr021

2024-07-19
“To err is human, but to really fuck things up requires a computer.” ~ Len Beattie

resters

2024-07-19
Any company that inserts itself so heavily into US politics cannot be counted on as a solid engineering organization.

localfirst

2024-07-19
This is the first time I'm hearing about crowdstrike, what is it and why is this such a big deal?

casey2

2024-07-19
Isn't Crowdstrike the same company the heavily lobbied to get make all their features a requirement for government computers? https://www.opensecrets.org/federal-lobbying/clients/summary... They have plenty of money for congress, but it seem little for any kind of reasonable software development practices. This isn't the first time crowdstrike has pushed system breaking changes.

steelframe

2024-07-19
Wow, this hits close to home. Doing a page fault where you can't in the kernel is exactly what I did with my very first patch I submitted after I joined the Microsoft BitLocker team in 2009. I added a check on the driver initialization path and didn't annotate the code as non-paged because frankly I didn't know at the time that the Windows kernel was paged. All my kernel development experience up to that point was with Linux, which isn't paged.

BitLocker is a storage driver, so that code turned into a circular dependency. The attempt to page in the code resulted a call to that not-yet-paged-in code.

The reason I didn't catch it with local testing was because I never tried rebooting with BitLocker enabled on my dev box when I was working on that code. For everyone on the team that did have BitLocker enabled they got the BSOD when they rebooted. Even then the "blast radius" was only the BitLocker team with about 8 devs, since local changes were qualified at the team level before they were merged up the chain.

The controls in place not only protected Windows more generally, but they even protected the majority of the Windows development group. It blows my mind that a kernel driver with the level of proliferation in industry could make it out the door apparently without even the most basic level of qualification.

CKMo

2024-07-19
This is a good example of why you don't want ring0 level access for clients. Or just, you don't want client-based solutions. The provider just becomes another threat vector.

swozey

2024-07-19
I know there's a better word to be used here, but what initially looked like a massive cyberattack turning out to be a massive defender foot-broom is chefs kiss.

I saw it was Windows and went to bed. What a great feeling.

I'm sorry to those of you dealing with this. I've had to wipe 1200 computers over a weekend in a past life when a virus got in.

Did I receive any appreciation? Nope. I was literally sleeping under cubicle desks bringing up isolated rows one by one. I switched everything in that call center to linux after that. Ironically it turns out it was a senior engineers ssh key that got leaked somehow and was used to get in and dig around servers in our datacenter outside of my network. My filesystem logging (in Windows, coincidentally) alerted me.

IT is fun.

cloin

2024-07-19
I'm confused as to how this issue is so widespread in the first place. I'm unfamiliar with how Crowdstrike works, do organizations really have no control over when these updates occur? Why can't these airlines just apply the updates in dev first? Is it the organizations fault or does Crowdstrike just deliver updates like this and there's no control? If that's just how they do it, how do they get away with this?

stainablesteel

2024-07-19
its strange how people who work in professions that are considered crucial infrastructure are held to such a high standard but there's always some tech problem that cripples them the hardest

rboyd

2024-07-19
Seems like a modern operating system would have an automatic rollback mechanism for cases like this.

tgtaptarget

2024-07-19
In my org, none of the essential systems went down (those used by labor). However all of management's individual PCs went down which got me wondering... Is this the beginning (or continuation) of whittling down what is "essential" human labor versus what could be done remotely (or eliminated completely)?

Or perhaps Microsoft is just garbage and soon will be as irrelevant as commercial real estate office parks and mega-call centers

kubov

2024-07-19
Were cloud providers (AWS and azure) so heavily impacted because they use CS internally or because so many users use CS?

raphar

2024-07-19
I want to see the internal postmortem of why this happened to CrowdStrike (if they are still in business)

gonzo41

2024-07-19
Do people not have test environments?

tbatchelli

2024-07-19
This event is predicted in Sydney Dekker’s book “Drift into Failure”, which basically postulates that in order to prevent local failure we setup failure prevention systems that increase the complexity beyond our ability to handle, and introduce systemic failures that are global. It’s a sobering book to read if you ever thought we could make systems fault tolerant.

cjbgkagh

2024-07-19
Due to the scale I think it’s reasonable to state that in all likelihood many people have died because of this. Sure it might be hard to attribute single cases but statistically I would expect to see a general increase in probability.

I used to work at MS and didn’t like their 2:1 test to dev ratio or their 0:1 ratio either and wish they spent more work on verification and improved processes instead of relying on testing - especially their current test in production approach. They got sloppy and this was just a matter of time. And god I hate their forced updates, it’s a huge hole in the threat model, basically letting in children who like to play with matches.

My important stuff is basically air-gapped. There is a gateway but it’ll only accept incoming secure sockets with a pinned certificate and only a predefined in-house protocol on that socket. No other traffic allowed. The thing is designed to gracefully degrade with the idea that it’ll keep working unattended for decades, the software should basically work forever so long as equivalent replacement hardware could be found.

nimbius

2024-07-19
I work for a diesel truck repair facility and just locked up the doors after a 40 minute day :( .

- lifts wont operate.

- cant disarm the building alarms. (have been blaring nonstop...)

- cranes are all locked in standby/return/err.

- laser aligners are all offline.

- lathe hardware runs but controllers are all down.

- cant email suppliers.

- phones are all down.

- HVAC is also down for some reason (its getting hot in here.)

the police drove by and told us to close up for the day since we dont have 911 either.

alarms for the building are all offline/error so we chained things as best we could (might drive by a few times today.)

we dont know how many orders we have, we dont even know whos on schedule or if we will get paid.

GirishSharma643

2024-07-19
Who is responsible for this billion dollar mistake?

vlan0

2024-07-19
Anything that has root/kernel access is a risk. It always has been. When will we learn. Probably never. Because money runs this world. So sad. Time to open a bakery and move on from this world.

steine65

2024-07-19
Here's a visual representation of flight cancellations and delays at major US airports https://www.flightaware.com/miserymap/

UniverseHacker

2024-07-19
Why are so many mission critical hardware connected systems connected to the internet at all or getting automatic updates?

This is just basic IT common sense. You only do updates during a planned outage, after doing an easily reversible backup, or you have two redundant systems in rotation and update and test the spare first. Critical systems connected to things like medical equipment should have no internet connectivity, and need no security updates.

I follow all of this in my own home so a bad update doesn’t ruin my work day… how do big companies with professional IT not know this stuff?

irusensei

2024-07-19
Can we end the whole “loading a kernel rootkit” thing? AFAIK Apple already shuns kernel extensions. What’s preventing Microsoft to do the same? As a bonus shit like anti cheat will go away too.

franczesko

2024-07-19
I just wanted to mention that Microsoft has 3 tiers of Windows beta releases before changes are pushed to production. I can't comprehend how this wasn't noticed before.

jimt1234

2024-07-19
I watched a presentation by someone representing "I Am The Cavalry" at B-Sides, Las Vegas, a few years ago. Very interesting stuff, gave me a whole new perspective on "cyber security".

https://iamthecavalry.org

PaulHoule

2024-07-19
People at my workplace were affected but I dodged the bullet because I left my computer turned on overnight because I always want to be able to RDP in the next morning in case I decide to stay home.

rootforce

2024-07-19
AWS has posted some instructions for those affected by the issue using EC2.

[AWS Health Dashboard](https://health.aws.amazon.com/health/status)

"First, in some cases, a reboot of the instance may allow for the CrowdStrike Falcon agent to be updated to a previously healthy version, resolving the issue.

Second, the following steps can be followed to delete the CrowdStrike Falcon agent file on the affected instance:

1. Create a snapshot of the EBS root volume of the affected instance

2. Create a new EBS volume from the snapshot in the same Availability Zone

3. Launch a new instance in that Availability Zone using a different version of Windows

4. Attach the EBS volume from step (2) to the new instance as a data volume

5. Navigate to the \windows\system32\drivers\CrowdStrike\ folder on the attached volume and delete "C-00000291*.sys"

6. Detach the EBS volume from the new instance

7. Create a snapshot of the detached EBS volume

8. Create an AMI from the snapshot by selecting the same volume type as the affected instance

9. Call replace root volume on the original EC2 Instance specifying the AMI just created"

janalsncm

2024-07-19
This outage may be more expensive and cause more damage than any cyberattack in history.

rs999gti

2024-07-19
So can crowdstrike be classified as malware now?

Currently waiting in line for 2 hours + waiting for Delta to tell me when my connecting leg can be booked. My current flight is delayed 5 hours.

Zaskoda

2024-07-19
I want to add something to the discussion but it's difficult for me to accurately summarize and cite things. In a nutshell, there appears to be a lot of tomfoolery with CrowdStrike and the stuff that happened with the DNC during the 2016 election. Here's some of what I'm talking about:

There's a strong link between the DNC, Hillary, and CrowdStrike. Here's once piece that links a cofounder of CrowdStrike with Hillary pretty far back: https://www.technologyreview.com/innovator/dmitri-alperovitc...

This 2017 piece talks about doubt behind CrowdStrike's analysis of the DNC hack being the result of Russian actors. One of the groups disputing CrowdStrike's analysis was Ukraine's military. https://www.voanews.com/a/crowdstrike-comey-russia-hack-dnc-...

This detailed analysis of CrowdStrike's explanation of the DNC hack goes so far as to say "this sounded made up" https://threatconnect.com/resource/webinar-guccifer-2-0-the-...

The Threat Connect analysis is also discussed here: https://thehill.com/business-a-lobbying/295670-prewritten-gu...

"For one, the vulnerability he claims to have used to hack the NGP VAN ... was not introduced into the code until an update more than three months after Guccifer claims to have entered the DNC system."

Noted at the end of this story they mention that CrowdStrike installed it's software on all of the DNC's systems: https://www.ft.com/content/5eeff6fc-3253-11e6-bda0-04585c31b...

Finally, there's this famous but largely forgotten story of the time Bernie's campaign was accused to accessing Hillary's data: https://www.npr.org/2015/12/18/460273748/bernie-sanders-camp...

"This was a very egregious breach and our data was stolen," Mook said. "We need to be sure that the Sanders campaign no longer has access to our data."

"This bug was a brief, isolated issue, and we are not aware of any previous reports of such data being inappropriately available," the company said in a blog post on its website.

(edited for spelling)

Kye

2024-07-19
There's a workaround: reboot 10-15 times. I've seen two people say it independently, so maybe it's for real.

tonymet

2024-07-19
No rolling updates? How could a 100% repro BSOD pass QC? I'm more concerned about the deployment process than the crash itself. Everyone experiences a bad build from time to time. How did this possibly go live?

energy123

2024-07-19
On the plus side this will help us develop an immune system against cyber attacks in any future war. Businesses will start thinking of contingencies.

Key89

2024-07-19
is there an ELI5 on how can this happen? Like i get its a boot loop, but what did crowdstrike do that cause it? How can non malicious code trigger boot loop?

piuantiderp

2024-07-19
how come does anyone still use crowdstrike?

1024core

2024-07-19
Read on Mastodon: https://infosec.exchange/@littlealex/112813425122476301

The CEO of Crowdstrike, George Kurtz, was the CTO of McAfee back in 2010 when it sent out a bad update and caused similar issues worldwide.

If at first you don't succeed, .... ;-) j/k

belter

2024-07-19
It's not the first time they pull something similar...1 month ago: "CrowdStrike bug maxes out 100% of CPU, requires Windows reboots" - https://www.thestack.technology/crowdstrike-bug-maxes-out-10...

75 Billion dollars valuation, CNBC Analysts praising the company this morning on how well the company is run!...When in reality they can't master the most basic of the phased deployment methodologies known for 20 years...

Hundreds of handsomely paid CTO's, at companies with billions of dollars in valuations, critical healthcare, airlines, who can't master the most basic of the concepts of "Everything fails all the time"...

This whole industry is depressing....

nu11ptr

2024-07-19
This whole thing likely would have been averted had microkernel architectures caught on during the early days (with all drivers in user mode). Performance would have likely been a non-issue, not only due to the state of the art L4 designs that came later, but mostly because had it been adopted everything in the industry would have evolved with it (async I/O more prevalent, batched syscalls, etc.).

I will admit we've done pretty well with kernel drivers (and better than I would have ever expected tbh), but given our new security focused environment it seems like now is the time to start pivoting again. The trade offs are worth it IMO.

chinathrow

2024-07-19
Yet their stock tanked only a couple of dollars. They (and their customers) should face some rather unpleasant lawsuits. If you let others own your systems, you should not be allowed to provide critical infrastructure.

Ringz

2024-07-19
By chance, I watched a few episodes of 911 and kept thinking that it was all completely unrealistic nonsense. Then there's an episode where the entire emergency call system for LA goes down, and even though there were different reasons in the episode (a transformer fire), I couldn't have imagined that it was actually possible to completely disable the emergency call system (and what else) of a city.

qwerty456127

2024-07-19
WTF is CrowdStrike and why is it affecting so many people and companies? I've never heard of it before. And apparently it isn't anything relevant to all Windows users as it didn't affect any computer of any person I personally know.

nothercastle

2024-07-19
I love how their company name foreshadows this exact event. It’s malware pretending to be a security suite.

satisfice

2024-07-19
I want to say the problem is that the industry has systematically devalued software testing in favor of continuous delivery and the strategy of hoping that any problems are easy to roll back.

But it's deeper than that: the industry realizes that, once you get to a certain size, no one can hurt you much. Crowdstrike will not pay a lasting penalty for what has just happen, which means executives will shrug and treat this as a random bolt of lightning.

etc-hosts

2024-07-19
Mission critical systems should be running something like ChromeOS.

Too bad ChromeOS seems be on the way out at Google.

01100011

2024-07-19
This might be a good time for folks to go back and watch the first episode of James Burke's Connections: The Trigger Effect

https://www.youtube.com/watch?v=NcOb3Dilzjc

Interconnected systems can fail spectacularly in unforeseen ways. Strange that something so obvious is so often dismissed or overlooked.

coderinsan

2024-07-19
CrowdStrike today has shown why it's absolutely crucial to test code before deployment, say no to YOLO deployments with LLM powered software testing https://github.com/codeintegrity-ai/mutahunter

LrnByTeach

2024-07-19
what many people of not taking is that why we are here:

one simple reason: all eggs in one Microsoft PC basket

why in one Microsoft PC basket?

- most corporate desktop apps are developed for Windows ONLY

Why most corporate desktop apps are developed for Windows ONLY?

- it is cheaper to develop and distribute since, 90% of corporations use Windows PCs ( Chicken and Egg problem)

- alternate Mac Laptops are 3x more expensive, so corporations can't afford

- there are no robust industrial grade Linux laptops from PC vendors (lack of support, fear of Microsoft may penalize for promoting Linux laptops etc.)

1/ Most large corporations (Airlines, Hospitals etc..) can AFFORD & DEMAND their Software vendors to provide their ' business desktop applications' both in Windows and Linux versions and install mix of both Operating systems.

2/ majority of corporate desktop applications can be Web applications (Browser based) removing the single vendor Microsoft Windows PC/Laptops

-

rewgs

2024-07-19
The most concerning thing about this is the realization of just how many incredibly critical systems run on Windows.

apantel

2024-07-19
This just in ‘CrowdStrike Strikes Crowd’

insane_dreamer

2024-07-19
How is it that these major companies aren't rolling out vendor updates to a small number of computers first to make sure that nothing broke, and then rolling out to the entire fleet? That's deployment 101.

rustcleaner

2024-07-19
Thank Chronos I switched to Qubes OS almost two years ago!

siliconc0w

2024-07-19
The postmortem will should interesting, can't imagine how even just basic integration testing didn't catch this. Much less basic best practice like canarying.

low_tech_punk

2024-07-19
Crowdstruck

m3kw9

2024-07-19
It’s that easy. A hacker that controls the update terminal at crowdstrike controls the world?

cjbgkagh

2024-07-19
Edit; it appears my comment has been moved to a top level comment, i.e. peer with the parent without any way of telling what happened - so now there is the whole other pointless branch polluting the relevance of the tree.

Previously;

It appears that someone was able to take my previous comment in this thread completely off hacker news, it's not even listed as flagged. It was at 40pts before disappearing, perhaps there is some reputation management going on here. If it was against the site rules it would be helpful to know which ones.

Edit; the link is https://news.ycombinator.com/item?id=41007985 it was a high up comment that no longer appears even though flagged comments do appear. I checked if it has been moved but the parent comment is still the same. This feels like hellbanned in that there isn't an easy way for me to see if I've been shadowbanned. But I really don't know. I was commenting in good faith.

darkhorn

2024-07-19
Why they don't use Windows' own anti-virus?

pfortuny

2024-07-19
This is the Irish potato famine (essentially due to the farming of a single species of potato) equivalent in IT infrastructure: a single vendor.

mindcrash

2024-07-19
This has all the hallmarks of a SSCA (Software Supply Chain Attack).

Either that or Crowdstrike is testing critical software meddling in ring zero so poorly, causing crashes and bootloops out in the wild on 100% of the deployments, that they need to get sued out of existence.

I hope for their sake its the former.

felipesabino

2024-07-19
Does anyone know how to proceed if I do not have administrator level access to the computer?

I do not have access to c:\windows\system32\drivers\crowdstrike folder to delete the corrupted .sys file

I was able to boot on recovery mode with network, after waiting 30 min, I rebooted and BSOD persisted.

Are there other alternatives on how to recover?

einpoklum

2024-07-19
If I weren't an atheist I would say this is god's punishment for installing malware on your employees' machines, on one hand, and for being a spineless patsy for management by letting them install that crap on your work machine.

jonplackett

2024-07-19
How do so many super critical things rely on… windows? I wouldn’t trust windows to run a laptop reliably but here it is running prettty ucy everything. I guess that’s why they need crowdstrike.

jonplackett

2024-07-19
All the crazy people banging the drum for war with Russia and/or China.

Imagine what our IT systems would look like with someone _intentionally_ messing with them.

awahab92

2024-07-19
closed source will always fuck you in the ass

zx10rse

2024-07-19
There is hardly a better time if you write software to watch - "The Mess We're In" by Joe Armstrong - https://www.youtube.com/watch?v=lKXe3HUG2l4

I am not sure in which one of his talks he briefly mentioned that one of his concerns is that we are basically building a digital Alexandria library, and if it burns, well ...

Even more devastating events like this will happen in the future.

We stand on the shoulders of giants and yet we learned nothing.

outside1234

2024-07-19
Texas, where software goes to die. Or maybe that is where killer software is developed?

dboreham

2024-07-19
Now things are serious: I can't place a mobile order at Starbucks.

kelembu

2024-07-19
Is there a way to estimate number of affected devices? 10 million? 100 million?

photonbeam

2024-07-19
It should be obvious to everyone now that kernel extensions for ‘security’ is not worth it

wufufufu

2024-07-19
Anyone have a technical writeup of the actual bug? I'm trying to explain how this could happen to people who think this is related to AI or cyber attacks.

What happened to the QA testing, staggered rollouts, feature flags, etc.? It's really this easy to cause a boot loop?

To me, BSOD indicates kernel level errors, which I assume Crowdstrike would be able to cause because it has root access due to being a security application. And because it's boot-looping, there's not a way to automatically push out updates?

_hcuq

2024-07-19
Ok... Would a Linux based infrastructure be more resilient.

Does Linux require Crowdstrike style AV software?

rtkwe

2024-07-19
I'd hate to be on Microsoft's teams today. They're catching a lot of stray blame for this in the public eye where it's entirely not their fault.

sytelus

2024-07-19
CloudStrike had managed to invade into StarBucks IT. All of the online order taking systems are down.

exabrial

2024-07-19
Why are people still using Windows?

sagebird

2024-07-19
It is interesting that operating systems exist for server applications at all.

What is the problem they are solving?

What is the difference between what an operating system contains and can do and what you need it to do?

Why would I want to rent a server to run a program that performs a task, and also have the same system performing extra tasks - like intrusion detection, intrusion detection software updates, etc.

I just don't understand why compiled program that has enough disk and memory would ever be asked to restart for a random fucking reason having nothing to do with the task at hand. It seems like the architecture of server software is not created intelligently.

banku_brougham

2024-07-19
The IT security chief at my co (paraphrasing):

>talked to pres of Crowdstrike. His forthrightnes was refreshing. He said “We got it wrong.”

>They are working with Microsoft to understand why this happened.

Pretty much the message minus even more boilerplate talk.

wojo1206

2024-07-19
Why Crowdstrike doesn't follow standard deployment strategies such as canary or rolling? Gradual update would uncover this bug before reaching critical mass. Doing all-at-once update is unacceptable to critical systems.

downrightmike

2024-07-19
This is what happens when you treat IT as a cost center.

nineteen999

2024-07-19
Speaking as somebody who manages a large piece of a 911 style system for first responders and has done so for 10 years (and is not affected by this outage) - this is why we do not allow third parties to push live updates to our systems.

It's unfortunate, the ambulances are still running in our area of responsibility, but it's highly likely that the hospitals they are delivering patients to are in absolute chaos.

ijidak

2024-07-19
How do you patch software causing a BSOD?

It seems like a chicken and egg problem.

I ran a team that developed a remote agent, and this was my nightmare scenario.

grumpyprole

2024-07-19
Hopefully now people might wake up to the idea that these tech monopolies are not leading to safe, secure and reliable systems. They will wonder how a third party component could cause such breakage. I expect many will be calling for regulation.

badgersnake

2024-07-19
I don’t really understand why AV updates aren’t tested before being pushed out to critical systems and I don’t understand why every system would run the same AV.

But also I don’t understand why this corporate garbageware is still a thing in 2024 when it adds so little value.

xyst

2024-07-19
why the fuck is our critical infrastructure running on WINDOWS. Fuck the sad state of IT. CIOs and CTOs across the board need to be fired and held accountable for their shitty decisions in these industries.

yes CRWD is a shitty company but seems they are a "necessity" by some stupid audit/regulatory board that oversees these industries. But at the end of the day, these CIOs/CTOs are completely fucking clueless as to the exact functions this software does on a regular basis. A few minions might raise an issue but they stupidly ignore them because "rEgUlAtOrY aUdIt rEqUiReS iT!1!"

wolfspaw

2024-07-19
(FORCE) Pusheedd to Prod on FRIDAYY -- Burneeeddd by its Sins

2OEH8eoCRo0

2024-07-19
They should be sued into bankruptcy

fargle

2024-07-19
such stupidity. our $$$ corporate geniuses mandate multiple so-called security software which is:

- unaccountable black boxes

- of questionable, and un-auditable, quality

- requires kernel modules, drivers, LocalSystem, root access, etc.

- updates at random times with no testing

- download these updates from where? and immediately trust and run that code at high privilege. using unaccountable-black-box crypto to secure it.

- all have known patterns of bad performance, bugs, and generally poor quality

all in the name of security. let's buy multiple "solutions" and widely deploy them to protect us from one boogeyman, or at least the shiny advertisements say. while punching all sorts of serious other holes in security. why even look for a Windows ZeroDay when we can look for a McAfee or Crowdstrike zero day?

mensetmanusman

2024-07-19
Stopped by a gas station in rural Wisconsin leaving from MSP. Thank God we were on a full tank when we left, nothing was operational except the bathrooms (which is why we stopped).

I left thinking about how anti-anti-fragile our systems have become. Maybe we should force cash operations…

accra4rx

2024-07-19
Never heard mainframe going down

mikewarot

2024-07-19
Why are we still running ANY operating systems based on Ambient Authority, as part of our infrastructure?

DoD shouldn't have given up on MULTICS. That premature optimization is going to sink the US and the Free World.

Personally, I'm still waiting for Genode to be my daily driver.

GabeIsko

2024-07-19
So - what is the lesson learned? The only clear message for me is that critical programs that also demand kernel level access maybe shouldn't update themselves.

jpgvm

2024-07-19
If you run RATs like these on your machines then I'm sorry, this is just a case of fucking around and finding out.

Just don't do it. Windows Defender is a thing, it does just fine. For everything else there is least-privilege and group policy.

watersb

2024-07-19
I wonder how unknown (yet) malware will be wiped out or enabled before this is over.

ckemere

2024-07-19
It seems that an unexplored weirdness here is the prevalence of virtual Windows in the medical world. It seems that this has approach has become commonplace for HIPAA reasons (though it's unclear that it makes the world better versus using secure applications to handle HIPAA data). In the case of this Crowdstrike outage, one would think that virtual machines would simplify getting things up and running again, but instead there seems to be just the opposite going on, where lack of hardware access is limiting restoring them.

Any insight from those affected?

Group_B

2024-07-19
fun times...

smcleod

2024-07-19
I mean... installing what is essentially a 3rd party enterprise rootkit that not only has root access to all files and network activity but also a self-update mechanism ... who could have seen this coming?

gpderetta

2024-07-19
The machine stops.

tigerlily

2024-07-19
Meanwhile the linux desktop just keeps on truckin'.

Bluestein

2024-07-19
I think we have reached and inflection point. I mean we have to make an inflection point out of this.-

This outage represents more than just a temporary disruption in service; it's a black swan célèbre of the perilous state of our current technological landscape. This incident must be seen as an inflection point, a moment where we collectively decide to no longer tolerate the erosion of craftsmanship, excellence, and accountability that I feel we've been seeing all over the place. All over critical places.-

Who are we to make this demand? Most likely technologists, managers, specialists, and concerned citizens with the expertise and insight to recognize the dangers inherent in our increasingly careless approach to ... many things, but, particularly technology. Who is to uphold the standards that ensure the safety, reliability, and integrity of the systems that underpin modern life? Government?

Historically, the call for accountability and excellence is not new. From Socrates to the industrial revolutions, humanity has periodically grappled with the balance between progress and prudence. People have seen - and complained about - life going to hell, downhill, fast, in a hand basket without brakes since at least Socrates.-

Yet, today’s technological failures have unprecedented potential for harm. The Crowdsource outage killed, halted businesses, and posed serious risks to safety—consequences that were almost unthinkable in previous eras. This isn't merely a technical failure; it’s a societal one, revealing a disregard for foundational principles of quality and responsibility. Craftsmanship. Care and pride in one's work.-

Part of the problem lies in the systemic undervaluation of excellence. In pursuit of speed and profit uber alles. Many companies have forsaken rigorous testing, comprehensive risk assessments, and robust security measures. The very basics of engineering discipline—redundancy, fault tolerance, and continuous improvement—are being sacrificed. This negligence is not just unprofessional; it’s dangerous. As this outage has shown, the repercussions are not confined to the digital realm but spill over into the physical world, affecting real lives. As it always has. But never before have the actions of so few "perennial interns" affected so many.-

This is a clarion call for all of us with the knowledge and passion to stand up and insist on change. Holding companies accountable, beginning with those directly responsible for the most recent failures.-

Yet, it must go beyond punitive measures. We need a cultural shift that re-emphasizes the value of craftsmanship in technology. Educational institutions, professional organizations, and regulatory bodies must collaborate to instill and enforce higher standards. Otherwise, lacking that, we must enforce them ourselves. Even if we only reach ourselves in that commitment.-

Perhaps we need more interdisciplinary dialogue. Technological excellence does not exist in a vacuum. It requires input from ethical philosophers, sociologists, legal experts. Anybody willing and able to think these things through.-

The ramifications of neglecting these responsibilities are clear and severe. The fallout from technological failures can be catastrophic, extending well beyond financial losses to endanger lives and societal stability. We must therefore approach our work with the gravity it deserves, understanding that excellence is not an optional extra but an essential quality sine qua non in certain fields.-

We really need to make this be an actual tuning point, and not just another Wikipedia page.-

ok123456

2024-07-19
Make a live CD Linux image that mounts the NTFS drives, locates the Windows directories from the bootloader, and deletes the file.

Also, you can mount BitLocker partitions from Linux iirc. If it encounters a BitLocker partition, have it read a text file of possible keys off the USB drive.

clarity20

2024-07-19
Has any thread had 3000 comments before?

Cyphase

2024-07-19
This story is about to break into the top 15 of upvoted stories on HN, but it already seems safely within the top 10 by number of comments.

type0

2024-07-19
video summary from fireship: https://www.youtube.com/watch?v=4yDm6xNeYas

grigy

2024-07-19
I could not imagine so many critical systems run on Windows.

hughw

2024-07-19
So, if CrowdStrike licenses didn't say "We're responsible for nothing" and if all affected users sued them, they'd be worth negative 90 trillion dollars or so right now. iow out of business.

I can understand the frustration their customers feel. But how could a software company ever bear liability for all the possible damage they can cause with their software? If they built CrowdStrike to space mission standards nobody could afford it.

JSDevOps

2024-07-19
Remember, there's someone out there right now, without irony, suggesting that AI can fix this. There's someone else scratching their head, wondering why AI hasn't fixed this yet. And there's someone doing a three-week bootcamp in AI, convinced that AI will fix this. I’m not sure which is worse

sys32768

2024-07-19
SMB here. Just spent a nine hour day fixing this. We had two machines that after a couple of reboots just came back up fine.

We were trialing CrowdStrike and about to purchase next week. If their rep doesn't offer us at least half off, we are going with Sentinel One which was half the price of CS already.

The incompetence that allowed this is baffling to me. I assumed with their billions of dollars they'd have tiers of virtual systems to test updates with.

I remember this happening once with Sophos where it gobbled up Windows system files. If you had set to Delete instead of Quarantine, you were toast.

utkarsh858

2024-07-19
Adding a comment to make this the most commented piece on hackernews and hence highlight the bad impact a bug can make on lives founded on IT.

tammer

2024-07-19
When I saw 'Global IT Outage' trending I assumed it was another major cloud service failure. Obviously this has far wider impact because of the need for intervention on individual endpoints.

The irony is dawning on me that for much of the recent computing era we've developed defenses against massive endpoint outages (worms, etc.) and one of them is now inadvertently reproducing the exact problem we had mostly eradicated.

unixhero

2024-07-19
4 hour delay at the airport in Los Cabos. At least they have tacos!

SoftMachine

2024-07-19
Am I supposed to use some AI bot to summarize all this shit? Ain't no one got time to read 3000+ comments. Any good links?

hansvm

2024-07-19
Random strangers running unknown, untrusted code on your computers is the worst. It's a good thing we patched that security flaw by letting the _right_ random strangers run unknown, untrusted code on our computers.

As something of a friendly reminder, it was Microsoft this time, but it's a matter of "when" not "if" till every other OS with that flavor of security theatre is similarly afflicted (and it happens much more frequently when you consider the normal consequences of a company owning the device you paid for -- kicked out of email forever, ads intruding into basic system functions, paid-in-full device eventually requires a subscription, ...). Be cautious with automatic updates.

type0

2024-07-19

TexanFeller

2024-07-19
A heuristic that has served me well for years is that anyone who uses the word “cybersecurity” is likely incompetent and should be treated with suspicion.

My first encounter with CrowdStrike was overwhelmingly negative. I was wondering why for the last couple weeks my laptop slowed to a crawl for 1-4 hours on most days. In the process list I eventually found CrowdStrike using massive amounts of disk i/o, enough to double my compile times even with a nice SSD. Then they started installing it on servers in prod, I guess because our cloud bill wasn’t high enough.

nbtm

2024-07-19
I heard all Windows PCs at the University of New South Wales were also boot-looping.

elchief

2024-07-19
maybe this was just an enormous distraction while Spetssvyaz did a bunch of fun stuff

zoom6628

2024-07-19
By way of a data point for everyone else I live in HongKong and haven't seen any of this level of disruption yet. I also was in Shenzhen China yesterday, probably the words highest density of Win95 machines, and everything was fine. At home we have only one old laptop on win10 that only gets opened when the 8yo gets windows homework - otherwise it's MacOs and Linux on all laptops, desktops and SBCs.

If I see some news I will update this comment.

godelmachine

2024-07-19
I wonder what happens to the engineer who deployed this patch.

jiggawatts

2024-07-19
While initially everyone blamed Microsoft and then quickly pointed the finger at CrowdStrike, I'd like to call out Microsoft especially their Azure division for making the recovery process unnecessarily difficult.

1) A key recovery step requires a snapshot to be take of the disk. The Portal GUI is basically locking up, so scripting is the only way to do this for thousands of VMs. This command is undocumented and has random combinations of strings as inputs that should be enums. Tab-complete doesn't work! See: https://learn.microsoft.com/en-us/powershell/module/az.compu...

E.g.: What are the accepted values for the -CreateOption parameter? Who knows! Good luck using this in a hurry. No stress, just apply it to a production database server at 1 am in the morning.

2) There has been a long-standing bug where VMs can't have their OS disk swapped out unless the replacement disk matches its properties exactly. For comparison, VMware vSphere has no such restrictions.

3) It's basically impossible to get to the recovery consoles of VMs, especially VMs stuck in reboot loops. The serial console output is buggy, often filled with gibberish, and doesn't scroll back far enough to be useful. Boot diagnostics is an optional feature for "reasons". Etc..

4) It's absurdly difficult to get a flat list of all "down" VMs across many subscriptions or resource groups. Again, compare with VMware vSphere where this is trivial. Instead of a simple portal dashboard / view, you have to write this monstrous Resource Graph query:

    Resources
    | where type =~ 'microsoft.compute/virtualmachines'
    | project subscriptionId, resourceGroup, Id = tolower(id), PowerState = tostring( properties.extended.instanceView.powerState.code)
    | join kind=leftouter (
      HealthResources
      | where type =~ 'microsoft.resourcehealth/availabilitystatuses'
      | where tostring(properties.targetResourceType) =~ 'microsoft.compute/virtualmachines'
      | project targetResourceId = tolower(tostring(properties.targetResourceId)), AvailabilityState = tostring(properties.availabilityState))
      on $left.Id == $right.targetResourceId
    | project-away targetResourceId
    | where PowerState != 'PowerState/deallocated'
    | where AvailabilityState != 'Available'

vpshastry

2024-07-19
Don’t they have canary deployments? Such huge updates happen al at once?

cookiengineer

2024-07-19
I'm a little late to the party, but I've uploaded my source codes to GitHub in case anyone needs a more convenient tool to deploy/execute on running machines and/or needs something fast on USB flash drives to run around the office:

https://github.com/cookiengineer/fix-crowdstrike-bsod

Releases section contains prebuilt binaries, but of course, I always recommend to check the source and then build it yourself.

1oooqooq

2024-07-19
Why aren't we upvoting a list of alternatives to CrowdStrike here?

dang

2024-07-19
All: there are over 3000 comments in this thread. If you want to read them all, click More at the bottom of each page, or like this:

https://news.ycombinator.com/item?id=41002195&p=2

https://news.ycombinator.com/item?id=41002195&p=3

https://news.ycombinator.com/item?id=41002195&p=4 (...etc.)

uptownfunk

2024-07-19
The correct solution is to have IT force push updates only when they deem fit (after they have tested internally on some ghost machines).

rkagerer

2024-07-19
This is why I don't like fully automatic updates. I prefer having control over the "deploy" button for the ability to time it when I can tolerate downtime. In mission-critical production systems all updates should go through test staging pipelines that my team controls, not a vendor.

Broken updates have cause far more havoc than being a few hours or even days late on a so-called critical patch.

TowerTall

2024-07-19
Microsoft to give Vista kernel access to security firms (2006)

https://arstechnica.com/information-technology/2006/10/7998/

256_

2024-07-19
I used to laugh at Dijkstra's idea that all code should be mathematically proven correct. I thought of it as a laughable idea from yet another out-of-touch mathematician.

I suppose true genius is seldom understood within someone's lifetime.

beardyw

2024-07-19
Microsoft to give Vista kernel access to security firms (2006) | Hacker News

https://news.ycombinator.com/item?id=41014426

cranberryturkey

2024-07-19
Someone accidentally shut down the planet with a code push -- rofl

kwhitefoot

2024-07-19
Can someone explain to me why such systems need anti-virus in the first place?

Windows has pretty good facilities for locking down the system so that ordinary users, even those with local admin rights, cannot run or install unauthorised code so if nothing can get in why would the system need checking for viruses?

So why do most companies not lock down their machines?

tomthumb

2024-07-19
Someone on X has shared the kernel stack trace of the crash

The faulting driver in the stack trace was csagent.sys.

Now, Crowdstrike has got two mini filter drivers registered with Microsoft (for signing and allocation of altitude).

1) csagent.sys - Altitude (321410) This altitude falls within the range for Anti-Virus filters. 2) im.sys - Altitude (80680) This altitude falls within the range for access control drivers.

So, it is clear that the driver causing the crash is their AV driver, csagent.sys.

The workaround that CrowdStrike has given is to delete C-00000291*.sys files from the directory: C:\Windows\System32\Drivers\CrowdStrike\

These files being suggested to be deleted are not driver files (.sys files) but probably some kind of virus definition database files.

The reason they name these files with the .sys extension is possibly to leverage Windows System File Checker tool's ability to restore back deleted system files.

This seems to be a workaround and the actual fix might be done in their driver, csagent.sys and the fix will be rolled out later.

Anyone having access a Falcon endpoint might see a change in the timestamp of the driver csagent.sys when the actual fix rolls out.

rldjbpin

2024-07-19
it is humbling (and lowkey reassuring?) to know that not all large players use the absolute cutting edge approaches in their workflow.

it seems and i hope that after all is said and done there is no major life-threatening consequence of this debacle. at the same time, heart goes out to the dev who pushed the troubling code. very easy to point at them or the team's processes, but we need to introspect at our own setup and also recognize that not all of us work in crucial systems like this.

attentive

2024-07-19
the funny thing it's often labeled "Microsoft IT outage" - theguardian.com as example

vlod

2024-07-19
I haven't heard ask this, but would this have happened on linux. Obviously not many people run virus s/w, but would something similar like this have caused this?

Are there any protections to prevent repeating reboots?

peanut-walrus

2024-07-19
I feel we are at a point in the evolution of our digital society where relying on general purpose OS-s is just not an option when moving forward.

omnee

2024-07-19
This event raises the question: What is the liability of Crowdstrike given its erroneous update caused the meltdown, and the impact certainly had negative personal or business outcomes globally.

See for example 6000 flights cancelled or the many statements posted here regarding it negatively impacting healthcare and other businesses.

hahamaster

2024-07-19
Someone at CrowdStrike got fired for this. I'm curious to know who this person is.

ngneer

2024-07-19
The whole thing needs to be redesigned, so that antivirus and EDR solutions do not require such high privilege. We need a high-performance way for a possibly privileged service to export all the data that is needed for a decision, and then let the AV/EDR do its thing. If the AV/EDR is broken by an update, fine. At least the system won't go down.

ngneer

2024-07-19
Have we failed as an industry?

ngneer

2024-07-19
Allow me to give a different, information-theoretic, perspective. How much damage can flipping a single bit cause? How much damage can altering two bits cause?

The fanout is a robustness measure on systems. If we can control the fanout we increase reliability. If all it takes is a handful of bits in a 3rd party update to kill IT infrastructure, we are doing it wrong.

tonymet

2024-07-19
Question : was this update delivered by Crowdstrike’s update agent or Windows Update ?

tartavull

2024-07-19
The front page of https://www.accel.com/

"Fail Fast. Evolve Faster"

mbrumlow

2024-07-19
> We have collaborated with Intel to remediate affected hosts remotely using Intel vPro and with Active Management Technology.

This worries me. Does this mean intel has access to remotely access my machine?!?!

ai4ever

2024-07-19
what would be funny is if crowdstrike demanded ransom from their castomars.

security is a great business - you play on people's fears, your product does not have to deliver the goods.

like the lock maker, you sell a lock, the thief breaks it, but it is not your problem, and you sell a bigger badder lock the next year which promptly gets broken.

as a business, you dont have any consequences for how your product works or doesnt work, what a great business to be in !!

meta-level

2024-07-19
Wow, CrowdStrike made it to #6 of all time HN threads by now.. https://hn.algolia.com/?q=

meetpateltech

2024-07-19
CrowdStrike’s faulty update crashed 8.5 million Windows devices, says Microsoft

https://www.theverge.com/2024/7/20/24202527/crowdstrike-micr...

Yawrehto

2024-07-19
Oh wow, this is #5 for all time already, beating out Steve Jobs.

retrocryptid

2024-07-19
My Commodore 64 never gave me a blue screen of death and my Atari ST never lapsed into a hardcore boot loop.

just sayin'

peter_d_sherman

2024-07-19
Greenspun's tenth rule:

"Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp."

(https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule)

Arthur C. Clarke's third law:

"Any sufficiently advanced technology is indistinguishable from magic."

(https://en.wikipedia.org/wiki/Clarke%27s_three_laws#:~:text=....)

Apparently we now have the following, as well:

"Any sufficiently bad software update is indistinguishable from a cyberattack…"

(https://x.com/leighhoneywell/status/1814278230704111792)

minhoryang

2024-07-19
Can we find an uptime(availability) graph for the CrowdStrike agent? Don't you think this graph should be included in the postmortem?

nurettin

2024-07-19
If I were a cloud vendor, I would provide a "CrowdStrike recovery" button which queues the recovery image and restores the system for the entire project. Why didn't hetzner, linode, DO, gcp, aws do something like this? Why leave people to their devices? Isn't this a basic application of centralization? It feels to me like this should be easier than managing your data center.