close
close

Time to rebuild Microsoft Windows?


Time to rebuild Microsoft Windows?

Endpoint Detection & Response (EDR), Endpoint Security, Incident & Breach Response

Global power outage sparks calls for ‘less invasive access’ to essential functions

Mathew J. Schwartz (euroinfosec) •
6 September 2024

Time to rebuild Microsoft Windows?
Why didn’t Microsoft stop CrowdStrike from causing a global computer outage on July 19, 2024? (Image: Shutterstock)

Even before it was all over – while millions of computers were still in the endless “Blue Screen of Death” cycle of uselessness in mid-July – the question arose: How could this happen?

See also: CISO Insights 2024: Navigating the cybersecurity vortex

The immediate trigger was endpoint detection and response vendor CrowdStrike, which has direct access to the Microsoft operating system kernel, rolling out an update that went completely wrong, causing outages affecting airports, banks and hospitals around the world.

Government agencies, security experts and vendors have called for several areas for review, including the resilience of Windows operating systems, deployment strategies for third-party software updates and the deep access to the operating system that many current security tools require.

The July 19 incident paralyzed 8.5 million Windows computers. The direct losses from the outages are estimated at over $5.4 billion.

CrowdStrike released a root cause analysis of the outage and said it is already making numerous changes to prevent a recurrence, including increasing internal testing procedures and rolling out software updates in batches (see: CrowdStrike introduces security measures to mitigate the impact of outages).

While attention is focused on CrowdStrike, experts say Microsoft is also to blame for the incident. Windows failed to prevent a faulty software update from triggering an endless reboot cycle.

Redmond has indicated plans to address the issue. In a July 25 blog post, John Cable, director of program management at Microsoft, said, “Windows must prioritize change and innovation in the area of ​​end-to-end resiliency.”

On Tuesday, Microsoft plans to host a private, closed-door summit with government and industry representatives at its headquarters in Redmond, Washington, to discuss secure deployment strategies and developing more resilient approaches.


The summit is expected to “lead to next steps on short- and long-term actions and initiatives with our shared goal of improved security and resilience,” said Aidan Marcuss, corporate vice president of Microsoft Windows and Devices, in a blog post.


Resilience is easy to define but difficult to achieve. As cybersecurity and risk management expert Dan Geer said in a keynote speech at the Security of Things Forum in 2014, “The real source of risk is dependency, particularly dependency on the expectation of a stable system state.”


Geer said anything new added to a previously stable system – such as security software updates – brings with it compromises and increases the tension between resilience and fragility.


Competition agreement


For Windows, these issues are not just technical questions. They touch on long-standing competition and consumer protection concerns about Microsoft, including its browser and software. To address these issues, the company entered into an agreement with the European Commission in 2009 in which it committed to giving third-party products with which it competes equal access to Windows’ internal components, including in terms of security.


Many types of security software, including advanced detection and response (XDR) tools, use kernel-level access to obtain otherwise unavailable information about the system. “This information is incredibly valuable for tracking attacker activity,” says a recent report from IT consulting firm Forrester Research.


Third-party access to the Windows kernel remains a prerequisite for many enterprise-focused security tools to function properly, and removing this feature would “ultimately result in much higher costs” due to the corresponding reduction in defensive capabilities, says JJ Guy, CEO of Sevco Security, a maker of IT asset management software.


CrowdStrike has stressed that it still requires kernel-level access and loading the tools as early as possible in the Windows boot cycle. “Products such as firmware analysis or device control would not be possible without this design,” the company said in its technical analysis after the incident. “Microsoft directly supports and encourages such features in security products through the Early Launch Anti Malware Architecture (ELAM), which was specifically built into Windows 8.1 to enable such monitoring and enforcement measures.”


But kernel-level access is not the same across all endpoint detection and response tools, says British cybersecurity expert Kevin Beaumont. “Some of these EDR vendors, including CrowdStrike, release updates in a way that allows them to execute detection code in an insecure way from the kernel, which can trigger blue screens,” he said in a blog post.


Concerns about equal access


In the name of resilience, Microsoft could propose to block direct third-party access to the Windows kernel entirely and require that any tools that want to use it work with an intermediary software developed by Microsoft, such as Windows Defender, instead. But experts say this approach is a lost cause for regulators on equal access grounds.


“Unless Microsoft is willing to remove Defender from the kernel space, regulators have a legitimate position,” Forrester said.


As a number of outages over the years have shown, most recently CrowdStrike, kernel-level access poses risks. This is in part because some types of XDR software boot before the operating system to prevent attackers from disabling it. If something goes wrong, it can prevent the operating system from loading properly, even to the point where it has internet access and could then be repaired automatically or remotely. This is why it was so difficult for some organizations to quickly resolve the outage triggered by CrowdStrike. Many IT teams had to physically access the affected systems, sometimes in remote locations.


Following the outages caused by CrowdStrike, Forrester now recommends that IT teams in critical systems “limit the number of endpoint management agents that have access to the kernel and instead opt for agentless forms of management – that is, API-based Windows management – or software without kernel components.”


Can the architecture of Windows be changed?


Given these concerns, isn’t it time to redesign Windows so that all security software, including Microsoft’s, no longer requires access to the kernel level?


The German cybersecurity authority plans to hold a conference later this year aimed at obtaining commitments from security providers to move in this direction. The Federal Office for Information Security (BSI) requires Microsoft, CrowdStrike and all manufacturers of comparable security software to at least ensure “that the respective operating system can always be started at least in safe mode, even in the event of serious disruptions.”


In the longer term, the BSI is calling for changes to Windows that “offer the same functionality and level of protection as before, but require less invasive permissions for the operating system” and thus “minimize the impact of software errors.”


“It is not acceptable to run these tools in kernel mode with all the access options that exist today,” Thomas Caspers, director general for technology strategy at the BSI, told the Wall Street Journal.



Reworking Windows to eliminate kernel-level access would be a massive undertaking, but there is precedent for doing so. Even “mainframes and minicomputers” in the early 1980s could “automatically and effectively” protect kernel memory by “catching and handling the bug,” says cybersecurity consultant and former CISO Ken Stephens.


Recently, Linux has offered this capability through the Extended Berkeley Packet Filter (eBPF), which allows kernel code to be executed in a safe area while keeping it isolated in case something goes wrong. When Apple switched to its own chip, the company used this as an opportunity to wall off the macOS kernel.


“It’s a little painful, but it’s a necessary development,” Neil MacDonald, a cybersecurity analyst at Gartner, told the Wall Street Journal.


Several security vendors have signaled that they are willing to forgo kernel-level access.


Tomer Weingarten, CEO of SentinelOne, which competes with CrowdStrike and Microsoft, told Information Security Media Group: “We would fully support an exit from the kernel as soon as possible” if Microsoft could come up with a compelling alternative, including “the right interfaces to achieve the same level of visibility.”


He said the kernel is already ready for both Linux and macOS systems.

Leave a Reply

Your email address will not be published. Required fields are marked *