Lessons from the Microsoft outage

Kirellos Abdelmalak, Tuesday 10 Sep 2024

The Microsoft outage that almost paralysed the global economy has drawn renewed attention to the pros and cons of the world’s growing reliance on technology, writes Kirellos Abdelmalak

The Microsoft outage

 

The world was hit by a major technological shock on 19 July, when many people around the globe were surprised by the disruption of flights, hospitals, and credit card services, among other vital infrastructure.

The cause was attributed to a corrupted update issued by Microsoft’s security company CrowdStrike, which caused computers running the Windows operating system to crash. The BBC website said that the crash was probably the largest-ever cyber-event, eclipsing all previous hacks and outages.

Another outage occurred in Microsoft services on 30 July fewer than two weeks after the first, and this time Microsoft said that the outage, which affected Microsoft Azure, was due to a cyber-attack (DDoS attack). There were many complaints of users being unable to access Microsoft services including Microsoft 365 products such as Office, Outlook, and Azure.

An update on the website of the Microsoft Azure cloud computing platform said that “while the initial trigger event was a Distributed Denial-of-Service (DDoS) attack... initial investigations suggest that an error in the implementation of our defences amplified the impact of the attack rather than mitigating it.”

Such outages raise concerns about the ability of global technology companies and others to withstand various technological challenges and also raise questions about the ability of Microsoft and its applications to solve technological problems in the future and provide essential technological needs in the light of the huge global reliance on services.

The Microsoft Windows outage was a wake-up call to the IT world that the digital infrastructure we all rely on is fragile and that back-up plans are needed to meet future challenges. As a result, our reliance on technology in general has become the focus of debate.

“As technology continues to play an increasingly central role in all aspects of our lives, it is crucial that organisations take proactive steps to safeguard their IT systems, minimise vulnerabilities, and ensure continuity of operations in the face of unforeseen challenges,” said Monty Excel, an expert in data analytics on Meedium.com.

Aside from IT issues, similar concerns have been raised about the world’s growing reliance on artificial intelligence (AI), with this too having negative impacts on humanity in the future if it replaces humans in many domains of life.

“By 2030, do you think it is most likely that advancing AI and related technology systems will enhance human capacities and empower them,” asked a recent study by the US-based non-profit Pew Research Centre entitled “Artificial Intelligence and the Future of Humans”.

The study concluded that whereas 63 per cent of respondents “said they are hopeful that most individuals will be mostly better off in 2030, 37 per cent said people will not be better off.”

“Experts say the rise of artificial intelligence will make most people better off over the next decade, but many have concerns about how advances in AI will affect what it means to be human, to be productive, and to exercise free will,” the study wrote.

Optimists identified in the study said that “smart” systems in communities, vehicles, buildings and utilities, farms and business processes will save time, money, and lives and will offer opportunities for individuals to enjoy a more-customised future.

 “Many focused their optimistic remarks on healthcare and the many possible applications of AI in diagnosing and treating patients or helping senior citizens live fuller and healthier lives,” the study elaborated.

Critics, however, expressed concerns about the long-term impact of these new tools on the essential elements of being human.

Erik Brynjolfsson, director of the MIT Initiative on the Digital Economy in the US and author of “Machine, Platform, Crowd: Harnessing Our Digital Future”, was one of those critics.

“AI and related technologies have already achieved superhuman performance in many areas, and there is little doubt that their capabilities will improve, probably very significantly, by 2030,” Brynjolfsson said.

“We can virtually eliminate global poverty, massively reduce disease, and provide better education to almost everyone on the planet.”

But Brynjolfsson still feared that “AI and ML [machine learning] can also be used to increasingly concentrate wealth and power, leaving many people behind, and to create even more horrifying weapons.”

“Neither outcome is inevitable, so the right question is not ‘what will happen?’ but ‘what will we choose to do?’ We need to work aggressively to make sure technology matches our values,” Brynjolfsson said.

“This can and must be done at all levels, from government, to business, to academia, and to individual choices.”

 

A LOOK AT THE CRISIS: The outage that almost caused the world to halt perhaps poses similar questions of how we choose to use technology in the future to avoid such catastrophes.

“Technology has spread to all aspects of life with the digital transformation, and it is our duty now to look into how to manage this technology so that its negatives are avoided,” Mohamed Kholeif, an innovation and digital transformation consultant, told Al-Ahram Weekly.

“Any technological product has its positive and negative sides, and our role is to reduce the negatives as much as possible. We cannot dispense with technology or reduce our reliance on it. We must use it more wisely and more rationally, and we must have good risk and crisis management.”

According to Kholeif, such problems as that which hit CrowdStrike “will not completely end, but we can work to reduce and limit them, as any system is exposed to malfunctions and unavailability at any time, and its presence cannot be guaranteed 100 per cent of the time.”

Because of the 19 July outage, CrowdStrike CEO George Kurtz has been called to testify before the US Congress about the company’s role in sparking the widespread IT outage that grounded flights, shut down banks and hospital systems, and impacted services around the world. US lawmakers who control the House Homeland Security Committee have said they want answers soon.

In a letter to Kurtz, two US Congressmen, Mark E Green of Tennessee and Andrew Garbarino of New York, said that “while we appreciate CrowdStrike’s response and coordination with stakeholders, we cannot ignore the magnitude of this incident, which some have claimed is the largest IT outage in history.”

Microsoft estimated via its official blog and through an article written by David Weston, vice president at the firm, that the number of devices affected by the technological glitch was about 8.5 million.

Kurtz explained that the outage was due to a defect in CrowdStrike’s Falcon Content Update for Windows. He said that Mac and Linux were not impacted and that the defect was not due to a cyber-attack.

In a more in-depth explanation of what happened, CrowdStrike released a 12-page report on the incident on 6 August that said that “on 19 July, a Rapid Response Content update was delivered to certain Windows hosts, evolving the new capability first released in February 2024. The sensor expected 20 input fields, while the update provided 21 input fields. In this instance, the mismatch resulted in an out-of-bounds memory read, causing a system crash.”

“Our analysis, together with a third-party review, confirmed that this bug is not exploitable by a threat actor.”

The report revealed the actions that have been taken and will be taken by CrowdStrike to address the defect and its consequences.

In the company’s words, they include “update Content Configuration System test procedures, add additional deployment layers and acceptance checks for the Content Configuration System, provide customers with additional control over the deployment of Rapid Response Content updates, prevent the creation of problematic Channel 291 files, implement additional checks in the Content Validator, enhance bounds checking in the Content Interpreter for Rapid Response Content in Channel File 291, engage two independent third-party software security vendors to conduct further review of the Falcon sensor code and end-to-end quality control and release processes.”

 

ECONOMIC SHOCKS: According to the website Euronews, after the two outages that occurred in July, Microsoft shares slipped by more than three per cent due to the disappointing growth pace in Azure cloud services, which is the main sector supporting Microsoft against its competitors in the field of AI.

In addition to Microsoft’s increased spending on data-centre construction, investors are concerned about profit margins.

However, Microsoft has reported earnings per share of $2.95 on revenue of $64.7 billion, surpassing analysts’ estimates of $2.94 and $64.5 billion, the site said. The company also reported a 15 per cent from a year ago increase in sales revenue, slowing from 17 per cent growth in the March quarter. Net income was $22 billion, up 10 per cent from the same quarter last year.

However, the economic impact of the outages may not stop there. The US airline Delta Air Lines has entered into a dispute with Microsoft after the outage on 19 July, for example, since it caused its services to collapse and meant it had to cancel thousands of flights costing it hundreds of millions of dollars.

It has now taken steps to obtain compensation for at least some of its losses from both Microsoft and CrowdStrike.

Delta Air Lines has selected prominent US attorney David Boies to seek compensation, and according to a note published by Savanthi Syth, airline analyst for New York financial firm Raymond James, its costs from the outage are estimated at between $325 million and $475 million.

A report published on the CNN website, “Delta hires Powerful Lawyer David Boies’ firm to seek Compensation from CrowdStrike and Microsoft for its Outage,” said that “unable to find the pilots and flight attendants it needed, Delta was forced to cancel 6,300 flights across its mainline operations and its Endeavor Air feeder airline, which flies under the Delta Connection name, according to flight tracking service FlightAware.”

“That represented about 30 per cent of its schedule. Another 9,300 of its flights were delayed, representing more than two-thirds of the flights it was able to complete in those five days. Delayed flights at the airline stayed elevated through the rest of the week.”

Delta CEO Ed Bastian criticised CrowdStrike in an interview with the US network CNBC, saying that its corrupted update had cost the company about $500 million and noting that it is seeking compensation from Microsoft and CrowdStrike.

Microsoft claimed that Delta had not updated its IT infrastructure, though a Delta spokesperson replied that “Delta has a long track record of investing in safe, reliable, and elevated service for our customers and employees. Since 2016, Delta has invested billions of dollars in IT capital expenditures, in addition to the billions spent annually in IT operating costs.”

According to the US newspaper USA Today, some Delta Air Lines passengers have filed a class action lawsuit against the airline, in which they have complained that they were denied full refunds after their flights were delayed or cancelled following the outage that occurred in July.

The passengers’ complaint includes that they received partial refunds and signed waivers against pursuing further legal claims. The passengers also requested compensation for the cost and inconvenience of rebooking with other airlines, hotels and food, and from being separated from their luggage.

 

WHAT’S NEXT? Regarding the possibility of such outages being repeated in the future, Kholeif said that “Microsoft has learned its lesson.”

He suggested that better digital governance should be implemented regarding updates in general, so that software updates are carried out with greater care and are tested on several levels to prevent such failures. Back-up systems could also be introduced as alternatives to provide services.

Kholeif said that he does not believe that the Microsoft outage that occurred on 19 July will permanently damage the IT industry, and he added that new companies will emerge that specialise in providing alternatives in the event of this type of software problem.

He said that “no technological or non-technological system is available at all times” and added that Microsoft will likely “reconsider its policies of relying on other external companies, which may lead to its acquisition of companies in cybersecurity, digital updates, and project management on a large scale.”

Regarding Egypt not having been affected by the Microsoft outage, Kholeif explained that services in Egypt, especially government services, are not dependent on Microsoft and CrowdStrike. They rely on Egyptian government applications and, like in most Arab countries, also rely on data centres located within the country and under its full control to store data and applications.

He suggested that various measures be taken to avoid technological problems in future, including the wise management of technology, not importing all technology from abroad, not relying on technology service providers from abroad, and making sure that critical infrastructure applications are located in Egyptian cloud computing centres under the control of the Egyptian government.

Those who work in such centres need not be government employees, however, since they can also be employed by private-sector companies or start-ups, he said, with the important thing being that data are protected on a national level.

He also stressed the importance of spreading digital awareness more widely, as well as providing further training in update processes and technological services to citizens. This will require further training and investment in order to ensure that problems do not occur in future in this field.

Technology is indispensable and irreversible, he said, and it is essential for companies and the country to be able to compete internationally and for products and services to be made and provided efficiently.  

The writer is a journalist and researcher in political science.

* A version of this article appears in print in the 12 September, 2024 edition of Al-Ahram Weekly

Short link: