The 10th anniversary of the Google search incident that incorrectly classified the entire World Wide Web as malware is another opportunity to reflect upon computer system defects, human error, process flaws, organizational mistakes, and the best principles and practices for solution in the IT industry. In this blog and my upcoming book, Bugs: A Short History of Computer System Failure, I will chronicle some important system failures in the past and discuss ideas for improving the future of system quality. As information technology becomes increasingly woven into Life, the quality of hardware and software impacts our commerce, health, infrastructure, military, politics, science, security, and transportation. The Big Idea is that we have no choice but to get better at delivering technology solutions because our lives depend on it.
On 31 January 2009, a Google engineer manually updated its search engine’s blacklist of sites classified as malware to include the URL of ‘/’; this change meant that every organic Google search result for the entire World Wide Web (WWW or Web) was incorrectly classified as malware. Fortunately, Google’s on-call Site Reliability Engineering (SRE) team quickly identified the problem and fixed it within an hour. Besides affecting organic search results, the system error also impacted Google’s email service, GMail, in which users reported genuine messages routed to spam folders; interestingly, advertised or promoted search results were not affected by the error. This essay explores some of the business and technology factors that contributed to the system defect, the incident’s timely resolution, and the wider implications for the Web, search, and malware classification.