The Pentium Chip Error: A Miscalculation That Led to Reloadable Microcode

Publié dans Coder stories

14 mai 2020


The Pentium Chip Error: A Miscalculation That Led to Reloadable Microcode
Sooraj Shah

Freelance tech journalist

The year of 1994 was huge for the technology industry as we know it today, mostly because it was the year that Internet in a Box, one of the first commercially available internet-connection software packages, became available for sale to the public. But there were other developments, too: It was the year Sony released the PlayStation, the year Yahoo! was founded, and it was the year that an error in the then-new Intel Pentium processor made mainstream headlines.

The last event may not be looked back on with as much nostalgia as the others, but it changed the course of the industry, not just from a technological perspective, but from a PR point of view, too.

So what happened?

In 1994, many consumers were using Intel Pentium microprocessors, which made PCs faster. Intel had already become a big name in the technology world with its microprocessors, and it wanted to capitalize on this with a marketing campaign that is still remembered today—Intel Inside. The idea was to make people think about the components inside the PC that make it work so effectively, rather than the exterior or the software uploaded onto it.

However, the late Dr. Thomas Nicely, a mathematics professor from Lynchburg College (now the University of Lynchburg) in Virginia, found that the latest Intel Pentium processor, known as FDIV (the x86 assembly language acronym for floating point division), had an error—it did not accurately perform floating point mass on certain sets of numbers. Nicely had been working on a project in computational number theory when he realized that the Pentium floating point unit (FPU) was returning incorrect values for certain division operations.

For example, 1/824633702441.0 was being calculated incorrectly (all digits beyond the eighth significant digit were wrong). In an email sent to Intel, Nicely advised it could be verified by using compiled code, a spreadsheet such as Quattro Pro or Excel, or even the scientific mode on the Windows calculator “by computing (824633702441.0)(1/824633702441.0), which should equal 1 exactly (within some extremely small rounding error; in general, coprocessor results should contain 19 significant decimal digits). However, the Pentiums tested return 0.999999996274709702 for this calculation.”

He found another error in the calculation of x*(1/x) for most values of x in the interval 824633702418 <= x <= 824633702449, and throughout any interval obtained by multiplying or dividing the above interval by an integer power of 2. “The bug can also be observed by calculating 1/(1/x) for the above values of x. The Pentium FPU will fail to return the original x (in fact, it will often return a value exactly 3072 = 60x200 larger),” he wrote.

Nicely had tested various FDIV chips and tried to locate the issue by trial and error. Eventually he found that there was a fault with the chip’s FPU.

One of the first people that Nicely sent this information to was Andrew Schulman, an author of books about Windows, DOS, and Pentium processors. According to Schulman, his main contribution was taking Nicely’s bug report seriously, which apparently a lot of his other contacts had not.

“At the time, the technology community, including journalists, largely took the word of technology companies at face value,” Schulman tells Behind the Code. “There was much less sense then that bugs were important and newsworthy, and that the vendor of a product might know less than outsiders about some aspects of its products.”

Schulman sent a copy of the email to Richard Smith, co-founder of the software company Phar Lap, to ask if he could run a test on the Pentium system, as he didn’t own one. Smith’s team confirmed the bug with the Windows calculator and a short C test program. Smith then posted the email on the Canopus forum of CompuServe and sought to get feedback from others who had the chip: Within a day he had received 10 confirmations of the bug on a variety of systems.

“Dr. Nicely’s tests to show the presence of the bug, in a way others could reliably confirm it, are a nice illustration of how complex products can be studied,” says Schulman.

Smith’s post about the bug was likely the first public disclosure of Nicely’s discovery of the Pentium FDIV bug—Intel had also been notified by this point—and a slew of media coverage followed. It catapulted the issue from being something that wouldn’t have had much of an impact to becoming the first hardware fault to make headlines worldwide.

“Historically, what would happen if there was a problem in the chip design is the vendor would have worked with their end customers and it would never have been exposed to a wider audience such as the mass consumer audience that use PCs and now tablet devices,” says Alan Priestley, VP Analyst at the global research and advisory firm Gartner. “The FVID bug was the first time that a fault became apparent.”

Back in 1994, there was not a vast use of x86-based processes inside data centers, as most were still running on mainframes. “At that point, x86s had not made a penetration to the data center,” explains Priestley. “Most of the affected systems were big personal computers such as IBM PCs and Compaqs, so they weren’t commonly deployed or used by a large number of people. But businesses were starting to use PCs, and that was the main impact on the community.”

The media coverage led to something approaching pandemonium: People thought that planes could fall out of the sky or that the national grid was going to shut down. Either way, they had been sold a vision that Intel’s premium chips would not have any issues on the PCs they were buying. It was a disaster for Intel.

The reality

The bug, which was caused by a circuit-design error, actually affected only a certain number of applications in certain pieces of software that leveraged FPUs. “It didn’t make any difference to email, word processing, or core applications,” says Priestly. Planes did not fall out of the sky and the national grid did not shut down.

The error would have had a negligible impact, even if Intel hadn’t acted. The reality is that, even back in 1994, chips were incredibly complicated to design and produce. In fact, Priestley suggests that any company that believes it can design a chip that has no logical errors in it would be wide of the mark—or it would be creating incredibly simplistic chips. “Chips are increasingly complex in design, and errors in chip design are not new,” he says.

In the event, Intel didn’t actually fix the issue with the chip—it was more of a rip-and-replace effort to eradicate the problem. “The chips didn’t have reloadable microcode, and so they had to pull the processor out and put a new processor in place, and this was relatively simple. It didn’t take a brain surgeon to replace a chip,” says Priestley.

A big change that has happened as a result of the bug is that future generations of processors were designed with microcode that could be reloaded.

“The way a microprocessor works today is that, when it starts up, the first thing it does is read some microcode from the BIOS ROM that contains any patches or updates to the chip and it installs them and then starts running. Back in those days, there was no ability to update devices,” says Priestley.

This update also mitigated the damage that could have been caused through the Spectre and Meltdown vulnerabilities that were to come decades later. In addition, FDIV probably prepared Intel to react better to those vulnerabilities from a PR perspective. Perhaps most importantly, though, it has encouraged outsiders to come forward when they find security holes in products or services.

“We now take this for granted, largely as a result of security holes regularly found by outsiders in major products, even products for which open source is not available, where the outsiders perform reverse engineering,” says Schulman. “Testing like Dr. Nicely’s can turn up important information about a product that is not known—or if known, not disclosed—by the creators of the products.”

This article is part of Behind the Code, the media for developers, by developers. Discover more articles and videos by visiting Behind the Code!

Want to contribute? Get published!

Follow us on Twitter to stay tuned!

Illustration by Blok

Les thématiques abordées