How IBM’s Supercomputer Summit Is Fast-tracking COVID-19 Research

Publié dans Coder stories

08 avr. 2020


How IBM’s Supercomputer Summit Is Fast-tracking COVID-19 Research
Jun Wu

Content writer with a background in programming and statistics.

As the coronavirus, or COVID-19, pandemic continues to spread around the world, the global response is becoming more urgent. At Oak Ridge National Lab (ORNL), Tennessee, researchers from the facility and the University of Tennessee (UT), Knoxville, are using IBM’s supercomputer Summit to aid them in the race to find treatment to control the pandemic. When there’s no time to waste and research has to pay off right away, the power of supercomputers is crucial.

Power matters

Any treatment for COVID-19 must be approved by regulatory agencies around the world—the Food and Drug Administration (FDA) in the US, for example. So to fast-track the search for treatment, researchers have been looking for small molecules in FDA-approved drugs and natural products that may help to treat COVID-19. This involves simulating thousands of molecules and how they may interfere with the virus. With traditional computing, running through complex simulations on a large number of molecules simply isn’t possible.

However, IBM’s Summit is a supercomputer that was developed specifically to handle this type of work. It’s more powerful than 1 million high-end laptops and can perform 200 quadrillion (peta-) floating point operations per second (flops). “If every person on Earth completed one calculation per second, it would take 305 days to do what Summit can do in one second,” explains David Turek, Vice President of Technical Computing for IBM Cognitive Systems. Since the Summit system debuted as the world’s most powerful supercomputer in 2018, it has cemented its title by driving groundbreaking research that has included helping to understand the origins of the universe and the opioid crisis, and showing how humans would be able to land on Mars.

IBM’s Power9 CPU is the brain at the center of Summit. It was built from the ground up for the most data-intensive workloads such as high performance computing (HPC) and machine learning. It uses advanced interconnects that specialize in moving data throughout the system and eliminating bottlenecks that can slow down overall performance. It can also move data up to 9.5 times faster than competing architectures.

There’s nothing like real-world applications to test a supercomputer’s performance. For the past 20 years, the supercomputing industry has relied on the benchmark known as LINPACK to measure computer performance. This measures how fast a computer solves a dense n by n system of linear equations, which is a common task in engineering. But as Turek says, “Even the creators of LINPACK acknowledge that it is not representative of the real-world workloads anymore. Imagine if the only barometer of a car’s performance were how fast it can go on a ¼-mile straight line. This isn’t really representative of how the majority of car owners use their vehicles. Similarly, we should be looking at new ways to measure performance that provides a meaningful representation of real work, including traditional HPC as well as emerging workloads like deep learning and machine learning.”

In the development of any technology, be it machine-learning algorithms or supercomputers, it’s important to set real-world performance as the benchmark and innovate from real-world use cases.

The COVID-19 use case

When the pandemic took hold, Summit was made available to researchers at ORNL. “They were able to achieve a result in a matter of days, whereas it would have taken months using a high-end laptop or a home PC,” says Turek. “Taking it further, if they had not used any computing resources and instead worked out all the equations by hand, it would have taken years, if not a lifetime.”

The researchers have been screening well-characterized small molecules, such as FDA or other regulatory-agency-approved drugs or natural products, and identifying the ones that may bind and interfere with the functions of the Sars CoV-2 protein, which is the one responsible for latching the virus onto cells.

The main difficulty comes from the fact that molecules are not flat: They are 3D structures. So the researchers have had to perform simulations of binding and docking from different geometries. And not only have they had to simulate as many geometries as possible for the test, they have also had to run a special type of molecular simulation called temperature replica exchange molecular dynamics (T-REMD), which requires a lot of computational power. It is essentially a way of coupling tens to hundreds of molecular simulations to increase the sampling efficiency of the simulations. Only then can the researchers see all the possible docking geometries. As Micholas D. Smith, a postdoctoral fellow at the UT/ORNL Center for Molecular Biophysics and member of the research team, says, “Getting a lot of geometries to run the docking/binding calculations is important. If you only have one pocket geometry to test, it is likely that you will miss molecules that may bind well in nature but just not for the one geometry used.”

Without the power of a supercomputer such as Summit, these simulations would have taken a long time. “Summit allowed us to perform special ‘enhanced sampling’ simulations rapidly for the S protein: ACE2 interface to generate input for our docking studies, which would have taken weeks or months,” continues Smith.

Interestingly, the project didn’t require Smith to learn to program on Summit’s graphics processing unit (GPU)—he used the Gromacs molecular dynamics simulation package. “The wonderful developers for Gromacs have done all of the ‘heavy lifting’ for the GPU programming for me,” he explains. “So for this project, I didn’t need to write any GPU code myself, just some post-processing scripts with AWK.”

As the coronavirus pandemic continues, it is becoming especially important for the researchers at ORNL to optimize their workflow and design unique techniques to efficiently simulate events that may happen. As a part of the ongoing work, the team is studying multiple viral protein targets that the virus may use for replication and function to figure out how to disrupt any and all of these proteins at once. “Without Summit, we would need to focus on one protein at a time, and this would slow our progress,” says Smith. “With Summit, our workflow is greatly accelerated.”

Using the supercomputer, the team has been able to identify 77 promising compounds that could inform a treatment of COVID-19 from a pool of 8,000.

Supercomputing: The future of AI and scientific research

In the near future, the development of supercomputers will coincide with the development of AI algorithms. The next generation of machine learning and deep learning algorithms can only become more accurate when learning from large amounts of data sets. According to the global market intelligence firm IDC, our digital universe doubles in size every two years: It predicts our worldwide data will have reached 175 zettabytes by 2025, which points to the convergence of traditional AI and HPC.

As algorithms become more data hungry, as seen with some deep learning models, so the need for computing architecture to adapt grows. In fall 2019, IBM donated an $11.6 million computer cluster, Satori, to MIT, modeled after the architecture of Summit. Josh McDermott, an Associate Professor at MIT’s Department of Brain and Cognitive Sciences, had been using Summit earlier in the year to develop a better hearing aid. His research into the human sensory system to model what we can see and touch calls for computer systems that are built on the power of the supercomputer. And preparing code to run on Summit is no small feat—McDermott’s team spent hours on the task—but Satori wil help researchers with that so that more ambitious projects can be run on Summit.

However, most researchers do not have access to supercomputers and must make use of the technology currently available on their personal PCs and smartphones and in the Cloud. At Georgia Tech last year, Elizabeth Cherry, Associate Professor of Computational Science and Engineering, worked with Abouzar Kaboudian, a research scientist, and Flavio Fenton, Professor of physics, to harness the power of GPUs to perform cardiac dynamics modeling over the web. This makes it possible for anyone—hospitals, students, and so on—to build real-time cardiac dynamic models on their smartphone or over the web.

Using WebGL, they developed a package that repurposed the graphics card NVIDIA TITAN RTX to leverage the 4,608 cores it already has. “With this new library, we have been able to take large-scale simulations of heart dynamics that previously could be run only on supercomputers and move them to local PCs, where we can run them and visualize results in real time,” Cherry tells Behind the Code. “Our main focus has been studying cardiac arrhythmias, but our approach can be applied to many similar types of problems.” Since this project, the team has used it on a number of others. “We have applied this library to solve other large-scale problems, such as crystal growth and fluid flow,” says Kaboudian. “And since it is ideal for reaction-diffusion problems, such as the spread of disease, this methodology would be ideal for simulations and studies of COVID-19 using only local PCs.”

But even though the power of the GPUs will continue to accelerate on personal PCs and computers distributed in the Cloud architecture, for enterprises and large-scale research projects that need dedicated simulations that involve building large-scale complex models on top of vast stores of historical and real-time data, the power of a supercomputer like IBM’s Summit is still the key. Its performance is on a different scale. The power of its high-performance architecture allows it to be significantly faster than competing architectures—remember those 200 petaflops it can perform.

In climate research, modeling the world over the past 40 years is a complex and vast effort. These models need to account for, among many other things, the forces that are causing Antarctica’s ice to melt and how vegetation affects temperature and moisture. Impacts are often not steady accelerations, rather there are tipping points in the models that can trigger fast global changes, such as a loss of sea ice. Being able to capture all the forces and simulate events at a global and granular scale calls for unusually complex models, which have grown from requiring billions of calculations to quintillions of calculations. Only supercomputers such as IBM’s Summit can perform quintillions of calculations in a few seconds.

As supercomputing power continues to progress, the next generation is exascale computing, or computer systems capable of performing exaFLOPS, or a quintillion calculations per second, which is the estimated processing power of the human brain. Achieving such systems is the target of programs such as the Human Brain Project.

With the development of HPC becoming central to the development of AI, many countries are currently involved in scaling and optimizing supercomputing applications. The Partnership for Advanced Computing in Europe aims to create a persistent pan-European supercomputer infrastructure. In the US and China, research institutions, along with government-funded projects, such as the Defense Advanced Research Projects Agency in the US and the National Offshore Oil Corp in China, are already using the most advanced supercomputers for their projects.

In a world that is set to become more and more data centric, and with ever-increasing innovation in the application of large-scale computing, the need for supercomputers is becoming greater.

This article is part of Behind the Code, the media for developers, by developers. Discover more articles and videos by visiting Behind the Code!

Want to contribute? Get published!

Follow us on Twitter to stay tuned!

Illustration by Catherine Pearson

Les thématiques abordées