Reached: Milestone in Power Grid Optimization on World’s First Exascale Supercomputer

“It is a massive jump in terms of computational power,” Petra said. “We showed that the operations and planning of the power grid can be done under an exhaustive list of failures and weather-related scenarios. This computational problem may become even more relevant in the future, in the context of extreme climate events. We could use the software stack that ran on Frontier to minimize disruptions caused by hurricanes or wildfires, or to engineer the grid to be more resilient in the longer run under such scenarios, just to give an example.”

HiOp, also used by Lab engineers for design optimization, parallelizes optimization by using a combination of specialized linear algebra kernels and optimization decomposition algorithms. The latest version of HiOp contains several performance improvements and new linear algebra compression techniques that helped improve the speed of the open source software by a factor of 100 on the exascale machine’s GPUs over the course of the project.

The team “was able to utilize these GPUs increasingly better, surpassing CPUs in many instances, and that was quite an achievement given the sparse, graph-like nature of our computations,” Petra said.

The Frontier runs were validated by colleagues at PNNL using industry-standard tools, showing that the computed pre-contingency power setpoints drastically reduce the post-contingency outages with minimal increase in the operation cost. 

“The pressing question is, ‘What is the cost benefit of our HPC optimization software for grid operators?’” Petra said. 

The LLNL team has solved grid-operator problems in a previous project, the ARPA-E Grid Optimization Challenge 1 Competition, and obtained the best setpoints among all the participants. Petra added that “unfortunately, we were not able to do a cost-benefit analysis for the grid operator problems due to confidentiality restrictions. I would say that a 5% improvement in operations cost justifies a high-end parallel computer, while anything less than 1% improvement will likely require downsizing the scale of computations. But we really do not know at this moment.” 

Since the optimization software stack is open-source and lightweight, grid system operators could downsize the technology and incorporate it into their current practices on commodity HPC systems in a cost-effective manner.

Petra said the opportunity to be among the first teams to run on the world’s first exascale system in Frontier could not have been accomplished without the close collaboration with teams in ECP and support from the Oak Ridge Leadership Computing Facility. The ExaSGD project also included researchers from Argonne National Laboratory. 

“We’ve been working toward exascale for the last four years. We were lucky to face a class of problems with very rich parallelization opportunities, but we also correctly anticipated that we need to keep the communication pattern as simple as possible to avoid porting, scalability, and deployment bottlenecks later on exascale machines.” 

Jingyi “Frank” Wang, a computer scientist in the Uncertainty Quantification and Optimization Group in LLNL’s Center for Applied Scientific Computing, and Petra developed a new optimization algorithm that uses sophisticated mathematics to simplify the communication footprint on Frontier while maintaining good convergence properties, helping the team achieve the exascale feat. The LLNL team also included research engineer Ignacio Aravena Solis and computational mathematician Nai-Yuan Chiang. 
Petra, who also leads a Laboratory Directed Research and Development project in contact design optimization, said he hopes the team can engage more closely with the power industry grid stakeholders and draw in more HiOp users for optimization at LLNL, with the goal of parallelizing those computations and bringing them into the realm of HPC.

Jeremy Thomas is public information officer at the Lawrence Livermore National Laboratory (LLNL). The article was originally poste to the website of LLNL.