EPRI Publishes First Electric Sector Benchmarking Results of Public LLMs
Thursday, January 22, 2026
PALO ALTO, Calif., Dec. 9, 2025 /PRNewswire/ -- Today, EPRI released first-of-its-kind, domain-specific benchmarking results for the electric power sector. This initial application included multiple-choice and open-ended questions rooted in real-world utility topics, providing a more realistic view of how large language models (LLM) perform. Results indicate expert oversight remains imperative, especially with open-ended questions, which could result in less than 50% accuracy in some cases.
Many existing benchmarks assess broad academic knowledge, such as math, science, and coding, and may not capture the operational and contextual complexity of real-world utility environments. Benchmarking with electric power-specific questions, such as generation and transmission and distribution asset-related inquiries, helps assess how well LLMs understand and respond to technical, regulatory, and operational questions that utilities face.
"As utilities integrate AI into power system planning and operations, this benchmarking establishes a critical foundation for evaluating domain-specific tools and models. Accuracy is paramount, as errors can lead to significant operational and reliability consequences," said EPRI Vice President of AI Transformation and Chief AI Officer Remi Raphael. "Independent benchmarking by EPRI ensures the utility industry can trust and act on unbiased, credible insight."
Key takeaways from EPRI's initial benchmarking report included:
-- Open-ended questions exposed a reliability gap. When the same questions
were asked in open-ended form instead of multiple-choice questions
(MCQs), average accuracy dropped on average by 27 percentage points. On
expert-level questions, top models only scored between 46-71%.
-- MCQs provide a strong but incomplete baseline. On EPRI's MCQs, leading
frontier models scored 83-86%, broadly consistent with their performance
on external math and science benchmarks, but these scores benefit from
the structure of MCQs.
-- Open-weight models are closing the gap. These are LLMs whose trained
parameters -- known as weights -- are publicly available. While
typically one generation behind proprietary frontier systems, they are
rapidly improving. Their ability to be self-hosted can give utilities
valuable deployment flexibility.
-- Web search modestly improves accuracy. Allowing models to search the web
boosted scores slightly (2-4%), while also introducing the risk of
retrieving irrelevant or misleading information.
EPRI utilized a dataset comprising more than 2,100 questions and answers, generated by 94 power sector experts, drawing from publicly available sources, including the institute's reports covering 35 power sector topics. The benchmarking used three phases to test capabilities, with reproducibility on multiple LLMs, including GPT-5, Grok 4, and Gemini 2.5 Pro. Phase 1 measured model knowledge through multiple-choice questions, phase two repeated tests with web search, and phase three assessed open-ended responses using both knowledge and search. Each phase included three runs per model, with confidence intervals reported to capture variability.
The effort stems from EPRI's Open Power AI Consortium, launched earlier this year to drive the development and deployment of AI approaches tailored for the power sector, including future domain-augmented tools.
Future phases of EPRI's benchmarking effort will build on this foundation by evaluating domain-augmented tools and models and expanding beyond generic tests into real utility applications.
The full report is available here: Benchmarking Large Language Models for the Electric Power Sector and an interactive site is available here: WattWorks: The Power Sector's AI Benchmarking Hub.
Contact:
Rachel Gantz
Senior Manager of Corporate Media Relations
202-293-7517
rgantz@epri.com
About EPRI
Founded in 1972, EPRI is the world's preeminent independent, non-profit energy research and development organization, with offices around the world. EPRI's trusted experts collaborate with more than 450 companies in 45 countries, driving innovation to ensure the public has clean, safe, reliable, and affordable access to electricity across the globe. Together...shaping the future of energy.®
View original content to download multimedia:https://www.prnewswire.com/news-releases/epri-publishes-first-electric-sector-benchmarking-results-of-public-llms-302635785.html
SOURCE EPRI
|
|
|
|
|
 |
Weekly Recap: 11 Tech Press Releases You Need to See | Jan 22, 2026
|
 |
Sup AI Sets New Benchmark Record with 52.15% on Humanity's Last Exam | Jan 22, 2026
|
 |
DEADLINE ANNOUNCED FOR 2026 NEW TOP-LEVEL DOMAIN APPLICATIONS | Jan 22, 2026
|
 |
Trigent Partners with WeWork India to Expand its GCC Footprint | Jan 22, 2026
|
 |
Exia Labs Brings Keystone to the U.S. Navy via DIU's Blue Object Management Challenge | Jan 22, 2026
|
 |
Skunk Works® and XTEND Expand Joint All Domain Command and Control for Advanced Mission Execution | Jan 22, 2026
|
 |
Glasswall Brings Defense-Level File Sanitization to Every Government Agency and Business Using Microsoft 365 | Jan 22, 2026
|
 |
Buyers Edge Platform Appoints Jaime Selga to Lead Expansion Across the Middle East, Africa & Asia | Jan 22, 2026
|
 |
Genpact Named a Leader in ISG Provider Lens(TM) 2025 for Insurance GCCs and Agentic AI Services | Jan 22, 2026
|
 |
Veteran Ventures Capital Announces Investment in Vatn Systems, Supporting a New Era of Scalable Undersea Autonomy | Jan 22, 2026
|
|
|