EPRI Publishes First Electric Sector Benchmarking Results of Public LLMs

Log In
New User? Sign Up

About | Contact | FAQ

HOME ACCOUNT PRODUCTS NEWS ARTICLES DIRECTORY CLASSIFIEDS FORUMS TOOLS

Home

News

Web Hosting

Domain Name Industry

Friday, May 29, 2026

News

Feeds

Email

EPRI Publishes First Electric Sector Benchmarking Results of Public LLMs
Thursday, January 22, 2026

PALO ALTO, Calif., Dec. 9, 2025 /PRNewswire/ -- Today, EPRI released first-of-its-kind, domain-specific benchmarking results for the electric power sector. This initial application included multiple-choice and open-ended questions rooted in real-world utility topics, providing a more realistic view of how large language models (LLM) perform. Results indicate expert oversight remains imperative, especially with open-ended questions, which could result in less than 50% accuracy in some cases.

Many existing benchmarks assess broad academic knowledge, such as math, science, and coding, and may not capture the operational and contextual complexity of real-world utility environments. Benchmarking with electric power-specific questions, such as generation and transmission and distribution asset-related inquiries, helps assess how well LLMs understand and respond to technical, regulatory, and operational questions that utilities face.

"As utilities integrate AI into power system planning and operations, this benchmarking establishes a critical foundation for evaluating domain-specific tools and models. Accuracy is paramount, as errors can lead to significant operational and reliability consequences," said EPRI Vice President of AI Transformation and Chief AI Officer Remi Raphael. "Independent benchmarking by EPRI ensures the utility industry can trust and act on unbiased, credible insight."

Key takeaways from EPRI's initial benchmarking report included:

    --  Open-ended questions exposed a reliability gap. When the same questions
        were asked in open-ended form instead of multiple-choice questions
        (MCQs), average accuracy dropped on average by 27 percentage points. On
        expert-level questions, top models only scored between 46-71%.
    --  MCQs provide a strong but incomplete baseline. On EPRI's MCQs, leading
        frontier models scored 83-86%, broadly consistent with their performance
        on external math and science benchmarks, but these scores benefit from
        the structure of MCQs.
    --  Open-weight models are closing the gap. These are LLMs whose trained
        parameters -- known as weights -- are publicly available. While
        typically one generation behind proprietary frontier systems, they are
        rapidly improving. Their ability to be self-hosted can give utilities
        valuable deployment flexibility.
    --  Web search modestly improves accuracy. Allowing models to search the web
        boosted scores slightly (2-4%), while also introducing the risk of
        retrieving irrelevant or misleading information.

EPRI utilized a dataset comprising more than 2,100 questions and answers, generated by 94 power sector experts, drawing from publicly available sources, including the institute's reports covering 35 power sector topics. The benchmarking used three phases to test capabilities, with reproducibility on multiple LLMs, including GPT-5, Grok 4, and Gemini 2.5 Pro. Phase 1 measured model knowledge through multiple-choice questions, phase two repeated tests with web search, and phase three assessed open-ended responses using both knowledge and search. Each phase included three runs per model, with confidence intervals reported to capture variability.

The effort stems from EPRI's Open Power AI Consortium, launched earlier this year to drive the development and deployment of AI approaches tailored for the power sector, including future domain-augmented tools.

Future phases of EPRI's benchmarking effort will build on this foundation by evaluating domain-augmented tools and models and expanding beyond generic tests into real utility applications.

The full report is available here: Benchmarking Large Language Models for the Electric Power Sector and an interactive site is available here: WattWorks: The Power Sector's AI Benchmarking Hub.

Contact:
Rachel Gantz
Senior Manager of Corporate Media Relations
202-293-7517
rgantz@epri.com

About EPRI
Founded in 1972, EPRI is the world's preeminent independent, non-profit energy research and development organization, with offices around the world. EPRI's trusted experts collaborate with more than 450 companies in 45 countries, driving innovation to ensure the public has clean, safe, reliable, and affordable access to electricity across the globe. Together...shaping the future of energy.®

View original content to download multimedia:https://www.prnewswire.com/news-releases/epri-publishes-first-electric-sector-benchmarking-results-of-public-llms-302635785.html

SOURCE EPRI

Email

Slashdot

Digg

Del.icio.us

Feeds

RELATED NEWS ARTICLES

	Sup AI Sets New Benchmark Record with 52.15% on Humanity's Last Exam \| Jan 22, 2026
	Weekly Recap: 11 Tech Press Releases You Need to See \| Jan 22, 2026
	DEADLINE ANNOUNCED FOR 2026 NEW TOP-LEVEL DOMAIN APPLICATIONS \| Jan 22, 2026
	Skunk Works® and XTEND Expand Joint All Domain Command and Control for Advanced Mission Execution \| Jan 22, 2026
	Trigent Partners with WeWork India to Expand its GCC Footprint \| Jan 22, 2026
	Altair HyperWorks 2026 Delivers Design and Simulation at Scale with AI \| Jan 22, 2026
	Exia Labs Brings Keystone to the U.S. Navy via DIU's Blue Object Management Challenge \| Jan 22, 2026
	Glasswall Brings Defense-Level File Sanitization to Every Government Agency and Business Using Microsoft 365 \| Jan 22, 2026
	Buyers Edge Platform Appoints Jaime Selga to Lead Expansion Across the Middle East, Africa & Asia \| Jan 22, 2026
	Genpact Named a Leader in ISG Provider Lens(TM) 2025 for Insurance GCCs and Agentic AI Services \| Jan 22, 2026

NEWS SEARCH

Survey Software

Click Tracking

Poll Software

Rating Software