Runloop.ai and Fermatix.ai Partner to Introduce Custom Benchmarks for AI Agents

Log In
New User? Sign Up

About | Contact | FAQ

HOME ACCOUNT PRODUCTS NEWS ARTICLES DIRECTORY CLASSIFIEDS FORUMS TOOLS

Home

News

Web Hosting

Domain Name Industry

Thursday, July 9, 2026

News

Feeds

Email

Runloop.ai and Fermatix.ai Partner to Introduce Custom Benchmarks for AI Agents
Thursday, October 9, 2025

SAN FRANCISCO, Oct. 1, 2025 /PRNewswire/ -- Runloop.ai, the leading enterprise infrastructure platform for AI agents, today announced the launch of its Custom Benchmarks product. The new offering enables organizations to create highly specialized, private benchmarks that accurately measure and refine AI agents on their unique, proprietary codebases and business logic. To highlight the product's broad applications and strategic value, Runloop.ai is collaborating with Fermatix.ai, a specialist in full-cycle data generation, on a landmark pilot program.

The explosion of AI agents has created a critical need for rigorous and relevant evaluation and functional training. While public benchmarks are crucial for general model evaluation, they often fail to capture the specific requirements of AI agents or the validation needs of enterprises. Runloop.ai's Custom Benchmarks solve this problem by providing a secure, scalable platform for companies to build benchmarks that test against their own internal business logic, tech stacks, and performance metrics.

Key features of Runloop.ai's Custom Benchmarks product include:

    --  Private benchmarking: Securely test AI agents on proprietary code
        without exposing intellectual property.
    --  Accurate performance evaluation: Measure agent effectiveness in
        real-world, business-specific conditions.
    --  Scalable infrastructure: A reliable and isolated environment for running
        thousands of tests simultaneously.
    --  Strategic model refinement: Obtain data for targeted improvement and
        retraining of AI agents for specific tasks.

"As AI agents move from prototypes to production, the benchmarks we use to evaluate them must evolve from generic tests to strategic assets," said Jonathan Wall, CEO of Runloop.ai. "Our new Custom Benchmarks product empowers enterprises to define what 'good' looks like for their unique business, enabling them to fine-tune and trust their AI agents in real-world scenarios. The pilot with Fermatix.ai is the perfect example of this in action, demonstrating the value of this approach in the most demanding environments."

Fermatix.ai, a company known for creating expert-level training data tailored to industry-critical tasks and highly specialized domains, with annotators who are practicing industry experts, brings the perfect expertise for this pilot. By leveraging Runloop.ai's infrastructure, Fermatix.ai is strategically expanding its capabilities to offer custom, in-house verification for its clients. The collaboration allows Fermatix.ai to move beyond its current offerings and provide a new level of assurance by creating benchmarks tailored to specific enterprise needs. This pilot program will demonstrate how Fermatix.ai's expertise in data engineering and expert-level annotation can be applied to create high-fidelity, multilingual benchmarks on Runloop.ai's platform.

"At Fermatix.ai, we've built our reputation on creating expert-level training data with practicing industry professionals as annotators," said Sergey Anchutin, CEO and Founder of Fermatix.ai. "This partnership with Runloop.ai represents a strategic evolution--moving beyond one-time data labeling to creating reusable benchmarks that deliver ongoing value to our clients. By leveraging our domain expertise and Runloop's infrastructure, we're not just providing data anymore; we're building the testing standards that will define how enterprises evaluate their AI agents across industry-critical tasks."

The Custom Benchmarks product is now available to all Runloop.ai Pro clients, with early results from the Fermatix.ai pilot program expected to be shared in the coming months.

About Runloop.ai

Runloop provides infrastructure and tooling for building, testing, refining, and deploying AI agents at scale. Founded by engineers with deep experience in building large-scale systems, Runloop provides secure, isolated environments, rich developer tooling, and a suite of benchmarking capabilities that help companies deploy and manage AI agents with confidence.

Media contact:
Michelle Faulkner
Big Swing
617-510-6998
michelle@big-swing.com

https://www.linkedin.com/company/runloopai https://x.com/runloopdev https://github.com/runloopai

View original content to download multimedia:https://www.prnewswire.com/news-releases/runloopai-and-fermatixai-partner-to-introduce-custom-benchmarks-for-ai-agents-302572197.html

SOURCE Runloop.ai

Email

Slashdot

Digg

Del.icio.us

Feeds

RELATED NEWS ARTICLES

	Sup AI Sets New Benchmark Record with 52.15% on Humanity's Last Exam \| Jan 22, 2026
	Weekly Recap: 11 Tech Press Releases You Need to See \| Jan 22, 2026
	Skunk Works® and XTEND Expand Joint All Domain Command and Control for Advanced Mission Execution \| Jan 22, 2026
	Trigent Partners with WeWork India to Expand its GCC Footprint \| Jan 22, 2026
	DEADLINE ANNOUNCED FOR 2026 NEW TOP-LEVEL DOMAIN APPLICATIONS \| Jan 22, 2026
	Genpact Named a Leader in ISG Provider Lens(TM) 2025 for Insurance GCCs and Agentic AI Services \| Jan 22, 2026
	Glasswall Brings Defense-Level File Sanitization to Every Government Agency and Business Using Microsoft 365 \| Jan 22, 2026
	Altair HyperWorks 2026 Delivers Design and Simulation at Scale with AI \| Jan 22, 2026
	Exia Labs Brings Keystone to the U.S. Navy via DIU's Blue Object Management Challenge \| Jan 22, 2026
	Marketing Evolution Announces New Investment Led by Insight Partners to Power AI-Ready Marketing Data for the Agentic Era \| Jan 22, 2026

NEWS SEARCH

Survey Software

Click Tracking

Poll Software

Rating Software