Incredible Pricing

Tailored Pricing Designed Specifically for Your Requirements.

Free

Synthetic data (excl Zip files and no download)

All AI Models

1 custom metric

Library of 210 metrics

Dashboard

A/B Testing

Experiments

1 User

10 experiment runs

Community support via Discord

Start up

Synthetic data (limited)

All AI Models

3 Custom metrics

Library of 210 metrics

Dashboard

A/B Testing

Experiments

3 users

500 LLM Judgements per month

Email support

Let's Talk

Enterprise

Synthetic data generation (unlimited)

All AI Models

Unlimited Custom metrics

Library of 210 metrics

Dashboard

A/B Testing

Experiments

Unlimited users

5,000 LLM Judgements per month

Dedicated account manager

and Slack Channel

SSO / SAML

Cloud or on-prem

Let's Talk

Frequently Asked Questions

Have another question? Please contact our team!

Contact Our Team

Can RagMetrics compare LLMs like Claude and GPT-4?

Yes. RagMetrics was built for benchmarking large language models. You can run identical tasks across multiple LLMs, compare their outputs side by side, and score them for reasoning quality, hallucination risk, citation reliability, and output robustness.

Does RagMetrics support an API for LLM evaluation?

Yes. RagMetrics provides a powerful API for programmatically scoring and comparing LLM outputs. Use it to integrate hallucination detection, prompt testing, and model benchmarking directly into your GenAI pipeline.

Can I run RagMetrics in a private cloud or on-premises?

RagMetrics can be deployed in multiple ways, including as a fully managed SaaS solution, inside your private cloud environment (like AWS, Azure, or GCP), or on-premises for organizations that require maximum control and compliance.

How do I run a GenAI evaluation experiment with RagMetrics?

Running an experiment is simple. You connect your LLM or retrieval-augmented generation (RAG) pipeline—such as Claude, GPT-4, Gemini, or your own model—define the task you're solving, upload a labeled dataset or test prompts, select your scoring criteria like hallucination rate or retrieval accuracy, and then run the experiment through the dashboard or API.

What information do I need to evaluate a model with RagMetrics?

To run an evaluation, you’ll need access to your LLM’s API key, the endpoint URL or model pipeline, a dataset or labeled test inputs, a clear task description, and a definition of success for that task. You can also include your own scoring criteria or subject matter expertise.

Can I evaluate my own proprietary or open-source model with RagMetrics?

RagMetrics is model-agnostic and supports any public, private, or open-source LLM. You can paste your custom endpoint, evaluate outputs from models like Mistral, Llama 3, or DeepSeek, and compare results to popular models like GPT-4, Claude, and Gemini using the same scoring framework.