Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Evaluatio is more than just a fast metrics library, it is a project library for statistically rigorous evaluation.

The goal of the project is simple: make correct evaluation the default, not the exception. Too often, model evaluation is inconsistent, statistically unsound, or difficult to reproduce. Evaluatio aims to reduce these issues by providing well-defined metrics, principled statistical tools, and clear guidance on how to use them.

Statistically rigorous evaluation should be available to academics, researchers, hobbyists, and alike.

Make correct evaluation easy and incorrect evaluation harder

Evaluation and statistical testing should be accessible, but not at the cost of correctness.

Evaluatio is designed to:

Strong typing and explicit interfaces

Clear interfaces lead to more reliable results.

Evaluatio emphasises:

This reduces ambiguity, improves readability and developer experience, and helps catch errors early.

Purpose-driven design

Features are not added unless they serve a clear purpose.

A new metric or functionality should:

Evaluatio does not aim to reimplement existing tools without a clear benefit. In many circumstances, sign-posting users to existing tools is preferable.

Reproducibility and shared standards

Reliable evaluation requires shared understanding.

Evaluatio encourages:

By aligning on common tools and principles, we can make evaluation results more comparable, interpretable, and trustworthy.