Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Welcome to the documentation for the Evaluatio library! Evaluatio is a Python library for statistically rigorous NLP evaluation. It provides metric implementations, bootstrap confidence intervals, and model comparison tools. Evaluatio’s documentation is designed as a reference for evaluation methodology, not just API usage.

Documentation philosophy

The documentation for Evaluatio should not only serve as the hub for the API of the project, but also as a go-to resource for how to do evaluation, comparisons, and statistical testing properly.

The documentation aims to be a reference for evaluation methodology as much as for the library itself. Each metric page covers not just the API, but when to use it, when not to, common pitfalls, and the statistical background. We also document metrics not included in the library where they are relevant to rigorous evaluation.

Getting started

Metrics

Statistical inference

Project and library design

For the philosophy of the project, please see the project and design philosophy document

Contribution Guide

We follow the principle that there is always space for more people in the community and that everyone has something to contribute, so if you want to contribute, please feel free to get in touch and read our contribution guide :)