Welcome to the documentation for the Evaluatio library! Evaluatio is a Python library for statistically rigorous NLP evaluation. It provides metric implementations, bootstrap confidence intervals, and model comparison tools. Evaluatio’s documentation is designed as a reference for evaluation methodology, not just API usage.
Documentation philosophy¶
The documentation for Evaluatio should not only serve as the hub for the API of the project, but also as a go-to resource for how to do evaluation, comparisons, and statistical testing properly.
The documentation aims to be a reference for evaluation methodology as much as for the library itself. Each metric page covers not just the API, but when to use it, when not to, common pitfalls, and the statistical background. We also document metrics not included in the library where they are relevant to rigorous evaluation.
Getting started¶
Metrics¶
Statistical inference¶
Project and library design¶
For the philosophy of the project, please see the project and design philosophy document
Contribution Guide¶
We follow the principle that there is always space for more people in the community and that everyone has something to contribute, so if you want to contribute, please feel free to get in touch and read our contribution guide :)