Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Edit distance measures the minimum number of substitutions, insertions, and deletions required to transform one sequence into another. It is also known as Levenshtein distance, after Vladimir Levenshtein who described the algorithm in 1966 Levenshtein, 1966 in the context of correcting binary codes.

While edit distance is most commonly applied to characters or words in NLP, the algorithm itself only requires elements to be comparable — it makes no assumptions about what those elements are. In Python terms, any object implementing __eq__ is sufficient. This makes it applicable to any tokenisation scheme, including custom or language-specific ones.

The normalised variant, universal error rate (UER), divides the edit distance by the length of the reference sequence:

UER(H,R)=edit_distance(H,R)RUER(H, R) = \frac{\text{edit\_distance}(H, R)}{|R|}

WER and CER are both special cases of UER, differing only in how sequences are tokenised before the metric is computed.

When to use UED/UER directly

In most cases you should use WER or CER directly. Use universal_edit_distance_per_pair or universal_error_rate when:

In these cases, pre-tokenise your sequences into lists and pass them directly to universal_error_rate. For corpus-level evaluation and confidence intervals over custom tokens, the same bootstrap tools used for WER and CER apply.

How to choose which function to use:

Evaluatio implementation

API reference

The Rust implementation uses a generic function bounded by PartialEq, making it truly type-agnostic at the core level. The PyO3 bindings expose this to Python by implementing PartialEq for PyAny using the following dispatch:

The second case carries a small performance overhead due to the Python call, but this only applies when comparing heterogeneous types, which should not occur in practice for well-formed inputs.

Corpus-level UER

As with WER and CER, corpus-level UER is computed as total edit distance divided by total reference length — not as a mean of utterance-level scores. The same distinction between micro and macro averaging applies here. See WER for a full discussion.

References
  1. Levenshtein, V. I. (1966). Binary Codes Capable of Correcting Deletions, Insertions, and Reversals. Soviet Physics Doklady, 10, 707–710.