evaluatio.metrics.cer¶
Character-level error metrics
This module provides utilities to compute character error rate (CER) and character-level edit distance between reference and hypothesis text sequences.
The functions accept any iterable of strings and internally convert them to a format compatible with the underlying native bindings.
Note
If a reference string is empty, the corresponding CER is defined as
inf.These functions are thin wrappers around optimized native implementations.
Functions¶
character_error_rate_per_pair¶
character_error_rate_per_pair(references: Iterable[str], hypotheses: Iterable[str]) -> List[float]Compute character error rate (CER) for each reference-hypothesis pair.
Parameters
references:Iterable[str]
Iterable of reference strings.hypotheses:Iterable[str]
Iterable of hypothesis strings. Must be the same length asreferences.
Returns
List[float]
Character error rate for each pair of reference and hypothesis.
Raises
ValueError
If the lists are of different lengths.
See-Also
metrics.uer.universal_error_rate_per_pair : Type-agnostic version.
Note
If a reference string is empty or contains no characters, the resulting CER is
inf.
character_edit_distance_per_pair¶
character_edit_distance_per_pair(references: Iterable[str], hypotheses: Iterable[str]) -> List[int]Compute character-level edit distance for each reference-hypothesis pair.
Parameters
references:Iterable[str]
Iterable of reference strings.hypotheses:Iterable[str]
Iterable of hypothesis strings. Must be the same length asreferences.
Returns
List[int]
character-level edit distance for each pair.
character_error_rate¶
character_error_rate(references: Iterable[str], hypotheses: Iterable[str]) -> floatCompute the corpus level character error rate (CER) over all pairs.
Parameters
references:Iterable[str]
Iterable of reference strings.hypotheses:Iterable[str]
Iterable of hypothesis strings. Must be the same length asreferences.
Returns
float
Corpus level character error rate across all pairs.
Note
Equivalent to common CER implementations (e.g.,
jiwer-based metrics).If all reference strings are empty, the resulting CER is
inf.
character_error_rate_ci¶
character_error_rate_ci(references: Iterable[str], hypotheses: Iterable[str], interations: int, alpha: float) -> ConfidenceIntervalEstimate a confidence interval for the character error rate via bootstrapping.
Parameters
references:Iterable[str]
Iterable of reference strings.hypotheses:Iterable[str]
Iterable of hypothesis strings. Must be the same length asreferences.interations:int
Number of bootstrap iterations.alpha:float
Significance level for the confidence interval.
Returns
ConfidenceInterval
Estimated confidence interval for the corpus level character error rate.
Note
The bootstrapped metric corresponds to
character_error_rate.If any reference string is empty or contains no characters, the resulting CER can be
inf.