Benchmark¶
The benchmark covers 12 languages chosen to span a wide range of phonemic contrasts, split into dev languages for tuning and test languages for final evaluation. Systems are given 10 hours of unannotated speech and must produce discrete units that can be mapped to the language's phoneme inventory, either many-to-one (with 256 units), or one-to-one (with as many units as phonemes).
- Languages:
- dev languages: German, Swahili, Tamil, Thai, Turkish, Ukrainian
- test languages: Basque, English, French, Japanese, Mandarin Chinese, Wolof
- Evaluation metrics:
- Units quality: PNMI
- Recognition: Phone Error Rate
- Segmentation: \(F_1\), \(R\)-value
- Discriminability (optional): ABX discrete and continuous
- Tracks
- Many-to-one (256 units)
- One-to-one (number of phonemes + 1 units)