Skip to content

Detailed results

We provide below the complete results for the baseline models: every layer, metric, language, and finetuning duration.

Across layers

The chart below shows the scores averaged across dev or test languages, for the four baseline models, every layer, and finetuning duration:

And this one for a specific language:

Best layer, by finetuning duration

This one only displays the scores for the best layer, averaged across dev or test languages:

And this one for a specific language: