Detailed results¶
We provide below the complete results for the baseline models: every layer, metric, language, and finetuning duration.
Across layers¶
The chart below shows the scores averaged across dev or test languages, for the four baseline models, every layer, and finetuning duration:
And this one for a specific language:
Best layer, by finetuning duration¶
This one only displays the scores for the best layer, averaged across dev or test languages:
And this one for a specific language: