Our publications.

We are regularly publishing papers on aspects of GEM that describe findings or resources we find worthwhile to share. Please have a look below:


GEMv1 OverviewGEM Workshop 2021
This is our first overview paper, introducing GEM and the initial set of 13 tasks and associated baselines.
Authors: All GEMv1 participants (see team list)
This is our second overview paper, expanding GEM to 40 tasks and 51 languages, introducing the automatic evaluation on the HuggingFace Hub.
Authors: All GEMv2 participants (see team list)
In this survey paper, we discuss many of the principles underlying GEM and propose a set of best practices to follow for model evaluation. See also the shortened version presented at the MLEval workshop at ICLR 2022.
Authors: Sebastian Gehrmann, Elizabeth Clark, Thibault Sellam
Data CardsGEM Workshop 2021
In "Reusable Templates and Guides For Documenting Datasets and Models for Natural Language Processing and Generation: A Case Study of the HuggingFace and GEM Data and Model Cards", we describe the approach for data documentation in GEMv1 and the similar approach used by HuggingFace datasets.
Authors: Angelina McMillan-Major, Salomey Osei, Juan Diego Rodriguez, Pawan Sasanka Ammanamanchi, Sebastian Gehrmann, Yacine Jernite
Evaluation SuitesNeurIPS 2021
In the paper "Automatic Construction of Evaluation Suites for Natural Language Generation Datasets", we discuss how to build data collections that test robustness of models and show that they are much more expressive than typical test splits.
Authors: Simon Mille, Kaustubh Dhole, Saad Mahamood, Laura Perez-Beltrachini, Varun Gangal, Mihir Kale, Emiel van Miltenburg, Sebastian Gehrmann
This was a collaborative & participatory workshop collecting >117 different ways to transform text and >23 ways to filter out subpopulations of datasets.
Participants and Authors: Listed in paper (see team list)
Steering Commitee: Kaustubh Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahmood, Simon Mille, Jascha SohlDickstein, Ashish Srivastava, Samson Tan, Tongshuang Wu and Abinaya Mahendiran