The workshop will be held as part of ACL-IJCNLP 2021, August 1-6, 2021. It will take place on August 5 or 6.
It is endorsed by the ACL Special Interest Group on Natural Language Generation (SIGGEN).
Natural language generation is one of the most active research fields in NLP, with generation, summarization, and dialog among the most submitted-to tracks. As such, the number of available datasets, metrics, models, and evaluation strategies are increasing rapidly. This is leading to the situation where new models are often evaluated on different anglo-centric tasks with incompatible evaluation setups. With GEM, we are aiming to solve this problem by standardizing and improving the corpora on which to evaluate NLG models, and by supporting the development of better evaluation approaches. In our shared tasks, models will be applied to a wide set of NLG tasks. It covers challenges that measure specific generation aspects, such as content selection and planning, surface realization, paraphrasing, simplification, and others. To avoid hill-climbing on automated metrics, a second part of the shared task focuses on an in-depth analysis of submitted model outputs across both human and automatic evaluation with the aim to uncover shortcomings and opportunities for progress.
The shared task is described in-depth on this page. It includes two parts:
- In the first part, participants are encouraged to apply their model to as many of the included tasks as possible and submit their formatted outputs. We provide GEM-specific test sets that will be used to evaluate specific generation aspects.
- In the second part, all submitted and baseline outputs will be released for an evaluation shared task. Participants can submit analyses and evaluations of the model outputs.
During the GEM workshop, shared task participants will come together to discuss their findings which will inform future iterations of GEM.
Call for Papers
All papers are allowed unlimited space for references and appendices. For papers associated with the shared task, we additionally highly encourage publishing the code used to generate the results. We ask for papers in the following categories:
Participants of the modeling shared task are invited to submit a system description of 4-8 pages.
System Evaluation Descriptions
Participants of the evaluation shared task are invited to submit a paper describing their analysis approach and findings of 4-8 pages.
We welcome papers discussing any of the following topics:
- Automatic evaluation of NLG systems
- Creating challenge sets for NLG corpora
- Critiques of benchmarking efforts (including ours)
- Crowdsourcing strategies to improve the inclusiveness of NLG research
- Measuring progress in NLG / What should a GEM 2.0 look like
- Modeling and data-augmentation strategies for training effective and/or efficient NLG systems that can be applied to a wide range of tasks
- Standardizing human evaluation and making it more robust
We additionally invite every group that contributed to the creation and organization of GEM to submit a description of their considerations and contributions.
Please note that we are not looking at submissions that focus on specific modeling challenges or introduce new model architectures, etc., which would fit better into conferences like ACL or INLG.
These submissions can take either of the following forms:
- Archival Papers Papers describing original and unpublished work can be submitted in either a short (4-page) or a long (8-page) format.
- Non-Archival Abstracts To discuss work already presented or under review at a peer-reviewed venue, we allow the submission of 2-page abstracts
All submissions should conform to ACL 2021 style guidelines. Archival long and short paper submissions must be anonymized. Abstracts and shared task submission descriptions should include author information. Please submit your papers at the SoftConf link.
February 2 First Call for Shared Task Submissions and Papers, Release of the Training Data
May 3 Workshop Paper Due Date (excl. shared tasks) UPDATED
May 28 Notification of Acceptance (excl. shared tasks)
June 7 Camera-ready papers due (excl. shared tasks)
Shared Task Dates
February 2 Release of the training Data
March 29 Release of the test sets
May 14 Modeling submissions due
March 29 Release of the baseline outputs
May 17 Release of the submission outputs
System Descriptions and Analyses
June 11 System Descriptions and Analyses due
June 25 Notification of Acceptance (shared task)
July 9 Camera-ready papers and task descriptions due
August 5-6 Workshop Dates
The workshop is organized by
- Antoine Bosselut (Stanford University)
- Esin Durmus (Cornell University)
- Varun Prashant Gangal (Carnegie Mellon University)
- Sebastian Gehrmann (Google Research)
- Yacine Jernite (Hugging Face)
- Laura Perez-Beltrachini (University of Edinburgh)
- Samira Shaikh (UNC Charlotte)
- Wei Xu (Georgia Tech)
The shared task and the GEM environment is organized by a larger team which is listed on this page.