GEM 💎 Resources

Using our resources.

As part of GEM, we are continuously producing resources for the research community. This page provides download links and brief explanations of each.

Outputs and Scores

Our growing collection of millions of outputs and automatic scores for 20+ models across all GEM tasks. This resource is to be used for work on model evaluation, to characterize model shortcomings, and to provide baseline outputs for model comparison.

HuggingFace Loader

All our datasets can be loaded via this data loader implemented in HuggingFace datasets.

TFDS Loader

All our datasets can be loaded via this data loader implemented in TFDS.

Metrics Repository

Our package for model evaluation. If you want to compute our full suite of metrics with additional convenience functions like caching and parallelism, simply add your dataset to it and follow the instructions in the README.

NL-Augmenter

If you want to run robustness tests on your model and data, NL-Augmenter can help! More information can be found on the dedicated site.