cd
NFDI4DS

Shared Task SOTA

Shared Task SOTA

2023-02-01
2 min read

Title

SOTA: Tracking the State-of-the-Art in Empirical AI Scholarly Publications

Abstract

The central activity around empirical AI research includes automated tasks defined via a task dataset for which machine learning models are developed whose performance can be evaluated by a standard set of evaluation metrics. Pushing the state-of-the-art boundaries in empirical AI research means optimizing the models developed for the tasks in terms of speed, accuracy, or storage. As such researchers in this domain often seem to ask the central question “What’s the state-of-the-art result for task XYZ right now?”

Instead of seeking out the answer buried in the ranked list of documents via a search query made on traditional search engines, researchers instead look for the answer on community-curated leaderboards such as https://paperswithcode.com/ or https://orkg.org/benchmarks. These leaderboards are websites specifically designed to showcase the performance of all introduced machine learning models on a machine learning task dataset. As such researchers seeking to find out the best model performance on a task dataset can easily obtain this information on these websites via their performance trendline overviews showcasing various model performances over a task dataset over time.

In this Shared Task, we hope to go beyond the community curation of leaderboards and instead realize the vision of obtaining the most efficient machine learning model capable of automatically detecting leaderboards. The efficiency of the submitted machine learning models as a solution to the shared task will be tested based on speed, model parameters, and leaderboard detection F1 measure.

Task

As a complete submission for the Shared Task, systems will have to perform the following tasks:

  • Identify whether an incoming AI article reports leaderboards or not; and
  • For AI articles reporting leaderboards, extract all the pertinent (Task, Dataset, Metric, Score) quadruples.

Dataset

Metrics

  • Rouge, Recall, Precision, F1

Official Website

Contact Person

  • Jennifer D’Souza (TIB)
Previous Shared Task SOMD