Title
SOTA: Tracking the State-of-the-Art in Empirical AI Scholarly Publications
Abstract
The central activity around empirical AI research includes automated tasks defined via a task dataset for which machine learning models are developed whose performance can be evaluated by a standard set of evaluation metrics. Pushing the state-of-the-art boundaries in empirical AI research means optimizing the models developed for the tasks in terms of speed, accuracy, or storage. As such researchers in this domain often seem to ask the central question “What’s the state-of-the-art result for task XYZ right now?”
Instead of seeking out the answer buried in the ranked list of documents via a search query made on traditional search engines, researchers instead look for the answer on community-curated leaderboards such as https://paperswithcode.com/ or https://orkg.org/benchmarks. These leaderboards are websites specifically designed to showcase the performance of all introduced machine learning models on a machine learning task dataset. As such researchers seeking to find out the best model performance on a task dataset can easily obtain this information on these websites via their performance trendline overviews showcasing various model performances over a task dataset over time.
In this Shared Task, we hope to go beyond the community curation of leaderboards and instead realize the vision of obtaining the most efficient machine learning model capable of automatically detecting leaderboards. The efficiency of the submitted machine learning models as a solution to the shared task will be tested based on speed, model parameters, and leaderboard detection F1 measure.
Task
As a complete submission for the Shared Task, systems will have to perform the following tasks:
- Identify whether an incoming AI article reports leaderboards or not; and
- For AI articles reporting leaderboards, extract all the pertinent (Task, Dataset, Metric, Score) quadruples.
Dataset
Metrics
- Rouge, Recall, Precision, F1
Official Website
Contact Person
- Jennifer D’Souza (TIB)