cd
NFDI4DS

SimpleText Task

SimpleText Task

2024-02-12
2 min read

Improving Access to Scientific Texts for Everyone

The general public tends to avoid reliable sources such as scientific literature due to their complex language and lacking background knowledge. Instead, they rely on shallow and derived sources on the web and in social media - often published for commercial or political incentives, rather than informational value. Can text simplification help to remove some of these access barriers?

The SimpleText task is a part of the CLEF initiative which promotes the systematic evaluation of information access systems, primarily through experimentation on shared tasks. SimpleText addresses the challenges of text simplification approaches in the context of promoting scientific information access, by providing appropriate data and benchmarks.

The task uses a corpus of scientific literature abstracts and popular science requests. Our overall use case is to create a simplified summary of multiple scientific documents based on a popular science query which provides a user with an accessible overview of this specific topic.

Task on SOTA - Tracking the State-of-the-Art in Empirical AI Scholarly Publications

NFDI4DS and partners host a task co-located with Conference and Labs of the Evaluation Forum (CLEF)

In Artificial Intelligence (AI), a common research objective is the development of new models that can report state-of-the-art (SOTA) performance. The reporting usually comprises four integral elements: Task, Dataset, Metric, and Score (TDMS).

The SOTA? shared task is defined on a dataset of Artificial Intelligence scholarly articles. There are two kinds of articles: one reporting (Task, Dataset, Metric, Score) tuples and another kind that do not report the TDMS tuples. For the articles reporting TDMS tuples, all the reported TDMS annotations are provided in a separate file accompanying the scraped full-text of the articles. The extraction task is defined as follows:

Develop a machine learning model that can distinguish whether a scholarly article provided as input to the model reports a TDMS or not. And for articles reporting TDMSs, extract all the relevant ones.

Given the recent upsurge in the developments in generative AI in the form of Large Language Models (LLMs), creative LLM-based solutions to the task are particularly invited. The task does not place any restrictions on the application of open-sourced versus closed-sourced LLMs. Nonetheless, development of open-sourced solutions are encouraged.

Find detailed information at: https://sites.google.com/view/simpletext-sota/home