Goal
The past years have seen a paradigm shift, with computational methods increasingly relying on data-driven and often deep learning-based approaches, leading to the establishment and ubiquity of Data Science as a discipline driven by advances in the field of Computer Science. Transparency, reproducibility and fairness have become crucial challenges for Data Science and Artificial Intelligence due to the complexity of contemporary Data Science methods, often relying on a combination of code, models and data used for training. NFDI4DS will promote fair and open research data infrastructures supporting all involved resources such as code, models, data, or publications through an integrated approach.
The overarching objective of NFDI4DS is the development, establishment, and sustainment of a national research data infrastructure (NFDI) for the Data Science and Artificial Intelligence community in Germany. This will also deliver benefits for a wider community requiring data analytics solutions, within the NFDI and beyond. The key idea is to work towards increasing the transparency, reproducibility and fairness of Data Science and Artificial Intelligence projects, by making all digital artifacts available, interlinking them, and offering innovative tools and services. Based on the reuse of these digital objects, this enables new and innovative research.
NFDI4DS intends to represent the Data Science and Artificial Intelligence community in academia, which is an interdisciplinary field rooted in Computer Science. We aim to reuse existing solutions and to collaborate closely with the other NFDI consortia and beyond. In the initial phase, NFDI4DS will focus on four Data Science intense application areas: language technology, biomedical sciences, information sciences and social sciences. The expertise available in NFDI4DS ensures that metadata standards are interoperable across domains and that new ways of dealing with digital objects arise.
Task Areas
Task Area 1: Community and Training continuously analyzes requirements from the variety of stakeholder communities, including Computer Science research, Data Science and AI practitioners as well as industry. We develop and implement a comprehensive approach for DS and AI training, skill development, and capacity building, as well as leveraging existing platforms and resources. We comprehensively organize community building and stakeholder engagement using a variety of physical, online, and social networking channels with a special focus on facilitating diversity for example concerning gender balance and attracting young talents. We will furthermore increase awareness for legal and ethical aspects.
Task Area 2: Research Knowledge Graphs aims at improving FAIRness of Data Science artifacts including research datasets, benchmarks, machine learning models and research software (code and executables). The last years have seen a paradigm shift in Data and Computer Science towards data-driven and deep learning-based methods, which often rely on a combination of code, models, and underlying datasets. However, lack of transparency about data, code, or models is a cause for significant reproducibility and reusability issues, which surfaced across various domains. Therefore, we will follow an integrated approach towards representing and linking Data Science artifacts into a joint Research Knowledge Graph (RKG), enabling a transparent understanding of code/data provenance, model configurations, and training/testing data.
Task Area 3: Infrastructure and Services establishes the infrastructure to collect and share all input, which we call Digital Objects (DO) which is required to deliver quality-assured data analytics solutions. The registries and repositories keep track of releases of quality assessed data as well as Data Science solutions required for assessment and the resulting benchmark information. The digital objects will be represented with rich metadata information and assembled as Research Knowledge Graphs for public use. Additional components ensure access to search and recommendation services (e.g., through a portal) and to additional services for public use such as data creation, annotation, and curation as well as an online authoring, publication, and execution platform. It further provides unified access to public HPC infrastructures.
Task Area 4: Transfer and Application aims to create a strong connection between the Data Science and AI sub-communities and NFDI4DS. We will focus on the sub-communities of (1) natural language processing and language technology as well as Semantic Web, (2) biomedical research and clinical decision-making, (3) information sciences and (4) social sciences. Additionally, we will activate other data-driven sub-communities by an open call for projects. The open call will allow for up to 30 speedboat projects whose feedback will influence our development road map and whose tangible results will be made available through our emerging infrastructure.
Task Area 5: Interoperability and Cooperation: DS and AI involves a plethora of artifacts, e.g., datasets, models, ontologies, task definitions, code repositories, execution platforms, repositories, training materials, and so on. These artifacts are currently hidden in a number of platforms that manage the respective content. By making all digital artifacts available and interlinking them, NFDI4DS will foster interoperability, and collaboration between Data Science and AI platforms. We will collaborate with other NFDI consortia, as well as relevant national, European, and international partners, and with industry.
Task Area 6: Management aims to ensure a transparent and efficient management of the consortium while holding up to the highest legal and ethical standards. Efficient management and proper governance will be ensured within NFDI4DS. We will take care to comply with legal and contractual obligations and allocate funds properly. Meetings and decision processes will be organized and documented, and information will be distributed accordingly. NFDI4DS will publish relevant information and coordinate marketing activities.
