4DS Lecture Series

Table of Contents

4DS Lecture Series

2024-07-03

12 min read

The overarching objective of NFDI4DS is the development, establishment, and sustainment of a national research data infrastructure (NFDI) for the Data Science and Artificial Intelligence community in Germany. This will also deliver benefits for a wider community requiring data analytics solutions, within the NFDI and beyond. The key idea is to work towards increasing the transparency, reproducibility and fairness of Data Science and Artificial Intelligence projects, by making all digital artifacts available, interlinking them, and offering innovative tools and services.

The NFDI4DS Lecture Series fosters collaboration, exchange of ideas, and discussions among various national and international stakeholders towards increasing transparency, reproducibility, and fairness of Data Science and Artificial Intelligence projects.

Find recorded videos of previous lectures in the TIB AV portal and our NFDI4DS Youtube channel.

Overview of all Lectures

Lecture 8 by Mariana Vitti Rodrigues: From Epistemic Opacity to Trustworthy Medical AI: Is transparency the pathway?
Lecture 7 by Olga Galanets: Building Data Spaces: Experiences from current and ongoing projects
Lecture 6 by Yiannis Papadopoulos: Safety of AI Systems with Executable Causal Models and Statistical Data Science
Lecture 5 by Suchith Anand: The Ethics of AI and Data in Higher Education
Lecture 4 by Marco Jahn: Software ate the world - and Open Source is eating software
Lecture 3 by Beatriz Serrano-Solano: Introduction to AI4Life
Lecture 2 by Michael Barton: The Open Modeling Foundation
Lecture 1 by Silvio Peroni: OpenCitations

Lecture 8: From Epistemic Opacity to Trustworthy Medical AI: Is transparency the pathway?

This presentation investigates the epistemological grounds for the ethical challenges brought about the growing use of automated decision support systems in diagnostic reasoning. Diagnostic reasoning, often described as a kind of abductive inference, can be understood as the process of generation and selection of plausible hypotheses to explain a patient’s set of signs and symptoms. The advances of machine and deep learning models for data analysis have promised an improvement in accuracy, performance, and efficiency of automated decision support systems to enhance the quality and speed of diagnostic reasoning. While the beneficial prospects of the use of Artificial Intelligence in healthcare are undeniable, their growing use in clinical settings challenge the type of trust attributed to results obtained by black box models, i.e., opaque computational models whose inner workings cannot be accessible to anyone due to their inherent complexity. Whereas regulations and ethical guidelines are pressing for the development of more transparent and interpretable algorithms to justify the rationale underlying automated forms of decision-making processes, the lack of conceptual clarity around the notion of opacity - and its correlated terms such as explainability, interpretability, and transparency - challenges the development of strategies to promote trustworthy AI. In this context, we will investigate, analyze, and compare forms of epistemic opacity present in machine learning models and human abductive inference in order to challenge the plausibility of requiring full-fledged transparent automated systems for building trustworthy Medical AI.

Speaker: Mariana Vitti Rodrigues

Mariana Vitti Rodrigues is a researcher in Philosophy of Information, Science and Technology. She investigates epistemological and ethical consequences of the increasing automation of scientific practice enabled by the development of machine learning algorithms, emphasizing the analysis of strategies to overcome algorithmic opacity in bioinformatics and biomedical informatics.

Lecture 7: Building Data Spaces: Experiences from current and ongoing projects

Join us for an insightful webinar exploring the intersection of dataPACT’s pioneering AI-driven framework and the VELES Excellence Hub, a major initiative for establishing a Regional Smart Health Data Space in Southeast Europe.

The dataPACT project is revolutionizing healthcare data management by integrating compliance, ethics, and environmental sustainability into AI-powered health data pipelines. Through its application in CAREPATH, dataPACT ensures secure, ethical, and sustainable handling of patient data. This framework is highly relevant to VELES, which focuses on advancing health data sharing, clinical practices, and patient privacy across Bulgaria, Greece, Romania, and Cyprus.

A key partner in both projects, the International Data Spaces Association (IDSA), plays a crucial role in developing federated data spaces that ensure sovereign, secure, and compliance-aware data sharing. Their expertise helps align both initiatives with European regulatory standards, fostering trust and innovation in digital health services.

Speaker: Olga Galanets

Olga Galanets works as a Senior Project Manager for the International Data Spaces Association (IDSA).

Slides: https://doi.org/10.5281/zenodo.14906913

Lecture 6: Safety of AI Systems with Executable Causal Models and Statistical Data Science

AI systems that learn from data present a unique challenge for safety, as there is no specific design artifact, model, or code to analyse and verify. The safety assurance challenges become even more complex in cooperative intelligent systems, like collaborative robots and autonomous vehicles. These systems are often loosely interconnected, allowing them to form and dissolve configurations dynamically. Evaluating the consequences of failures in largely unpredictable configurations is a daunting task. Intentional or unintentional interactions between systems, along with newly learned behaviours and varying environmental conditions, can lead to unpredictable or emergent behaviours. Achieving complete safety assurance of such systems of systems at the design stage through traditional model-based methods is unfeasible. In this talk, I will explore these challenges and introduce executable causal models and statistical techniques that may help address these emerging issues.

Speaker: Yiannis Papadopoulos

Professor Papadopoulos is a foremost international expert on safety of computer systems including safety of AI and intelligent systems. He is leading a research group on Dependable Intelligent Systems and has pioneered a method and set of tools for model-based safety and reliability assessment and evolutionary optimisation of complex engineering systems known as Hierarchically Performed Hazard Origin and Propagation Studies.

Professor Papadopoulos is currently developing new model-based and data driven technologies for dynamic safety assurance of autonomous and cooperative systems that include swarms of robots and autonomous cars using cutting-edge statistical methods for improving the safety of AI, including safety of Machine Learning, Deep Learning and Large Language Models.

Video: http://doi.org/10.5446/69635

Slides: https://doi.org/10.5281/zenodo.14223981

Lecture 5: The Ethics of AI and Data in Higher Education

The presentation will introduce the Ethical Data Initiative. The Ethical Data Initiative provides a neutral space to bring together diverse actors and stakeholders, shaping the future of data governance. In doing so, we aim to increase equality and inclusivity in the data space; building data confidence and empowering the digital citizens of tomorrow. The presentation will also share information about the Campaign for Data Ethics in Education. The Campaign advocates for the integration of data ethics in all higher education courses focused on data science and research. It aims to educate the next generation of data and research professionals about their legal and ethical obligations when it comes to using, reusing, and sharing data.

Speaker: Suchith Anand

Dr Suchith Anand is an internationally recognised expert in sustainable development and geospatial science, providing guidance and advice to governments and international organisations on data science, data ethics, open education, open data and open science policies. He has authored a wide range of publications; from journal papers, scientific reports, book chapters to international strategy documents. He is passionate about education enabling an inclusive society which supports a full commitment to equality, diversity and the public good. He has positioned his work to serve as a bridge between academia and the worlds of policy and practice. He is a UN SDG Volunteer and Advocate. His recent research has focused on ‘Leadership for a More Ethical, Equitable, and Just World’.

Video: http://doi.org/10.5446/69634

Slides: https://zenodo.org/doi/10.5281/zenodo.10721178

Lecture 4: Software ate the world - and Open Source is eating software!

“Software is eating the world” [1] - The famous quote and article by Marc Andreesen, founder of Netscape, is now 12 years old and it is fair to say: He was right, software is the driver of modern economy and pervasive throughout all industries. Taking it one step further, we argue that while software has eaten the world, open source is eating software. Open source makes up 80% - 90% of applications and if we think about it, it is clear that the modern IT industry would not be where it is today without open source. Just to name one example, the internet as we know it is based on open source technology. This lecture will give an introduction to open source and will try to investigate how open source can be applied successfully - in industry and research alike. We will cover aspects such as licenses, governance, best practices, success stories and the role of open source foundations. Last but not least we will have a look at how we can increase the impact of research projects with open source. [1] https://a16z.com/why-software-is-eating-the-world/

Speaker: Marco Jahn

Marco Jahn is Senior Research Project Manager at the Eclipse Foundation. He obtained his diploma in computer science from Ulm University in 2006 and his PhD from RWTH Aachen in 2016. He worked as software developer at denkwerk GmbH before moving to Fraunhofer FIT in 2009. There he has been working as researcher and (technical) project manager in various European research projects in the areas of IoT and Smart Cities and was leading the IoT Platforms team. He joined the Eclipse Foundation in 2019 to help turning innovations into successful open source projects.

Video: https://doi.org/10.5446/65467

Slides: https://doi.org/10.5281/zenodo.10259442

Lecture 3: Introduction to AI4Life

Machine learning (ML) has enabled and accelerated frontier research in the life sciences, but democratised access to such methods is, unfortunately, not a given. Access to necessary hardware and software, knowledge and training, is limited, while methods are typically insufficiently documented and hard to find. Furthermore, even though modern AI-based methods typically generalize well to unseen data, no standard exists to enable sharing and fine-tuning of pre-trained models between different analysis tools. Existing user-facing platforms operate entirely independently from each other, often failing to comply with FAIR data and Open Science standards. The field of AI and ML is developing at a staggering pace, making it impossible for non-specialists to stay up to date. To enable the life science communities to benefit from AI/ML-powered image analysis methods, AI4Life will build bridges, providing urgently needed services on the common European research infrastructures. We will build an open, accessible, community-driven repository of FAIR pre-trained AI models and develop services to deliver these models to life scientists, including those without substantial computational expertise. Our direct support and ample training activities will prepare life scientists for the responsible use of AI methods, while contributor services and open standards will drive community contributions of new models and interoperability between analysis tools. Open calls and public challenges will provide state-of-the-art solutions to yet unsolved image analysis problems in the life sciences. Our consortium brings together AI/ML researchers, developers of popular open-source image analysis tools, providers of European scale storage and compute services and European life sciences Research Infrastructures – all united behind the common goal to enable life scientists to fully benefit from the untapped but potentially tremendous power of AI-based analysis methods.

Beatriz Serrano-Solano holds the position of Scientific Project Manager in the Euro-BioImaging ERIC. Her responsibilities encompass the management of the work package “WP7 – Communication, Outreach and Training” and active contributions to “WP6 – Support for Open Calls, Challenges and New Services” in AI4Life (this Horizon Europe-funded project started in September 2022). With a background in Computer Science, Beatriz earned her PhD in Computational Biology, further refining her expertise during her postdoc in image analysis. Following this experience, she assumed the role of community manager for the European Galaxy project. In this role, she was deeply involved in organizing outreach, training, and community engagement activities for the global Galaxy Community.

Video: https://doi.org/10.5446

Slides: https://zenodo.org/doi/10.5281/zenodo.10563414

Lecture 2: The Open Modeling Foundation: a Global Community for Standards-Based Modeling of Human and Natural Systems

Computation is ubiquitous across all areas of science, policy, and daily life in a diverse array of applications. Modeling is one such application that has become critical to a wide range of research and policy issues, spanning multiple scientific disciplines. These computational tools allow researchers to study and forecast complex, dynamic interactions of multiple social and natural processes in ways not possible with more traditional means. While scientists share the results of model-based research with policymakers and others in respected, peer-reviewed journals and conferences, following widely understood and accepted scientific norms, equivalent practices for documenting, evaluating, and sharing the code of the models that produced such research findings have lagged behind. This especially critical when this technology is urgently needed to help humanity is confront the challenge of successfully and sustainably managing a planetary socioecological system, in which a highly complex, telecoupled, global society is tightly coupled with diverse biophysical systems. A grass-roots initiative of the international modeling community, over the past eight years, led to the formation of the Open Modeling Foundation (OMF). The OMF is a global alliance of modeling organizations that coordinates and administers a common, community developed body of standards and best practices among diverse communities of modeling scientists. As an international open science community, the OMF works to enable the next generation modeling of human and natural systems.

Michael Barton is a Professor in the School of Complex Adaptive Systems and in the School of Human Evolution & Social Change, and Director of the Center for Social Dynamics & Complexity at Arizona State University (USA). He is Executive Director of the Open Modeling Foundation, a global consortium of organizations to promote standards and best practices for computational modeling across the social and natural sciences. He also directs the Network for Computational Modeling in Social and Ecological Sciences (CoMSES.Net), an international scientific network to enable accessibility, open science, and best practices for computation in the socio-ecological sciences. Barton received his BA from the University of Kansas in Anthropology/Archaeology, and MA and PhD from the University of Arizona in Anthropology/Archaeology and Geosciences.

His research centers around long-term human ecology and landscape dynamics, integrating computational modeling, geospatial technologies, and data science with geoarchaeological field studies. Barton has directed transdisciplinary research on hunter-gatherers and small-holder farmers in the Mediterranean and North America for over three decades, and directs research on human-environmental interactions in the modern world. He is a member of the open-source GRASS GIS Development Team and Project Steering Committee, dedicated to making advanced geospatial technologies openly accessible to the world.

Video: http://doi.org/10.5446/62444

Lecture 1: Introduction to NFDI4DS and OpenCitations

Zeyd Boukhers is a data scientist and AI specialist at the Institute for Web Science and Technologies. He is Co-Leader of the group FAIR Data & Distributed Analytics at Fraunhofer-Institut für Angewandte Informationstechnik FIT.

Silvio Peroni holds a Ph.D. degree in Computer Science and he is an Associate Professor at the Department of Classical Philology and Italian Studies, University of Bologna. He is an expert in document markup and semantic descriptions of bibliographic entities using Semantic Web technologies and Co-Director of OpenCitations

Slides: Peroni, S. (2023, April). OpenCitations, an open infrastructure organization for bibliographical data. https://doi.org/10.5281/zenodo.7920424

SharePic

Previous 4DS School

Next NFDI Science Slam