Special Session: Human Frontier Science Program (HFSP) Symposium

Attention Presenters - please review the Speaker Information Page available here

Data science will determine the success of breakthrough research of the future

In today’s research environment studying for example a disease in human patients using human DNA protocols is more than just traditional molecular biology bench work. The myriad of data points requires powerful approaches for data analysis and availability of data resources that allow comparison of results for example with that of a well-established experimental model, such as mouse, for a particular human disease. But the work does not stop there as the first analysis may demand to dig even deeper into pathways that control specific molecular or genetic mechanisms, or to analyze relevant metabolic pathways that are important for the disease under study.

High throughput technologies for sequencing and genomics are just two of many approaches that harbor innovation potential and hold promise for the development of the global research ecosystem. Frontier science of the future is unthinkable without continued education, development and application of computational biology approaches. Data science using machine learning, AI technologies and advanced software applications provide the foundation for success in scientific research. Equally important is the sustained support for data resources and maintaining open access for the global scientific community. The symposium will provide insights into the many aspects of this broad topic with an historic perspective but also highlighting current approaches in key areas of the life sciences.

Schedule subject to change
All times listed are in CEST
Wednesday, July 26th
10:30-10:50
Data science and the new HFSPO Strategy 2024-2032
Room: Salle Rhone 3a
Format: Live from venue

  • Guntram Bauer
10:50-11:50
Invited Presentation: Establishing a self-sustaining database for a sustainable society
Room: Salle Rhone 3a
Format: Live from venue

  • Minoru Kanehisa


Presentation Overview: Show

Since the Human Genome Project (HGP) in the 1990s, continuous developments of high-throughput experimental technologies and generation of large-scale datasets have transformed biology into data science. The KEGG database resource (https://www.kegg.jp) was initiated in 1995 under the Japanese HGP and we share our perspectives on its past, present and future developments. KEGG is not a simple data repository. KEGG is a computerized model of biological information systems in the cell, the organism and the ecosystem, which enables uncovering hidden features in genome sequences and other biological data. In comparison to AI/ML models that are computationally generated from big data, the KEGG model is manually created from selected publications based on human intelligence. This aspect never changed over the past 28 years, but there were two major changes. One is the change from a publicly funded database to a self-sustaining database, in which the licensing revenue received from commercial users is fully reinvested to further develop KEGG and to continue to make it freely available to academic users. The other change involves more focus on social values. The health information category of KEGG was introduced in 2010 around the time when public funding started to decline. It integrates drug labels and other regulatory data with scientific knowledge, and KEGG is now a popular web resource for drug information in the Japanese society. The essence of the KEGG model is its molecular network representation, which we hope will be expanded enabling the analysis of molecular reactions and interactions at the biosphere level.

11:50-12:10
Invited Presentation: Open Access data resources - how to meet global challenges and community needs
Room: Salle Rhone 3a
Format: Live from venue

  • Johanna McEntyre


Presentation Overview: Show

The mission of the EMBL-EBI data services is to deliver global data resources that meet the needs of cutting edge life sciences and medical research, and to do so in the most cost-effective, integrated, and sustainable way possible. Molecular and related data are provided to the world from just over 40 data resources, covering genomes to chemicals to curated cellular pathways and the research literature. These resources are used everyday use in the design and interpretation of experiments, through to supporting sophisticated new analyses schemes that push forward the boundaries of understanding, to routine use in clinical genomics workflows worldwide and finally in enabling transformative computational approaches, such as DeepMind’s AlphaFold. In this presentation I will describe how this portfolio of resources is managed, highlighting some key recent developments, challenges and future opportunities.

12:10-12:30
Invited Presentation: Who owns your data? Who should benefit from it? The effect of UN policy decisions on biological data management
Room: Salle Rhone 3a
Format: Live from venue

  • Amber Scholz, Leibniz Institute DSMZ, Germany


Presentation Overview: Show

The UN Convention on Biological Diversity establishes sovereign rights of nations over the biological diversity in their borders. From pathogens to palm trees, countries can regulate the terms of access and benefit-sharing for their genetic resources under the Nagoya Protocol. But what happens when scientists share the data on these same organisms in open access databases and infrastructures? Who should benefit from these data and how can benefits flow back to the original country of origin? Recent international policy decisions under the Kunming-Montreal Global Biodiversity Framework and on-going discussions under the Food and Agriculture Organization, World Health Organization and UN Convention on the Law of the Seas will fundamentally change how biological data ownership is handled. The decisions will likely impact biological database management principles, shift data stewardship practices, and revise the ethical standards of the bioinformatics community into the next decade and beyond. Come learn more about policy decisions that will impact your understanding of data ownership, ethics, and science policy.

13:50-14:20
Invited Presentation: Interoperability, data structure and data sharing in the Argentina Genomics Network
Room: Salle Rhone 3a
Format: Live from venue

  • Josefina Campos
14:20-14:50
AI-driven drug repurposing and binding pose meta dynamics identifies novel targets for monkeypox virus
Room: Salle Rhone 3a
Format: Live from venue

  • Halima Bensmail, Qatar computing research institute, Hamad Bin Khalifa University, Qatar


Presentation Overview: Show

Monkeypox virus (MPXV) was confirmed in May 2022 and designated a global health emergency by WHO in
July 2022. MPX virions are big, enclosed, brick-shaped, and contain a linear, double-stranded DNA genome
as well as enzymes. MPXV particles bind to the host cell membrane via a variety of viral-host protein
interactions. As a result, the wrapped structure is a potential therapeutic target. DeepRepurpose, an artificial
intelligence-based compound-viral proteins interaction framework, was used via a transfer learning
setting to prioritize a set of FDA approved and investigational drugs which can potentially inhibit MPXV
viral proteins. To filter and narrow down the lead compounds from curated collections of pharmaceutical
compounds, we used a rigorous computational framework that included homology modeling, molecular
docking, dynamic simulations, binding free energy calculations, and binding pose metadynamics. We
identified Elvitegravir as a potential inhibitor of MPXV virus using our comprehensive pipeline.

14:50-15:10
Invited Presentation: The eLwazi open data science platform for biomedical research in Africa
Room: Salle Rhone 3a
Format: Live from venue

  • Nicola Mulder, University of Cape Town, South Africa


Presentation Overview: Show

New and improved technologies are resulting in biomedical data being generated at unprecedented scales. In order to convert the data into knowledge for application to health priorities, it is necessary to integrate different data types and embrace data science techniques for their analysis and interpretation. This increased complexity in biomedical research brings with it many challenges, particularly for low resourced settings, such as those in low and middle income countries. Challenges include moving, storing, securing and processing large datasets and doing so in an ethical, responsible way such that those with authorized access to data have access to the required data analysis tools and computing resources. The Harnessing Data Science for Health Discovery and Innovation in Africa (DS-I Africa) Initiative aims to “leverage data science technologies to transform biomedical and behavioral research and develop solutions that would lead to improved health for individuals and populations”. The consortium’s eLwazi Open data Science Platform aims to address some of the aforementioned challenges to enable African scientists to access, share, analyze and interpret large, multidisciplinary datasets for novel health discoveries. This talk will describe the challenges and how the platform will address these in the context of African research settings.

15:10-15:30
Investigating the effect of gene-country interactions on health and anthropometric traits in South Asian populations
Room: Salle Rhone 3a
Format: Live from venue

  • Martin Kelemen, University of Cambridge, United Kingdom
  • Adam Butterworth, University of Cambridge, United Kingdom


Presentation Overview: Show

Health disparities arise from variations in disease outcomes among different populations, influenced by both genetics and environmental factors. However, the impact of gene-environment interactions (GxE) at the country-level has remained an understudied area so far.
We relied on two large population cohorts of South Asian individuals in the UK and Bangladesh. Variance-component analyses and GxE-GWAS were performed on the following cardio-metabolic and anthropometric traits: height, BMI, years of education, systolic blood pressure (SBP), diastolic blood pressure (DBP), high cholesterol, myocardial infarction, stroke, type 2 diabetes (T2D) and angina.
We found significant Gene x Country variance-components for height (0.067), BMI (0.123), years of education (0.251), SBP (0.095) and DBP (0.080). Compared to single-cohort narrow-sense heritability estimates, we observed that for BMI, SBP and DBP, approximately half, and for years of education, close to the entire effect may be due to a country-level GxE term. However, among all traits, we found only a single significant locus (rs236554) for the T2D GxE GWAS.
Our results suggest that GxE effects may be common and likely to be due to a highly polygenic architecture comprising many SNP-environment interactions of small effect. Such GxE may exacerbate health inequalities by altering the efficacy of interventions across countries.

16:00-16:50
Invited Presentation: Biomedical Data Science: We Are Not Alone
Room: Salle Rhone 3a
Format: Live from venue

  • Philip Bourne


Presentation Overview: Show

Data science initiatives of some ilk are being developed in most academic research institutions worldwide. While their scope and emphasis may vary, I argue, they challenge the conventional way of thinking about bioinformatics, computational biology, and systems biology. It is more than a name change to become biomedical data science; it is an opportunity as I will explain. It addresses and updates the question I posed in 2021, Is Bioinformatics Dead? https://doi.org/10.1371/journal.pbio.3001165

16:50-17:10
Invited Presentation: Towards a sustainable biodata infrastructure
Room: Salle Rhone 3a
Format: Live from venue

  • Guy Cochrane, Global Biodata Coalition, France


Presentation Overview: Show

Progress in life and biomedical sciences depends absolutely on biodata resources - databases comprising biological data and services around those databases. Supporting scientists in data operations and spanning management, analysis and publication of newly generated data and access to pre-existing reference data, these biodata resources together comprise a critical infrastructure for the domain. Unlike other scientific infrastructures, biodata resources are globally distributed and lack any kind of central coordination. While this configuration supports innovation, it lends itself poorly to the long-term sustainability of individual biodata resources and the infrastructure as a whole. The Global Biodata Coalition (GBC) brings together life science research funding organisations that recognise these challenges and acknowledge the threat that the lack of sustainability poses. They agree to work together to find ways to improve sustainability.

In the presentation I will outline biodata resources and the infrastructure that they make up and discuss sustainability challenges. Covering some of the work that GBC has carried out to understand and classify biodata resources and the entire biodata resource infrastructure, I will outline the Global Core Biodata Resource programme and Inventory project. I will introduce the upcoming stakeholder consultation processes around approaches to sustainability and open data. Finally I will lay out the path GBC is taking to engage researchers, informaticians, funding organisations and other stakeholders in moving towards greater sustainability for these critical foundations for our field.

17:10-17:30
Invited Presentation: TBC
Room: Salle Rhone 3a
Format: Live from venue

  • Christophe Godin, INRIA, France
  • Teva Vernoux, CNRS, France
17:30-18:00
Invited Presentation: Data, computational biology and drug target discovery
Room: Salle Rhone 3a
Format: Live from venue

  • Philippe Sanseau


Presentation Overview: Show

In this talk I will discuss how relevant data, especially genetics and genomics, alongside computational analysis are essential to uncover new biological insights as well as to identify and validate new drug targets.