Special Session: Bioinformatics in France

Attention Presenters - please review the Speaker Information Page available here

The French bioinformatics community is rich and diverse. It brings together many actors from the academic and industry domains. The \"Bioinformatics in France\" is a ISMB/ECCB2023 special session organized by the SFBI (French Society of Bioinformatics) and it will give a broad view of bioinformatics in France.
This afternoon session is open to all ISMB/ECCB attendees. First, the actions and goals of the main French groups in bioinformatics (SFBI, GdR-BIM, IFB-Elixir) will be presented. Then, we will invite you to listen to researchers from our community who will share their latest and most innovative research work. All the session will have that very special “french flavor”.
In order to take the time to exchange, we will continue our discussions during a poster session followed by a French style culinary evening. In addition to its scientific dynamism, we hope to show you how friendly our community is.

Schedule subject to change
All times listed are in CEST
Tuesday, July 25th
13:50-14:00
Opening Speech
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Anna-Sophie Fiston-Lavier

  • Guy Perrière
14:00-14:30
Invited Presentation: French Society of Bioinformatics (SFBI)
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Guy Perrière

  • Anna-Sophie Fiston-Lavier, SFBI, France


Presentation Overview: Show

SFBI is an association (under the French law of 1901) independent of supervisory bodies, institutes and companies, representing the entire bioinformatics community, whatever the scientific field, and whatever the laboratory (private or public), and French-speaking! Since 2021, SFBI has been a founding, active and associate member of the Collège des Sociétés Savantes Académiques de France (CoSSAF). SFBI represents the entire Bioinformatics community: researchers, students, lecturers, engineers, laboratories and platforms, and covers all disciplinary fields (interface with Molecular Biology, Computer Science, Maths and Statistics, Physics).
SFBI's aim is to promote interdisciplinary research in France at the interface between Molecular Biology, Computer Science, Mathematics and Statistics, and Physics; to bring together the French-speaking community for scientific meetings and exchanges; to promote and encourage the training, participation and integration of young bioinformaticians in particular; and to contribute to the visibility of bioinformatics.

14:30-15:00
Invited Presentation: Institut Français de Bioinformatique, the french node of ELIXIR-FR
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Guy Perrière

  • Anne-Francoise Adam-Blondon, INRAE, FRANCE
  • Jacques Van Helden, Aix-Marseille University, FRANCE
  • Gildas Le Corguillé, CNRS, FRANCE
  • Hélène Chiapello, INRAE, FRANCE
  • David Salgado, INSERM, FRANCE
  • Julien Seiler, CNRS, FRANCE
  • Christine Gaspin, INRAE, FRANCE
  • Christophe Blanchet, CNRS, FRANCE
  • Yves Vandenbrouck, CEA, FRANCE
  • Guy Perriere, CNRS, FRANCE


Presentation Overview: Show

The French Institute of Bioinformatics (IFB) is a federation of bioinformatic facilities that constitute the french node of the European Infrastructure ELIXIR. IFB contribute to the activities of the ELIXIR platforms and communities, which objectives are to provide transnational services to European researchers in life science but also to strengthen and make more impactful the services provided by its nodes at the national level. Recently, a strong focus has been set on the development of resources at the national and European level aiming at supporting an upscale of FAIR data management planning, of reproducible environment of analysis and of training. These resources are now leveraged in recently launched large national projects.

15:00-15:10
Statistical inference of repeated sequence contacts in Hi-C maps (Hi-C BERG)
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Sandra Derozier

  • Sébastien Gradit


Presentation Overview: Show

Increasingly detailed investigations of the spatial organization of genomes reveal that chromosome folding influences or regulates dynamic processes such as transcription, DNA repair and segregation. Hi-C approach is commonly used to characterize genome architecture by quantifying physical contacts’ frequency between pairs of loci through high-throughput sequencing. These sequences cause challenges during the analysis’ alignment step, due to the multiplicity of plausible positions to assign sequencing reads. These unknown parts of the genome architecture, that may contain biological information, remains hidden throughout downstream functional analysis. To overcome these limitations, we have developed HiC-BERG, a method combining statistical inference with input from DNA polymer behavior characteristics and features of the Hi-C protocol to assign with robust confidence repeated reads in a genome and "fill-in" empty vectors in contact maps. HiC-BERG is intended to be applicable to different types of organisms. We will present the program and key validation tests, before applying it to unveil hidden parts of the genomes of E.coli, S.cerevisiae and P.falciparum. HiC-BERG shows that repeated sequences may be involved in singular genomic architectures. Our method can provide an alternative visualization of genomic contacts under a wide variety of biological conditions allowing a more complete view of genome plasticity.

15:10-15:20
The impact of similarity metrics on cell type clustering in highly multiplexed in situ imaging cytometry data
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Sandra Derozier

  • Elijah Willie


Presentation Overview: Show

Highly multiplexed in situ imaging cytometry assays have enabled researchers to scrutinize cellular systems at an unprecedented level. With the capability of these assays to simultaneously profile the spatial distribution and molecular features of many cells, unsupervised machine learning, and in particular clustering algorithms, have become indispensable for identifying cell types and subsets based on these molecular features. However, the most widely used clustering approaches applied to these novel technologies were developed for cell suspension technologies and may not be optimal for in situ imaging assays. In this work, we systematically evaluated the performance of various similarity metrics used to quantify the similarity between cells when clustering. Our results demonstrate that performance in cell clustering varies significantly when different similarity metrics were used. Lastly, we propose FuseSOM, an ensemble clustering algorithm employing hierarchical multi-view learning of similarity metrics and self-organizing maps (SOM). Using a stratified subsampling analysis framework, FuseSOM exhibits superior clustering performance compared to the current state-of-the-art clustering approaches for in situ imaging cytometry data analysis.

15:20-15:30
Towards a machine learning approach for automated detection of well-to-well contamination in metagenomic data
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Sandra Derozier

  • Lindsay Goulet, Université Paris-Saclay, INRAE, MGP MetaGenoPolis, Jouy-en-Josas, 78350, France, France
  • Florian Plaza Oñate, Université Paris-Saclay, INRAE, MGP MetaGenoPolis, Jouy-en-Josas, 78350, France, France
  • Edi Prifti, Unité de Modélisation Mathématique et Informatique des Systèmes Complexes, UMMISCO, Sorbonne Université, IRD, Bondy, France
  • Eugeni Belda, Unité de Modélisation Mathématique et Informatique des Systèmes Complexes, UMMISCO, Sorbonne Université, IRD, Bondy, France
  • Emmanuelle Le Chatelier, Université Paris-Saclay, INRAE, MGP MetaGenoPolis, Jouy-en-Josas, 78350, France, France
  • Guillaume Gautreau, Université Paris-Saclay, INRAE, MGP MetaGenoPolis, Jouy-en-Josas, 78350, France, France


Presentation Overview: Show

Samples subjected to metagenomic sequencing can be accidentally contaminated during wet lab steps (DNA extraction, library preparation) by DNA from an external source (e.g.: lab reagents) or from other samples processed on the same plate (well-to-well contamination). These can lead to biased results and eventually to false conclusions if not detected. Although a critical issue, well-to-well contamination remains understudied. A few tools have been developed but suffer from several limitations, such as a lack of sensitivity.

By inspecting species abundance profiles of published cohort samples, we identified specific patterns associated with well-to-well contamination. Here, we propose an original method based on the recognition of such patterns that accurately detects contamination events even at low rates (up to 1%). Our approach does not require negative controls, works with related samples that may naturally share strains (e.g.: mother/child), discriminates contamination sources from contaminated samples and estimates contamination rates.

However, this method is time-consuming and requires human expertise to manually inspect suspect cases. We are developing a fully automated tool, based on deep learning, trained with semi-simulated sequencing data to classify contaminated samples. As preliminary results are promising, we believe this method will significantly impact the field, making metagenomic experiments more robust.

16:00-16:10
Integrating metagenetic datasets through microbial association networks to compare microbial communities from lacto-fermented vegetables
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Sandra Derozier

  • Romane Junker


Presentation Overview: Show

The development of low-cost sequencing technologies has generated a massive amount of microbiome datasets in public repositories during the last 20 years. However, their reuse raises many difficulties making their comparison and integration very limited, even for a given ecosystem.
In this study, we present an integrative bioinformatics approach focusing on public metagenetic 16S datasets targeting lacto-fermented vegetables. This ecosystem needs to be better characterized regarding how microbial communities interact and evolve dynamically.
We have developed a workflow to explore, compare, and integrate public 16S datasets to conduct meta-analyses in the microbiota field. The workflow includes searching and selecting public time-series datasets and constructing Amplicon Sequence Variants (ASV) association networks based on co-abundance metrics. Microbial communities detection is achieved by comparison and clustering of ASVs networks. We applied the workflow to ten public datasets and demonstrated its value in monitoring precisely the fermentation with the identification of the bacterial communities succession and of putative core-consortia shared by different plant fermentation types.
Our integrative analysis demonstrates that the reuse and integration of microbiome datasets can provide new insights into a little-known biotope and add value to the independent analysis of individual studies.

16:10-16:20
GORi: automated biological characterization of gene signatures under the scope of multiple controlled vocabularies
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Sandra Derozier

  • Yanis Asloudj, Laboratoire Bordelais de Recherche en Informatique (LaBRI), France
  • Patricia Thébault, Laboratoire Bordelais de Recherche en Informatique (LaBRI), France
  • Fleur Mougin, Bordeaux Population Health (BPH), France


Presentation Overview: Show

The recent maturing of high-throughput sequencing technologies has revolutionized biology. Notably, it has enabled laboratories to routinely measure the global transcriptomic profiles of millions of biological entities, such as tissues or individual cells.

The analysis of these data often results in the identification of a gene signature, and its subsequent biological characterization is usually based on a singular source of curated knowledge, notably the Gene Ontology.

To study a gene signature under the scope of multiple controlled vocabularies instead, we have developed a computational tool called GORi (Gene-based Ontologies Relationships Inferences).

Given the gene annotations of the Gene Ontology, the Medical Subject Headings thesaurus, and the gene signature of a biological entity, GORi identifies the biological processes undertaken by the entity, and infers their associations to various diseases.

These results are obtained by measuring the co-occurence of gene annotations across the two controlled vocabularies, at every semantic resolution.
By coupling metrics borrowed from data-mining and biostatistical approaches, GORi identifes, characterizes and visualizes associations of interest.

Ongoing developments of GORi aim to build a data warehouse of seven controlled vocabularies, and to develop a local web application to make the tool more easily usable, for both biologists and bio-informaticians.

16:20-16:30
Tools for analysing spatial data in the context of immuno-oncology
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Sandra Derozier

  • Vera Pancaldi


Presentation Overview: Show

Single-cell spatially resolved proteomic or transcriptomic methods offer the opportunity to discover cell types interactions of biological or clinical importance. To extract relevant information from these data, we have produced a python library (tysserand, Coullomb and Pancaldi 2021, https://github.com/VeraPancaldiLab/tysserand) that extracts tissue networks from biological images. We also developed mosna (https://doi.org/10.1101/2023.03.16.532947), a Python package that exploits these networks to analyze spatially resolved experiments and discover patterns of cellular spatial organization. It includes the detection of preferential interactions between specific cell types. We have tested the method on different types of spatial proteomics and transcriptomics and have used public data from cancer patient samples annotated with clinical response to immunotherapy. mosna can identify a number of features describing cellular composition and spatial distribution that can provide biological hypotheses regarding factors that affect response to therapies.
Finally, mosna uses the Neighbors Aggregation Statistics method to assign a feature vector to each cell, describing the composition of its neighborhood with different statistics. Applying clustering of the cells based on these neighborhood features, we can identify niches, specific subsets of interacting cells, that can also be used to predict response to immunotherapy.

16:30-17:15
Invited Presentation: Design and application of deep neural networks for population genetics
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Guy Perrière

  • Flora Jay, LISN, CNRS, University Paris-Saclay, France


Presentation Overview: Show

As population genetic datasets keep increasing in size, it is common to observe millions of genomic markers sequenced for hundreds of individuals, opening the possibility of answering intricate biological questions. However, extracting relevant information from these genomic datasets is not trivial due to their size, the complexity of the underlying mechanisms, and sometimes impossible due to privacy rules that govern several human genome databases. Deep learning approaches have recently been introduced at different levels of population genetics for parameter inference, data visualization, or generation. I will give a short overview of current applications of discriminative and generative neural networks in this field and present some advances we made in designing architectures for evolutionary inference and data generation.

17:15-18:00
Invited Presentation: RNA bioinformatics: Still combinatorial in 2023?
Room: Pasteur Lounge
Format: Live from venue

Moderator(s): Anna-Sophie Fiston-Lavier

  • Yann Ponty


Presentation Overview: Show

Predictive Bioinformatics is currently under the dominion of machine learning, leveraging massive datasets to learn weight-valued neural networks to achieve superior average predictions. Such impressive feats sometimes come at the cost of inferior explanability, communicability and, in some cases, generalizability, silently impeding our collective capacity to investigate beyond well-studied (data-rich) subareas of biology. Accordingly, bioinformatics is slowly adapting its standards and evaluation metrics to detect, assess the extent of, and mitigate these shortcomings.

RNA structural bionformatics currently stands out as an exception. While many ML-fueled efforts for RNA 2D structure prediction were recently published in high-impact venues, independent validation efforts revealed reproducibility and generalizability issues. Correcting those issues lead to very limited performance gains (if any) when compared to textbook physics-inspired dynamic programming. Moreover, even for the rather

18:00-19:00
Poster session
Room: Pasteur Lounge
Format: Live from venue

19:00-22:00
Social Event
Room: Pasteur Lounge
Format: Live from venue