Posters - Schedules
Posters Home

View Posters By Category

Monday, July 24, between 18:00 CEST and 19:00 CEST
Tuesday, July 25, between 18:00 CEST and 19:00 CEST
Session A Poster Set-up and Dismantle
Session A Posters set up:
Monday, July 24, between 08:00 CEST and 08:45 CEST
Session A Posters dismantle:
Monday, July 24, at 19:00 CEST
Session B Poster Set-up and Dismantle
Session B Posters set up:
Tuesday, July 25, between 08:00 CEST and 08:45 CEST
Session B Posters dismantle:
Tuesday, July 25, at 19:00 CEST
Wednesday, July 26, between 18:00 CEST and 19:00 CEST
Session C Poster Set-up and Dismantle
Session C Posters set up:
Wednesday, July 26,between 08:00 CEST and 08:45 CEST
Session C Posters dismantle:
Wednesday, July 26, at 19:00 CEST
Virtual
B-161: Statistical inference of repeated sequence contacts in Hi-C maps (Hi-C BERG)
Track: Special Session: Bioinformatics in France
  • Sébastien Gradit, Institut Pasteur, France
  • Axel Cournac, Institut Pasteur, France


Presentation Overview: Show

Increasingly detailed investigations of the spatial organization of genomes reveal that chromosome folding influences or regulates dynamic processes such as transcription, DNA repair and segregation. Hi-C approach is commonly used to characterize genome architecture by quantifying physical contacts’ frequency between pairs of loci through high-throughput sequencing. These sequences cause challenges during the analysis’ alignment step, due to the multiplicity of plausible positions to assign sequencing reads. These unknown parts of the genome architecture, that may contain biological information, remains hidden throughout downstream functional analysis. To overcome these limitations, we have developed HiC-BERG, a method combining statistical inference with input from DNA polymer behavior characteristics and features of the Hi-C protocol to assign with robust confidence repeated reads in a genome and "fill-in" empty vectors in contact maps. HiC-BERG is intended to be applicable to different types of organisms. We will present the program and key validation tests, before applying it to unveil hidden parts of the genomes of E.coli, S.cerevisiae and P.falciparum. HiC-BERG shows that repeated sequences may be involved in singular genomic architectures. Our method can provide an alternative visualization of genomic contacts under a wide variety of biological conditions allowing a more complete view of genome plasticity.

B-162: Towards a machine learning approach for automated detection of well-to-well contamination in metagenomic data
Track: Special Session: Bioinformatics in France
  • Lindsay Goulet, Université Paris-Saclay, INRAE, MGP MetaGenoPolis, Jouy-en-Josas, 78350, France, France
  • Florian Plaza Oñate, Université Paris-Saclay, INRAE, MGP MetaGenoPolis, Jouy-en-Josas, 78350, France, France
  • Edi Prifti, Unité de Modélisation Mathématique et Informatique des Systèmes Complexes, UMMISCO, Sorbonne Université, IRD, Bondy, France
  • Eugeni Belda, Unité de Modélisation Mathématique et Informatique des Systèmes Complexes, UMMISCO, Sorbonne Université, IRD, Bondy, France
  • Emmanuelle Le Chatelier, Université Paris-Saclay, INRAE, MGP MetaGenoPolis, Jouy-en-Josas, 78350, France, France
  • Guillaume Gautreau, Université Paris-Saclay, INRAE, MGP MetaGenoPolis, Jouy-en-Josas, 78350, France, France


Presentation Overview: Show

Samples subjected to metagenomic sequencing can be accidentally contaminated during wet lab steps (DNA extraction, library preparation) by DNA from an external source (e.g.: lab reagents) or from other samples processed on the same plate (well-to-well contamination). These can lead to biased results and eventually to false conclusions if not detected. Although a critical issue, well-to-well contamination remains understudied. A few tools have been developed but suffer from several limitations, such as a lack of sensitivity.

By inspecting species abundance profiles of published cohort samples, we identified specific patterns associated with well-to-well contamination. Here, we propose an original method based on the recognition of such patterns that accurately detects contamination events even at low rates (up to 1%). Our approach does not require negative controls, works with related samples that may naturally share strains (e.g.: mother/child), discriminates contamination sources from contaminated samples and estimates contamination rates.

However, this method is time-consuming and requires human expertise to manually inspect suspect cases. We are developing a fully automated tool, based on deep learning, trained with semi-simulated sequencing data to classify contaminated samples. As preliminary results are promising, we believe this method will significantly impact the field, making metagenomic experiments more robust.

B-163: GORi: automated biological characterization of gene signatures under the scope of multiple controlled vocabularies
Track: Special Session: Bioinformatics in France
  • Yanis Asloudj, Laboratoire Bordelais de Recherche en Informatique (LaBRI), France
  • Patricia Thébault, Laboratoire Bordelais de Recherche en Informatique (LaBRI), France
  • Fleur Mougin, Bordeaux Population Health (BPH), France


Presentation Overview: Show

The recent maturing of high-throughput sequencing technologies has revolutionized biology. Notably, it has enabled laboratories to routinely measure the global transcriptomic profiles of millions of biological entities, such as tissues or individual cells.

The analysis of these data often results in the identification of a gene signature, and its subsequent biological characterization is usually based on a singular source of curated knowledge, notably the Gene Ontology.

To study a gene signature under the scope of multiple controlled vocabularies instead, we have developed a computational tool called GORi (Gene-based Ontologies Relationships Inferences).

Given the gene annotations of the Gene Ontology, the Medical Subject Headings thesaurus, and the gene signature of a biological entity, GORi identifies the biological processes undertaken by the entity, and infers their associations to various diseases.

These results are obtained by measuring the co-occurence of gene annotations across the two controlled vocabularies, at every semantic resolution.
By coupling metrics borrowed from data-mining and biostatistical approaches, GORi identifes, characterizes and visualizes associations of interest.

Ongoing developments of GORi aim to build a data warehouse of seven controlled vocabularies, and to develop a local web application to make the tool more easily usable, for both biologists and bio-informaticians.

B-164: Uncovering Disrupted Gene Regulatory Network and Transcriptomic Defects during Megakaryopoiesis in ETV6-Related Thrombocytopenia using Single-Cell RNA Sequencing
Track: Special Session: Bioinformatics in France
  • Timothée Bigot, Aix Marseille Univ, INSERM, INRAe, C2VN, Marseille, France, France
  • Elisa Gabinaud, Aix Marseille Univ, INSERM, INRAe, C2VN, Marseille, France, France
  • Laurent Hannouche, Aix Marseille Univ, INSERM, INRAe, C2VN, Marseille, France, France
  • Véronique Sbarra, Aix Marseille Univ, INSERM, INRAe, C2VN, Marseille, France, France
  • Elisa Andersen, Aix Marseille Univ, INSERM, INRAe, C2VN, Marseille, France, France
  • Delphine Bastelica, Aix Marseille Univ, INSERM, INRAe, C2VN, Marseille, France, France
  • Céline Falaise, APHM, CHU Timone, French Reference Center on Inherited Platelet Disorders, Marseille, France, France
  • Denis Bernot, APHM, CHU Timone, Marseille, France, France
  • Manal Ibrahim-Kosta, Aix Marseille Univ, INSERM, INRAe, C2VN, Marseille, France, France
  • Pierre-Emmanuel Morange, APHM, CHU Timone, France, France
  • Marie Loosveld, Aix Marseille Univ, CNRS, INSERM, CIML, Marseille, France, France
  • Paul Saultier, Aix Marseille Univ, INSERM, INRAe, C2VN, Marseille, France, France
  • Dominique Payet-Bornet, Aix Marseille Univ, CNRS, INSERM, CIML, Marseille, France, France
  • Marie-Christine Alessi, Aix Marseille Univ, INSERM, INRAe, C2VN, Marseille, France, France
  • Delphine Potier, Aix Marseille Univ, CNRS, INSERM, CIML, Marseille, France, France
  • Marjorie Poggi, Aix Marseille Univ, INSERM, INRAe, C2VN, Marseille, France, France


Presentation Overview: Show

Background: Germline mutations in the ETV6 gene are responsible for an inherited thrombocytopenia (IT). Although previous studies showed that ETV6 plays an important role in megakaryocyte (MKs) maturation, mechanisms by which ETV6 dysfunction promotes thrombocytopenia remain unclear.
Goal: Employ single-cell RNA sequencing to uncover disrupted gene regulatory networks and transcriptomic defects in ETV6-related thrombocytopenia.
Methods & results: Single-cell RNA sequencing was performed on CD34+-derived MKs issued from two ETV6-variant carriers. To analyze distinct populations in the control and patient cells, signature genes for each cell type were proposed. The transcriptomic profiles of ETV6-deficient cells differ from those of the controls, indicating that the similar dysregulation is observed in patients. We distinguished distinct cell populations involved in megakaryopoiesis for which we propose enriched transcriptomic signatures. ETV6-deficient condition is characterized by a higher proportion of hematopoietic-stem/progenitor-cells (HSPCs) and a reduced proportion of MKs. Pathway analysis identified ribosome biogenesis defect in MKs which was confirmed in vitro. The gene regulatory network (GRN), inferred using pySCENIC, highlighted an aberrant GATA2/GATA1 balance activity in patients.
Conclusion: ETV6-variants affect megakaryopoiesis, disrupt GRN and lead to abnormal ribosome biogenesis. We propose a systematic workflow to unravel the mechanism of other ITs.

B-165: Computational Design of RNA Based Modulators Targeting Maf-A protein for Multiple Myolema
Track: Special Session: Bioinformatics in France
  • Güneş Yıldırım Akdeniz, Sabanci University, Turkey
  • Hüveyda Başağa, Sabanci University, Turkey
  • Ahmet Can Timuçin, Acıbadem University, Turkey


Presentation Overview: Show

Multiple myeloma, the second most common hematological cancer worldwide, is a malignancy of plasma cells. Maf proteins, particularly MafA, are critical transcription factors in myelomagenesis. In line, Maf proteins together with their target DNA sequences may be used to design & optimize new promising therapeutic chemical modulators against multiple myeloma progression. With all this in consideration, here we report computational design of RNA based chemical modulator candidates, potentially capable of targeting Maf-A.
First, publicly available Maf-A structure (PDB ID:4EOT) which contains Maf-A dimer together with its target double stranded DNA was retrieved from protein data bank. Second, based on this structure, FoldX based unfolding energy calculations were implemented to select DNA mutations.Next, based on ΔΔG based changes, mutant DNA sequences have been generated.The RNA library generated from DNA sequences was further subjected to 3D structure prediction for docking studies.During global docking, Hdock software was used to delineate the unbiased preference of RNA library members.Second, the Hdock docked Maf-A-RNA complexes were re-docked by local docking software HADDOCK in an effort to prepare for future MD simulations and MM-PBSA studies.
Overall, here we plan to present computational design efforts towards for generation of novel RNA based chemical modulators, potentially targeting Maf-A protein.

B-166: The impact of similarity metrics on cell type clustering in highly multiplexed in situ imaging cytometry data
Track: Special Session: Bioinformatics in France
  • Elijah Willie, University of Sydney, Australia
  • Ellis Patrick, University of Sydney, Australia
  • Pengyi Yang, University of Sydney, Australia


Presentation Overview: Show

Highly multiplexed in situ imaging cytometry assays have enabled researchers to scrutinize cellular systems at an unprecedented level. With the capability of these assays to simultaneously profile the spatial distribution and molecular features of many cells, unsupervised machine learning, and in particular clustering algorithms, have become indispensable for identifying cell types and subsets based on these molecular features. However, the most widely used clustering approaches applied to these novel technologies were developed for cell suspension technologies and may not be optimal for in situ imaging assays. In this work, we systematically evaluated the performance of various similarity metrics used to quantify the similarity between cells when clustering. Our results demonstrate that performance in cell clustering varies significantly when different similarity metrics were used. Lastly, we propose FuseSOM, an ensemble clustering algorithm employing hierarchical multi-view learning of similarity metrics and self-organizing maps (SOM). Using a stratified subsampling analysis framework, FuseSOM exhibits superior clustering performance compared to the current state-of-the-art clustering approaches for in situ imaging cytometry data analysis.

B-167: MIASSM : A user-friendly application to explore a standardized database aggregating manually curated metadata of public metagenomic projects
Track: Special Session: Bioinformatics in France
  • Pauline Barbet, INRAE, MGP, Université Paris-Saclay, Jouy-en-Josas, 78350 France, France
  • Emmanuelle Le Chatelier, INRAE, MGP, Université Paris-Saclay, Jouy-en-Josas, 78350 France, France
  • Nicolas Maziers, INRAE, MGP, Université Paris-Saclay, Jouy-en-Josas, 78350 France, France
  • Nicolas Pons, INRAE, MGP, Université Paris-Saclay, Jouy-en-Josas, 78350 France, France
  • Mathieu Almeida, INRAE, MGP, Université Paris-Saclay, Jouy-en-Josas, 78350 France, France
  • Victoria Meslier, INRAE, MGP, Université Paris-Saclay, Jouy-en-Josas, 78350 France, France
  • Florian Plaza-Oñate, INRAE, MGP, Université Paris-Saclay, Jouy-en-Josas, 78350 France, France


Presentation Overview: Show

While the amount of metagenomic sequencing data is growing exponentially, associated metadata is often sparse, scattered across multiple repositories and lack homogeneity (e.g. changing field names and units). A resource gathering metadata from available public cohorts using standardized and unified terminology is needed by biostaticians and data analysts.
We introduce MIASSM, a web-based application to browse a standardized database aggregating manually curated metadata of public metagenomic projects. Database management relies on the document-oriented MongoDB system, which is particularly suitable for metagenomic studies that generally share a limited number of metadata fields. The RShiny interface allows easy and fast data querying by multiple criteria (sex, age etc.) at sample or study levels. We retrieved metadata from various sources (publications, INDSC, github etc.) and performed manual curation and homogenization according to the standardized database ontologies. Compared to similar tools (HumanMetagenomeDB, curatedMetagenomicData), we paid particular attention to flagging mislabelled or contaminated samples that could significantly bias downstream analyses.
To date, we have focused on studies related to the human intestinal microbiota but we plan to integrate other body sites and hosts soon. We also plan to make available not only metadata, but also taxonomic and functional profiles using a fully automated pipeline.

B-168: Inhibition of Mfd as an innovative strategy in the battle against antimicrobial resistance.
Track: Special Session: Bioinformatics in France
  • Samantha Samson, Université Paris-Saclay, INRAE, MaIAGE, 78350, Jouy-en-Josas, France, France
  • Delphine Cormontagne, Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78350, Jouy-en-Josas, France, France
  • Seav-Ly Tran, Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78350, Jouy-en-Josas, France, France
  • Lucie Lebreuilly, Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78350, Jouy-en-Josas, France, France
  • Jean-Christophe Cintrat, Université Paris-Saclay, CEA, Laboratoire de Chimie Bioorganique, 91191, Gif-sur-Yvette, France
  • Didier Rognan, Université de Strasbourg, CNRS, Laboratoire d'Innovation Thérapeutique, 67400, Illkirch, France, France
  • Nalini Rama Rao, Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78350, Jouy-en-Josas, France, France
  • Gwenaëlle André, Université Paris-Saclay, INRAE, MaIAGE, 78350, Jouy-en-Josas, France, France


Presentation Overview: Show

Drug-resistant bacterial infections result in at least 700,000 deaths every year; if nothing is done, it has been estimated that this number could rise up to 10 million by 2050, becoming the first cause of death worldwide. The declining discovery of new antibiotics has led to a lack of effective treatment options for these infections. Thus, the development of new antibacterials targeting innovative targets and mechanisms of action that have a low potential for resistance induction is crucial. Appropriately, this study focuses on the Mutation Frequency Decline protein (Mfd) as a promising bacterial target and on its inhibition potentiated in silico, in vitro and in vivo.
Mfd is a non-essential multifunctional protein and ubiquitous solely in bacteria. It was proven to be a virulence factor that is overexpressed in bacterial cells to overcome DNA damage caused by macrophage-generated nitric oxide (NO). Thus, it plays a crucial role in blocking the innate immune response during bacterial infection.
The purpose of this investigation is to identify molecules capable of inhibiting Mfd ATPase activity. We will mostly focus on the computational characterization of the target and hits, tested in vitro and in vivo, and how this drives the potentiation of hits into leads.

B-169: Integrating metagenetic datasets through microbial association networks to compare microbial communities from lacto-fermented vegetables
Track: Special Session: Bioinformatics in France
  • Romane Junker, Université Paris-Saclay, INRAE, MaIAGE, 78350, Jouy-en-Josas, France, France
  • Victoria Chuat, INRAE, Agrocampus Ouest, STLO, 35042, Rennes, France, France
  • Florence Valence, INRAE, Agrocampus Ouest, STLO, 35042, Rennes, France, France
  • Michel-Yves Mistou, Université Paris-Saclay, INRAE, MaIAGE, 78350, Jouy-en-Josas, France, France
  • Stéphane Chaillou, Université Paris-Saclay, INRAE, MICALIS, 78350, Jouy-en-Josas, France, France
  • Helene Chiapello, Université Paris-Saclay, INRAE, MaIAGE, 78350, Jouy-en-Josas, France, France


Presentation Overview: Show

The development of low-cost sequencing technologies has generated a massive amount of microbiome datasets in public repositories during the last 20 years. However, their reuse raises many difficulties making their comparison and integration very limited, even for a given ecosystem.
In this study, we present an integrative bioinformatics approach focusing on public metagenetic 16S datasets targeting lacto-fermented vegetables. This ecosystem needs to be better characterized regarding how microbial communities interact and evolve dynamically.
We have developed a workflow to explore, compare, and integrate public 16S datasets to conduct meta-analyses in the microbiota field. The workflow includes searching and selecting public time-series datasets and constructing Amplicon Sequence Variants (ASV) association networks based on co-abundance metrics. Microbial communities detection is achieved by comparison and clustering of ASVs networks. We applied the workflow to ten public datasets and demonstrated its value in monitoring precisely the fermentation with the identification of the bacterial communities succession and of putative core-consortia shared by different plant fermentation types.
Our integrative analysis demonstrates that the reuse and integration of microbiome datasets can provide new insights into a little-known biotope and add value to the independent analysis of individual studies.

B-170: Tools for analysing spatial data in the context of immuno-oncology
Track: Special Session: Bioinformatics in France
  • Vera Pancaldi, INSERM UMR 1037 Cancer Research Center of Toulouse, France
  • Abdelmounim Essabbar, INSERM UMR 1037 Cancer Research Center of Toulouse, France
  • Marcelo Hurtado, INSERM UMR 1037 Cancer Research Center of Toulouse, France
  • Alexis Coullomb, INSERM UMR 1037 Cancer Research Center of Toulouse, France
  • Leila Khajavi, INSERM UMR 1037 Cancer Research Center of Toulouse, France


Presentation Overview: Show

Single-cell spatially resolved proteomic or transcriptomic methods offer the opportunity to discover cell types interactions of biological or clinical importance. To extract relevant information from these data, we have produced a python library (tysserand, Coullomb and Pancaldi 2021, https://github.com/VeraPancaldiLab/tysserand) that extracts tissue networks from biological images. We also developed mosna (https://doi.org/10.1101/2023.03.16.532947), a Python package that exploits these networks to analyze spatially resolved experiments and discover patterns of cellular spatial organization. It includes the detection of preferential interactions between specific cell types. We have tested the method on different types of spatial proteomics and transcriptomics and have used public data from cancer patient samples annotated with clinical response to immunotherapy. mosna can identify a number of features describing cellular composition and spatial distribution that can provide biological hypotheses regarding factors that affect response to therapies.
Finally, mosna uses the Neighbors Aggregation Statistics method to assign a feature vector to each cell, describing the composition of its neighborhood with different statistics. Applying clustering of the cells based on these neighborhood features, we can identify niches, specific subsets of interacting cells, that can also be used to predict response to immunotherapy.

B-171: Genes encoding teleost orthologs of human signal transduction proteins remain in duplicate or in triplicate more frequently than the whole genome
Track: Special Session: Bioinformatics in France
  • Floriane Picolo, INRAe, France
  • Philippe Monget, INRAe, France
  • Benoît Piégu, INRAe, France


Presentation Overview: Show

Cell signaling involves numerous proteins, many belonging to related families, resulting in extensive interactions. Whole genome duplication (WGD) events facilitated gene innovation. Teleost genomes experienced additional WGDs, including one after divergence from holostei and independent events in salmonids and carps. Preserving duplicated genes often follows a dose effect, influenced by interaction product dosage. We examined 63 teleost species to determine if orthologs of human genes in 47 signaling pathways (HGSP) remained more frequently duplicated, triplicated, or singleton compared to the whole genome. Results showed teleost genes in the 3WGD group remained more frequently duplicated, while those in the 4WGD group were triplicated. Similar trends were observed for ligand/membrane receptor genes. Analyzing pairs of interacting gene products revealed prevalent 1:1 and 1:2 proportions in the 3WGD group and 2:2 and 2:4 proportions in the 4WGD group. Proportions varied across pathways, with n:n and n:m ratios ranging from 20% to 65% and 34% to 70%, respectively. Notably, JAK-STAT, FoxO, and Glucagon pathways showed no gene loss. These findings demonstrate that teleost HGSP orthologs are often retained in duplicate or triplicate, with some gene losses and varying proportions among pathways.

B-172: Repeat expression analysis made easy
Track: Special Session: Bioinformatics in France
  • Ali Hamraoui, BiBs, Université Paris Cité, CNRS, Epigenetics and Cell Fate, F-75013 Paris, France, France
  • Magali Hennion, BiBs, Université Paris Cité, CNRS, Epigenetics and Cell Fate, F-75013 Paris, France, France


Presentation Overview: Show

Repeated elements constitute a huge fraction of many genomes, with about half of the sequence consisting of repetitive elements in humans. A significant part of those sequences are transcribed and transcription misregulation is associated with several diseases, including rare genetic diseases and cancers. Most RNAseq analysis ignore repeated elements, but several tools have been developed to specifically look at their transcription. We implemented two approaches to study repeat expression, one based on featureCounts and one on TEtranscripts, into a complete analysis workflow coded with Snakemake and based on RASflow. The workflow is also routinely used for standard gene expression analysis using either genome mapping (HISAT2 or STAR) or fast transcript-level quantification (Salmon). It runs in a dedicated Singularity/Apptainer image to allow for reproducibility and was optimized to compute efficiently the data on HPC clusters using Slurm such as the ones from the IFB network. As we aim to make those complex analyses doable by biologists with no or little bioinformatics background, the user can configure the workflow thanks to a simple yaml file. The results are presented in a HTML report including interactive plots for an easy visualization.

B-173: Methylator, a complete workflow for DNA methylation analysis
Track: Special Session: Bioinformatics in France
  • Elouan Bethuel, BiBs, Université Paris Cité, CNRS, Epigenetics and Cell Fate, F-75013 Paris, France, France
  • Magali Hennion, BiBs, Université Paris Cité, CNRS, Epigenetics and Cell Fate, F-75013 Paris, France, France
  • Olivier Kirsh, BiBs, Université Paris Cité, CNRS, Epigenetics and Cell Fate, F-75013 Paris, France, France


Presentation Overview: Show

Epigenetic marks regulate gene transcription, as well as genome replication, and repair. In mammals, DNA methylation on CpGs was shown to be essential for normal development and is associated with a number of key processes including genomic imprinting, X-chromosome inactivation, repression of transposable elements, aging, and carcinogenesis. DNA methylation is set up, maintained, decoded, and eliminated by various actors, whose mutations accompany numerous human diseases such as cancers or neuro-developmental illness. Several methods allow for genome-wide analysis of this mark, but data analysis remains complex and computationaly intense. We implemented Methylator, a Snakemake workflow to analyse DNA methylation data coming from WGBS, RRBS, Nanopore-sequencing and eventually Illumina BeadChips data. Methylator runs in a dedicated Singularity/Apptainer image to allow for reproducibility and was optimized to compute efficiently the data on HPC clusters such as the ones from the IFB and iPOP-UP networks. As we aim to make those complex analyses do able by biologists with no or little bioinformatics background, the user can configure the workflow thanks to a simple yaml file. The results are presented in a HTML report.

B-174: Prediction of antibiotic resistance in Pseudomonas aeruginosa from genomic data using machine learning approaches
Track: Special Session: Bioinformatics in France
  • Elsa Denakpo, Institut de Chimie des Substances Naturelles (ICSN), CNRS, Gif-sur-Yvette, France
  • Bogdan Iorga, Institut de Chimie des Substances Naturelles (ICSN), CNRS, Gif-sur-Yvette, France


Presentation Overview: Show

Pseudomonas aeruginosa is a challenging pathogen with complex antibiotic resistance patterns. Traditionally, antibiotic susceptibility is evaluated by measuring the Minimum Inhibitory Concentration (MIC) using serial dilutions, which is tedious and time-consuming. An interesting alternative is the in silico prediction of antibiotic susceptibility from genomic data, potentially saving time and resources.

Within our ANR project Seq2diag we generated a representative dataset containing sequencing data and MIC values for 1131 Pseudomonas aeruginosa strains and 24 clinically-relevant antibiotics.

In this study, this dataset is used to develop machine learning models to predict the antibiotic susceptibility based on genomic data.

The sequences were encoded into three formats: k-mers, n-grams and “knowledge-based” (identification of mutations of antibiotic resistance-related proteins). The resulting data form a feature matrix of presence/absence of each k-mer/n-gram/mutation in the sequences.

Several models including Random Forest, Extreme Gradient Boosting, Support Vector Machines were tested for classification and regression tasks. The performance of each model was evaluated using cross-validation. Variable performance was observed, and possible explanations for these results are discussed on selected examples.

In conclusion, our work highlights the potential and the limitations of machine learning methods for predicting the antibiotic susceptibility in Pseudomonas aeruginosa.

B-175: TEFLoN2, an automized and accurate computational tool for detecting and analyzing insertions of transposable elements in population data
Track: Special Session: Bioinformatics in France
  • Corentin Marco, University of Montpellier - Institut of Sciences and Evolution (ISEM), France
  • Clothilde Chenal, University of Montpellier - Institut of Sciences and Evolution (ISEM) - MIVEGEC - IRD, France
  • Michael Fontaine, CNRS - MIVEGEC: Maladies Infectieuses et Vecteurs: Ecologie, Génétique, Evolution et Contrôle (IRD 224-CNRS 5290-UM), France
  • Anna-Sophie Fiston-Lavier, University of Montpellier - Institut of Sciences and Evolution (ISEM) - French Academic Institute (IUF), France


Presentation Overview: Show

Transposable elements (TEs) are mobile, repetitive and mutagenic elements of DNA known to be important components of eukaryotic genomes and major actors in genome evolution. In order to estimate their impact on genome evolution, we need to detect them from individual to populations. Detection of TE insertions using paired-end reads revealed a high false-positive rate, up to 40%. Even if long–read sequencing technologies overcome this detection issue, this type of sequencing is not suitable for population genomics studies that require a large amount of resequencing data from multiple samples and for a reasonable cost.
One of the most promising tools highlighted in previous benchmark studies is TEFLoN that uses short-read pooled-data. While TEFLoN is easy to install and use, many technical limitations have been identified such as the fact that each script must be launched independently without parallelization that makes it time and memory consuming. Thus we developed an automatic and optimized version of this tool called TEFLoN2 by upgrading the code and developing a SnakeMake pipeline. We also developed a new module accurately estimating the TE frequency in pooled or large single sequencing dataset. We were then able to appreciate its accuracy in simulated and public resequencing data.
https://github.com/asfistonlavie/TEFLoN2

B-175: TEFLoN2, an automized and accurate computational tool for detecting and analyzing insertions of transposable elements in population data
Track: Special Session: Bioinformatics in France
  • Corentin Marco, University of Montpellier - Institut of Sciences and Evolution (ISEM), France
  • Clothilde Chenal, University of Montpellier - Institut of Sciences and Evolution (ISEM) - MIVEGEC - IRD, France
  • Michael Fontaine, CNRS - MIVEGEC: Maladies Infectieuses et Vecteurs: Ecologie, Génétique, Evolution et Contrôle (IRD 224-CNRS 5290-UM), France
  • Anna-Sophie Fiston-Lavier, University of Montpellier - Institut of Sciences and Evolution (ISEM) - French Academic Institute (IUF), France


Presentation Overview: Show

Transposable elements (TEs) are mobile, repetitive and mutagenic elements of DNA known to be important components of eukaryotic genomes and major actors in genome evolution. In order to estimate their impact on genome evolution, we need to detect them from individual to populations. Detection of TE insertions using paired-end reads revealed a high false-positive rate, up to 40%. Even if long–read sequencing technologies overcome this detection issue, this type of sequencing is not suitable for population genomics studies that require a large amount of resequencing data from multiple samples and for a reasonable cost.
One of the most promising tools highlighted in previous benchmark studies is TEFLoN that uses short-read pooled-data. While TEFLoN is easy to install and use, many technical limitations have been identified such as the fact that each script must be launched independently without parallelization that makes it time and memory consuming. Thus we developed an automatic and optimized version of this tool called TEFLoN2 by upgrading the code and developing a SnakeMake pipeline. We also developed a new module accurately estimating the TE frequency in pooled or large single sequencing dataset. We were then able to appreciate its accuracy in simulated and public resequencing data.
https://github.com/asfistonlavie/TEFLoN2

B-176: Genomics Data Analysis Facilities in a Biomedical Research Institute
Track: Special Session: Bioinformatics in France
  • Justine Guégan, Paris Brain Institute, France
  • Beáta György, Paris Brain institute, France
  • Thomas Gareau, Paris Brain institute, France
  • Emeline Cherchame, Paris Brain institute, France
  • Corentin Raoux, Paris Brain Institute, France
  • Stephen Whitmarsh, Paris Brain institute, France
  • Violetta Zujovic, Paris Brain Institute, France


Presentation Overview: Show

Data Analysis Core facility (DAC, https://dac.institutducerveau-icm.org) provides the Paris Brain Institute (ICM) with structural support and expertise in processing, integrating, and analyzing fundamental and clinical neuroscience research data. The DAC is organized into four divisions: data governance & open science, statistics & methodology, image analysis & AI, and omics analysis. The facility is available to both internal and external research teams and industry.

The omics analysis division’s activity assists researchers from the study design to data processing, analysis, and interpretation of mainly Next-Generation Sequencing (NGS) data. Our researcher-centered approach allows ICM to exploit the latest technologies for new scientific questions and discoveries. Essential for the reproducibility and scalability of increasingly complex omics analysis, specialized automated pipelines are developed (using Snakemake and Conda), and implemented on local Illumina Dragen servers, as well as local HPC CPU nodes depending on NGS technology.

For the purpose of fast and in-depth exploration of the results by researchers, the omics division develops dedicated graphical interfaces (QuBy) to explore transcriptomics data from bulk and single-cell RNA-seq experiments, and variants identified in the frame of ICM studies (DejaVu) within a R/Shiny framework.

B-177: Study of sporadic Alzheimer’s disease using Single-cell transcriptomics of Human Forebrain Assembloids
Track: Special Session: Bioinformatics in France
  • Emeline Cherchame, Paris Brain Institute, France
  • Benjamin Galet, Paris Brain institute, France
  • Clément Daube, Paris Brain institute, France
  • Amal Karsi, Paris Brain institute, France
  • Stéphanie Bigou, Paris Brain Institute, France
  • Sara Majello, Paris Brain institute, France
  • Ulveling Damien, Scipio bioscience, France
  • Komatsu Jun, Scipio bioscience, France
  • Justine Guégan, Paris Brain institute, France
  • Philippe Ravassard, Paris Brain Institute, France


Presentation Overview: Show

This study focuses on Alzheimer's disease, a neurodegenerative disease affecting cognitive functions. Research conducted at the Brain Institute aims to better understand the mechanism of the disease and to find new therapeutic solutions. This study focuses on forebrain assembloids carrying an APOE4/4 genotype, the strongest genetic risk factor identified for sporadic AD, compared to APOE3/3 isogenic controls. Three cellular populations were labeled and purified by FACS ((1) Excitatory neurons: CAMK2a (GFP+), (2) Inhibitory neurons: DLX1 (mScarlet+)), or by immunopanning, (3) Astrocytes (Hepacam+). The purified cells were used for 3’single cell transcriptome library preparation by RevGel-Seq (Komatsu et al., 2023). Then, 3’ sequencing was done with Illumina technology. Count matrices were computed using pre-processing pipeline of Cytonaut™ software. Further analyses were performed by Data Analysis Core (DAC). Data were analysed using Seurat v.4 package (Hao et al., 2021). Cells were annotated using clustifyr (Fu et al., 2019) on public data. VoxHunt tool was used to assess regional patterning (Fleck et al., 2021). Specific genes markers expressions were also used to refine annotations. The different annotation approaches used allowed for a fine annotation of cells. Differential analyses were performed between APOE4/4 and APOE3/3 conditions, for the three different celltypes (CAMK2a, DLX1, Astrocytes).

B-178: Multiview machine learning integration of multiplex spectral confocal microscopy and spatial transcriptomics in Peyer’s patches
Track: Special Session: Bioinformatics in France
  • Alexandre Bonomo, LIS (Laboratoire d'Informatique et Systèmes), CIML (Centre d'immunologie Marseille Luminy), France


Presentation Overview: Show

Spatial transcriptomics is a powerful technique allowing to identify mRNA expression sites in a tissue of interest. Some sequencing protocol allows to gather whole transcriptome, but are not single cell resolved, whilst other are limited to a thousand genes, but are sub-cellular resolved.

In order to better characterize samples, analysis of spot-based methods to identify gene of interest followed by a cellular or sub-cellular methodology is of great interest. However imaging techniques, such as multiplex spectral confocal microscopy can be used to increase the resolution of the former methods.

Here, we propose a novel method that uses immuno-histochemistery, performed right before tissue permeabilization, to gather an identity for the cells composing the spots, and their cell to cell interactions. This information can then be integrated in a multi-view machine learning model in order to help downstream analysis tasks such as inference of the gene expression in zones not covered by spots, tissue segmentation, marker gene identification, etc.

We present an application to Peyer’s patches samples as an example of what can be done.