Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in COT
Wednesday, November 13th
10:00-10:15
Session: Education
CABANAnet in numbers: seven years of building bioinformatics and knowledge exchange capacity in Latin America
Confirmed Presenter: Rebeca Campos-Sánchez, Universidad de Costa Rica, Costa Rica

Room: Room B502
Format: In Person


Authors List: Show

  • Rebeca Campos-Sánchez, Universidad de Costa Rica, Costa Rica
  • Alejandro Reyes-Muñoz, Universidad de los Andes, Colombia
  • Adrián Turjanski, Universidad de Buenos Aires, Argentina
  • Carlos Modenutti, Universidad de Buenos Aires, Argentina
  • Jan Kreuze, Centro Internacional de la Papa, Peru
  • Andrés Gatica-Arias, Universidad de Costa Rica, Costa Rica
  • Valeria Faggioli, Instituto Nacional de Tecnología Agropecuaria, Argentina

Presentation Overview: Show

The CABANA project (Capacity Building for Bioinformatics in Latin America), funded by the UK's Global Challenges Research Fund from 2017 to 2022, aimed to strengthen bioinformatics capacity and broaden applications across Latin America. Its continuation, CABANAnet (funded by the Chan Zuckerberg Initiative), is building upon this foundation since 2022. CABANAnet's initiatives concentrate on three key areas of transformation: tackling communicable diseases, enhancing sustainable food production, and safeguarding biodiversity. Our comprehensive capacity-building program encompasses a variety of activities, including collaborative research projects, hands-on workshops, development of the Train-the-Trainer program, internship opportunities, online learning materials, knowledge exchange meetings, web resource development, and publication of research findings. We are developing five multinational research projects, where 11 secondees collaborate with the data analysis. We generated three eLearning tutorials and delivered four workshops, among other activities. We have trained more than 900 people and have 14 collaborating institutions in Latin America. We're actively seeking partnerships with complementary bioinformatics organizations such as SoiBio, ISCB, Global Data Alliance, and others to create synergistic relationships. This multifaceted approach aims to catalyze positive changes in regional policies, economic landscapes, and scientific progress across the area.

10:15-10:30
Session: Education
Cerrando la «brecha de conocimiento» en biología computacional: Wikipedia en español como caso de estudio
Confirmed Presenter: Nelly Selem, Langebio Cinvestav, Mexico

Room: Room B502
Format: In Person


Authors List: Show

  • Nelly Selem, Langebio Cinvestav, Mexico
  • Tülay Karakulak, Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich,, Switzerland
  • Audra Anjum, Office of Instructional Design, Ohio University, Athens, United States
  • Antón Pashkov, ENES, Mexico
  • Rafael Pérez Estrada, ENES, Mexico
  • Karina Enriquez Guillén, ENES, Mexico
  • Dan De Blasio, Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh,, United States
  • Sofia Ferrerira-Gonzalez, Centre for Regenerative Medicine, Institute for Regeneration and Repair, The University of Edinburgh, Edinburgh, United Kingdom
  • Alejandra Medina-Rivera, Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Juriquilla, Mexico
  • Daniel Rodrigo-Torres, Centre for Regenerative Medicine, Institute for Regeneration and Repair, The University of Edinburgh, Edinburgh, United Kingdom
  • Alastair Kilpatrick, Centre for Regenerative Medicine, Institute for Regeneration and Repair, The University of Edinburgh, Edinburgh, United Kingdom
  • Lonnie Welch, School of Electrical Engineering and Computer Science, Ohio University, United States
  • Farzana Rahman, School of Computer Science and Mathematics, Faculty of Engineering, Computing and the Environment, United Kingdom

Presentation Overview: Show

Motivación: Wikipedia es un recurso educativo abierto vital en biología computacional. La calidad de la cobertura de biología computacional en Wikipedia en inglés ha mejorado de manera constante en los últimos años. Sin embargo, existe una ""brecha de conocimiento"" cada vez mayor entre los recursos de biología computacional en Wikipedia en inglés y las Wikipedias en idiomas distintos del inglés. Reducir esta brecha de conocimiento al proporcionar recursos educativos en idiomas distintos del inglés reduciría las barreras lingüísticas que perjudican a los estudiantes no nativos de habla inglesa en múltiples dimensiones en biología computacional.
Resultados: Aquí, proporcionamos una evaluación integral de la cobertura de biología computacional en Wikipedia en español, la segunda Wikipedia más visitada en todo el mundo. Usando Wikipedia en español como estudio de caso, generamos datos cuantitativos y cualitativos antes y después de un evento educativo específico, específicamente, una competencia de edición para estudiantes enfocada en español. Nuestros datos demuestran cómo dichos eventos y actividades pueden reducir la brecha de conocimiento entre los recursos educativos en inglés y en otros idiomas, al mejorar los artículos existentes y crear nuevos artículos. Finalmente, con base en nuestro análisis, sugerimos formas de priorizar futuras iniciativas para mejorar los recursos educativos abiertos en otros idiomas.

10:30-10:45
Session: Education
Proceedings Presentation: ShinyTHOR app: Shiny-built Tumor High-throughput Omics-based Roadmap
Confirmed Presenter: Anthony Vladimir Campos-Segura, International Center of Research CIPE, A.C.Camargo Cancer Center, Sao Paulo 01509-010, Brazil., Brazil

Room: Room B502
Format: In Person


Authors List: Show

  • Eduardo Navarrete-Bencomo, Centro de Ciencias Genómicas (UNAM), Unidad Profesional Interdisciplinaria de Biotecnología (UPIBI IPN), Mexico
  • Alexis Germán Murillo-Carrasco, Immunology and Cancer Research Group (IMMUCA), Peru
  • Anthony Vladimir Campos-Segura, International Center of Research CIPE, A.C.Camargo Cancer Center, Sao Paulo 01509-010, Brazil., Brazil
  • Orlando Sevillano, Instituto de Medicina Tropical, Faculdade de Medicina, Universidade de São, São Paulo, 05403-000, Brazil, Brazil
  • Ana Mayanga, Cancer and Stem Cells Research Group, Universidad Científica del Sur, Lima, Peru, Peru
  • José Luis Buleje-Sono, Centro de Genética y Biología Molecular, Universidad de San Martín de Porres, Lima, Peru, Peru
  • César Ortíz, Hospital das Clínicas HCFMUSP, Faculdade de Medicina, Universidade de São Paulo, São Paulo ZIP 01246-000, SP, Brazil, Brazil

Presentation Overview: Show

Cancer cells cultured in a controlled in vitro environment provide scientists with functional and comparable experimental models. These models facilitate a holistic understanding of molecular and cellular biology mechanisms associated with tumor formation and progression. Therefore, it is critical to account with an intuitive atlas of cell line omics for cancer to guide further bench hypotheses. We developed ShinyThor, a web app intended to offer multi-omic (transcriptomic, metabolomic, proteomic, and miRNomic) and drug-related data intuitively and dynamically to help researchers find relevant analytes. This app uses data from the Cancer Cell Line Encyclopedia (CCLE) for multi-omic data from cancer cell lines, miRTarBase for miRNA-gene validated associations, circInteractome for circular RNA (circRNA) regions potentially targeting genes, and Genomics of Drug Sensitivity in Cancer for drug sensitivity data.
The ShinyThor app consists of five modules. The first module allows researchers to find the top (and bottom) ten cell lines expressing a determined marker for each omic content or 50% inhibition concentration (IC50) for different drugs. Then, it was added a module for running a single-cell Gene Set Enrichment Analysis (ssGSEA). In addition, the multiple analyte expression allows users to simultaneously evaluate expression levels of different targets. Next, the fourth module evaluates miRNA-target interactions (MTI) and miRNA expression profiles in different cancer types. Finally, the modulation section allows users to collect information on circRNAs and miRNAs and their potential action as gene silencers. To explain the use of this app, we described a hypothesis driven by a previous study. Herein, potential cell lines and markers potentially used for validating a previously published study in gastric cancer prognosis are described. The ShinyThor web app is freely available to non-commercial users at https://alexismurillo.shinyapps.io/ShinyThor and the source code can be accessed at https://github.com/Murillo22/ShinyThor.

10:45-11:00
Session: Education
scExplorer: A Comprehensive Web Server for Single-Cell RNA Sequencing Data Analysis
Confirmed Presenter: Sergio Hernández-Galaz, Universidad San Sebastian, Fundacion Ciencia & Vida, Chile

Room: Room B502
Format: In Person


Authors List: Show

  • Sergio Hernández-Galaz, Universidad San Sebastian, Fundacion Ciencia & Vida, Chile
  • Andres Hernández, Fundacion Ciencia & Vida, Chile
  • Alberto Martín, Universidad San Sebastian, Fundacion Ciencia & Vida, Chile
  • Alvaro Lladser, Universidad San Sebastian, Fundacion Ciencia & Vida, Chile
  • Felipe Villanelo, Universidad San Sebastian, Fundacion Ciencia & Vida, Chile

Presentation Overview: Show

Single-cell RNA sequencing (scRNA-seq) has become a transformative technique in gene expression analysis, enabling the profiling of thousands of individual cells within a single experiment. This advancement has significant implications for biomedical research, revealing novel cellular markers and providing insights into developmental biology and other physiological processes. Despite the power of scRNA-seq, the complexity of its analysis remains a significant barrier for many researchers who lack programming skills. Here we describe ScExplorer, an interactive web application designed to simplify scRNA-seq data analysis. ScExplorer focuses on essential tasks such as preprocessing, dimensional reduction, differential expression analysis, and integrative data visualization, streamlining the analytical workflow and making it more accessible for non experts. This new tool employs robust tools like Scanpy for Python and Seurat V4 for R, supporting various data formats of frequent use in scRNA-seq analysis, including .h5ad, .h5, .mtx, and 10x Genomics output. Quality metrics assessment, PCA for dimensional reduction, UMAP for non-linear reduction, and Leiden clustering are key features of ScExplorer, along with differential expression analysis to identify significant genes per cell type. Analysis results can be exported in both Python and R formats, facilitating further analysis. ScExplorer also integrates Scrublet for doublet detection, enhancing the accuracy of downstream analyses. To address batch effects, ScExplorer offers several integration methods such as Combat, Scanorama, BKNN, and Harmony, each tailored to harmonize datasets from different sources. The infrastructure of ScExplorer relies on FastAPI for the backend and JavaScript Express for the frontend, ensuring a responsive user experience, while a SLURM scheduler manages computational demands efficiently. The platform's design allows for local installation with a minimum requirement of 16 GB RAM and Docker, making it accessible to a wide range of users. Overall, ScExplorer democratizes scRNA-seq data analysis by providing a user-friendly interface combined with comprehensive analytical capabilities. This empowers researchers, especially those with limited computational expertise, to conduct sophisticated analyses and derive meaningful biological insights from their data. The development of ScExplorer represents a significant step forward in making scRNA-seq analysis more accessible and efficient, fostering a deeper understanding of cellular processes and advancing biomedical research.

11:00-11:15
Session: Education
Integration of genomic and bioinformatic analysis in medical education: A Project-Based Pedagogical approach for medical students
Confirmed Presenter: Ivon Andrea Bolaños-Martinez, Facultad de Salud, Pontificia Universidad Javeriana Cali, Colombia

Room: Room B502
Format: In Person


Authors List: Show

  • Ivon Andrea Bolaños-Martinez, Facultad de Salud, Pontificia Universidad Javeriana Cali, Colombia
  • Fabian Tobar Tosse, Facultad de Salud, Pontificia Universidad Javeriana Cali, Colombia
  • Jose Guillermo Ortega Avila, Facultad de Salud, Pontificia Universidad Javeriana Cali, Colombia
  • Juliana Lores Espinosa, Facultad de Salud, Pontificia Universidad Javeriana Cali, Colombia
  • Elizabeth Londoño-Velasco, Facultad de salud, Pontificia Universidad Javeriana Cali, Colombia

Presentation Overview: Show

The learning of genomics and bioinformatics in the classroom, through projects that analyze the content of disease-associated genes, promotes a practical and collaborative approach that combines genome analysis with scientific writing and publication. This strategy, specifically applied to medical students, allows them to obtain data directly from genomic analysis and use bioinformatic tools to investigate diseases, generating relevant scientific results. As a result, some papers have been published in indexed journals. For the development of their papers, students used key databases such as Orphanet, Malacard, OMIM, NCBI, the Human Genome Nomenclature Committee (HUGO), UCSC Genome Browser, and STRING, as well as custom bioinformatic tools like Python programming. This methodology fosters the use of emerging technologies for clinical research and personalized medicine, familiarizing future physicians with the most up-to-date resources and methods in computational biology and bioinformatics. From a pedagogical perspective, the approach is based on project-based learning (PBL), which enables medical students to develop critical skills by addressing complex challenges that combine biomedical theory with clinical practice. Furthermore, the process of analysis and subsequent scientific publication reinforces key competencies such as critical thinking and academic writing, which are fundamental to medical education. This interdisciplinary approach enhances collaborative learning, the integration of technologies for genomic data analysis, and the ability to interpret biological information in a clinical context. By acquiring these competencies, medical students are better prepared to face the challenges of genomic and personalized medicine, contributing to the advancement of knowledge in a constantly evolving field.

13:00-14:00
Bioinformatics Tools for Clinical Analysis (Illumina workshop)
Room: Hall Floor -1
Format: In person


Authors List: Show

  • Juan Rincón
14:00-14:15
Session: Bioinformatics of microbes and microbiomes / Agrobiological omics
Core perturbomes of Escherichia coli and Staphylococcus aureus using a machine learning approach
Confirmed Presenter: Jose Arturo Molina Mora, Universidad de Costa Rica, Costa Rica

Room: Room B502
Format: In Person


Authors List: Show

  • Jose Fabio Campos Godínez, Universidad de Costa Rica, Costa Rica
  • Mauricio Villegas-Campos, Universidad de Costa Rica, Costa Rica
  • Jose Arturo Molina Mora, Universidad de Costa Rica, Costa Rica

Presentation Overview: Show

The core perturbome is defined as a central response to multiple disturbances that work as a complex molecular network to overcome the disruption of homeostasis under stress conditions and subsequently guarantee tolerance and survival in organisms. Based on the biological and clinical relevance of Escherichia coli and Staphylococcus aureus, in this study, we characterized their molecular responses to multiple perturbations. Thus, this study aimed to identify and describe the functionality of the core perturbome of these two prokaryotic models using a machine learning approach. For this purpose, feature selection and classification algorithms were implemented to assess the ability of a subset of genes to correctly classify samples into control and perturbation classes. After verifying an effective dimensional reduction (median accuracies of 82.6% and 85.1% for E. coli and S. aureus, respectively), a model of molecular interactions and functional enrichment analyses were performed to characterize the selected genes. The core perturbome was composed of 55 genes (with 9 hubs) for E. coli and 46 (with 8 hubs) for S. aureus. Well-defined interactomes were predicted for each model; these interactomes are jointly associated with enriched pathways, including energy and macromolecule metabolism, DNA/RNA and protein synthesis and degradation, transcription regulation, virulence factors, and other signaling processes. Taken together, the results of this study could contribute to the identification of potential targets and biomarkers of the response to stress in further studies.

14:15-14:30
Session: Bioinformatics of microbes and microbiomes / Agrobiological omics
Solanum lycopersicum organ-specific reference Gene regulatory networks (rGRNs) enable the discovery of novel regulators of key biological processes.
Confirmed Presenter: Jose David Fernandez, Centro de Genómica y Bioinformática, Universidad Mayor,Chile/ ANID -Millennium Institute for Integrative Biology, Chile, Chile

Room: Room B502
Format: In Person


Authors List: Show

  • Jose David Fernandez, Centro de Genómica y Bioinformática, Universidad Mayor,Chile/ ANID -Millennium Institute for Integrative Biology, Chile, Chile
  • David Navarro-Payá, Institute for Integrative Systems Biology (I2SYSBIO, UV-CSIC), Spain., Spain
  • Javier Canales, Instituto de Bioquímica y Microbiología, Universidad Austral de Chile, Chile /ANID-IBIO, Chile, Chile
  • Jose M. Alvarez, Centro de Biotecnología Vegetal, Universidad Andrés Bello, Chile / ANID-IBIO, Chile, Chile
  • José Tomás Matus, Institute for Integrative Systems Biology (I2SYSBIO, UV-CSIC), Spain., Spain
  • Elena A. Vidal, Centro de Genómica y Bioinformática, Universidad Mayor,Chile/ ANID -Millennium Institute for Integrative Biology, Chile, Chile

Presentation Overview: Show

Tomato (Solanum lycopersicum) is a widely grown crop, relevant to the human diet, and is considered a model organism for fruit development and response to plant pathogens. Despite its significance, there is still a lack of information about how gene expression is orchestrated by transcription factors (TFs) in this species, which is key for understanding the regulation of biological and molecular processes and for proposing strategies to improve tomato productivity and adaptation to the environment. We created five organ-specific reference gene regulatory networks (rGRNs) for tomato. First, we updated the structural and functional annotations of tomato genes to produce ITAG4.1c. Next, we developed the rGRNS by combining a large volume of transcriptome data (more than 10,000 RNA-seq libraries), which we then transformed into TF-target interactions using a machine learning algorithm. Additionally, we integrated complementary functional datasets, such as gene co-expression networks, chromatin accessibility TF binding profiles and bioinformatic prediction of TF genomic binding sites, originating organ-specific tomato rGRNs. We were able to obtain rGRNs representing TF-target regulatory interactions for tomato roots, leaves, flowers, fruits and seeds Our findings revealed that, while gene expression is mainly ubiquitous across organs, the regulatory interactions are highly organ-specific. We found a significant correlation between the number of target genes and TF allegiance to specific sets of targets. As a proof of concept, we used our regulatory predictions to gain new insights on the fruit ripening regulatory cascade, a well-studied process at the physiological and molecular levels. The fruit rGRN, was able to recapitulate known and experimentally validated targets of the central ripening controllers TAGL1 and RIN, validating our approach. We used the regulatory information of the rGRN to infer new TFs controlling ripening-related genes, finding a key TF from the ARF family as potential central controller of ripening, in particular of genes involved in carotenoid metabolism, cell wall development, ethylene and ABA signaling. Our tissue-specific rGRNs provide a valuable framework for identifying novel central regulators of tomato development and response to the environment.

14:30-14:45
Session: Bioinformatics of microbes and microbiomes / Agrobiological omics
Exploring Genetic Resources of Wild Coffea Species to the Future of Coffee Breeding
Confirmed Presenter: Laura Natalia González García, Université de Montpellier - Universidad de los Andes, France

Room: Room B502
Format: In Person


Authors List: Show

  • Laura Natalia González García, Université de Montpellier - Universidad de los Andes, France
  • Jorge Duitama, Universidad de los Andes, Colombia
  • Douglas Silvia Domingues, Escola Superior de Agricultura "Luiz de Queiroz"/USP, Brazil
  • Romain Guyot, Institut de Recherche pour le Développement, France

Presentation Overview: Show

Coffee production is vital to the global economy supporting millions of livelihoods worldwide, especially in Colombia and Brazil. Commercial varieties belong to two species: Coffea arabica (60%) and Coffea canephora (40%), which are preferred for their superior bean quality. The sustainability of coffee production is facing severe threats from climate change due to the sensibility of crops to temperature fluctuations and distinct CO2 regimes. By 2050, global temperatures are expected to increase by 2°C with a rise of CO2, resulting in an increased risk of drought, loss of biodiversity, reduced water availability, and decreased crop yields, highlighting the need for adaptive strategies. The genetic diversity of wild Coffea species from Africa can be used to tackle some of these challenges. Among the 130 identified species, many exhibit relevant traits for breeding such as disease resistance, drought tolerance, and superior cup quality. However, nearly 60% of these species are endangered due to habitat loss and other anthropogenic factors. Here, we analyzed WGS data from 68 Coffea species showing high genome conservation across the species, with a >50% mapping rate against a C. canephora genome reference. This dataset allowed us to build an eight-million SNP variation database of the genus and to identify separated phylogeographic groups among the species. We also compared complete genomes from C. humblotiana, a naturally decaffeinated species; C. canephora, C. eugenioides, and two domesticated and two wild C. arabica. Despite the 8 million years of diversification within Coffea, the species exhibit high conservation in gene synteny and possess a large core-genome comprising 10,500 gene families. In addition, a correlation between the genome size of the species and their transposable element (TE) content was identified. Genomes varied in size from 400 Mbp to 1,300 Mbp, with the largest one containing active long-tandem repeat (LTR) retroelements. These data were also fundamental for understanding the evolution of genes, especially those related to environmental adaptation (e.g. drought resistance) and cup quality (e.g. caffeine content and terpenes synthesis). In general, our results supported the chromosomic stability of the genus, which is crucial to introducing high-quality genes or mutations into cultivated crops through breeding programs.

14:45-15:00
Session: Bioinformatics of microbes and microbiomes / Agrobiological omics
Variación estructural entre acervos silvestres y domesticados de frijol Lima
Confirmed Presenter: Natalia Duarte, Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia, Colombia

Room: Room B502
Format: In Person


Authors List: Show

  • Natalia Duarte, Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia, Colombia
  • Tatiana Garcia, Facultad de Ciencias Agrarias, Universidad Nacional de Colombia, Bogotá, Colombia, Colombia
  • Juan Pablo Londoño, Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia, Colombia
  • Paula Siaucho, Biotecnología y Genética SAS, Biotecgen, Bogotá, Colombia, Colombia
  • Edwin Bautista, Biotecnología y Genética SAS, Biotecgen, Bogotá, Colombia, Colombia
  • Santiago Jimenez, Biotecnología y Genética SAS, Biotecgen, Bogotá, Colombia, Colombia
  • Liza Romero, Biotecnología y Genética SAS, Biotecgen, Bogotá, Colombia, Colombia
  • Fabio Herrera, Grupo de Diseño de Productos y Procesos,Dept. of Chemical and Food Engineering,Universidad de los Andes,Bogotá,Colombia, Colombia
  • Jessica Ospina, Facultad de Ciencias Agrarias, Universidad Nacional de Colombia Sede Palmira, Palmira, Colombia, Colombia
  • Daniela Lozano, Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia, Colombia
  • Laura Gonzalez, Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia, Colombia
  • Alejandro Reyes, Grupo de Investigación en biología computacional y ecología microbiana,Dep. Ciencias Biológicas,U. de Los Andes,Bogotá, Colombia
  • Andres Gonzalez, Grupo de Diseño de Productos y Procesos,Dept. of Chemical and Food Engineering,Universidad de los Andes,Bogotá,Colombia, Colombia
  • Maria Isabel Chacón, Facultad de Ciencias Agrarias, Universidad Nacional de Colombia, Bogotá, Colombia, Colombia
  • Jorge Duitama, Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia, Colombia

Presentation Overview: Show

El frijol Lima (Phaseolus lunatus L.) es una de las especies del género Phaseolus domesticadas en las Américas. Ha experimentado al menos dos eventos de domesticación distintos, dando lugar a los linajes Mesoamericano y Andino. Nuestros grupos de investigación recientemente ensamblaron un genoma de referencia para la especie, el cual fue utilizado para encontrar asociaciones de elementos genómicos con rasgos relacionados con la domesticación y para investigar la evolución de los genomas de Phaseolus utilizando herramientas de genómica comparativa. Para avanzar en el estudio del proceso de domesticación, en este estudio se secuenció una accesión domesticada del acervo Andino (G25900) y tres accesiones silvestres de los dos acervos de frijol Lima (G25228, G25231 y G25915), utilizando tecnologías del lecturas largas (PacBio y Oxford Nanopore respectivamente). Se secuenciaron entre 17 Gbp y 25 Gbp por accesión para una profundidad de entre 30x y 50x. Para G25900 se logró ensamblar un genoma a nivel de cromosomas con un tamaño total de 598 Mbp de los cuales 527 Mbp fueron ensamblados en cromosomas. De 6366 genes conservados en fabales, 98.7% pudieron ser identificados en este ensamblaje. Diferentes elementos transponibles fueron anotados, cubriendo un 48% del genoma. Un total de 28,542 genes fueron anotados en el 52% del genoma restante. Para las accesiones silvestres se mapearon las lecturas a los genomas de accesiones domesticadas dentro del mismo acervo y se identificaron variantes estructurales. Se obtuvieron 18,882 inserciones (INS) y 24,787 deleciones (DEL) en G25228, 23,939 INS y 30,799 DEL en G25231, y 17,008 INS y 24,979 DEL en G25915. Se identificaron 1,564 genes afectados por deleciones para la accesión G25228 y 2,157 genes para la accesión G25231. La información de variación genómica incluida en este estudio permitirá investigar con mayor profundidad la variación genética relacionada con los rasgos fenotípicos seleccionados durante los procesos de domesticación en frijol Lima.

15:00-15:15
Session: Bioinformatics of microbes and microbiomes / Agrobiological omics
Near-infrared spectroscopy and machine learning for accurate estimation of nitrogen content in intact seeds of common bean
Confirmed Presenter: Tatiana Garcia Navarrete, Departamento de Agronomía, Facultad de Ciencias Agrarias, Universidad Nacional de Colombia, Bogotá, Colombia., Colombia

Room: Room B502
Format: In Person


Authors List: Show

  • Tatiana Garcia Navarrete, Departamento de Agronomía, Facultad de Ciencias Agrarias, Universidad Nacional de Colombia, Bogotá, Colombia., Colombia
  • Darren Drewry, Department of Food, Agricultural and Biological Engineering, Ohio State University, Ohio, USA., United States
  • Luis Guillermo Santos, Genetic Resources Program, Centro Internacional de Agricultura Tropical (CIAT), Cali, Colombia., Colombia
  • Mónica Carvajal-Yepes, Genetic Resources Program, Centro Internacional de Agricultura Tropical (CIAT), Cali, Colombia., Colombia
  • Jorge Duitama, Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia., Colombia
  • Peter Wenzl, Genetic Resources Program, Centro Internacional de Agricultura Tropical (CIAT), Cali, Colombia., Colombia
  • Maria Isabel Chacón-Sanchez, Departamento de Agronomía, Facultad de Ciencias Agrarias, Universidad Nacional de Colombia, Bogotá, Colombia., Colombia

Presentation Overview: Show

One major bottleneck for germplasm banks is the high economic cost and human effort needed to perform phenotypic screening of large collections with conventional methods. Lack of knowledge about the diversity held in genebanks is a limitation for their use and conservation. High throughput and non-destructive phenotyping technologies, such as near-infrared spectroscopy (NIRS), are alternatives to overcome this problem. However, NIR reflectance spectra taken from whole samples (for example intact seeds) are inherently complex and finding correlations between thousands of spectral features and compounds of interest can be difficult. Different machine learning (ML) techniques have been proposed to unlock patterns within the complex data generated by NIRS. Although the CIAT germplasm bank holds 37,936 bean accessions, many of them lack information on seed nutritional quality due to the high costs and time needed.
This study presents the development of accurate ML predictive models for fast screening of seed nutritional compounds from NIRS data in the common bean collection at CIAT. NIRS spectral signatures from 1,754 intact seeds of common bean (Phaseolus vulgaris L.) accessions of the CIAT (1,394 domesticated and 360 wild) were collected to generate predictive models for nitrogen (N) content. Five spectral signatures were captured for each accession using an ASD FieldSpec4 spectroradiometer (range: 350 - 2,500 nm). Total N content for a subset of 300 domesticated and 100 wild accesions was measured by the Kjeldahl method. Three accession groups were analyzed separately (domesticated only, domesticated + wild and wild only) to develop predictive models for N content using varying wavelength ranges (350 to 2500 nm; 750 to 2500 nm and interval PLS). For each group, combinations of thirteen data pretreatments were evaluated and modeled with PLS. The best combinations for domesticated accessions were raw data+PLS, SG (Savitzky–Golay)+PLS, and SNV (Standard normal variate)-SG+PLS, which showed a concordance correlation coefficient (CCC) above 0.8, and a root mean squared error of prediction (RMSEp) of 2.2. For wild samples, the best combination was SG-1D(W11)+PLS, with CCC of 0.7 and RMSEp of 2.1. Finally, for the complete dataset, four combinations showed similar metrics compared to the domesticated data set (CCC=0.8 and RMSEp=2.2). These results indicate that NIRS can be used to estimate the N content of common beans in a non-destructive manner. This low-cost approach could be used to screen tens of thousands of common bean accessions in the CIAT genebank to identify superior materials for breeding programs.

15:15-15:30
Session: Bioinformatics of microbes and microbiomes / Agrobiological omics
Unlocking the future of sustainable agriculture: Bioinformatics for designing novel and targeted RNAi-based biopesticides for crop protection
Confirmed Presenter: Valeria Velasquez-Zapata, Greenlight Biosciences, United States

Room: Room B502
Format: In Person


Authors List: Show

  • Valeria Velasquez-Zapata, Greenlight Biosciences, United States
  • Upendra Devisetty, Greenlight Biosciences, United States
  • Emma De Neef, Greenlight Biosciences, United States
  • Eric Gordon, Greenlight Biosciences, United States
  • Kenneth Narva, Greenlight Biosciences, United States
  • Laurent Mezin, Greenlight Biosciences, United States
  • Petter Mc Cahon, Greenlight Biosciences, United States
  • Kenneth Witwer, Johns Hopkins University, United States
  • Krishnakumar Sridharan, Greenlight Biosciences, United States

Presentation Overview: Show

Bioinformatics has emerged as a pivotal field in biology, medicine, and agriculture, revolutionizing our understanding of complex biological systems. Its ability to analyze and interpret extensive biological data has facilitated and unified research across omics, machine learning and systems biology studies. As an integral component of biotechnological innovations, bioinformatics has applications from early stages of discovery to product development. An emerging application of bioinformatics, known as “Regulatory Bioinformatics,” supports regulatory studies, especially in the development and approval of new products and modes of action.

Currently, plant protection methods predominantly rely on conventional chemical pesticides, which pose significant risks to human health and the environment. This has created an urgent need for sustainable solutions with improved safety profiles for humans and non-target organisms (NTOs). RNA interference (RNAi), a natural defense mechanism in eukaryotes that silences viral genes in a sequence-specific manner, has recently been used to develop a new class of topical, sprayable double-stranded RNA (dsRNA)-based biopesticides. These biopesticides target essential genes of pests, offering a promising, safer alternative to traditional chemical pesticides.

Bioinformatic prediction of off-target effects on humans and NTOs represents a critical step in the regulatory approval of externally applied dsRNA-based biopesticides. Despite the importance of this type of analysis, no universally applicable bioinformatics guidelines exist for the risk assessment of dsRNA-based biopesticides. To address this gap, we developed a bioinformatics framework designed for the human health risk assessment of dsRNA-based biopesticides. Our framework employs advanced bioinformatics tools, a mismatch tolerance for sequence divergence between dsRNA and unintended targets, and human siRNA criteria for quantifying the possibility of potential gene silencing in the presence of mismatches to predict off-target effects in humans, supported by experimental evidence. This framework has been successfully used to evaluate the potential risks of the first EPA-approved externally applied dsRNA-based biopesticide, Calantha™. By providing a detailed and systematic method for assessing off-target effects, our framework significantly enhances the regulatory process, ensuring the safe development and deployment of dsRNA-based crop protection products.

15:30-15:45
Session: Bioinformatics of microbes and microbiomes / Agrobiological omics
Contribution of microbial communities in the processes of mercury methylation and demethylation: a metagenomic approach to the issue of mercury contamination in Colombia.
Confirmed Presenter: Lina Fernanda Sánchez Aya, UNIVERSIDAD DE PAMPLONA, Colombia

Room: Room B502
Format: Live Stream


Authors List: Show

  • Lina Fernanda Sánchez Aya, UNIVERSIDAD DE PAMPLONA, Colombia
  • Daniel Cerqueda García, Instituto de Ecología, A.C. – INECOL., Mexico
  • Luis Parmenio Suescún Bolivar, UNIVERSIDAD DE CARTAGENA, Colombia

Presentation Overview: Show

Mercury, especially in its toxic form as methylmercury, is a significant environmental contaminant due to its high toxicity and capacity for bioaccumulation in aquatic and terrestrial ecosystems. In 2023, the National Prosecutor's Office identified Colombia as the highest per capita emitter of mercury and the third-largest global emitter. This highlights the urgent need to investigate microbial communities and the genes responsible for mercury transformation, particularly the mer (demethylation) and hgc (methylation) operons.

This study aims to explore the diversity of these operons across various Colombian ecosystems and their impact on mercury transformation. To achieve this, a search was conducted in the NCBI database to obtain raw metagenomic reads labeled as Whole Genome Sequencing (WGS), sequenced using the Illumina platform and available in the Sequence Read Archive (SRA). Peptidic sequences of the genes from the mer (merT, merP, merR, merD, merI, merC, merF, merA, and merB) and hgc (hgcA and hgcB) operons were searched in KEGG, using these sequences as references to identify metagenomic data.

After applying quality filters, a metagenomic assembly was performed to gain a detailed view of the present genomes. MAGs were constructed and quantified to assess their relative abundance, followed by functional and taxonomic annotations. MAGs were grouped into orthologous groups based on KEGG, providing information on conserved functions and evolutionary processes.

From 14,745,349 contigs, 421 MAGs were reconstructed, with 175 having at least 40% completeness and 10% contamination, classified into 36 phyla, 56 classes, and 124 genera. Samples were obtained from 74 bioprojects across various regions of Colombia, including tropical peat soils and water from urban and oil-affected environments. Microbial diversity showed a predominance of archaea and bacteria specific to each environment.

Out of the 175 MAGs, 16 were identified as candidates for mercury biotransformation: 15 with the hgc operon and 1 with the mer operon. The hgc operon was primarily found in sulfate-reducing bacteria in peat soils and sulfate-reducing water samples, while the mer operon was found in MAGs of the genus Desulfomonile in soils from Meta. These findings indicate a significant influence of anoxic environments and dissolved organic matter on mercury methylation and demethylation, offering new perspectives for addressing mercury contamination in Colombia.

15:45-16:00
Session: Bioinformatics of microbes and microbiomes / Agrobiological omics
Proceedings Presentation: Computational Screening of Promising Epitopes from Mycobacterium tuberculosis’ ESX genes for a Multi-Epitope Vaccine Design
Confirmed Presenter: Olatunde Adeoti, Landmark University, Nigeria

Room: Room B502
Format: Live Stream


Authors List: Show

  • Olatunde Adeoti, Landmark University, Nigeria
  • Emenike Irokanulo, Landmark University, Nigeria
  • Ndako James, Landmark University, Nigeria

Presentation Overview: Show

Tuberculosis remains a significant global health threat, necessitating the development of more effective vaccines to combat Mycobacterium tuberculosis (Mtb) infections. In this study, a computational screening technique to identify promising epitopes derived from the ESX gene family of Mycobacterium tuberculosis was employed for the development of a multi-epitope vaccine. The prediction was based on the antigenicity, allergenicity, sub-cellular localization, gene essentiality, and its homology with Homo sapiens, TM helix photoelectric index and molecular weight of the vaccine construct. This resulted in 19 (20 mers) B cell, 49 (9 mers) Cytotoxic T cell, 11( 15 mers) Helper-T cells epitopes; while their ability as inducer of Interleukin (IL-10) and Interferon(IFN-γ were confirmed. The resulting vaccine construct comprised of a adjuvant (HEYGAEALERAG), PADRE (AKVAAWTLKAAAC) as universal linker at both NH3 and CH3 ends. The selected epitopes were combined to design a multi-epitope vaccine candidate with a potential to elicit immune response against Mtb. Optimized vaccine construct showed its clonability inside self-replication vector to ensure its expression in E. coli host. The host immune simulation in response to the predicted vaccine antigen showed the good production of primary, secondary and tertiary immune responses as well as good interferon and interleukin counts. The MEV construct showed robust interaction with TLR-4 (Toll-like receptor 4), MHC 1 and MHC 2 immune receptors. The predicted vaccine binding strength, the atomic level radiance, binding energy estimation suggested moderate stability for MEV candidates. This computational approach provides a rational and efficient strategy for the development of novel tuberculosis vaccines, offering promising avenues for further experimental validation and clinical translation.

Thursday, November 14th
10:30-10:45
Session: Biomedical omics
Transcriptomic analysis of Trypanosoma cruzi reveals the role of PUF3 RNA binding protein in the regulation of mitochondrial genes encoded in the nucleus
Confirmed Presenter: Geysson Javier Fernandez, Grupo Biología y Control de Enfermedades Infecciosas, Universidad de Antioquia UdeA, Medellín, Colombia, Colombia

Room: Room B501
Format: In Person


Authors List: Show

  • Ana Maria Mejia Jaramillo, Grupo Biología y Control de Enfermedades Infecciosas, Universidad de Antioquia UdeA, Medellín, Colombia, Colombia
  • Geysson Javier Fernandez, Grupo Biología y Control de Enfermedades Infecciosas, Universidad de Antioquia UdeA, Medellín, Colombia, Colombia
  • Hader Ospina, Grupo Biología y Control de Enfermedades Infecciosas, Universidad de Antioquia UdeA, Medellín, Colombia, Colombia
  • Omar Triana Chavez, Grupo Biología y Control de Enfermedades Infecciosas, Universidad de Antioquia UdeA, Medellín, Colombia, Colombia

Presentation Overview: Show

The RNA-binding PUF proteins are post-transcriptional regulators found throughout the eukaryotic domain. Considering that the control of gene expression in this parasite is mainly at the post-transcriptional level, we characterized the PUF3 protein by knocking out and overexpressing the gene in Trypanosoma cruzi epimastigotes and studied the transcriptome of these parasites. The RNA-seq analyses in both genotypes showed significant changes in the number of regulated transcripts compared with wild-type parasites. Thus, the number of differentially expressed genes in the knockout (ΔTcPuf3) and the overexpressor (pTEXTcPuf3) were 238 and 187, respectively. In the knockout, a more significant proportion of genes was negatively regulated (166 out of 238). In contrast, in the overexpressor, positively regulated genes were predominant (149 out of 170). Additionally, when we predicted the subcellular location of the differentially expressed genes, the results revealed a notable representation of mitochondrial genes, with 27 for the ΔTcPuf3 and 23 for pTEXTcPuf3. To ascertain the direct targets of PUF3 among the identified genes, we searched for the PUF3 UGUAYAUW binding motif in the 3'-UTR. Interestingly, approximately 33% of genes with a PUF3 binding site exhibit mitochondrial localization for the knockout condition, whereas for the overexpression condition, we found 17.39%. Our findings highlight that modulation of PUF3 expression regulates many non-overlapping genes. This implies that the effects of PUF3 modulation on gene regulation depend on its intracellular concentration. In conclusion, our finding indicates that TcPUF3 regulates the expression of targeted mitochondria-associated genes, highlighting a potential role in energy production, cellular metabolism, or other mitochondrial-related functions.

10:45-11:00
Session: Biomedical omics
Deep Learning for Phenotype Prediction from Single-Cell Expression Data
Confirmed Presenter: Jordi Martorell-Marugán, Pfizer-University of Granada-Junta de Andalucía Centre for Genomics and Oncological Research (GENYO), Spain

Room: Room B501
Format: Live Stream


Authors List: Show

  • Jordi Martorell-Marugán, Pfizer-University of Granada-Junta de Andalucía Centre for Genomics and Oncological Research (GENYO), Spain
  • Raúl López-Domínguez, Pfizer-University of Granada-Junta de Andalucía Centre for Genomics and Oncological Research (GENYO), Spain
  • Juan Antonio Villatoro-García, University of Granada, Spain
  • Daniel Toro-Dominguez, Karolinska Institutet, Spain
  • Marco Chierici, Fondazione Bruno Kessler, Italy
  • Giuseppe Jurman, Fondazione Bruno Kessler, Italy
  • Pedro Carmona-Sáez, University of Granada, Spain

Presentation Overview: Show

Single-cell RNA-Sequencing (scRNA-Seq) is a cutting-edge technology that permits the quantification of gene expression in thousands of individual cells from each sample, offering an unprecedented depth of insight into cellular heterogeneity. Analyzing scRNA-Seq data presents notable challenges due to its characteristics, such as data volume and sparsity, and predicting the samples phenotypes from their corresponding single-cell expression is an unresolved challenge to date. In this study, we implemented an innovative workflow grounded in deep learning to exploit scRNA-Seq datasets for the purpose of predicting sample phenotypes. Our methodology, namely singleDeep, allowed us to attain precise predictions for complex phenotypes. Notably, in addition to the primary classification objective, we gained valuable biological insights from the singleDeep workflow, including the cell types that may be particularly relevant to the studied phenotype and the essential genes that contribute to the functional discrepancies among the sample groups within each cell population.

We predicted the diagnosis of systemic lupus erythematosus (SLE) and healthy controls (HCs) in two independent studies, outperforming other methods in both internal and external validation. Classical monocytes and nonclassical monocytes emerged as the cell populations with the highest performance, consistent with previous findings identifying them as primary contributors to the interferon signature. Futhermore, we identified genes with high relevance consistently across multiple cell types, particularly interferon signature genes such as IFI44L, IFI44, MX1 and XAF1 whose high expression is strongly correlated to SLE.

We also predicted the severity and infection status of COVID-19 patients. For the multiclass prediction of severity, an accuracy of 61.63% was achieved, while the status (case/control) was correctly predicted for 87.21% of samples. Analysis of gene contributions across cell types revealed that proinflammatory genes like ISG15 have major importance for SARS-CoV-2 infection.

Finally, we predicted the dementia status in a cohort of Alzheimer’s disease (AD) patients and HCs, achieving an accuracy of 71.91%. Importantly, several genes previously linked to AD were identified by singleDeep as top contributors to the phenotype. For instance, APOE, known for its strong genetic risk factor for AD, has been identified as top contributor to the healthy state in specific cell types, especially in astrocytes. These finding are aligned to previous transcriptomics studies in AD.

In conclusion, singleDeep is an innovative tool that will enable the exploitation of this new type of transcriptomic studies.

11:00-11:15
Session: Biomedical omics
Dissection of the Tumor Microenvironment Using Spatial Transcriptomics and Deep Learning
Confirmed Presenter: Karla Paniagua, UTSA, United States

Room: Room B501
Format: Live Stream


Authors List: Show

  • Karla Paniagua, UTSA, United States
  • Yufang Jin, University of Texas at San Antonio, United States
  • Mario Flores, UTSA, United States

Presentation Overview: Show

Understanding the tumor microenvironment (TME) is crucial for elucidating cancer progression, metastasis, and treatment responses. Despite advances in spatial transcriptomics (ST), integrating multimodal data to characterize the TME remains a significant challenge. This study introduces TG-ME, a novel computational framework that integrates a Transformer model with a Graph Variational Autoencoder (GraphVAE) to dissect microenvironmental niches in spatial transcriptomics data.
TG-ME addresses the limitations of existing methods by combining gene expression profiles, spatial data, and morphological images through an advanced normalization method to achieve a comprehensive analysis of the TME. TG-ME’s framework employs a Transformer model to identify gene interactions, a convolutional neural network (CNN) to extract morphological image features, and a graph autoencoder to incorporate spatial information, facilitating clustering and profiling of microenvironmental features.
The integration of multimodal data in TG-ME enhances its ability to characterize the TME with greater accuracy. By leveraging spatial data, the model captures the physical arrangement of cells, while morphological data provides insights into cellular structures. Gene expression profiles add another layer of complexity, allowing TG-ME to highlight specific molecular signatures within the spatial context.
Our approach was validated using three benchmark datasets: the human dorsolateral prefrontal cortex (DLPFC), a 10X Visium breast cancer dataset, and a NanoString CosMx dataset of non-small cell lung cancer (NSCLC). TG-ME demonstrated superior performance in identifying spatial niches, achieving Adjusted Rand Index (ARI) values of 0.54 and Normalized Mutual Information (NMI) scores of 0.66 surpassing the performance of existing methods.
Some of the key findings from TG-ME include the identification of novel insights into the spatial organization of TMEs and identifying specific niche compositions related to cancer progression. For instance, the analysis of NSCLC samples uncovered unique microenvironmental signatures associated with poor prognosis and resistance to therapies. These discoveries underscore the heterogeneity of the TME and its critical role in cancer biology.
The implications of our study are significant. By uncovering microenvironmental signatures linked to disease prognosis, TG-ME provides pathways for developing targeted therapies that can improve patient outcomes. Moreover, the integration of multimodal data sets a new standard for TME and spatial transcriptomics data analysis, advancing our understanding of tumor biology.
In conclusion, TG-ME represents a major advancement in spatial transcriptomics, providing a robust framework for the detailed dissection of the TME. Its application across diverse cancer types highlights its versatility and potential to drive significant improvements in cancer diagnosis and treatment.

11:15-11:30
Session: Biomedical omics
Transcriptomic analysis uncovering markers of chronic fatigue syndrome post-chikungunya infection
Confirmed Presenter: Raissa Medina Santos, Conservatoire National des Arts et Metiers, France

Room: Room B501
Format: Live Stream


Authors List: Show

  • Raissa Medina Santos, Conservatoire National des Arts et Metiers, France
  • Sigrid Le Clerc, Conservatoire National des Arts et Metiers, France
  • Léa Bruneau, Centre Hospitalier Universitaire de La Réunion, France
  • Adrien Maillot, Centre Hospitalier Universitaire de La Réunion, France
  • Taoufik Labib, Conservatoire National des Arts et Metiers, France
  • Myriam Rahmouni, Conservatoire National des Arts et Metiers, France
  • Cécile Lefebvre, Vaccine Research Institute, France
  • Nora El Jahrani, Vaccine Research Institute, France
  • Christine Fontaine, Centre Hospitalier Universitaire de La Réunion, France
  • Christine Payet, Centre Hospitalier Universitaire de La Réunion, France
  • Nathalie Ah-You, Centre Hospitalier Universitaire de La Réunion, France
  • Cécile Chabert, Centre Hospitalier Universitaire de La Réunion, France
  • Corinne Mussard, Centre Hospitalier Universitaire de La Réunion, France
  • Sylvaine Porcherat, Centre Hospitalier Universitaire de La Réunion, France
  • Samir Medjane, Centre Hospitalier Universitaire de La Réunion, France
  • Josselin Noirel, Conservatoire National des Arts et Metiers, France
  • Catherine Marimoutou, Centre Hospitalier Universitaire de La Réunion, France
  • Hakim Hocini, Vaccine Research Institute, France
  • Patrick Gerardin, Centre Hospitalier Universitaire de La Réunion, France
  • Jean-François Zagury, Conservatoire National des Arts et Metiers, France

Presentation Overview: Show

Background: In 2005-2006, a chikungunya epidemic of unprecedented magnitude struck La Réunion, a southwestern Indian Ocean Island, causing 300,000 infections. Over time, significant public health concerns emerged due to long-lasting manifestations, particularly chronic rheumatic and chronic fatigue-related syndromes.

Methods: To investigate the pathophysiology underlying chronic chikungunya (CC), particularly the chronic fatigue syndrome (CFS), the CHIKGene study was initiated. Blood samples were collected from 133 individuals who experienced chronic symptoms, with 58 of them presenting persistent chronic fatigue-like symptoms, 11 to 14 years after exposure. RNA-Seq was performed on purified PBMCs followed by the comparison of the mRNA gene expression profiles between CFS and CC.

Results: Gene expression analysis between CFS and CC individuals identified only three differentially expressed genes (DEGs): EGR1, EGR2, and FOSB, all down-regulated in CFS compared to CC. These genes are linked to mood disorders, schizophrenia and stress-related conditions, indicating a possible connection between psychiatric disorders and chronic chikungunya. Metascape pathway analysis revealed significant immune response-related terms predominantly overexpressed in CC. DisGeNET analysis highlighted mood disorder-related pathways, also involving ACE and PTGS2 genes, both associated with CFS.

Conclusions: This study identified key DEGs and pathways in individuals with CFS linked to CC. Down-regulation of genes associated with mood disorders and immune response alterations in CFS suggests a novel genetic correlation with the chronicity of chikungunya. To our knowledge, this genetic link has not been previously described. These findings emphasize the importance of these genetic markers in understanding, diagnosing and potentially developing therapies for long-term chikungunya effects, particularly its impact on mental health and immune function.

11:30-11:45
Session: Biomedical omics
Machine Learning Synergy Prediction of Antimicrobial Peptides Using the MetaFlow Framework
Confirmed Presenter: Alex Sanchez, Laboratório de Bioinformática - LABINFO, Laboratório Nacional de Computação Científica, Brazil

Room: Room B501
Format: Live Stream


Authors List: Show

  • Alex Sanchez, Laboratório de Bioinformática - LABINFO, Laboratório Nacional de Computação Científica, Brazil
  • Thiago Souza, Laboratório de Bioinformática - LABINFO, Laboratório Nacional de Computação Científica, Brazil
  • Isabella Guedes, Grupo de Modelagem Molecular de Sistemas Biológicos - GMMSB, Laboratório Nacional de Computação Científica, Brazil
  • Laurent Dardenne, Grupo de Modelagem Molecular de Sistemas Biológicos - GMMSB, Laboratório Nacional de Computação Científica, Brazil
  • Marisa Nicolás, Laboratório de Bioinformática - LABINFO, Laboratório Nacional de Computação Científica, Brazil

Presentation Overview: Show

Combining antimicrobial agents offers a promising approach to counter multi-drug-resistant pathogens by leveraging synergism, where combined agents exhibit enhanced efficacy. This strategy can reduce the likelihood of drug-resistant subpopulations emerging with resistance to all drugs. However, due to the vast number of possible combinations, experimental validation of potential synergistic combinations may be time-consuming and high-cost.
Artificial Intelligence (AI) can efficiently predict synergistic combinations based on existing data. Previous studies using Light Gradient Boosted Machine (LightGBM) models, a tree-based supervised algorithm, have achieved high accuracy in predicting antibiotic synergy. This study explores the synergistic potential of Antimicrobial Peptides (AMPs) combined with other antimicrobial agents using LightGBM classifier. We sourced experimental synergy data from the Database of Antimicrobial Activity and Structure of Peptides (DBAASP), which includes entries reporting the Fractional Inhibitory Concentration Index (FICI) as a synergy metric. AMP descriptors were based on physicochemical properties listed in DBAASP, while chemical descriptors for antimicrobial compounds were calculated using CirPy and RDKit. Additionally, we incorporated Minimal Inhibitory Concentration (MIC) values for both AMPs and antimicrobial compounds against specific pathogens. Synergism was classified as +1 (synergistic) or -1 (non-synergistic), using a FICI threshold of <1, as established in the literature.
Our study prioritizes reproducibility in AI processes and feature engineering. To ensure reproducibility, all workflow phases—data retrieval, feature extraction, model training, and evaluation—were conducted using the MetaFlow framework. Feature engineering included One Hot Encoding for categorical features, data rescaling with Standard Scaler, and dimensionality reduction via Principal Component Analysis. These processes were applied in a pipeline fitted on the training data and later used on the test dataset to minimize model bias.
We employed k-fold cross-validation (k=5) to assess the overall performance of our models, evaluating accuracy and area under the curve (AUC). The best model reached 80% and 77% accuracy in the training and testing phases, respectively, with a standard deviation of 4% during cross-validation. These findings underscore the competitive performance of our approach, incorporating AMPs with state-of-the-art models. This demonstrates the robustness and reliability of our methodology and its potential for further optimization and application in future antimicrobial peptide research.

11:45-12:00
Session: Biomedical omics
The co-expression landscape of diffuse large B-cell lymphoma
Room: Room B501
Format: In person


Authors List: Show

  • Arturo Kenzuke Nakamura-García, National Institute of Genomic Medicine, Mexico
  • Jesus Espinal-Enriquez, National Institute of Genomic Medicine, Mexico

Presentation Overview: Show

Diffuse large B-cell lymphoma (DLBCL) is a complex set of malignancies, representing the most common and aggressive non-Hodgkin lymphoma. Advances in gene expression profiling have revealed two distinct types, the activated B-cell-like (ABC) and germinal center B-cell-like (GCB), along with an unclassified category showing an in-between profile. This classification is based on the similarity of the expression landscape of the lymphoma to distinct regions in the germinal center, which seems to reflect the cell of origin of the lymphoma. Here, to evaluate the global transcriptional landscape that characterize the distinct subtypes of the disease, we constructed gene co-expression networks from RNA-seq data of 481 DLBCL samples. Through this approach we identified that the ABC network shows an increase in intra-chromosomal interactions with respect the other two groups. Additionally, through an integrated approach of topological and differential expression analysis, we identified a set of distinctly regulated set of biological processes for each subtype, from which angiogenesis and cell cycle have the most important differences between subtypes. Lastly, we identified 88 genes with an intermediate expression in the unclassified group. From this list, we identified 48 genes that could be associated with the overall-survival of DLBCL patients. This research sheds light on the intricate genetic aspects of DLBCL, revealing distinct co-expression interactions, and potential survival indicators. These findings provide a foundation for further research into personalized treatments and prognosis for DLBCL patients.

14:00-14:15
Session: Biomedical omics
Embedding of biomedical topological information in multi-class ensemble classifiers
Confirmed Presenter: Joram Posma, Imperial College London, United Kingdom

Room: Room B501
Format: In Person


Authors List: Show

  • Eloisa Rocha-Liedl, Imperial College London, United Kingdom
  • Shabeer Yassin, Imperial College London, United Kingdom
  • Melpomeni Kasapi, Imperial College London, United Kingdom
  • Joram Posma, Imperial College London, United Kingdom

Presentation Overview: Show

Background: Cancer is the second leading cause of disease-related death worldwide, and machine learning-based identification of novel biomarkers is crucial for improving early detection and treatment of various cancers. A key challenge in applying machine learning to high-dimensional data is deriving important features in an interpretable manner to provide meaningful insights into the underlying biological mechanisms. However, most traditional machine learning methods derive these based on the data and do not consider the biochemical topology underlying the measured features.

Data and Methods: We have extended our prior work [1] the LAtent VAriable Stochastic Ensemble of Trees (LAVASET) to include the latent feature embedding in gradient boosting classifiers (LAVABOOST). While our previous work considered spatial, spectral, and temporal dependencies, we now place this framework in a topological data analysis (TDA) setting by incorporating protein interaction information directly into the decision function. Furthermore, we contribute a new directional feature importance measure that is integrated within decision tree classifiers [2]. We demonstrate these new algorithms on The Cancer Genome Atlas proteomics data and compare the interpretability and classification performance with traditional random forests (RF) and Gradient Boosted Decision Trees (GBDTs).

Results: Our findings show that no method outperforms all others for individual cancer type prediction, with the macro F1-score across 28 cancers, obtained from 100 random initialisations, being 92.6±0.2% for RF, 92.1±0.3% for LAVASET, 89.1±0.4% for LAVABOOST, and 85.7±0.3% for GBDT. Our class-based direction feature importance (CLIFI) metric metric allowed the visualisation of the model decision making functions, and the distributions indicated heterogeneity in several proteins (MYH11, ER-alpha, BCL2) for different cancer types (including brain glioma, breast, kidney, thyroid and prostate cancer).

Conclusion: We have developed an integrated, directional feature importance metric for multi-class decision tree-based classification models that facilitates interpretable feature importance assessment. The CLIFI metric can be used in conjunction with incorporating topological information into the decision functions of models to add inductive bias for improved interpretability.

References:
[1] M Kasapi, K Xu, TMD Ebbels, DP O’Regan, JS Ware, JM Posma (2024) Bioinformatics 40(3), doi: 10.1093/bioinformatics/btae101.
[2] E Rocha-Liedl, SM Yassin, M Kasapi, JM Posma (2024) bioRxiv, (4 August 2024) doi: 10.1101/2024.08.01.605982.

14:15-14:30
Session: Biomedical omics
T cells and B cells repertoire analysis in zika-associated acute neuroinflammatory disease
Confirmed Presenter: Melissa Solarte Cadavid, Universidad del Valle, Colombia

Room: Room B501
Format: In Person


Authors List: Show

  • Nelson Rivera Franco, Universidad del Valle, Colombia
  • Melissa Solarte Cadavid, Universidad del Valle, Colombia
  • Diana Lopez Alvarez, Universidad Nacional de Colombia, Colombia
  • Carlos A Pardo, Johns Hopkins University, Colombia
  • Beatriz Parra Patiño, Universidad del Valle, Colombia

Presentation Overview: Show

To explore TCR and BCR clonality in Zika associated Guillain-Barre syndrome (GBS) or encephalitis by an immunoinformatic analysis of the RNA-seq , four RNA samples from Peripheral Blood Mononuclear (PBMC) obtained to cases of GBS (1 zika-associated and 1 non-zika associated ) and encephalitis cases (1 zika-associated and 1 non-zika associated ) at neurological disease onset were sequenced by Illumina 150 paired-end.
After quality control the raw data were analyzed by TRUST 4. Of the total clones, only approximately 5% were from the TCR and 95% were from BCR, suggesting the T cell responses are focused on fewer antigens and less variable than the B cell responses. In the TCR repertoire network, a low proportion of connected clones was observed, indicating that most of these TCR correspond to clones expanded to diverse antigens. Only two clusters were established, one of them formed by the four individuals which indicates a possible common T cell response to a single antigen regardless of the zika status. One of the two T cluster corresponded to the same individual (a non-Zika GBS) indicating a possible immunodominant T cell response to the same antigen during the patient's condition. In contrast, the BCR repertoire exhibit a more diverse pattern, with a high proportion of clusters (clonotypes) network. This result suggests a clonal expansion to several antigens which 32% correspond to sequences clonally derived by somatic hypermutations (SHM) in a subsequent analysis performed. The hyperexpanded B cell clonotypes corresponded to the two GBS cases, none of the encephalitis cases, regardless of the Zika status. This initial proof of concept study demonstrates the feasibility of TCR and BCR clonality assessment by immunoinformatic analysis of bulk RNA-seq of blood mononuclear cells to study the immunology of neuroinflammatory diseases associated to emerging pathogens.

14:30-14:45
Session: Biomedical omics
Understanding the role of T-cell receptor repertoire in T1D status
Confirmed Presenter: Puneet Rawat, Department of Immunology, University of Oslo, Oslo, Norway, Norway

Room: Room B501
Format: In Person


Authors List: Show

  • Puneet Rawat, Department of Immunology, University of Oslo, Oslo, Norway, Norway
  • Micheal Wildrich, Institute for Machine Learning, Johannes Kepler University Linz, Austria, Austria
  • Melanie Shapiro, Department of Pathology, Immunology, and Laboratory Medicine, University of Florida, Gainesville, FL, USA, United States
  • Koshlan Mayer-Blackwell, Fred Hutchinson Cancer Center, Seattle, WA, United States
  • Ghadi al Hajj, Department of Informatics, University of Oslo, Oslo, Norway, Norway
  • Keshav Motvani, Department of Pathology, Immunology, and Laboratory Medicine, University of Florida, Gainesville, FL, USA, United States
  • Leeana Peters, Department of Pathology, Immunology, and Laboratory Medicine, University of Florida, Gainesville, FL, USA, United States
  • Amanda Posgoi, Department of Pathology, Immunology, and Laboratory Medicine, University of Florida, Gainesville, FL, USA, United States
  • Milena Pavlović, Department of Informatics, University of Oslo, Oslo, Norway, Norway
  • Maria Chernigovskaya, Department of Immunology, University of Oslo, Oslo, Norway, Norway
  • Lonneke Scheffer, Department of Informatics, University of Oslo, Oslo, Norway, Norway
  • Camryn Pettenger-Willey, Fred Hutchinson Cancer Center, Seattle, WA, United States
  • Sebastiaan Valkiers, Fred Hutchinson Cancer Center, Seattle, WA, United States
  • Andrew Fiore-Gartland, Fred Hutchinson Cancer Center, Seattle, WA, United States
  • Sepp Hochreiter, Institute for Machine Learning, Johannes Kepler University Linz, Austria, Austria
  • Philip Bradley, Fred Hutchinson Cancer Center, Seattle, WA, United States
  • Geir Kjetil Sandve, Department of Informatics, University of Oslo, Oslo, Norway, Norway
  • Victor Greiff, Department of Immunology, University of Oslo, Oslo, Norway, Norway
  • Todd Brusko, Department of Pathology, Immunology, and Laboratory Medicine, University of Florida, Gainesville, FL, USA, United States

Presentation Overview: Show

Type 1 Diabetes (T1D) is an increasingly prevalent disease worldwide, characterized by a strong immunogenetic dependence on Human Leukocyte Antigen (HLA) alleles. Although islet autoantibodies (AAb) are clinically utilized in the diagnosis of T1D, they primarily serve as markers of autoantigen presentation and are not directly implicated in disease pathogenesis. In contrast, T cells are hypothesized to play a pathogenic role, being directly responsible for the destruction of β-cells, thereby underscoring the necessity of developing T-cell biomarkers for T1D.

We have sequenced 2286 TCRβ repertoires containing 1112, 720, 68 and 386 repertoires for T1D, first degree relatives, second degree relatives and healthy controls, respectively. Standard repertoire-level analysis (e.g. diversity profile, Morisita-Horn similarity index etc.) and shared public-clone analysis (TCR clustering and statistical classification of public clones) were not able to differentiate T1D repertoires from the healthy ones. Therefore, we utilized statistical assessment of HLA-based restriction of TCR repertoires and a Deep learning-based model entitled “DeepRC” to classify the T1D and healthy repertoires and identify the biomarker associated with T1D.

The HLA-based TCR restriction analysis revealed high-risk HLA associated CDR3β phenotypes and demonstrated that the CDR3 risk score could classify different T1D clinical groups. Additionally, the performance of high-risk HLA-associated motifs derived from CDR3β phenotypes was comparable to HLA-based risk assessment. The DeepRC method achieved an AUROC of 0.77 for classification on the test dataset. These T1D-associated TCR motifs were further validated in TCRs residing in pancreas-draining lymph nodes of T1D individuals. Our data highlights the impact of T1D on the TCR repertoire and demonstrates that T1D-specific TCR repertoire features mostly localized to the TCR subsequence level (motifs), showing largely an absence of T1D-associated public clones.

14:45-15:00
Session: Biomedical omics
Coexpression analysis coupled with gene regulatory network reveal Master Regulators in mouse hearts induced by angiotensin II
Confirmed Presenter: Sebastián Urquiza-Zurich, Advanced Center for Chronic Diseases - ACCDiS, Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, Chile

Room: Room B501
Format: In Person


Authors List: Show

  • Sebastián Urquiza-Zurich, Advanced Center for Chronic Diseases - ACCDiS, Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, Chile
  • Emiliano Vicencio, Advanced Center for Chronic Diseases - ACCDiS, Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, Chile
  • Francisco Sigcho-Garrido, Advanced Center for Chronic Diseases - ACCDiS, Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, Chile
  • Francisco Pino-de la Fuente, Advanced Center for Chronic Diseases - ACCDiS, Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, Chile
  • Danica Jiménez-Gallegos, Advanced Center for Chronic Diseases - ACCDiS, Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, Chile
  • Paulo Amaral, Insper Instituto de Ensino e Pesquisa, Brazil
  • Sergio Lavandero, Advanced Center for Chronic Diseases - ACCDiS, Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, Chile
  • Vinicius Maracaja-Coutinho, Advanced Center for Chronic Diseases - ACCDiS, Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, Chile

Presentation Overview: Show

Cardiovascular diseases (CVDs) are one of the leading causes of death worldwide and their prevalence continues to increase. They can be induced by several factors, such as neurohumoral activation (catecholamine dysregulation), and pressure overload (as in hypertension), among others. Angiotensin II (AngII), is a vasoactive peptide with a crucial role in hypertension, cardiac remodeling, and activation of different kinase pathways such as JNK and ERK. The latter is important since these pathways converge on master regulators (MRs) that control gene expression of multiple genes or determine cell fate in different contexts, especially in CVDs. It remains to be understood whether there are other mechanisms associated with neurohumoral changes mediated by other MRs using an integrative and systems biology approach in CVDs. This study aims to identify new MRs that orchestrate responses at the transcriptional level in an Ang II context, using gene co-expression modules and the generation of biological networks. We used a bulk RNA-seq dataset from mouse hearts stimulated with AngII, to prepare the data and determine gene expression, FastQC, Hisat2, featureCounts, and DESeq2 were used sequentially. The differentially expressed genes regulated by Ang II were then used to identify modular transcriptional behavior and generate specific gene regulatory networks (GRNs). For this end, we used CEMiTool and Cytoscape, respectively. This allowed us to identify TFs and MRs that could control the transcriptional changes associated with the Ang II phenotype. A total of 16.818 protein-coding genes were found, of which only 15 were up-regulated and 675 were down-regulated. In addition, four coexpression modules were obtained, in which, by an overrepresentation analysis (ORA), module 1 (M1) with 2.097 genes was found to contain Gene Ontology (GO) processes such as “Collagen containing extracellular matrix”, “Extracellular structure organization” and “Extracellular matrix structural constituent”, that are specific to the model. Through the integrated approach, 10 MRs which could be fundamental in cardiac remodeling, and 19 TFs were identified only in Ang II such as KLF8 and MYBL2. These results not only present a new layer of regulation and understanding of the molecular mechanisms underlying cardiac remodeling, but also offer potential new therapeutic targets to study CVDs, that are associated with hypertension. Data integration with a systems biology approach, allows us to unravel complex GRN and provide a deeper understanding of biological processes at the cardiac level.

15:00-15:15
Session: Biomedical omics
Systems biology approach for complex miRNA-transcription factor networks involved in cancer gene dosage compensation and macrophage phenotypic bistability
Confirmed Presenter: Rodrigo Mora, University of Costa Rica, Costa Rica

Room: Room B501
Format: In Person


Authors List: Show

  • Rodrigo Mora, University of Costa Rica, Costa Rica
  • Carsten Geiss, Institute of Developmental Biology & Neurobiology Johannes Gutenberg University Mainz, Germany
  • Anne Régnier-Vigouroux, Institute of Developmental Biology & Neurobiology Johannes Gutenberg University Mainz, Germany

Presentation Overview: Show

miRNAs are a type of small non-coding RNAs that can regulate gene expression. Sometimes regarded as simple regulators of gene-expression noise, in fact they can interact with transcription factors (TF) participating in the assembly of complex regulatory circuits including negative feedback loops, positive feedback loops, coherent feedforward loops, incoherent feedforward loops, miRNA clusters, and target hubs leading to non-linear, systems-level properties such as bistability, ultrasensitivity, and oscillations. Despite of multiple potential applications, the search for robust miRNA-based regulators has been limited by the paramount complexity of their regulatory networks. Thus, the identification of master key regulators for specific applications requires the aid of computational approaches. We present BioNetUCR, a biocomputational platform for the automatic construction of large-scale regulatory networks of miRNA-TF interactions and their corresponding mathematical models of ordinary differential equations for systems biology studies using COPASI. Two working examples are presented for the application of BioNetUCR to bottom-down systems biology approaches. First, we have modeled the complex networks for a potential network of dosage compensation of gene expression in aneuploid cancer and identified a minimal model of MYC dosage compensation mediated redundantly by 3 miRNAs. These findings were experimentally validated with a novel experimental platform for the future design of therapeutic approaches against aneuploid cancer. Second, using an experimental model of human macrophage polarization, we generated multi-omic data for the construction and simplification of a miRNA-TF network potentially regulating macrophage phenotypic transitions. Here, we identified a minimal model with a switch-like behavior composed of 5 key genes where the property of bistability could be studied to identify ultrasensitivity and hysteresis of the potential transition trajectories between the steady states associated with different macrophage phenotypes. This has potential applications for the regulation of macrophage responses in cancer and inflammatory diseases.

Friday, November 15th
9:00-9:15
Session: Evolutionary and comparative genomics
Evolutionary plasticity and functional repurposing of the essential metabolic enzyme MoeA
Confirmed Presenter: Daniela Megrian, Institut Pasteur, CNRS UMR 3528, Université Paris Cité, Structural Microbiology Unit, F-75015 Paris, France, Uruguay

Room: Room B502
Format: In Person


Authors List: Show

  • Daniela Megrian, Institut Pasteur, CNRS UMR 3528, Université Paris Cité, Structural Microbiology Unit, F-75015 Paris, France, Uruguay
  • Mariano Martinez, Institut Pasteur, CNRS UMR 3528, Université Paris Cité, Structural Microbiology Unit, F-75015 Paris, France, France
  • Pedro Alzari, Institut Pasteur, CNRS UMR 3528, Université Paris Cité, Structural Microbiology Unit, F-75015 Paris, France, France
  • Anne Marie Wehenkel, Institut Pasteur, Université Paris Cité, Bacterial Cell Cycle Mechanisms Unit, F-75015 Paris, France, France

Presentation Overview: Show

MoeA, also known as gephyrin in higher eukaryotes, is an enzyme that plays a crucial role in the biosynthesis of the molybdenum cofactor (Moco) used by molybdoenzymes involved in redox reactions. Gephyrin has acquired additional functions and acts as a moonlighting protein involved in GABA and GlyR receptor clustering at the synapse, a feature thought to be a recent evolutionary trait restricted to eukaryotic Moco biosynthetic enzymes. However, we have recently shown that the clinically relevant phylum of Actinobacteria contains an evolutionary repurposed copy of MoeA (Glp) involved in bacterial cell division. MoeA is present in all domains of life, including Bacteria, Archaea, and Eukaryotes, which motivated our current work to investigate MoeA acquired multifunctionality during the evolution of life. We used phylogenetic inference and protein structure analyses to study the diversity and evolutionary history of MoeA. Glp-expressing Bacteria such as Corynebacteriales have at least two (or more) copies of the gene, and structural analysis of their putative active sites suggest that Glp has lost its ancestral role on the biosynthesis of Moco, implying that MoeA multifunctionality was divided into two specialized paralogs. In Archaea, we identified an ancestral duplication, and the fusion of one of the paralogs to a periplasmic-binding domain that might bind tungsten instead of molybdenum. In eukaryotes, we show that the acquisition of the moonlighting activity of gephyrin comprised three major events: first, MoeA was obtained from Bacteria by early eukaryotes, second, MogA was fused to the N-terminus MoeA in the ancestor of opisthokonts -a clade that includes fungi and animals-, and third, it acquired the function of anchoring inhibitory neurotransmitters by modifying key exposed residues. Our results support the functional versatility and adaptive nature of the MoeA scaffold, which has been repurposed independently in both eukaryotes and bacteria to carry out analogous functions in network organization at the inner cell membrane.

Megrian, D., Martinez, M., Alzari, P. M., & Wehenkel, A. M. (2024). Evolutionary plasticity and functional repurposing of the essential metabolic enzyme MoeA. bioRxiv, 2024-07, doi: https://doi.org/10.1101/2024.07.23.604723

9:15-9:30
Session: Evolutionary and comparative genomics
Comparative transcriptomic analysis reveals the use of divergent transcriptional networks during gastrulation in two phylogenetically distant species of the Scleractinian coral genus Acropora
Confirmed Presenter: Alejandro Reyes-Bermudez, Universidad de la Amazonía, Colombia

Room: Room B502
Format: In Person


Authors List: Show

  • Juan Pablo Ossa Gómez, Universidad de Antioquia, Colombia
  • Héctro Alejandro Rodriguez-Cabal, Universidad de Antioquia, Colombia
  • Alejandro Reyes-Bermudez, Universidad de la Amazonía, Colombia

Presentation Overview: Show

Although gastrulation is a conserved morphogenetic process in Metazoa, this process can occur via different pathways [1–3]. Cnidarians are a clear example of the diversity of gastrulation mechanisms in the animal kingdom, displaying gastrulation mechanisms also used by bilaterians [1, 2]. This is consistent with evidence showing that, to some extent, cellular mechanisms and signaling pathways underlying gastrulation are conserved and reused in various animal clades [2, 3]. However, some other aspects that affect gastrulation, such as developmental speed, embryo size and shape, yolk amount, blastomeres number, or blastocoel formation, are highly variable at the genus level, suggesting the existence of species-specific signaling pathways [1, 3]. This raises important questions regarding the extent to which conserved or divergent genome regulatory programs control gastrulation in phylogenetic close taxa displaying different developmental strategies. To understand the degree of conservation and diversification of these networks, we compared gene expression profiles by RNA-seq during early development of two phylogenetically distant species of the genus Acropora: A. digitifera and A. tenuis. We identified 1629 DEGs common to both species, related to processes such as neuronal differentiation, symmetry axis formation, and germ layer differentiation; some of these molecules mapped to WNT, BMP, and NOTCH signaling pathways. On the other hand, our results revealed inter-specific differences in the relative timing of expression (heterochrony) of conserved networks during gastrulation. Furthermore, we observed that while A. digitifera’s in-paralogs were expressed asynchronously, A. tenuis’s tended to show similar expression patterns. Finally, we identified differences in isoform expression patterns for orthologous genes expressed during gastrulation in both species. Species-specific isoforms generated independent gene interaction nodes, suggesting transcriptional network diversification. This study provides a temporal understanding of the differences in gene expression dynamics underlying the development of two phylogenetic distant Acropora coral species.

9:30-9:45
Session: Evolutionary and comparative genomics
Adaptive Phylogenomics: a target enrichment sequencing method based on Nanopore's adaptive sampling
Confirmed Presenter: Simón Villanueva Corrales, Institute of Botany of the Czech Academy of Sciences, Czechia

Room: Room B502
Format: In Person


Authors List: Show

  • Simón Villanueva Corrales, Institute of Botany of the Czech Academy of Sciences, Czechia
  • Roswitha Schmickl, Charles Univeristy, Czechia
  • Yann Bertrand, Institute of Botany of the Czech Academy of Sciences, Czechia

Presentation Overview: Show

Phylogenetics, phylogeography, and population genetics require the accurate estimation of species trees and species networks -directed acyclic graphs whose leaves represent lineages, populations, and species, as opposed to genes- that model complex evolutionary dynamics within and between populations. Despite being part of the species genome, the history of individual genes tends to conflict with the history of species because of gene duplication, horizontal gene transfer, and deep coalescence. To account for these phenomena, phylogenomics requires DNA sequence data from hundreds of independently segregating genes.
Short-read target enrichment is a method for obtaining such large genomic datasets even in non-model organisms and its low cost allows the inclusion of even hundreds of samples. Hybridization-based target enrichment is the standard method for generating large amounts of sequence data for phylogenomics, but hybridization probes are expensive and library preparation laborious. Adaptive sampling is a feature of Nanopore sequencing that targets specific sequences informatically, resulting in their enrichment without additional wet-lab processing. Although this feature has been demonstrated to effectively target regions intra-species sequences, its potential to target inter-species sequences remains unexplored.
Here, we introduce a novel target enrichment method based on adaptive sampling. We benchmark it against standard target enrichment on the young and evolutionary unresolved genus Cochlearia ( Brassicaceae). We included diploids, autotetraploids (within-species polyploids), and allotetraploids (polyploid hybrids). We compared both methods in terms of, target coverage, informative sites recovered, and reconstructed phylogenies.
For our adaptive sampling approach, we designed 18 target regions distributed across all chromosomes of Cochlearia excelsa, a species with an annotated genome, based on gene density, exonic and intronic composition, and repetitive element content. These regions have a median size of 266 kb and target a total of 6 Mb. We obtained a median coverage of targets of 99.5% with a median depth of 22x. After reconstructing the target regions, the median size of our targeted fragments was 260 kb.
Since the nature of the adaptive sampling data (few very long loci, encompassing exonic, intronic, and intergenic regions) differs from classical target-enrichment data (many short loci, primarily exonic regions), we designed a specific bioinformatics pipeline for these characteristics. Compared to the classical approach data, our novel adaptive sampling method generates a higher density of parsimony informative sites. We furthermore show how our bioinformatic pipeline reconstructs and phases the subgenomes of the allohexaploid sample, Cochlearia bavarica. Finally, we compare the species trees achieved with both methods.

9:45-10:00
Session: Evolutionary and comparative genomics
Introducing MITNANEX: A Comprehensive Bioinformatics Pipeline for the Assembly and Annotation of Human Mitochondrial Genomes from Third-Generation Sequencing Data
Confirmed Presenter: Juan José Picón Cossio, Universidad EAFIT, Colombia

Room: Room B502
Format: In Person


Authors List: Show

  • Juan José Picón Cossio, Universidad EAFIT, Colombia
  • Javier Correa, Universidad EAFIT, Colombia
  • Andrés Carmona, Laboratorio Genómico One Health, Universidad Nacional de Colombia, Sede Medellín, Colombia
  • Isabel Moreno, Laboratorio Genómico One Health, Universidad Nacional de Colombia, Sede Medellín, Colombia
  • Laura Pérez, Laboratorio Genómico One Health, Universidad Nacional de Colombia, Sede Medellín, Colombia
  • Felipe Báez, Unidad de Secuenciación Genómica de Próxima Generación (Uni-SEQs), Universidad de Antioquia, Medellín, Colombia
  • Omer Campo, Unidad de Secuenciación Genómica de Próxima Generación (Uni-SEQs), Universidad de Antioquia, Medellín, Colombia
  • Gustavo Gámez, Unidad de Secuenciación Genómica de Próxima Generación (Uni-SEQs), Universidad de Antioquia, Medellín, Colombia

Presentation Overview: Show

The mitochondrion, also referred to as the powerhouse, is the main cellular machinery for ATP production. However, several studies have demonstrated that this organelle plays important roles in cellular metabolism and homeostasis, such as the synthesis of metabolic precursors, calcium regulation, reactive oxygen species production, immune signaling, and apoptosis. The human mitochondrial DNA molecule (mtDNA) is a circular double-stranded desoxyribonucleic acid of approximately 16kb, which appears to constitute a low complexity genome due to its short length and few repetitive regions, and a very efficient extra-chromosomal molecule because of its high proportion of protein-encoding genes. The mtDNA is also considered a polyploid genome, as each cell can bears multiple mitochondria with diverse polymorphic DNA molecules, which is commonly known as heteroplasmy. The mitochondrial genetic information has also been associated with diseases and aging in humans, and its heteroplasmy manifests critical protein expression points. Moreover, several mtDNA mutations have been associated with development of pathological conditions such as the congestive heart failure. Similarly, evidence of C150T mutation favoring longevity has been widely discussed, therefore heteroplasmy detection has represented the main approach in mitochondrial studies in health and longevity. Heteroplasmy detection presents a challenge for long-reads sequencing technologies, such as Oxford-Nanopore (ONT), due to the production of prone-error reads, consequently Sanger and Next Generation Sequencing (NGS) has dominated mitochondrial research the last years, nonetheless ONT recently presented a new pore chemistry R10 which coupled with HERRO correction can achieve read qualities of Q30 and researchers have highlighted the improvement in epigenetic modifications, Single Nucleotide Polymorphism (SNP) and Structural Variations (SV) detection. As a consequence, most open source tools for mtDNA processing have been developed for short reads technologies, E.g NOVOplasty, MITObim, MEANGS, Norgal and MitoZ. MitoHiFi, published in 2022, represented the first pipeline proposal for assembly and annotation of mitogenome for long reads, but it works with PacBio HiFi reads. Thus, in this study, we introduce MITNANEX a novel pipeline designed for the extraction of mitochondrial ONT reads and the detection of point mutations and ancestry composition. Here, we used MITNANEX for the genetic assembling and analysis of mitogenomes of Centenarians, human older than 100-years of age, aiming to identify variants associated with longevity, as well as, their genetic lineages. We successfully assembled and annotated a circular mtDNA molecule of 16,568 bp, belonging to a 105-YO female-human, with a 178x coverage and found mutations correlated with longevity, as well as mapped haplotypes.

10:00-10:15
Session: Evolutionary and comparative genomics
Transposable Elements Identification in the Neotropical Species Drosophila amaguana. What lies behind this large genome?
Confirmed Presenter: Manuel Alejandro Coba-Males, Maestría en Biología Computacional, Laboratorio de Genética Evolutiva, Pontificia Universidad Católica del Ecuador, Ecuador

Room: Room B502
Format: In Person


Authors List: Show

  • Manuel Alejandro Coba-Males, Maestría en Biología Computacional, Laboratorio de Genética Evolutiva, Pontificia Universidad Católica del Ecuador, Ecuador
  • Simon Orozco-Arias, Department of Computer Sciences, Universidad Autónoma de Manizales - Colombia, Colombia
  • Romain Guyot, Institut de Recherche pour le Développement, IRD, CIRAD, Université de Montpellier - France, France
  • Doris Vela, Maestría en Biología Computacional, Laboratorio de Genética Evolutiva, Pontificia Universidad Católica del Ecuador, Ecuador

Presentation Overview: Show

Genome size in eukaryotic species is related to the large amounts of noncoding DNA sequences that influence genome organization. Variations in genome size are not always related to organism complexity; instead, they could result from a high copy number of repetitive DNA, such as transposable elements (TEs), satellite DNA (satDNA), introns, and others. Insects are considered crucial actors in evolutionary processes involving mobile elements. Insect taxa are the major contributors to the available information about TEs. Previous studies have described the TE content in multiple insect species, finding a positive correlation between genome size and TE content. Drosophila amaguana is a neotropical species that may hold the record for the largest genome size among other Drosophila species; however, the information about the TE landscape in this species has not yet been explored. This study describes, for the first time, the mobilome of D. amaguana to determine whether the TE content contributes to its large genome. To analyze the genome of D. amaguana, we used three bioinformatic pipelines with a de novo approach to estimate the TE content. The tools employed were the Extensive de novo TE Annotator (EDTA), Repeat Modeler, and reasonaTE. These de novo libraries were used to create a manually curated TE library with the MCHelper tool, which consisted of 737 consensus sequences. Of these, only 21 sequences showed similarity to TE families previously described in other Drosophila species. The results suggest that the genome of D. amaguana may contain 21.54% TEs, mainly composed of Helitrons (6.35%), LTR retrotransposons (5.13%), TIR transposons (3.63%), and LINEs (3.61%). Finally, our findings suggest that the large genome size in D. amaguana is not a consequence of a high number of TEs. Other types of repetitive DNA, such as satDNA, could explain the larger genome size of D. amaguana. These findings may provide valuable data for understanding adaptive and evolutionary processes in the mesophragmatica group and neotropical groups of Drosophila in the Andean forests.

10:15-10:30
Session: Evolutionary and comparative genomics
Combining gene genealogies and pedigrees to inform genetic screening programs about founder mutations
Confirmed Presenter: Alejandro Mejia Garcia, Department of human genetics, McGill University, Montreal, QC, Canada, Canada

Room: Room B502
Format: Live Stream


Authors List: Show

  • Alejandro Mejia Garcia, Department of human genetics, McGill University, Montreal, QC, Canada, Canada
  • Alex Diaz-Papkovich, Deparment of human genetics, McGill university, Canada
  • Guillaume Sillon, Departments of Medicine and Human Genetics, McGill University Health Center, Montreal, QC, Canada, Canada
  • Daniela D'Agostino, Departments of Medicine and Human Genetics, McGill University Health Center, Montreal, QC, Canada, Canada
  • Anne-Laure Chong, Cancer Research Program, The Research Institute of the McGill University Health Centre, Montreal, QC, Canada, Canada
  • George Chong, Cancer Research Program, The Research Institute of the McGill University Health Centre, Montreal, QC, Canada, Canada
  • Ken Sin Lo, CARTaGENE, CHU Sainte-Justine, Montreal, QC, Canada, Canada
  • Laurence Baret, Departments of Medicine and Human Genetics, McGill University Health Center, Montreal, QC, Canada, Canada
  • Nancy Hamel, Cancer Research Program, The Research Institute of the McGill University Health Centre, Montreal, QC, Canada, Canada
  • William D. Foulkes, Cancer Research Program, The Research Institute of the McGill University Health Centre, Montreal, QC, Canada, Canada
  • Guillaume Lettre, CARTaGENE, CHU Sainte-Justine, Montreal, QC, Canada, Canada
  • Daniel Taliun, Department of human genetics, McGill University, Montreal, QC, Canada, Canada
  • Adam Shapiro, Research Institute of the McGill University Health Centre, Division of Pediatric Respirology, Montreal QC, Canada, Canada
  • Simon Gravel, Department of human genetics, McGill University, Montreal, QC, Canada, Canada

Presentation Overview: Show

Introduction: Gene genealogies represent the ancestry of a sample and are often encoded as ancestral recombination graphs (ARG). It has recently become possible to infer these gene genealogies from sequencing or genotyping data and use them for evolutionary and statistical genetics. Unfortunately, inferred gene genealogies can be noisy and subject to biases, making their applications more challenging. The goal of this project is to study the application of ARG methods to systematically impute and trace the transmission of all disease variants in founder populations where long shared haplotypes allow for accurate timing of relatedness. We apply the methods to the population of Quebec, where multiple founder events led to uneven distribution of pathogenic variants across regions, and where extensive population pedigrees are available.
Methods: Using the CARTaGENE cohort data, we reconstruct the ARG using genotype data of 30k genotyped individuals. We identified carriers of known and novel pathogenic mutations using whole genome sequencing data of a subset of individuals (WGS, n=2173). Using the ARG, we estimated mutation age and whether each variant was introduced once or multiple times in the population. We developed an ARG-based imputation strategy to infer carrier rates within the genotype data and used ISGen to impute the variants in unsampled individuals via the genealogical record, allowing for estimation of regional frequency estimates for many pathogenic variants.
Results: We applied our method to 9 known founder mutations in Saguenay, and we estimated their mutation age. We found a positive correlation between GnomAD carrier frequency in non-Finnish Europeans and mutation age (R=0.89). Moreover, we estimated carrier frequency in Saguenay, and we did not find significant differences in carrier frequency against the literature for 8/9 of them. In addition, we validated our imputation approach by genotyping the mutation PMS2 C.2117DEL and we found accuracy of 99.97% and kappa statistic of 90.90%. On the other hand, we analyzed the Colombian founder mutation HPSE2: c.1516C>T (p.Arg506Ter) and we found a mutation age of 360 years, and an European origin.
Conclusions: We demonstrated that ARG-based imputation is useful for the study of rare mutations and allows posterior regional frequency estimation. We showed that this method can be applied to admixed populations like Colombia.