View Posters By Category

Scroll down to view Results

Session A: July 21, 2025 at 10:00-11:20 and 16:00-16:40
Session B: July 22, 2025 at 10:00-11:20 and 16:00-16:40
Session A Posters set up:
Monday, July 21 between 08:00 - 08:40
Session A Posters dismantle:
Tuesday, July 22 at 18:00
Session B Posters set up:
Monday, July 21 between 08:00 - 08:40
Session B Posters dismantle:
Tuesday, July 22 at 18:00
Session C: July 23, 2025 at 10:00-11:20 and 16:00-16:40
Session D: July 24, 2025 at 10:00-11:20 and 13:00-14:00
Session C Posters set up:
Wednesday, July 23 between 08:00 - 08:40
Session C Posters dismantle:
Thursday, July 24 at 16:00
Session D Posters set up:
Wednesday, July 23 between 08:00 - 08:40
Session D Posters dismantle:
Thursday, July 24 at 16:00
Virtual
Student Council Symposium

Results

A-227: FAMUS: A Few-Shot Learning Framework for Large-Scale Protein Annotation
Track: Function: Gene and Protein Function Annotation
  • Guy Shur, Tel Aviv University, Israel
  • David Burstein, Tel Aviv University, Israel


Presentation Overview: Show

Predicting gene function is a pivotal step in metagenomic data analysis. Following gene calling, putative protein sequences are assigned functions based on similarity to sequences from annotated protein databases. Current high-throughput tools for functional annotation typically propagate the function of the single most similar sequence or sequence family from the database to the query sequence. However, many functional ortholog groups in these databases contain few examples, making it challenging to confidently annotate sequences based solely on similarity score to these underrepresented groups. Here, we present FAMUS (Functional Annotation Method Using Siamese neural networks), a contrastive learning framework for functional annotation. FAMUS compares query sequences to profile Hidden Markov Models (pHMMs) and is trained to condense the produced similarity scores into a low-dimensional vector space that minimizes the distance of vectors of proteins from the same family. Query sequences are thus annotated using an embedding based on similarity scores to all profiles. We validated our approach by training a model based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) Orthology, achieving an F1-score of 0.911 on a diverse protein set compared to 0.886 by KEGG's KofamScan. We created additional models using protein families from the InterPro, OrthoDB, and EggNOG databases and made all four models available online via https://famus-6e94e.web.app/. FAMUS is the first comprehensive modular annotation framework based on contrastive learning. Our framework can be easily integrated into genomic and metagenomic analysis pipelines to facilitate accurate, large-scale functional annotation while being extensible to any database in various knowledge domains.

A-229: Fold first, ask later: structure-informed function prediction in Pseudomonas phages
Track: Function: Gene and Protein Function Annotation
  • Hannelore Longin, Computational Systems Biology, KU Leuven, Belgium; Laboratory of Gene Technology, KU Leuven, Belgium, Belgium
  • George Bouras, Adelaide Medical School, the University of Adelaide, Australia, Australia
  • Susanna Grigson, Flinders Accelerator for Microbiome Exploration, Flinders University, Australia, Australia
  • Robert Edwards, Flinders Accelerator for Microbiome Exploration, Flinders University, Australia, Australia
  • Hanne Hendrix, Laboratory of Gene Technology, KU Leuven, Belgium, Belgium
  • Rob Lavigne, Laboratory of Gene Technology, KU Leuven, Belgium, Belgium
  • Vera van Noort, Computational Systems Biology, KU Leuven, Belgium, Belgium


Presentation Overview: Show

Phages, the viruses of bacteria, are the most abundant biological entities on earth. In general, phage genomes are densely coded and contain many open reading frames, yet up to 70% encode proteins of unknown function. Despite clinical, biotechnological and fundamental interests in unravelling these proteins’ functions, phage proteins are absent from recent large-scale structure-based efforts (such as AlphaFold database).

Here, we investigate the efficacy of structure-based protein annotation for Pseudomonas-infecting phages, comparing different post-processing strategies to obtain function annotations from FoldSeek output. Briefly, we collected every protein annotated as ‘hypothetical/phage protein’ in NCBI and of at least 100 amino acids in length, of 887 Pseudomonas-infecting phages. These 38,025 proteins (31% of all proteins) were then clustered into 10,453 groups of homologs. Protein structures were predicted with ColabFold and structural similarity to the PDB and AlphaFold database was assessed with FoldSeek. Of all proteins, 59% displayed significant similarity to at least one structure in these databases. We benchmarked various strategies for extracting function from these FoldSeek hits, integrating different information resources, hit selection methods, and structure-based clustering of the hits. The resulting annotations were then compared with state-of-the-art sequence- and structure-based phage annotation tools Pharokka and Phold.

On average, up to 42% of the phage proteins of unknown function could be annotated using structure-based methods, depending on the post-processing strategies applied. While caution is warranted when transferring protein annotations based on similarity, these methods can significantly speed up research into new antimicrobials and biotechnological applications inspired by nature’s finest bioengineers: phages.

A-231: Multi-Objective Optimization Method for Superior Pareto-Optimal Antibody Sequence
Track: Function: Gene and Protein Function Annotation
  • Yutaro Kido, Research & Development Group, Hitachi, Ltd., Tokyo, Japan., Japan
  • Kunihiko Kido, Research & Development Group, Hitachi, Ltd., Tokyo, Japan., Japan
  • Wataru Takeuchi, Research & Development Group, Hitachi, Ltd., Tokyo, Japan., Japan


Presentation Overview: Show

Motivation: The market for antibody drugs has been rapidly expanding in recent years. However, antibody drugs are challenging to develop due to the need to search for antibodies with superior properties among the large number of amino acid combinations. In particular, lead optimization requires searching for antibodies with superiority in multiple properties, which is time-consuming and costly.
Results: In this study, we developed a sequence generation method to search for superior Pareto-optimal solutions to support antibody lead optimization. In recent years, in silico antibody screening methods have been developed to re-duce the time required for antibody discovery. However, conventional methods rely on multi-objective optimization based on the weighted sum of different properties, which can introduce bias towards certain features depending on the chosen weights. This often leads to failure to identify superior Pareto-optimal solutions. The proposed method ad-dresses this problem by using hypervolume improvement as a multi-objective indicator that quantifies the volume oc-cupied by the non-inferior solution set within the solution space. In addition, it reduces computational complexity by selectively narrowing down the Pareto solutions that serve as reference points for hypervolume improvement. An in silico evaluation of the amino acid sequence of the CDR3 region of trastuzumab showed that our method improved the hypervolume of convergent solutions by 8.8% compared to conventional methods. This result suggests our method successfully generated a superior Pareto-optimal sequence.

A-233: Mapping protein structure space to function: towards better structure-based function prediction
Track: Function: Gene and Protein Function Annotation
  • Paweł Szczerbiak, Sano Centre for Computational Personalised​ Medicine, Kraków, Poland, Poland
  • Lukasz Szydlowski, Sano Centre for Computational Medicine, Kraków, Poland, Poland
  • Witold Wydmański, Faculty of Mathematics and Computer Science, Jagiellonian University, Kraków, Poland, Poland
  • Adam Nowak, Sano Centre for Computational Medicine, Kraków, Poland, Poland
  • P. Douglas Renfrew, Center for Computational Biology, Flatiron Institute, New York, USA, United States
  • Julia Koehler Leman, Open Molecular Software Foundation, Davis, USA, United States
  • Tomasz Kosciolek, Sano Centre for Computational Medicine, Kraków, Poland, Poland


Presentation Overview: Show

The emergence of large-scale protein structure prediction tools such as AlphaFold2 and ESMFold has transformed our ability to explore protein structure space, offering access to hundreds of millions of predicted models - far beyond the scope of traditional structure repositories like the Protein Data Bank. In our work, we comprehensively examine the structural clusters obtained from the AlphaFold Protein Structure Database (AFDB), a high-quality subset of ESMAtlas, and the Microbiome Immunity Project (MIP) database. We create a single cohesive low-dimensional representation of the resulting protein space. Our results show that, while each database occupies distinct regions within the protein structure space, they collectively exhibit significant overlap in their functional profiles. High-level biological functions tend to cluster in particular regions, revealing a shared functional landscape despite the diverse sources of data. These observations allow us to pinpoint underrepresented regions - functional blind spots - highlighting opportunities for targeted improvement in structure-based protein function prediction. Building on our prior work (deepFRI), which demonstrated the benefits of integrating sequence and structural information for functional annotation, we argue that the current availability of extensive structural and functional data provides a unique opportunity. We argue that now is the ideal time to pursue a comprehensive, structure-driven mapping of protein function, laying the foundation for more accurate and generalizable predictive models. This work is based on: https://doi.org/10.1101/2024.08.14.607935.

A-235: Functions of Low Complexity Regions in protein families
Track: Function: Gene and Protein Function Annotation
  • Joanna Ziemska-Legięcka, Institute of Biochemistry and Biophysics PAS, Poland
  • Patryk Jarnot, Silesian University of Technology, Poland
  • Sylwia Szymańska, Silesian University of Technology, Poland
  • Dagmara Błaszczyk, Jagiellonian University, Poland
  • Alicja Staśczak, Silesian University of Technology, Poland
  • Hanna Langer-Macioł, Silesian University of Technology, Poland
  • Kinga Lucińska, Silesian University of Technology, Poland
  • Karolina Widzisz, Silesian University of Technology, Poland
  • Aleksandra Janas, Silesian University of Technology, Poland
  • Hanna Słowik, Silesian University of Technology, Poland
  • Wiktoria Śliwińska, Silesian University of Technology, Poland
  • Aleksandra Gruca, Silesian University of Technology, Poland
  • Marcin Grynberg, Institute of Biochemistry and Biophysics PAS, Poland


Presentation Overview: Show

LCRAnnotationsDB is a comprehensive database designed to consolidate knowledge about Low Complexity Regions (LCRs) in proteins. These regions, characterized by a low diversity of amino acid composition, play significant roles in various biological processes. The database aims to unify redundant annotations by categorizing them based on similarity in function, protein structure, and biological process, and linking them to Gene Ontology terms, where possible. Annotation statistics reveal a diverse range of functions, with notable cases of nucleotide and RNA binding. This database is based on the Swiss-Prot Database, which contains over 60% of proteins with at least one LCR identified using the SEG method. More than 5 mln annotations from 12 open databases describe these fragments. Additionally, we have analysed co-occurrence of LCRs with specific annotations in protein families. Here, we also show that some Gene Ontologies terms of LCR functions are not included in protein family descriptions, even if they represent a significant portion of the family members. Further research is needed to address these methodological flaws and enhance the accuracy of the database.

A-237: Leveraging Protein Language Models for Functional Prediction
Track: Function: Gene and Protein Function Annotation
  • Vinh-Son Pho, Laboratory of Computational, Quantitative and Synthetic Biology (CQSB) UMR 7238 CNRS - Sorbonne Université, France
  • Matteo Scarsini, Phytogenomics laboratory, IBENS, UMR 8197 CNRS, - École Normale Supérieure - Inserm U1024, France
  • Chris Bowler, Phytogenomics laboratory, IBENS, UMR 8197 CNRS, - École Normale Supérieure - Inserm U1024, France
  • Alessandra Carbone, Laboratory of Computational, Quantitative and Synthetic Biology (CQSB) UMR 7238 CNRS - Sorbonne Université, France


Presentation Overview: Show

Understanding protein function is crucial for advancing fields like drug discovery, biotechnology, and molecular biology. Traditional methods for protein functional prediction often rely on sequence alignment or hidden Markov models (HMMs), which can be limited when dealing with low sequence similarity. Recent advancements in Protein Language Models (PLMs), trained on vast protein sequence datasets, offer a promising alternative by capturing deep contextual features directly from sequence information.

In this work, we introduce PLM-View, a novel approach that leverages PLMs to cluster proteins of similar function relying solely on sequence. By integrating techniques from transfer learning, bioinformatics, and graph theory, our method hierarchically clusters amino acid sequences in a dendrogram representing the functional classification. Incorporating proteins of known function into the clustering process allows us to infer the functions of previously uncharacterized sequences.

We demonstrate the effectiveness of PLM-View on the Structure Function Linkage Database, achieving significant improvement over traditional HMM-based methods in predicting enzymatic functions. Finally, on TARA Oceans data, our approach reveals correlations between sequence data, gene expression and geographic position, offering new insights into the functional ecology of marine proteins.

This work highlights the potential of PLMs as powerful tools for functional prediction, opening new avenues for understanding protein biology at scale.

A-239: Expanding the TolC Efflux Repertoire in Klebsiella pneumoniae with Pretrained Protein Language Models
Track: Function: Gene and Protein Function Annotation
  • Jingyi Zhu, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, China
  • Jinzheng Ren, Department of Life Sciences, University of Bath, United Kingdom
  • Rui Weng, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, China
  • Xiaoting Hua, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, China
  • Jiawei Wang, Department of Life Sciences, University of Bath, United Kingdom
  • Yan Jiang, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, China


Presentation Overview: Show

Klebsiella pneumoniae, a clinically significant pathogen, frequently develops multidrug resistance through diverse efflux pump systems. These typically comprise a tripartite complex that includes the conserved outer membrane protein TolC. Despite its central role, the diversity and functional range of TolC homologs in K. pneumoniae remain incompletely understood. Here, we leverage pretrained large-scale protein language models, fine-tuned on curated TolC protein sequences, to identify and functionally characterize TolC homologs. Our approach demonstrates improved sensitivity and specificity over existing tools based solely on sequence or structural similarity. Comprehensive analysis across the K. pneumoniae genomic landscape - including 1,150 diverse clinical isolates - revealed 5-9 TolC homologs per strain, exhibiting distinct clustering in protein feature space. Notably, we identified a plasmid-encoded MFS-type efflux system, EmrGH-KolC3, incorporating a novel TolC-like β-barrel protein. This system is predicted to mediate the efflux of tigecycline and deoxycholate, with strain-dependent activity against chloramphenicol and levofloxacin. This work not only categorizes and expands the known repertoire of TolC homologs in K. pneumoniae, offering new insights into its efflux-mediated resistance landscape, but also establishes a scalable framework for discovering resistance-associated efflux systems in clinically important pathogens.

A-241: Genome-scale metabolic modeling with phenotyping data from a gene knockout library predicts functions of uncharacterized genes
Track: Function: Gene and Protein Function Annotation
  • Yunli Eric Hsieh, Max Planck Institute of Molecular Plant Physiology, Germany
  • Zoran Nikoloski, Max-Planck Institute of Molecular Plant Physiology, Germany


Presentation Overview: Show

Systematic genome-scale phenotyping of gene knockout libraries is widely used to connect genotypes to phenotypes, including Chlamydomonas reinhardtii. Although this approach can place uncharacterized genes into functional pathways, pinpointing the exact molecular mechanism remains challenging and time-consuming. Here, we present a computational approach, termed GEM-ORACLE, that relies on integrating phenotyping data on condition-specific growth of strains from a gene knockout library into genome-scale metabolic model to predict metabolic functions of uncharacterized genes. This is achieved by developing and using an innovative constraint-based modeling approach to link metabolic reactions lacking genomic annotation with corresponding gene mutations. We applied GEM-ORACLE to simultaneously integrate data from photoautotrophic, heterotrophic, and mixotrophic growth for each of the 864 mutants from the Chlamydomonas Library Project into a recently developed enzyme-constrained genome-scale metabolic model of C. reinhardtii. Among 426 reactions annotated with at least one protein found in the model and with a mutant in the library, GEM-ORACLE achieved a recall rate of 73%. Furthermore, GEM-ORACLE annotated eight previously unannotated metabolic reactions and improved one reaction with incomplete genomic support. These proteins, consist of five transporters and four enzymes, involved in metabolic systems of urea degradation, carotenoid biosynthesis, and ascorbate and aldarate metabolism. These function predictions are further supported with deep learning models for transport and catalytic activity with specific substrates. GEM-ORACLE represents a versatile tool to annotate molecular function of uncharacterized genes in microalgae and other organisms for which genome-scale models and phenotyping data for mutant libraries are readily available.

A-243: Resources for Annotation of PolyHydroxyAlkanoate Synthases
Track: Function: Gene and Protein Function Annotation
  • Karel Sedlar, Department of Biomedical Engineering, Brno University of Technology, Czechia
  • Kristyna Hermankova, Department of Biomedical Engineering, Brno University of Technology, Czechia
  • Mohammad Umair, Department of Biomedical Engineering, Brno University of Technology, Czechia
  • Katerina Sabatova, Department of Biomedical Engineering, Brno University of Technology, Czechia


Presentation Overview: Show

Current annotation of specific groups of enzymes is still limited by a lack of focused resources resulting in annotation with very general terms. This also applies to PolyHydroxyAlkanoates (PHA) synthases, enzymes capable of synthesizing microbial polyesters in various prokaryotic microorganisms that are often annotated simply as hydrolases. Therefore, we overlook the enormous biosynthetic capacity for novel green technologies because PHA themselves offer a solution to current challenges in the circular economy. Unlike petrochemical-based synthetic polymers, PHA are suitable for biological recycling offering sustainable, selective, and environmentally friendly plastic end-of-life options. PHA synthases are further divided into four different classes. While classes I and II are represented by monomers coded by phaC genes, classes III and IV are heterodimers coded beside phaC also phaE and phaR genes, respectively. As our recent results showed, while general resources like KEGG or COG contain entries on PHA synthases, they are mainly built around class I and II synthases and do not represent the actual division of PHA synthases. Here, we present our custom-made, manually curated, dataset of sequences coding PHA synthases considering all available current knowledge on their classification. Furthermore, we demonstrate the use of this dataset for annotation of PHA synthases in genomes of non-model bacteria using standard approaches based on sequence similarity and alignment like BLAST or Hidden Markov Models. Additionally, we show the limitation of their primary structure comparison and discuss the possible improvement based on predicted tertiary structure comparison.

A-245: The integration of genomic context and structural similarity towards the characterization of bacterial virulence factors
Track: Function: Gene and Protein Function Annotation
  • Madu Nzerem, New York University Grossman School of Medicine, United States
  • Alejandro Pironti, New York University Grossman School of Medicine, United States
  • David Fenyö, New York University Grossman School of Medicine, United States


Presentation Overview: Show

The global increase in bacterial antibiotic resistance poses a serious threat to human health. For treating bacterial infections, antivirulence therapy is a promising approach based on targeting bacterial virulence factors (VFs). Current computational approaches for identifying VFs rely on sequence similarity to known VFs and perform poorly. To identify and functionally categorize VFs, we have developed a hybrid approach that integrates language model embeddings of a gene in its genomic context, and the corresponding protein’s structural similarity to known VFs. Our model outputs three sets of predictions based on (1) genomic context, (2) protein structure similarity, and (3) the integration of (1) and (2). All predictions consist of classification scores for each VF functional category and a non-VF category. We evaluated the performance of our models with five-fold cross validation, resulting in F1 scores averaged across all functional categories of 74% using only genomic context, and 67% using both genomic and structural similarity, on test sets with 0.90 or 0.30 sequence identity holdout respectively. We apply this prediction framework to Enterobacter cloacae complex (ECC) clinical isolates, which are known for their antibiotic resistance, and poorly characterized pathogenesis. Through our VF hybrid prediction model, we characterized the putative virulome of ECC strains, and identified a phage endolysin with the potential application for treatment against ECC infections. Beyond predictions in ECC, we show that genomic context and structural similarity are more informative for prediction in distinct, and overlapping VF functional categories, and integrating both modalities can improve predictions of VFs.

A-247: Redesigning the PDBe and PDBe-KB Web Portal for Improved Data Integration and Accessibility
Track: Function: Gene and Protein Function Annotation
  • Jennifer Fleming, EMBL-EBI, United Kingdom
  • Marcelo Querino Lima Afonso, EMBL-EBI, United Kingdom
  • Ivanna Pidruchna, EMBL-EBI, United Kingdom
  • Adam Midlik, EMBL-EBI, United Kingdom
  • Preeti Choudhary, EMBL-EBI, United Kingdom
  • Sreenath Nair, EMBL-EBI, United Kingdom
  • Ibrahim Roshan Kunnakkattu, EMBL-EBI, United Kingdom
  • Sri Devan Appasamy, EMBL-EBI, United Kingdom
  • Sameer Velankar, EMBL-EBI, United Kingdom


Presentation Overview: Show

The Protein Data Bank in Europe (PDBe) and its associated PDBe-Knowledge Base (PDBe-KB) are key resources for accessing macromolecular structure data and biological annotations. To meet the growing demand for integrated, accessible structural data, the PDBe/PDBe-KB web portal is undergoing a major redesign focused on improving usability and data connectivity across experimental structures, predicted models, functional annotations, and biological context.
As structural biology expands to include increasingly complex datasets, linking and interpreting these data has become more challenging, particularly for users without specialist expertise. The redesigned portal addresses this by providing a user-friendly framework centred on 3D structure visualisation and contextual navigation. A knowledge graph-based approach underpins the integration of diverse data types, enabling users to explore structural relationships and generate new biological insights.
New features include redesigned PDBe entry pages, aggregated ligand pages, and a new dedicated view for aggregated data on macromolecular complexes. Interactive visualisation tools support residue-level annotations, molecular interactions, and other structural features. An expanded set of API endpoints provides consistent, programmatic access to aggregated data, supporting use in external workflows.
These updates are shaped by community feedback and align with FAIR (Findable, Accessible, Interoperable, Reusable) data principles. By improving access to high-quality, integrated structural data, the new portal supports a broad range of basic and translational research applications across the life sciences.

A-249: GOFEAT-AI: AI assistant for functional annotation analysis using RAG and LLM
Track: Function: Gene and Protein Function Annotation
  • Fabricio Araujo, Universidade Federal Rural da Amazônia - UFRA, Brazil
  • Gilberto Nerino de Souza Júnior, Universidade Federal Rural da Amazônia - UFRA, Brazil
  • Igor Guerreiro Hamoy, Universidade Federal Rural da Amazônia - UFRA, Brazil
  • Jhenifer Rayane Ramos Farias, Universidade Federal Rural da Amazônia - UFRA, Brazil
  • Antonio Santiago de Sousa Neto, Universidade Federal Rural da Amazônia - UFRA, Brazil
  • Emilia Gabriela da Conceição Baia, Universidade Federal Rural da Amazônia - UFRA, Brazil
  • André Luiz Monteiro Oliveira, Universidade Federal Rural da Amazônia - UFRA, Brazil
  • Juliana Minuzzi Niederauer, Universidade Federal Rural da Amazônia - UFRA, Brazil
  • Salomão Braga Santos, Universidade Federal do Pará - UFPA, Brazil
  • Victor Benedito Costa Ferreira, Universidade Federal Rural da Amazônia - UFRA, Brazil
  • Rommel Thiago Jucá Ramos, Universidade Federal do Pará - UFPA, Brazil
  • Marcus de Barros Braga, Universidade Federal Rural da Amazônia - UFRA, Brazil


Presentation Overview: Show

One major step in genomic analysis is to identify products in data based on ORF prediction. Among many tools, GOFEAT stands out as a user-friendly software for functional annotation. GOFEAT has a simplified way of showing results with graphs and external sources. It generates several gene annotation informations, which minimizes the effort to analyze the results, when compared to other tools.
GOFEAT-AI’s is a new software, based on GOFEAT, that implements a RAG (Retrieval Augmented Generation) that uses annotation data from GOFEAT’s projects as input. The data is divided into chunks of 1,024 characters with 100 characters overlapping. Resulting data is splitted into tokens that go through an embedding process generating vectors that are stored in a database (FAISS). Finally, Llama 3.2:1b is used as an LLM (Large Language Models) for a Q&A interaction to simplify mining the data.
We present GOFEAT-AI, a new functional annotation analysis tool that uses LLM and RAG to create a new way of interaction between users and their data. GOFEAT-AI presents a chatbot that can answer users’ questions based on their project context, making it easier to get specific information from their results.
LLM is a new way for users to interact with data. Thus, GOFEAT-AI was developed to help users analyze and interact with their functional annotation results, making it simpler to get insights about the analysis of a specific project. GOFEAT-AI is available for use at https://gofeat-ai.npca.tec.br/gofeat/ and will be publicly available for download soon.

A-251: Cell type–specific functions of nucleic acid-binding proteins revealed by deep learning on co-expression networks
Track: Function: Gene and Protein Function Annotation
  • Naoki Osato, Institute of Science Tokyo, Japan
  • Kengo Sato, Institute of Science Tokyo, Japan


Presentation Overview: Show

Nucleic acid-binding proteins (NABPs) exhibit cell type–specific regulatory functions, yet their target genes and roles remain incompletely defined due to limitations in current experimental methods. Here, we present a deep learning approach that integrates gene co-expression correlations to predict NABP regulatory targets and infer their functions across diverse cellular contexts, without requiring binding motif information. By replacing low-informative input features in a transcriptome-based prediction model with co-expression-derived interactions, we achieved improved accuracy in gene expression prediction. Functional enrichment analysis revealed biologically coherent annotations aligned with known NABP roles. In parallel, we employed a ChatGPT-assisted interpretation framework that provided functional predictions based on semantic patterns among gene sets—an approach that may offer interpretability even when gene sets are modest in size. Predictions supported by both statistical enrichment and ChatGPT-based inference were less likely to reflect spurious associations, reinforcing their biological relevance. Together, this integrative framework combines deep learning, co-expression networks, and generative AI to uncover both established and previously uncharacterized NABP functions in a cell type–specific manner.

A-253: Massive Expansion of Carbohydrate-Active Enzymes from the Microbial Universe
Track: Function: Gene and Protein Function Annotation
  • Licheng Zong, EMBL-EBI; University of Bath; The Chinese University of Hong Kong, United Kingdom
  • Jinzheng Ren, EMBL-EBI; University of Bath; The Australian National University, United Kingdom
  • Timothy Rozday, EMBL-EBI, United Kingdom
  • Jennifer Mattock, EMBL-EBI, United Kingdom
  • Yu Li, The Chinese University of Hong Kong, Hong Kong
  • Robert Finn, EMBL-EBI, United Kingdom
  • Jiawei Wang, EMBL-EBI; University of Bath, United Kingdom


Presentation Overview: Show

Carbohydrate-active enzymes (CAZymes) are crucial for the transformation of complex carbohydrates in natural and industrial processes. The microbial world holds a vast, largely untapped reservoir of CAZymes, yet identifying and characterizing these enzymes is challenging due to the reliance on sequence similarity for detection, which limits the discovery of novel enzymes with low homology to known counterparts. In this study, we report a massive expansion of the known CAZyme repertoire through the exploration of microbial genomic data with pretrained protein language models. With the ability to uncover high-level relationships among CAZymes, our tool detects hundreds of thousands of previously uncharacterized enzymes across diverse microbial ecosystems. Our findings reveal the immense potential of microbial communities for enzyme discovery and highlight the tool and the newly expanded CAZyme reservoir as valuable resources for understanding microbial ecology, the gut microbiome, and future biotechnological applications.

A-255: Bioinformatic analysis of key metabolic pathways in microalgae
Track: Function: Gene and Protein Function Annotation
  • Antonia D'Argenio, Department of Biology, University of Naples Federico II, Via Cinthia – Edificio 7, 80126 – Napoli, Italy, Italy
  • Deborah Giordano, CNR-ISA, Institute of Food Science, via Roma 64, Avellino, Italy, Italy
  • Angelo Facchiano, CNR-ISA, Institute of Food Science, via Roma 64, Avellino, Italy, Italy
  • Giuliana D'Ippolito, CNR-ICB, Institute of Biomolecular Chemistry, Via Campi Flegrei, 34, 80078 Pozzuoli (NA), Italy, Italy
  • Angelo Fontana, University of Naples Federico II – Napoli; CNR-ICB, Institute of Biomolecular Chemistry, Pozzuoli (NA), Italy, Italy


Presentation Overview: Show

This study focused on the bioinformatic analysis of key enzymes involved in glycolipid biosynthesis and sulfur metabolism in the extremophilic microalga Galdieria sulphuraria and in the diatom species Skeletonema marinoi, Cyclotella cryptica, and Thalassiosira pseudonana. These organisms are of great biotechnological interest due to their adaptability to wide environmental conditions and their production of bioactive metabolites.
Gene and protein sequences were retrieved from public databases (NCBI, KEGG, UniProt, InterPro, PDB) and analyzed for similarity and conserved domains. Functional annotation was performed, and KEGG data were integrated to reconstruct preliminary metabolic pathways associated with glycolipid biosynthesis and sulfur metabolism. Protein structure modeling was performed using available crystallographic templates or de novo approaches when experimental data were lacking.
In G. sulphuraria, candidate genes involved in sulfur and glycerolipid metabolism were identified, with conserved domains such as NAD(P)-binding Rossmann fold and Glyco_transf. In T. pseudonana, KEGG-based analysis allowed for the preliminary reconstruction of both pathways. Comparative analysis revealed conservation of key catalytic residues across species, suggesting potential functional similarity. Notably, G. sulphuraria displayed high genomic plasticity, indicating a key role in the regulation of biosynthetic pathways under varying environmental conditions.
Overall, the project provided a first characterization of the metabolic pathways related to glycolipids in extremophilic microalgae and diatoms, highlighting functional analogies among phylogenetically distinct groups and underscoring the importance of more complete genomic data for the study of biosynthetic mechanisms. The results obtained offer a solid basis for future structural and functional analyses of the enzymes involved.

A-259: Isoform Diversity of CES1 in Cancer: Structural and Prognostic Implications
Track: Function: Gene and Protein Function Annotation
  • Heshmat Borhani, University of Nottingham, Faculty of Medicine and Health Sciences, School of Life Sciences, Nottingham UK, United Kingdom
  • Samuele Di Cristofano, Sapienza University of Rome, Dep. of Mol. Medicine, Istituto Pasteur Italia - Fondazione Cenci Bolognetti, Rome, Italy, Italy
  • Daria Capece, University of L’Aquila, Department of Biotechnological and Applied Clinical Sciences, L’Aquila, Italy, Italy
  • Domenico Raimondo, Sapienza University of Rome, Dep. of Mol. Medicine, Istituto Pasteur Italia - Fondazione Cenci Bolognetti, Rome, Italy, Italy
  • Daniel D'Andrea, University of Bristol, School of Engineering Mathematics and Technology, Bristol, UK, United Kingdom


Presentation Overview: Show

Cancer involves complex changes in gene expression, including the production of distinct coding and non-coding isoforms from the same gene through alternative splicing. While splicing dysregulation contributes to cancer progression, its tissue-specific effects are poorly characterised, limiting the potential of isoform-targeted therapies to enhance drug specificity and reduce off-target effects.
We previously identified carboxylesterase/lipase 1 (CES1, UniProt-ID: P23141) as a critical NF-κB-regulated lipase that promotes cancer cell survival and adaptation to nutrient-depleted conditions in aggressive colorectal cancer. CES1 has nine annotated isoforms, with P23141-1 designated as the canonical isoform.
In this study, we integrated RNA-seq data from TCGA (cancer) and GTEx (normal) to explore the association between CES1 isoform expression and splicing dynamics across cancers. Our findings revealed that cancer and normal tissues exhibit distinct patterns of variation in coding and non-coding CES1 isoforms, suggesting a potential role of isoforms in cancer-specific regulatory mechanisms.
Furthermore, from 20 solved 3D protein structures, we focused on two isoforms, P23141-1 and P23141-3, which differ by a single residue: P23141-3 lacks a glutamine residue at position 362 (Q362) present in the canonical P23141-1. Using molecular dynamics simulations, we demonstrated that this single residue difference converts a short α-helix into a π-helix near the active site. Interestingly, patients expressing high levels of isoforms containing Q362 tend to have poorer prognoses, suggesting isoform-specific structural changes may affect tumour aggressiveness and drug sensitivity. These findings highlight a broader regulatory link between transcript diversity and cancer progression, with potential implications for targeted therapies.