Return to ISMB/ECCB 2025 Homepage Click here for the abridged agenda

Schedule for UK

NOTE: Browser resolution may limit the width of the agenda and you may need to scroll the iframe to see additional columns.
Click the buttons below to download your current table in that format

Date	Start Time	End Time	Room	Track	Title	Confrimed Presenter	Format	Authors	Abstract
2025-07-21	11:20:00	12:00:00	01B	Bioinformatics in the UK	Molecular Digitisation and Biodiversity Bioinformatics	Paul Kersey	In person	Paul Kersey	Biological collections (such as herbarium and fungarium specimens) are the prescurors of modern biobanks; the defining types of taxonomic concepts; together with their associated metadata, a record of what lifeforms were found where and when; there; and increasingly, a physical reference from which molecular data can be extracted. Digitisation of specimen images and metadata, and molecular characterisation through DNA sequencing, are making historic collections newly relevant to contemporary scientific questions. Although DNA degrades with age, it is still possible to obtain significant information about the phylogenetic placement and gene content of many specimens. In this talk, Dr. Kersey will present three large-scale sequencing projects that utilise the collections of the Royal Botanic Gardens, Kew: the Plant and Fungal Trees of Life project, the Darwin Tree of Life project, and the Fungarium Sequencing project. He will discuss the data they are generating, the challenges these raise and the opportunities these present, and the changing role of collections in the scientific community as biodiversity science becomes a big data field.
2025-07-21	12:00:00	12:20:00	01B	Bioinformatics in the UK	KnetMiner for Smarter Science: Leveraging Knowledge Graphs & LLMs for Productive Gene Research	Arne De Klerk	In person	Arne De Klerk, Marco Brandizi, Sam Holegar, Alex Warr, Sardor Asatillaev, Keywan Hassani-Pak	In the interpretation of high‑throughput genomic data, the identification of candidate genes underlying differential expression or genome‑wide association study (GWAS) signals remains a major challenge. Here, we describe recent enhancements to the KnetMiner platform, which integrates knowledge mining, large language models (LLMs) and retrieval‑augmented generation (RAG) to accelerate gene discovery. KnetMiner constructs a comprehensive knowledge graph by integrating curated ontologies, structured databases and literature‑derived relationships. Upon input of a gene list or genomic loci, semantic queries extract relevant subgraphs that are transformed into context‑aware prompts for an LLM. Through RAG, the model retrieves supporting evidence from external sources - including publications and functional annotations - to produce gene summaries and prioritisation scores. We will present the platform's modular architecture and real use cases of KnetMiner assisting scientists in mining for candidate genes for complex traits in wheat and other crops.
2025-07-21	12:20:00	12:40:00	01B	Bioinformatics in the UK	Lichen Cell Atlas: Tools for exploring photosymbiotic associations	Ellen Cameron	In person	Ellen Cameron, Gulnara Tagirdzhanova, Nicholas Talbot, Robert Finn, Mark Blaxter, Irene Papatheodorou	Photosymbiotic associations are partnerships where one partner is photosynthetic, termed the photobiont. Such associations span the eukaryotic tree of life from fungi to metazoans. Certain fungi (lichens) have evolved to live as a symbiont dependent on energy from a photobiont. Lichens uniquely produce an anatomically complex structure through the symbiotic association between fungi and algae which resembles neither symbiont independently but instead resembles a multicellular organism. Co-evolution of such symbioses is likely underpinned by molecular interactions which has previously been characterized using bulk sequencing approaches. However, bulk sequencing fails to capture diversity of individual symbionts preventing the exploration of functional differentiation and cell types found in symbiotic associations. Single-nucleus RNA sequencing (snRNAseq) provides a high-resolution tool to investigate symbiont cellular heterogeneity and further inform roles of symbionts, cell-cell communication, and complex tissue differentiation. Many questions and challenges remain when working with biodiverse data types including: how to define ‘cell types’ in biological systems that form through interaction of simple microbial symbionts? In this project, we present the first cell atlas for a lichen, Xanthoria parietina, using snRNAseq data and a computational framework for studying symbiotic partnerships at the single nuclear level. Functional clusters for fungal and algal symbionts corresponding to key biological functions including carbohydrate transport and photosynthesis were generated suggestive of ‘cell types’. As more cell atlases are generated for mutualistic symbiotic associations (e.g., corals), future cross-species comparisons will enable identification of potential functional conservation and new understanding of the evolution of symbioses across the eukaryotic tree of life.
2025-07-21	12:40:00	13:00:00	01B	Bioinformatics in the UK	How can emerging AI technologies benefit multi-omic analysis for crop and soil sustainability	Laura-Jayne Gardiner	In person	TBC , Laura-Jayne Gardiner	The field of AI is rapidly evolving. At IBM Research (UK), our work in molecular biology has developed from a focus on bioinformatics and computational biology, to include classic Machine Learning (ML), Deep Learning (DL), and now emerging AI-based technologies such as Foundational models (FMs) and Agentic AI. FMs are large, pre-trained models that can be adapted to various tasks, they have revolutionised AI and provide opportunities to accelerate discovery in the multi-omics domain. Here we take you on our technological journey via a range of application use-cases relating to our sustainability work in crop and soil multi-omics, where we are harnessing AI-based technology to identify genetic and molecular targets for disease, disease management and other key phenotypes. Our current toolkit for the analysis of crop genomes and soil metagenomes includes this full stack of bioinformatics and AI-based technologies, and we show that, in combination, these approaches can more effectively guide biological discovery for target identification. What sets us apart is our combination of: (1) cutting edge AI and multi-modal datasets (2) the interdisciplinary nature of our research team including biologists, computational biologists, bioinformaticians, mathematicians, computational scientists and AI specialists and (3) our end-user translational focus that provides a testbed for AI application. In the UK we work with a series of industrial partners focused on healthcare and life sciences within a unique collaborative centre led by IBM Research and STFC, called the Hartree National Centre for Digital Innovation.
2025-07-21	14:00:00	14:20:00	01B	Bioinformatics in the UK	Civic Data-Driven Innovation for Global Health and AI for All	Iain Buchan	In person	Iain Buchan	Professor Buchan will explore how some of the world’s most pressing public health challenges might best be tackled with a global network of learning health systems – across 3 levels – linking patient, provider and population level insights and actions. He will give examples from Liverpool’s Covid-19 responses as to how the data-action gap was narrowed sufficiently to create the flywheel effect needed for ‘learning health systems’. For example, data-driven deployment and rapid evaluation of the world’s first voluntary mass testing with lateral flow devices, which reduced Covid-19 cases by a fifth and hospitalisation by a quarter. The underlying data science and engineering has persisted in Liverpool as core business after the pandemic via an NHS “Data into Action” programme and University-hosted Civic Health Innovation Labs (CHIL). Professor Buchan will argue that society needs civic clusters of health and care services, academic and industry partners to train and test AIs in a systemic way, networked to allow health systems to borrow strength from each other automatically. Further, Professor Buchan will argue the need for a standard, interactive form of digital self – the Health Avatar – that improves self-care, provider services and science synergistically. He will explore ways that avatar/AI-driven bio-sampling might work at scale – showing that prevention, precision, payer-value need common data and interoperable AI to radically improve healthcare.
2025-07-21	14:20:00	14:30:00	01B	Bioinformatics in the UK	Health data organisation and landscape across the UK	Emily Jefferson	In person	Emily Jefferson	Professor Jefferson will explore the key challenges and opportunities in the field of clinical and health data science. Drawing on her transition from bioinformatics to health data science, she will share insights from her career journey. Her presentation will highlight how the Five Safes framework is used to protect data confidentiality and explain the role of Trusted Research Environments (TREs) and Secure Data Environments (SDEs) in enabling secure access to sensitive data. She will also outline essential resources for navigating the health data ecosystem, including Data Discovery tools and Information Governance processes. As bioinformatics and health informatics increasingly converge, there is a growing need for bioinformaticians to engage in this interdisciplinary domain.
2025-07-21	14:30:00	15:00:00	01B	Bioinformatics in the UK	The COALESCE study	Angela Wood	In person	Angela Wood	Professor Wood will present a flagship health data research programme, showcasing how national-scale data assets and analytical infrastructure have enabled transformative COVID-19 research in the UK. Her talk will focus on the integration of linked electronic health records (EHRs) covering more than 67 million individuals in England. These datasets span primary and secondary care, mortality records, medication dispensing, COVID-19 vaccinations, specialist audits, and environmental exposures. The presentation will also delve into the technical and governance foundations that support this work—including reproducible data curation and analysis pipelines, use of open-source tools, and patient and public involvement to promote transparent and trustworthy population-scale research. Professor Wood will highlight the impact of the collaborative research initiative the COALESCE study, which conducted the first UK-wide meta-analysis on COVID-19 undervaccination. This study revealed a clear association between undervaccination and increased risks of hospitalisation and death from COVID-19. This session is designed to inform and inspire researchers, analysts, and policymakers about the future of health data science in the UK—emphasising its transformative potential, ethical foundations, and collaborative spirit.
2025-07-21	15:00:00	15:20:00	01B	Bioinformatics in the UK	TRE-FX platform for federated analytics of sensitive data in Trusted Research Environments	Tim Beck	In person	Tim Beck, Justin Biddle, Jonathan Couldridge, Grazziela Figueredo, Alexander Hambley, Alexandra Lee, Hazel Lockhart-Jones, Douglas Lowe, Chris Orton, Vasiliki Panagi, Andy Rae, Stian Soiland-Reyes, Simon Thompson, Carole Goble, Philip Quinlan	A Trusted Research Environment (TRE) is a highly secure computer system that allows sensitive data from different sources to be combined, de-personalised and then made available for approved researchers to analyse within a secure virtual environment. The TRE-FX platform enables secure cross-TRE federated analytics. Federated analytics is where the data does not move, but instead the computer code the researchers write is sent to the data. There are many different software tools available for federated analytics, but most have not been designed for use within the considerable technical constraints of TREs. TRE-FX separates the different logical stages of a federated project, to support the use of analytics software within these environments. TRE-FX also uses international open standards from the Global Alliance for Genomics and Health (GA4GH) and ELIXIR to provide a solution for the remote execution of reproducible federated analyses. The Task Execution Service from GA4GH is used to receive federated analysis requests into the TRE. RO-Crate from ELIXIR is used in a standards framework to create the FAIR (Findable, Accessible, Interoperable, Reusable) metadata needed for reproducing analyses across TRE networks. The Five Safes RO-Crate encapsulates the metadata required for the exchange and review of analysis requests and results. This supports the disclosure control processes of TREs, where all outputs/results must be checked to ensure no disclosive data leaves a TRE. TRE-FX tools are being adapted and used across diverse UK and international projects, including DARE UK TREvolution, EOSC-ENTRUST, HDR UK Federated Analytics and ELIXIR Fed-A-Crate.
2025-07-21	15:20:00	15:40:00	01B	Bioinformatics in the UK	SurvivEHR: A Primary Care Foundation Prediction Model for Multiple Long-Term Conditions	Charles Gadd	In person	Charles Gadd, Francesca Crowe, Krishnarajah Nirantharakumar, Christopher Yau	We present SurvivEHR, a foundation model for time-to-event prediction using Electronic Health Records (EHR), based on the Generative Pre-trained Transformer (GPT) architecture. The model is trained on 23 million patient records from the UK Clinical Practice Research Datalink (CPRD), encompassing longitudinal primary care data. In total, 7.6 billion recorded event across patient timelines are used, with each represented as a tuple comprising: (i) a categorical event index (a unique combination of ICD-10 codes), (ii) an associated numerical value (e.g. measurement), and (iii) the event time (days to/since birth) SurvivEHR follows a pretrain-finetune paradigm: it first learns generalisable clinical representations from large-scale EHR data, and is then fine-tuned for specific prediction tasks such as forecasting future diagnoses, lab values, or mortality risk. This enables SurvivEHR to perform time-to-event forecasting, providing personalised forecasts for risk of future diagnoses, measurements, tests, and death. We further demonstrate that SurvivEHR supports strong transfer learning, and can be used as a Foundation Model for clinical prediction modelling on a number of case study examples. This work is motivated by the growing burden of Multiple Long-Term Conditions (MLTCs), also referred to as multimorbidity, as the prevalence of individuals living with two or more chronic conditions continues to rise. This shift is largely driven by an ageing population and advances in medical care that have extended life expectancy, resulting in more people living longer with chronic diseases. MLTCs are associated with poorer health outcomes, reduced quality of life, increased healthcare costs, and higher rates of hospitalisation and mortality.
2025-07-21	15:40:00	15:50:00	01B	Bioinformatics in the UK	Pathogen Analysis System (PAS): A Scalable Genomic Data Processing Framework Integrated with the European Nucleotide Archive	David Yuan	In person	Eugene Ivanov, Senthilnathan Vijayaraja, Tony Burdett, David Yuan	The Pathogen Data Network (PDN) consists of two interconnected networks: local private data hubs for public health and global public knowledge bases for genomic research. The Pathogen Analysis System (PAS) in PDN is a computing platform to process pathogen data by integrating with the European Nucleotide Archive (ENA). PAS retrieves input data from ENA, processes it using customizable bioinformatics pipelines, and submits results back to ENA, which may then be highlighted in a Pathogens Portal. PAS supports four use cases: 1. Public data on PDN’s priority list (ENA-sourced). 2. Public data not on the priority list (ENA-sourced). 3. Private data stored in ENA. 4. Data uploaded directly to Galaxy Community Hub, bypassing ENA. The system leverages tools like the ENA File Downloader and Webin CLI for Submission, wrapped as Galaxy tools, to automate workflows. A reference implementation using the Tuberculosis (TB) Variant Analysis Workflow demonstrates how researchers can retrieve data, analyze it, and submit results seamlessly. Developed in collaboration with GA4GH, ENA, EVORA, and BRC Analytics, PAS operates on a layered architecture, enabling scalable, automated analysis. Researchers can plug in their own pipelines or reuse existing ones, benefiting from Galaxy’s computing power without manual data handling. While UC1 is pathogen-specific, UC2-UC4 are generic, making PAS adaptable for any genomic data analysis integrated with ENA. The system also aligns with FAIR principles, with plans to share workflows via WorkflowHub and package them in RO-Crates for improved reproducibility and accessibility. This framework enhances collaborative research, enabling rapid data sharing and analysis.
2025-07-21	15:50:00	16:00:00	01B	Bioinformatics in the UK	AI in Histopathology Explorer for comprehensive analysis of the evolving AI landscape in histopathology	Heba Sailem	In person	Yingrui Ma, Shivprasad Jamdade, Lakshmi Konduri, Heba Sailem	Digital pathology and artificial intelligence (AI) hold immense transformative potential to revolutionize cancer diagnostics, treatment outcomes, and biomarker discovery. Gaining a deeper understanding of deep learning algorithm methods applied to histopathological data and evaluating their performance on different tasks is crucial for developing the next generation of AI technologies. To this end, we developed AI in Histopathology Explorer (HistoPathExplorer); an interactive dashboard with intelligent tools available at www.histopathexpo.ai. This real-time online resource enables users, including researchers, decision-makers, and various stakeholders, to assess the current landscape of AI applications for specific clinical tasks, analyze their performance, and explore the factors influencing their translation into practice. Moreover, a quality index was defined to evaluate the comprehensiveness of methodological details in published AI methods. HistoPathExplorer highlights opportunities and challenges for AI in histopathology, and offers a valuable resource for creating more effective methods and shaping strategies and guidelines for translating digital pathology applications into clinical practice.
2025-07-21	16:40:00	17:00:00	01B	Bioinformatics in the UK	Generative machine learning to model cellular perturbations	Mo Lotfollahi	In person	Mo Lotfollahi	The field of cellular biology has long sought to understand the intricate mechanisms that govern cellular responses to various perturbations, be they chemical, physical, or biological. Traditional experimental approaches, while invaluable, often face limitations in scalability and throughput, especially when exploring the vast combinatorial space of potential cellular states. Enter generative machine learning that has shown exceptional promise in modelling complex biological systems. This talk will highlight recent successes, address the challenges and limitations of current models, and discuss the future direction of this exciting interdisciplinary field. Through examples of practical applications, we will illustrate the transformative potential of generative ML in advancing our understanding of cellular perturbations and in shaping the future of biomedical research.
2025-07-21	17:00:00	17:10:00	01B	Bioinformatics in the UK	Building the world’s largest, ethically-sourced database of biological information to pioneer a new class of foundational AI models	Carla Greco	In person	Carla Greco, William Chow	Foundational AI models are revolutionizing many fields, but their effectiveness hinges on the availability of high-quality, diverse data. Without a robust and varied dataset, even the most sophisticated models can fall short of their potential, lacking the necessary breadth to generalize accurately and provide meaningful insights. Whilst public databases provide an immeasurable valuable resource, it does host geographical and taxonomic biases as well as limitations in environmental and genomic context that can hinder the potential when building these models. Here we present BaseData™, Basecamp Research’s central data asset, that is far larger, more diverse, and more richly contextualized than any public dataset, and fully grounded on ethical access and benefit sharing agreements that ensure clear commercial use for the data. This dataset continues to grow at pace, supported by a self-reinforcing data supply chain that spans over 120 locations in over 25 countries, reaching over half of all known biomes from volcanoes on islands to ice fields in Antarctica. BaseData™ has expanded classes of biological systems and biomolecules by 100 times compared to public databases, and this unprecedented novelty and diversity significantly strengthens AI models, enabling them to tackle highly complex tasks. BaseData has already shown to help build tools to improve predictive and generative tasks such as solving protein structures (BaseFold), functional enzyme annotation (Hifi-NN), and making enzyme design programmable (ZymCTRL). We believe this novel and diverse dataset opens the door to truly revolutionary advances in biological AI in the near future.
2025-07-21	17:10:00	17:20:00	01B	Bioinformatics in the UK	Multimodal generative machine learning for non-clinical safety evaluations in drug discovery and development	Arijit Patra	In person	Arijit Patra	In the evolving landscape of pharmaceutical drug development and a constant reimagination of the Ideas to Patient journey, the integration of multimodal foundation models, generative machine learning, and AI-driven chatbot interfaces has marked a significant leap forward. This talk will present our recent work in building and deploying multimodal foundation models specifically designed for drug toxicity evaluations, as well as the production of intuitive chatbot interfaces to facilitate knowledge discovery and human-in-the-loop assessments throughout the process. Our multimodal foundation models are engineered to process and interpret a variety of data types, including molecular structures, biological assay results, and textual data from scientific literature, along with imaging and computational toxicity data from non-clinical safety studies and toxicologic pathology processes. By harnessing these models, we have enhanced our ability to predict and evaluate potential drug toxicities early in the development pipeline. This not only accelerates the identification of safer drug candidates but also substantially reduces the costs and time associated with traditional toxicity testing methods. In parallel, we have developed and productionized sophisticated chatbot interfaces that serve as powerful tools for knowledge discovery. These chatbots enable teams to interact seamlessly with complex datasets and analytical tools, democratizing access to critical insights and fostering a more collaborative research environment. The chatbots are designed to understand and respond to natural language queries, making advanced data analysis accessible to users regardless of their technical expertise. This talk will showcase applications and case studies where our multimodal foundation models and chatbot interfaces have been successfully implemented. We will discuss the potential impact of these technologies on non-clinical safety evaluations, and improvements in accuracy, efficiency, and decision-making processes. Additionally, we will explore the challenges encountered during development and deployment, as well as the future directions and potential expansions of these innovative tools.
2025-07-21	17:20:00	17:30:00	01B	Bioinformatics in the UK	CUPiD: a machine learning approach for determining tissue-of-origin in cancers of unknown primary from cell-free DNA methylation profiles	Steven M. Hill	In person	Alicia-Marie Conway, Simon P. Pearce, Alexandra Clipson, Steven M. Hill, Holly Cassell, Vsevolod J. Makeev, Caroline Dive, Natalie Cook, Dominic G. Rothwell	Patients with cancer of unknown primary (CUP) present with metastases, but without an identifiable primary tumour and typically have poor outcomes. Determining the primary site enables access to type-specific therapies, potentially leading to therapeutic benefit. We developed CUPiD, a machine learning classifier for non-invasively and accurately determining CUP tissue-of-origin (TOO) from cell-free DNA (cfDNA) methylation profiles obtained from a blood sample. We used a data augmentation strategy that enabled data arising from over 9,000 TCGA tumour tissue samples (rather than cfDNA samples) to be leveraged for classifier training. To generate the training dataset, we performed in silico mixing of reads from tumour tissue DNA with reads from non-cancer control (NCC) cfDNA, to mimic low tumour content of cfDNA, resulting in 276,108 in silico mixture samples, across 29 cancer types. These were used to train an ensemble of 100 XGBoost classifiers for TOO determination from cfDNA, with an AUROC of 0.98 on held-out mixture samples. We further tested CUPiD on 143 cfDNA samples from patients with metastatic cancer of known type, giving an overall multi-class sensitivity of 85% and TOO accuracy of 97%. In an additional cohort of 41 patients with CUP, CUPiD predictions were made in 32/41 (78%) cases, with 88% of the predictions clinically consistent with a subsequent or suspected primary tumour diagnosis. Our data demonstrate that CUPiD can accurately predict TOO from a single liquid biopsy and has potential to support treatment stratification and improve clinical outcomes for a significant proportion of patients with CUP.
2025-07-21	17:30:00	17:35:00	01B	Bioinformatics in the UK	Introduction to AIBIO UK	Charlie Harrison		Charlie Harrison	AIBIO-UK is a UKRI BBSRC-funded network to support and enhance engagement between the Biosciences and AI communities in the UK. The network supports community events, funding of pilot projects and creation of resources for supporting use of AI in the biosciences.
2025-07-21	17:35:00	18:00:00	01B	Bioinformatics in the UK	AI: Careers and trajectories			Mo Lotfollahi, Lucia Marucci, Gabriella Rustici, Harpreet Saini	TBC

- top -