Attention Presenters - please review the Speaker Information Page available here

This special session is being held outside the ISCB virtual platform.

For more information, and to register, visit http://tinyurl.com/geapril2025

Schedule subject to change
All times listed are in SAST
Thursday, April 17th
9:00-9:10
Invited Presentation: Overview of EMBL-EBI
Confirmed Presenter: Johanna McEntyre

Format: Live Stream

Moderator(s): Matthew Thakur


Authors List: Show

  • Johanna McEntyre

Presentation Overview: Show

The European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI) is among the world’s leading sources of public biomolecular data. Based at the Wellcome Genome Campus in Hinxton, UK, EMBL-EBI is one of six sites of the European Molecular Biology Laboratory, Europe’s only intergovernmental life sciences organization. In 2024 EMBL-EBI received 123 million requests for data on an average day, from 42 million unique users a year, from every country. EMBL-EBI now manages data contributed by researchers around the globe, totalling close to close to 0.5 Exabytes overall. EMBL-EBI is unique in its range of about 40 data resources hosted, spanning biological scales and modalities. These include small molecules, DNA sequences (genes and genomes), information about proteins and macromolecular structures, biological images, pathways, ontologies and the research literature. The expert-curated data at scale, relevant to life science fields from precision medicine to biodiversity, is the foundation for developing innovative technologies and new AI tools. For example, the tremendous impact of AlphaFoldDB protein structure predictions, will enable unprecedented acceleration of discovery, finding new solutions to global challenge

9:10-9:20
Invited Presentation: Overview of EMBL-EBI Global Engagement programme
Confirmed Presenter: ThankGod Ebenezer

Format: Live Stream

Moderator(s): Matthew Thakur


Authors List: Show

  • ThankGod Ebenezer

Presentation Overview: Show

The European Bioinformatics Institute (EMBL-EBI) maintains and operates one of the widely used biodata resources for life science which has significance in human health, agriculture, and biodiversity. These resources are enhanced by scientists globally through data submission, retrieval, curation, co-developments, partnerships, community-based research, and service delivery. There are heavy contributions and use of biodata resource in high income settings and opportunities exist to maximise use in Africa. We are establishing the EMBL-EBI Global Engagement programme to increase these opportunities, involving a series of awareness activities to better understand technical and non-technical barriers and solutions to biodata use and contribution.

9:20-9:30
Panel: Q/A and discussion session
Format: In person

Moderator(s): Matthew Thakur


Authors List: Show

  • All audience
9:30-9:45
Invited Presentation: Phenotype data collection, harmonisation, management and standards in the DS-I Africa context
Format: In person

Moderator(s): Kim Gurwirtz


Authors List: Show

  • Katherine Johnston
9:45-10:00
Invited Presentation: The European Nucelotide Archive: open digitial sequence information
Confirmed Presenter: Colman O'Cathail

Format: Live Stream

Moderator(s): Kim Gurwirtz


Authors List: Show

  • Colman O'Cathail

Presentation Overview: Show

The European Nucleotide Archive (ENA) is an open, supported platform for the management, sharing, integration, archiving and dissemination of sequence data. The ENA comprises both the globally comprehensive data resource that preserves the world’s public-domain output of sequence data and a rich portfolio of tools and services to support the management of sequence data. In this talk, the scope of what the ENA is, what it offers, it's interfaces, tools and services will be presented. Further, the ENA will also discuss it's collaboration with nucleotide databases within the International Nucelotide Sequence Database Collaboration (INSDC) and the ambition of including more databases in this intiative.

10:15-10:30
Invited Presentation: Sequence data types and the role of MGnify on data management and FAIRification
Confirmed Presenter: Alexander Rogers

Format: Live Stream

Moderator(s): Kim Gurwirtz


Authors List: Show

  • Alexander Rogers

Presentation Overview: Show

MGnify is EMBL-EBI's resource for the assembly, analysis and archiving of microbiome-derived sequencing datasets. Interoperable with the European Nucleotide Archive and wider INSDC archival network, MGnify offers the community on-demand metagenomic analysis pipelines as well as metagenomic assembly capabilities, through web requests and larger direct collaborations on globally critical projects like microbial biodiversity, health, and food security. MGnify's open source data pipelines derive and analyse data products from open access data repositories, and in turn produce open datasets available to the scientific community. These data products are only possible thanks to high performance compute infrastructure, but the analysis services and results are freely available to users whether or not they have these compute resources themselves. Increasingly, some of these resources are available through advancing technology and cloud availability. This is enabling upstream, parallel, and downstream analyses – all potentially tailored for specific scientific goals or environments – to be performed elsewhere. In this talk, we will describe MGnify and highlight the approaches we have taken, and are taking, to make microbiome data and pipelines FAIR.

10:30-10:50
Panel: All speakers
Format: In person

Moderator(s): Kim Gurwirtz


Authors List: Show

  • Q&A, session
11:05-11:20
Invited Presentation: Comparative Genomics: Case studies in two groups of economically important marine finfish
Confirmed Presenter: Clint Rhode

Format: Live Stream

Moderator(s): ThankGod Ebenezer


Authors List: Show

  • Clint Rhode

Presentation Overview: Show

Fish represents the oldest and most species-rich group of vertebrates with many biological and taxonomic complexities. Many fish species are also economically important in fisheries and aquaculture. Thus, understanding the genetic composition of such species can facilitate optimal fisheries and conservation management and enable genetic improvement through selective breeding under aquaculture. Yellowtail (Seriola spp.) and dusky kob (Argyrosomus japonicus) are globally distributed, cosmopolitan species, with importance in both fisheries and aquaculture. However, there is debate surrounding the taxonomic status within the respective species groups. In both groups, there are known morphological and life history differences between populations from different geographic regions; in the case of Seriola the group has been divided into multiple species, whilst Argyrosomus japonicus remains defined as a single global species. A comparative genomic analysis was applied to assess the genomic statuses of the respective species’ conspecifics. Comparing the South African dusky kob to the Chinese conspecific highlighted that the two genomes were similar in size, with similar repetitive element profiles and shared a core gene set. However, the Chinese genome had significantly more unique genes, and overall genomic similarity was only 92%. On the contrary, for the Seriola species, comparing the South African S. lalandi to the Australian conspecific (99.6%), and two other Pacific species S. aureovittata (99.1%) and S. dorsalis (98.8) revealed very low divergence between the genomes. However, despite high nucleotide similarity, significant functional genic diversification, likely driven by local adaptation, was evident. The genomic evidence therefore suggests that there might be more justification to split dusky kob into multiple species, whilst Seriola might be different ecotypes of the same species.

11:20-11:35
Invited Presentation: Overview of Ensembl data and training
Confirmed Presenter: Aleena Mushtaq, EMBL-EBI, United Kingdom

Format: In person

Moderator(s): ThankGod Ebenezer


Authors List: Show

  • Aleena Mushtaq, EMBL-EBI, United Kingdom

Presentation Overview: Show

Ensembl provides a genome browser that acts as a single point of access to annotated genomes for vertebrate and non-vertebrate species. These data include human diversity datasets from five African populations from the International Genome Sample Resource, malaria and yellow fever mosquito vectors, as well as plant and livestock species relevant to African agriculture.

The Ensembl Outreach team offers an extensive training programme including in-person and virtual courses across the world. We continue to deliver virtual courses with open registration for participants across the globe while also collaborating with host organisations to provide training for specific communities, tailored to their needs and interests. Over the past year, we have delivered virtual workshops to over 1000 participants across Africa and 10 in-person workshops in Nigeria, South Africa and Zimbabwe.

We have recently developed our virtual train the trainer workshop to empower participants to teach their own Ensembl workshop at their home institute. Course materials are distributed through our training site (https://training.ensembl.org/) and virtual workshops are recorded and hosted on YouTube to share with participants and the wider community.

11:35-11:50
Invited Presentation: Comparative genomics on public genomics data with Ensembl
Confirmed Presenter: Jitender Cheema

Format: Live Stream

Moderator(s): ThankGod Ebenezer


Authors List: Show

  • Jitender Cheema
12:10-12:25
Panel: Presenting and showcasing examples/prototypes for low resource optimisation - UniProt
Format: In person

Moderator(s): ThankGod Ebenezer


Authors List: Show

  • Vishal Joshi
Invited Presentation: Presenting and showcasing examples/prototypes for low resource optimisation - UniProt
Confirmed Presenter: Daniel Rice, EBI, UK

Format: In person

Moderator(s): ThankGod Ebenezer


Authors List: Show

  • Daniel Rice, EBI, UK
  • Vishal Joshi, EBI, UK

Presentation Overview: Show

In the context of EBI's Global Engagement initiative, we explore ways to optimize access to UniProt’s protein knowledge for users in low-resource settings. Our work focuses on removing technical barriers and distilling content to reduce data download requirements, enabling more efficient access to essential protein information. Additionally, we leverage large language models (LLMs) to summarize protein knowledge from rich UniProtKB flat files, making it more accessible to individuals with limited internet connectivity or using lower-end devices such as low-cost smartphones or older computers.