Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in JST
Wednesday, October 23rd
11:45-12:45
Invited Presentation: Accelerating Bioinformatics Workflows with Interactive High-Performance Computing
Room: Small Theatre
Format: In person


Authors List: Show

Presentation Overview: Show

Bioinformatics research is advancing rapidly thanks to techniques like next-generation high-throughput sequencing, computational mass spectrometry, and computational biophysics. Automation tools also play a significant role in scaling these processes. Such surge in bioinformatics workloads has led to an unprecedented accumulation of biological data, necessitating high-performance and high-throughput computing technologies to process these massive datasets. Hardware accelerators and massively parallel heterogeneous computing systems are key to speeding up the processing of big data in high-performance environments. By enabling greater degrees of parallelism, these technologies significantly boost computational throughput. In this talk, we will explore the latest architectures that are driving the acceleration and growth in bioinformatics workflows.

13:00-13:15
Coordinated regulation by lncRNAs results in tight lncRNA–target couplings
Confirmed Presenter: Hua-Sheng Chiu, Baylor College of Medicine, United States

Room: Small Theatre
Format: In Person

Moderator(s): Tatsuya Akutsu


Authors List: Show

  • Hua-Sheng Chiu, Baylor College of Medicine, United States
  • Sonal Somvanshi, Baylor College of Medicine, United States
  • Pavel Sumazin, Baylor College of Medicine, United States

Presentation Overview: Show

The determination of long non-coding RNA (lncRNA) function is a major challenge in RNA biology with applications to basic, translational, and medical research. Our efforts to improve the accuracy of lncRNA-target inference identified lncRNAs that coordinately regulate both the transcriptional and post-transcriptional processing of their targets. Namely, these lncRNAs may regulate the transcription of their target and chaperone the resulting message until its translation, leading to tightly coupled lncRNA and target abundance. Our analysis suggested that hundreds of cancer genes are coordinately and tightly regulated by lncRNAs and that this unexplored regulatory paradigm may propagate the effects of non-coding alterations to effectively dysregulate gene expression programs. As a proof-of-principle we studied the regulation of DICER1—a cancer gene that controls microRNA biogenesis—by the lncRNA ZFAS1, showing that ZFAS1 activates DICER1 transcription and blocks its post-transcriptional repression to phenomimic and regulate DICER1 and its target microRNAs.

13:15-13:30
Identifying regulators of global chromatin accessibility using CRISPR-ATACsee-ATACseq
Confirmed Presenter: Sung-Joon Park, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Japan

Room: Small Theatre
Format: In Person

Moderator(s): Tatsuya Akutsu


Authors List: Show

  • Sung-Joon Park, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Japan
  • Yusuke Miyanari, WPI Nano Life Science Institute, Kanazawa University, Japan

Presentation Overview: Show

Accessible chromatins are essential for DNA-dependent genetic and epigenetic modifiers that impact various cellular phenotypes. Next-generation sequencing with Tn5 transposase (ATAC-seq) is widely used to understand dynamic changes in chromatin accessibility by capturing short DNA fragments, which provides information on locally accessible regions. However, the regulation of genome-wide chromatin states remains poorly understood.

We screened for molecules influencing chromatin accessibility by knocking out (KO) 19,114 human genes using CRISPR/Cas9. The effect of each gene KO in eHAP cells was assayed by combining sgRNA sequencing analysis with an optimized ATAC-see method, involving fluorescent imaging of Tn5 transposase labeled with Cy3. This screening identified 102 significant genes from day-7 post-transduction samples, most of which are TFs and components of complexes involved in chromatin organization and DNA-templated processes, e.g., Tip60, HDAC, PRMT, and MCM complexes.

We focused on novel regulators using ATAC-seq analysis with Drosophila spike-in and found diverse and specific functions in regulating local chromatin accessibility, with specific motifs enriched. Notably, KOs of CNOT3, HNRNPU, NAA10, and TFDP1 increased global chromatin accessibility, suggesting their roles as negative regulators through distinct biological pathways in wild-type cells.

We will present and discuss our efforts to understand the mechanisms regulating chromatin accessibility.

13:30-13:45
Uncovering COVID-19 severity markers through computational network biology strategy
Confirmed Presenter: Heewon Park, Sungshin Women’s University, South Korea

Room: Small Theatre
Format: In Person

Moderator(s): Tatsuya Akutsu


Authors List: Show

  • Heewon Park, Sungshin Women’s University, South Korea
  • Satoru Miyano, Tokyo Medical and Dental University, Japan

Presentation Overview: Show

Coronavirus disease 2019 (COVID-19), severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), rapidly spread worldwide. We aimed to identify COVID-19 severity-specific markers in the Japanese population using gene network analysis. We developed a novel computational network biology strategy to identify differentially regulated gene networks between severe and non-severe COVID-19 samples. Monte Carlo simulations demonstrated the effectiveness of our strategy for differential gene network analysis. We applied this method to publicly available whole blood RNA-seq data from 465 genotyped samples from the Japan COVID-19 Task Force and revealed the COVID-19 severity-specific molecular interplay. Our analysis suggests the gene network between HLA class II, CIITA, and CD74 as a COVID-19 severity specific molecular marker. Although the association between HLA class II and COVID-19 has been demonstrated, our data analysis revealed that the molecular interplay of HLA class II with its target and/or regulator is a crucial marker for COVID-19 severity. Our findings from computational network biology analysis suggests that suppression and activation of the molecular interplay between HLA class II, CIITA, and CD74 provide crucial clues to uncover the mechanisms of COVID-19 severity.

13:45-14:00
Supervised learning of enhancer-promoter specificity based on genome-wide perturbation studies highlights areas for improvement in learning
Confirmed Presenter: Mira Han, University of Nevada, Las Vegas, United States

Room: Small Theatre
Format: Live Stream

Moderator(s): Tatsuya Akutsu


Authors List: Show

  • Dylan Barth, University of Nevada, Las Vegas, United States
  • Richard Van, University of Nevada, Las Vegas, United States
  • Jonathan Cardwell, University of Colorado School of Medicine, United States
  • Mira Han, University of Nevada, Las Vegas, United States

Presentation Overview: Show

Understanding the rules that govern enhancer-driven transcription remains a central unsolved problem in genomics. Now with multiple massively parallel enhancer perturbation assays published, there are enough data that we can utilize to learn to predict enhancer-promoter relationships in a data-driven manner. We applied machine learning to one of the largest enhancer perturbation studies integrated with transcription factor and histone modification ChIP-seq. The results uncovered a discrepancy in the prediction of genome-wide data compared to data from targeted experiments. Relative strength of contact was important for prediction, confirming the basic principle of EP regulation. Novel features such as the density of the enhancers/promoters in the genomic region was found to be important, highlighting our lack of understanding on how other elements in the region contribute to the regulation. Several TF peaks were identified that improved the prediction by identifying the negatives and reducing False Positives. In summary, integrating genomic assays with enhancer perturbation studies increased the accuracy of the model, and provided novel insights into the understanding of enhancer-driven transcription.

14:00-14:15
Gene set correlation enrichment analysis for interpreting and annotating gene expression profiles
Confirmed Presenter: Chun-Yu Lin, Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Taiwan

Room: Small Theatre
Format: In Person

Moderator(s): Tatsuya Akutsu


Authors List: Show

  • Lan-Yun Chang, Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Taiwan
  • Meng-Zhan Lee, Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Taiwan
  • Yujia Wu, Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Taiwan
  • Wen-Kai Lee, Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Taiwan
  • Chia-Liang Ma, Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Taiwan
  • Jun-Mao Chang, Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Taiwan
  • Ciao-Wen Chen, Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Taiwan
  • Tzu-Chun Huang, Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Taiwan
  • Chia-Hwa Lee, School of Medical Laboratory Science and Biotechnology, Taipei Medical University, Taiwan
  • Jih-Chin Lee, Department of Otolaryngology-Head and Neck Surgery, Tri-Service General Hospital, National Defense Medical Center, Taiwan
  • Yu-Yao Tseng, Department of Food Science, Nutrition, and Nutraceutical Biotechnology, Shih Chien University, Taiwan
  • Chun-Yu Lin, Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Taiwan

Presentation Overview: Show

Pathway analysis, including nontopology-based (non-TB) and topology-based (TB) methods, is widely used to interpret the biological phenomena underlying differences in expression data between two phenotypes. By considering dependencies and interactions between genes, TB methods usually perform better than non-TB methods in identifying pathways that include closely relevant or directly causative genes for a given phenotype. However, most TB methods may be limited by incomplete pathway data used as the reference network or by difficulties in selecting appropriate reference networks for different research topics. Here, we propose a gene set correlation enrichment analysis method, Gscore, based on an expression dataset-derived coexpression network to examine whether a differentially expressed gene (DEG) list (or each of its DEGs) is associated with a known gene set. Gscore is better able to identify target pathways in 89 human disease expression datasets than eight other state-of-the-art methods and offers insight into how disease-wide and pathway-wide associations reflect clinical outcomes. When applied to RNA-seq data from COVID-19-related cells and patient samples, Gscore provided a means for studying how DEGs are implicated in COVID-19-related pathways. In summary, Gscore offers a powerful analytical approach (https://gscore.ibsb.nycu.edu.tw/) for annotating individual DEGs, DEG lists, and genome-wide expression profiles using existing biological knowledge.

14:15-14:30
SIEVE: One-stop differential expression, variability, and skewness analyses using RNA-Seq data
Confirmed Presenter: Hongxiang Li, University of Malaya, Malaysia

Room: Small Theatre
Format: In Person

Moderator(s): Tatsuya Akutsu


Authors List: Show

  • Hongxiang Li, University of Malaya, Malaysia
  • Tsung Fei Khang, University of Malaya, Malaysia

Presentation Overview: Show

RNA-Seq data analysis is commonly biased towards detecting genes that show significant differences in mean. As a result, the complexity of gene expression changes between biological conditions, such as those involving changes in variance and skewness, are frequently ignored. SIEVE is a novel statistical methodology that embraces a compositional data analysis framework that transforms discrete RNA-Seq counts to a continuous form with a distribution that is well-fitted by a skew-normal distribution. Simulation results show that, with respect to the false discovery rate and probability of Type II error, SIEVE has comparable or superior performance than its competitors for testing differences in mean and variance. Analysis of the Mayo RNA-Seq dataset for Alzheimer’s disease using SIEVE reveals that a gene set with significant expression difference in mean, variance and skewness between the control and the Alzheimer’s disease group strongly predicts a subject’s disease state. Furthermore, functional enrichment analysis shows that incorporating genes that show differential variability and skewness reveals a richer spectrum of biological aspects associated with Alzheimer’s disease. Thus, SIEVE may be a useful tool to gain systems biology understanding of the intricate changes in gene expressions in complex diseases.

14:50-15:12
Invited Presentation: Unveiling Next-Generation Data Analysis Environments: Expanding Imagination to Meet Emerging Computational Needs
Room: Small Theatre
Format: In person

Moderator(s): Junna Kawasaki


Authors List: Show

  • Rie Watanabe
15:12-15:20
A short talk from ASCS
Room: Small Theatre
Format: In person

Moderator(s): Junna Kawasaki


Authors List: Show

15:20-15:35
vOMIX-MEGA: An ultrafast and comprehensive pipeline for viral metagenomic analysis.
Confirmed Presenter: Joshua Wing Kei Ho, The University of Hong Kong and Laboratory of Data Discovery for Health, Hong Kong

Room: Small Theatre
Format: In Person

Moderator(s): Junna Kawasaki


Authors List: Show

  • Erfan Shekarriz, The University of Hong Kong and Laboratory of Data Discovery for Health, Hong Kong
  • Joshua Wing Kei Ho, The University of Hong Kong and Laboratory of Data Discovery for Health, Hong Kong

Presentation Overview: Show

Viral metagenomics research is booming with the rapid development of viral calling and taxonomic identification tools. Scalability, reproducibility, and speed of analysis remain a challenge for larger datasets when using necessary yet memory-heavy underlying software. Here we have addressed the bottleneck of viral calling by re-engineering the parallelizability of key standard software and ad hoc tasks in viral metagenomics. Our pipeline, called vOMIX-MEGA, is the first end-to-end, modular, and containerized viral metagenomic software that can take a list of SRA accessions as input and carry out all essential viral and non-viral analysis with a structured output. We compare vOMIX-MEGA to three state-of-the-art pipelines, ViroProfiler, VIRify, and Nayfach (2023) et al; Using 64 cores and >10,000 contigs we establish that vOMIX-MEGA is 10-1000X times faster than all existing workflows, and has a significantly (5-20X) less memory usage at higher CPUs (Only using ~20 GB maximum RAM). We put vOMIX-MEGA’s capabilities to the test and analyze, end-to-end, a dataset consisting of 3204 human gut microbiome samples in less than two weeks on a 64-core machine with a peak memory usage of 34 GB, a task that would otherwise take 3-12 months with existing pipelines and use approximately 110 GB - 700 GB of memory. vOMIX-MEGA is wrapped around a snakemake back-end, is deployable as a container and on the cloud, and demonstrates a much-needed computational performance enhancement in the viral metagenomics toolsets.

15:35-15:50
MGM: Microbial General Model Enhancing Contextualized Microbiome Analysis
Confirmed Presenter: Haohong Zhang, Huazhong University of Science and Technology, China

Room: Small Theatre
Format: In Person

Moderator(s): Junna Kawasaki


Authors List: Show

  • Haohong Zhang, Huazhong University of Science and Technology, China
  • Zixin Kang, Huazhong University of Science and Technology, China
  • Yuli Zhang, Huazhong University of Science and Technology, China
  • Kang Ning, Huazhong University of Science and Technology, China

Presentation Overview: Show

Microbial communities significantly impact medicine, biotechnology, and agriculture. Advanced sequencing technologies have generated extensive microbiome data, revealing substantial evolutionary and ecological patterns. Deep learning, coupled with transfer learning, has improved performance in specific microbiome tasks by leveraging large pre-trained models. However, traditional supervised learning methods struggle to capture universal patterns in microbial community data. We propose MGM, a context-aware, attention-based deep learning model, pre-trained on a dataset of 263,302 microbiome samples (Microcorpus-260K) via causal language modeling. For downstream tasks, MGM employs transfer learning by replacing the language modeling head with a task-specific head and fine-tuning the model on limited data. MGM significantly improved microbial source tracking and sample classification. Fine-tuning MGM on a longitudinal infant dataset revealed distinct keystone genera during development, with Bacteroides and Bifidobacterium showing higher attention weights in vaginal deliveries and Haemophilus in cesarean deliveries. In conclusion, leveraging self-attention and autoregressive pre-training, MGM serves as a versatile toolkit for various downstream microbiome tasks and holds potential for driving forward the frontiers of microbiome science.

15:50-16:05
Melias: Microbial genomic language model for identification of plasmid sequences
Confirmed Presenter: Chayaporn Suphavilai, Genome Institute of Singapore, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore

Room: Small Theatre
Format: In Person

Moderator(s): Junna Kawasaki


Authors List: Show

  • Chayaporn Suphavilai, Genome Institute of Singapore, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
  • Patipan Boonsimma, Genome Institute of Singapore, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
  • Evelyn Chee, School of Computing, National University of Singapore, Singapore, Singapore
  • Hatairat Yingtaweesittikul, Faculty of Science, Chiang Mai University, Thailand, Thailand
  • Kar Mun Lim, Genome Institute of Singapore, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
  • Niranjan Nagarajan, Genome Institute of Singapore, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
  • Karrie Ko, Department of Microbiology, Singapore General Hospital, Singapore, Singapore

Presentation Overview: Show

Antimicrobial resistance (AMR) is a public health threat caused by bacterial pathogens developing resistance to antibiotics. A key driver of the rapid spread of AMR is horizontal gene transfer via plasmids, which are small circular DNA molecules carrying and sharing AMR genes across bacterial cells. Standard genomic-based outbreak detection relies on genetic relatedness between chromosomes and misses the opportunity to detect the spread of AMR via plasmids.

We propose Melias, a machine-learning framework that leverages a large language foundation model trained on genomic data at scale. We introduce a plasmid classification task and utilize contig embeddings for augmented retrieval of similar plasmids, allowing the investigation of plasmid-mediated AMR outbreaks when similar plasmid sequences are detected.

The newly introduced plasmid classification model achieved 100% precision and 99.56% recall in identifying plasmid contigs. We demonstrated consistent predicted probabilities across loci when comparing the same plasmid type, suggesting that the model could capture specific plasmid sequence patterns. Finally, by utilizing the contig embeddings for augmented retrieval of similar plasmids, we identified possible plasmid-mediated outbreaks across different bacterial species.

This study demonstrated the utilization of foundation models for meaningful analysis of microbial genomes, providing valuable insights for outbreak investigations and public health benefits.

16:05-16:20
Elucidating disease-associated mechanisms triggered by pollutants using large-scale ChIP-Seq data
Confirmed Presenter: Zhaonan Zou, Kumamoto University, Japan

Room: Small Theatre
Format: In Person

Moderator(s): Junna Kawasaki


Authors List: Show

  • Zhaonan Zou, Kumamoto University, Japan
  • Shinya Oki, Kumamoto University, Japan

Presentation Overview: Show

Despite well-documented effects on human health, the action modes of environmental pollutants are incompletely understood. Transcriptome-based approaches are widely used to predict associations between chemicals and disorders. However, the molecular cues regulating gene expression remain unclear. To elucidate the action modes of pollutants, we proposed a data-mining approach combining epigenome (ATAC-Seq) and large-scale public ChIP-Seq data from ChIP-Atlas to identify transcription factors that are enriched in pollutant-induced differentially accessible genomic regions (DARs), thereby integratively regulating gene expression upon pollutant exposure. By using the proposed approach, we predicted that PM2.5 inhibit the binding of hematopoietic differentiation regulator to the genome, thereby perturbing normal blood cell differentiation and leading to immune dysfunction; and lead induces fatty liver by disrupting the normal regulation of lipid metabolism by altering hepatic circadian rhythms. Thus, our approach has the potential to reveal pivotal TFs that mediate adverse effects of pollutants, thereby facilitating the development of strategies to mitigate environmental pollution damage.

Thursday, October 24th
10:45-11:00
Landscape of Evolutionary Arms Races between Transposable Elements and KRAB-ZFP Family
Confirmed Presenter: Masato Kosuge, 1.Waseda University, 2.CBBD-OIL, AIST, Japan

Room: Small Theatre
Format: In Person

Moderator(s): Russell Schwartz


Authors List: Show

  • Masato Kosuge, 1.Waseda University, 2.CBBD-OIL, AIST, Japan
  • Jumpei Ito, The Institute of Medical Science, The University of Tokyo, Japan
  • Michiaki Hamada, 1.Waseda University, 2.CBBD-OIL, AIST, 3. Nippon Medical School, Japan

Presentation Overview: Show

Transposable elements (TEs) are mobile and parasitic DNA sequences within their host genomes. They have expanded within the host genome and have influenced the host’s evolution. Therefore, elucidating the interactions between TE and the host is crucial for understanding the host's evolution. To suppress the uncontrolled transposition of TEs, host organisms have expanded the Krüppel-associated box zinc finger proteins (KRAB-ZFP) family, which can epigenetically suppress TEs. Interestingly, some TE families have been reported to evolve to evade suppression by KRAB-ZFPs, suggesting a co-evolutionary relationship known as an evolutionary arms race. However, the extent to which this arms race has occurred across different TE families has remained unclear.
In this study, we systematically explored the evolutionary arms race between TE families and human KRAB-ZFPs by leveraging publicly available ChIP-seq data. Accordingly, we reconstructed the evolutionary arms race with KRAB-ZFPs in several endogenous retroviruses (ERV), including LTR7_HERVH, which is essential for the pluripotency of human embryonic stem cells (hESCs). Furthermore, we found that the regulatory landscape shaped by this arms race contributed to regulating gene expression. In summary, our results provide insight into the impact of the evolutionary arms race on TE families, the KRAB-ZFP family, and host gene regulatory networks.

11:00-11:15
A Protein Language Model for Exploring Viral Fitness Landscapes
Confirmed Presenter: Jumpei Ito, The Institute of Medical Science, The University of Tokyo, Japan

Room: Small Theatre
Format: In Person

Moderator(s): Russell Schwartz


Authors List: Show

  • Jumpei Ito, The Institute of Medical Science, The University of Tokyo, Japan
  • Adam Strange, The Institute of Medical Science, The University of Tokyo, Japan
  • Wei Liu, The Institute of Medical Science, The University of Tokyo, Japan
  • Gustav Joas, The Institute of Medical Science, The University of Tokyo, Japan
  • Spyros Lytras, The Institute of Medical Science, The University of Tokyo, Japan
  • Kei Sato, The Institute of Medical Science, The University of Tokyo, Japan

Presentation Overview: Show

Successively emerging SARS-CoV-2 variants lead to repeated epidemic surges through escalated spreading potential (i.e., fitness). Modeling genotype–fitness relationship, known as the fitness landscape, enables us to pinpoint the mutations boosting viral fitness and flag high-risk variants immediately after their detection. Furthermore, because viruses tend to evolve to increase their fitness, we can predict the direction of viral evolution by inferring the viral fitness landscape. Here, we introduce CoVFit, a protein language model able to predict the fitness of variants based solely on their spike protein sequences. CoVFit was trained with genotype–fitness data derived from viral genome surveillance and functional mutation data related to immune evasion. When limited to only data available before the emergence of XBB, CoVFit successfully predicted the higher fitness of the XBB lineage. Fully-trained CoVFit identified 549 fitness elevation events throughout SARS-CoV-2 evolution until late 2023. Furthermore, a CoVFit-based simulation successfully predicted that JN.1 would increase its fitness through substitutions at F456 or R346 residues. Our study provides both insight into the SARS-CoV-2 fitness landscape and a novel tool potentially transforming viral genome surveillance.

11:15-11:30
Hidden Markov Model generators of continuous-time indel processes
Room: Small Theatre
Format: In person

Moderator(s): Russell Schwartz


Authors List: Show

  • Ian Holmes, University of California, Berkeley, United States
  • Annabel Large, University of California, Berkeley, United States

Presentation Overview: Show

Realistic probabilistic models of the way proteins evolve over time have many applications in bioinformatics, phylogenetics, phylodynamics, and bioengineering. Pursuant to this need, the realism of evolutionary substitution models used in statistical phylogenetics has advanced significantly in the first quarter of this century: from ML-influenced algorithms for fitting substitution rate matrices, through variational approximations to context-dependent models, to Potts models, composite likelihoods, and deep learning methods for covariation. However, indel modeling theory has advanced more slowly. The simplest model allowing multi-character indel events is known as the General Geometric Indel (GGI) model, to which the leading approximation is Thorne, Kishino, and Felstenstein’s 1992 model (TKF92), which introduces latent variables in the sequence and consequently biases parameter estimation. Recently, a more principled (and less biased) GGI approximation was introduced, using a coarse-grained moment-fitting approach. Here, we show that this approach is sufficiently general to yield the finite-time conditional distributions of any model that has a Hidden Markov Model for its infinitesimal generator, offering a complete and self-contained derivation of the resulting differential equations. We evaluate this moment-fitting approach to TKF92 on both simulated data (as an approximation to GGI, comparing observed and empirical gap length distributions) and real data (comparing likelihoods).

11:30-11:45
Evolutionary Scenarios for the Specific Recognition of Non-homologous Endogenous Peptides by G Protein-Coupled Receptor Paralogs
Confirmed Presenter: Akira Shiraishi, Suntory Foundation for Life Sciences, Japan

Room: Small Theatre
Format: In Person

Moderator(s): Russell Schwartz


Authors List: Show

  • Akira Shiraishi, Suntory Foundation for Life Sciences, Japan
  • Azumi Wada, Suntory Foundation for Life Sciences, Japan
  • Honoo Satake, Suntory Foundation for Life Sciences, Japan

Presentation Overview: Show

Neuropeptides and peptide hormones participate in various biological events. Most peptides interact with G protein-coupled receptors (GPCRs) and these homologies of interactions are also conserved among species. In contrast, some peptides or GPCRs exhibit homology-independent interaction patterns, and the molecular mechanisms underlying this complexity of peptide-GPCR interactions remain unknown. Previously, we developed the peptide descriptor-incorporated support vector machine (SVM), to predict peptide-GPCR interactions. In this system, the peptide and GPCR sequences are converted into our originally developed descriptors. Each element of the descriptors can be mapped to the corresponding residue position in the peptide and GPCR sequences. Then, the GPCRs and peptides converted into descriptors are used as inputs to learn the peptide-GPCR interactions using linear SVM. Using the discriminant of this linear SVM, we defined the contribution of peptide and GPCR residue pairs in the interaction predictions as the "interaction determinant likelihood (IDL) score." Validation with the neurotensin-NTR1 cocrystal structure showed strong correlations between high IDL scores and key interacting residues. Furthermore, we identified novel ligand selectivity determinants for MRGPRX1 and MRGPRX2 using IDL score method, revealing species-specific ligand interaction changes among primates. Consequently, the IDL score method provide new insights into the evolution and specificity of GPCR-ligand interactions.

11:45-12:00
Probability-based sequence comparison finds the oldest-ever nuclear mitochondrial DNA segments in the human genome
Confirmed Presenter: Muyao Huang, The Department of Computational Biology and Medical Sciences, University of Tokyo, Japan

Room: Small Theatre
Format: In Person

Moderator(s): Russell Schwartz


Authors List: Show

  • Muyao Huang, The Department of Computational Biology and Medical Sciences, University of Tokyo, Japan
  • Martin C. Frith, The Department of Computational Biology and Medical Sciences, University of Tokyo, Japan

Presentation Overview: Show

The ongoing process of mitochondrial genome-derived DNA sequences being inserted into the nuclear genome occurs frequently in organism evolution and generates nuclear-mitochondrial DNA segments (NUMTs), which is a significant driving force for genome evolution. After being incorporated into the nuclear genome, some NUMTs can be conserved for long periods, adapting to perform novel cellular functions. However, current mainstream methods for investigating NUMTs have limited efficiency in detecting those ancient and highly degraded NUMTs, leading to an underestimation of their prevalence and impact. These ancient NUMTs likely play a much larger role in genetic functions than previously documented, including the acquisition of functional exons. This study aims to find ancient human NUMTs using improved high-sensitivity sequence comparison methods. Here, we established a sensitive and accurate NUMTs-searching pipeline. It predicts 1069 NUMTs in the human reference genome, 395 (37%) of which are not in the UCSC human NUMTs database. Furthermore, we discovered 89 pre-Eutherian NUMTs that are more ancient compared to previous findings, dating back at least 100 million years ago. Our study provides a comprehensive exploration of the quantity and evolutionary history of human NUMTs, paving the way for future research on endosymbiotic impact on the evolution of nuclear genomes.

12:00-12:15
PhyloFusion- Fast and easy fusion of rooted phylogenetic trees into rooted phylogenetic networks
Confirmed Presenter: Banu Cetinkaya, Institute for Bioinformatics and Medical Informatics, University of Tübingen, Germany

Room: Small Theatre
Format: In Person

Moderator(s): Russell Schwartz


Authors List: Show

  • Louxin Zhang, Department of Mathematics and Centre for Data Science and Machine Learning, National University of Singapore, Singapore
  • Banu Cetinkaya, Institute for Bioinformatics and Medical Informatics, University of Tübingen, Germany
  • Daniel Huson, Institute for Bioinformatics and Medical Informatics, University of Tübingen, Germany

Presentation Overview: Show

Phylogenetic trees typically represent the evolutionary relationships between a set of organisms. However, in evolution, reticulate events such as horizontal gene transfer and speciation by hybridization play a significant role. Rooted phylogenetic networks are more effective for accurately illustrating evolutionary histories in cases involving reticulate events. While there are methods to calculate unrooted phylogenetic networks, there is a lack of methods for fast and easily calculating rooted phylogenetic networks. Here, we present PhyloFusion, a fast and easily applicable method for calculating rooted phylogenetic networks from sets of rooted phylogenetic trees. The algorithm processes a set of input trees and produces a phylogenetic network that contains all input trees while attempting to minimize the number of hybridization events. PhyloFusion is designed to handle input trees with unresolved nodes and missing taxa. It operates efficiently on tens of trees and hundreds of taxa, making it suitable for the interactive exploration of input tree sets. One potential application is exploring rooted phylogenetic networks for sets of functionally related gene trees. In this context, we suggest that PhyloFusion would be a useful tool for explicitly representing reticulate evolutionary events. We provide an implementation of the PhyloFusion algorithm in the SplitsTree app.

12:30-13:30
Invited Presentation: Accelerating bioinformatics research with AWS
Room: Small Theatre
Format: In person

Moderator(s): Kimihiro Tohyama


Authors List: Show

  • Yusuke Toba
  • Charlie Lee
15:05-15:27
Invited Presentation: Optimizing Bioinformatics Workloads on AI and HPC Cluster: DDN Storage Solutions and Best Practices
Room: Small Theatre
Format: In person

Moderator(s): Tsuyoshi Shirai


Authors List: Show

  • Koji Tanaka
15:27-15:35
A short talk from ASCS
Room: Small Theatre
Format: In person

Moderator(s): Tsuyoshi Shirai


Authors List: Show

15:35-15:50
Optimization of RNA Inverse Folding using the differentiable McCaskill Algorithm and Basepair Probability
Confirmed Presenter: Takumi Otagaki, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Japan

Room: Small Theatre
Format: In Person

Moderator(s): Tsuyoshi Shirai


Authors List: Show

  • Takumi Otagaki, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Japan
  • Kiyoshi Asai, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Japan

Presentation Overview: Show

RNA plays a crucial role in various biological functions, and the RNA inverse folding problem is highly significant, particularly with the growing focus on RNA design for nucleic acid therapeutics. Introduced in 1990, the McCaskill algorithm calculates the partition function for RNA secondary structures, considering their thermodynamic properties. Our research introduces a differentiable formulation of the McCaskill algorithms for continuous-valued sequences. This enables differentiability of the partition function relative to the sequence, aiding design through gradient-based methods. The core of our method is optimizing the base-pair probability matrix to match a target matrix.A significant challenge is making the can_pair() function differentiable. By employing an exponential function, this function was modified, seamlessly integrating it into the McCaskill algorithm. Additionally, our method retains the DP matrix structure largely unchanged, simplifying implementation.Our approach shows high flexibility and operates under various constraints, offering new possibilities for RNA sequence design. Initial results indicate that starting from a poly-A RNA, our method can predict sequences forming secondary structures akin to the target, including stem-loops. Ongoing efforts aim to handle longer sequences and predict sequences with pseudoknots, broadening our method’s scope and applicability.

15:50-16:05
Structural Motif Detection on the Scale of the Protein Universe with Folddisco
Confirmed Presenter: Hyunbin Kim, Seoul National University, South Korea

Room: Small Theatre
Format: In Person

Moderator(s): Tsuyoshi Shirai


Authors List: Show

  • Hyunbin Kim, Seoul National University, South Korea
  • Seongeun Kim, Seoul National University, South Korea
  • Milot Mirdita, Seoul National University, South Korea
  • Martin Steinegger, Seoul National University, South Korea

Presentation Overview: Show

Protein structural motifs are short, evolutionarily-conserved patterns of atoms involved in protein functions. These motifs are usually discontinuous in sequence making them difficult to detect by structural alignment methods like Foldseek. Graph-based, disjoint segment-utilizing methods are more sensitive but computationally intense. Inverted index-based motif search, such as offered by the RCSB, provides constant search time but would require substantial storage for large predicted protein structure databases, such as the AlphaFoldDB and ESMAtlas.

Here, we present Folddisco, a novel inverted-index based method that overcomes previous methods’ limitations and allows for the first time to detect structural motifs within databases representing the whole protein universe. Folddisco minimizes the size of the inverted index, allowing the full AlphaFoldDB to fit on a single disk for protein-universe-scale motif search in a single machine. To do so, Folddisco introduces several innovations by (1) reducing inverted-index storage space by 70% by omitting location information, (2) improving precision with a novel feature for capturing side-chain orientation, (3) offering fast searching speed with a highly optimized index structure.

Folddisco is free and open source Rust software available at https://folddisco.foldseek.com.

16:05-16:20
Discovery of fold novelty and biome-specific proteins from a billion structures
Confirmed Presenter: Jingi Yeo, Seoul National University, South Korea

Room: Small Theatre
Format: In Person

Moderator(s): Tsuyoshi Shirai


Authors List: Show

  • Jingi Yeo, Seoul National University, South Korea
  • Yewon Han, School of Biological Sciences, Seoul National University, South Korea
  • Nicola Bordin, Institute of Structural and Molecular Biology, University College London, United Kingdom
  • Andy Lau, Department of Computer Science, University College London, United Kingdom
  • Hyunbin Kim, Seoul National University, South Korea
  • David Jones, University College London, United Kingdom
  • Christine Orengo, University College London, United Kingdom
  • Martin Steinegger, Seoul National University, South Korea

Presentation Overview: Show

Deciphering protein function is crucial for studying biological processes, yet the function of most environmental proteins from metagenomic experiments remains unknown. Since protein function is determined by structure, solving structures provides important insights. Following the AlphaFold2 breakthrough in structure prediction, there has been a surge in high-quality predicted structures, with over 200 and 600 million structures deposited in the AFDB and the environmental ESMAtlas, respectively.

We exploit this wealth of structures to shed light on metagenomic proteins. Using the structure aligner Foldseek, we clustered the merged AFDB-ESMAtlas into 5 million clusters, with 3.5 million being metagenomic-specific. We used the deep-learning based tools Chainsaw, Merizo and Unidoc for domain chopping and CATH domain annotation to identify novel folds and multidomain proteins (MDPs) in the metagenomic clusters. Finally, we associated the clusters with MGnify environmental annotations. Our analysis yielded three main findings: 1) 132 novel domain folds; 2) 34,853 MDPs with novel fold arrays not found in the predominantly non-metagenomic AFDB; 3) 12,912 protein structures specific to extreme biomes, such as halophilic and extreme temperature conditions. These discoveries provide new insights into protein function and adaptation to extreme conditions.

Barrio-Hernandez, Yeo, et al.. Nature, 2023

16:20-16:35
Computational insights into binding affinity of membrane protein-protein complexes and mutational effects
Confirmed Presenter: Fathima Ridha Karuvanthodikayil, Department of Biotechnology, Indian Institute of Technology Madras, Chennai, India

Room: Small Theatre
Format: In Person

Moderator(s): Tsuyoshi Shirai


Authors List: Show

  • Fathima Ridha Karuvanthodikayil, Department of Biotechnology, Indian Institute of Technology Madras, Chennai, India
  • M. Michael Gromiha, Department of Biotechnology, Indian Institute of Technology Madras, Chennai, India

Presentation Overview: Show

Membrane protein (MP) complexes play vital cellular functions which are mainly dictated by their binding affinity. Due to their intricate structure, however, binding affinity of MPs is less explored compared to globular proteins. Mutations in these complexes affect their binding affinity, as well as impair critical functions, and may lead to diseases. Despite increase in experimental affinity data in literature, they are dispersed, necessitating their compilation into a comprehensive database for further analysis. Hence, we developed the first and specific database, MPAD (https://web.iitm.ac.in/bioinfo2/mpad), which contains ~5400 experimental binding affinity data of MP complexes and their mutants along with sequence, structure, functional information, membrane-specific features, experimental conditions, and literature information. Next, we developed MPA-Pred (https://web.iitm.ac.in/bioinfo2/MPA-Pred/), the first ML-based method for predicting the affinity of novel MP complexes, which predicts with a correlation of 0.83 and a MAE of 0.88 kcal/mol in jackknife test. Further, we also developed MPA-MutPred, a novel method specific for predicting the binding affinity change upon mutation of MP complexes using an innovative strategy. Our method showed a correlation of 0.75 and MAE of 0.73 kcal/mol using jack-knife test, indicating the reliability. Thus, these resources help enhance understanding of MPs, aiding drug design and enabling further diverse analyses.

16:50-17:30
Panel: APBioNet General Assembly
Room: Small Theatre
Format: In person


Authors List: Show

Friday, October 25th
10:45-11:00
Identifying rare cell-type specific driver mutations using DNA+Protein single cell sequencing
Confirmed Presenter: Matt Field, James Cook University, Australia

Room: Small Theatre
Format: In Person

Moderator(s): Yoshihiro Yamanishi


Authors List: Show

  • Matt Field, James Cook University, Australia
  • Mandeep Singh, Immunogenomics, Garvan Institute for Medical Research, Sydney, Australia
  • Raymond Louie, University of New South Wales, Australia
  • Jerome Samir, University of New South Wales, Australia
  • Fabio Luciano, University of New South Wales, Australia
  • Chris Goodnow, Garvan Medical Institute, Australia

Presentation Overview: Show

Single cell sequencing is revolutionizing life sciences with recent work from our group demonstrating how rare driver mutations in a specific immune cell type led to the development of autoimmune disease. However identifying such variants remains largely impossible in bulk and scRNASeq and only becomes possible with deep DNA+Protein single cell sequencing. Here we describe a novel bioinformatics workflow that can identify rare cell type specific driver mutations in as few as 5 cells (0.1% of the total cells) at a timepoint prior to disease diagnosis.

The workflow annotates cells using a supervised learning approach to more accurately identify duplicates, dead cells and thus overall better reflects the gating thresholds than unsupervised methods. Variants are annotated and filters applied based on a minimum cell number, variant allele frequencies and cell-type variant enrichment scores to reduce false positive variants. Calculating the odds ratio and z-score per cell type proved critical in prioritization. Strikingly, across multiple autoimmune diseases and >40 patients to date we have identified multiple somatic lymphoma driver mutations specific to a type of immune cell. Despite the presence of these mutations, many of the patients had yet to receive definitive clinical diagnosis at the time of sampling.

11:00-11:15
Machine learning-based stratification of asymptomatic HTLV-1 Carriers: Exploring candidate Biomarkers for identifying High-risk Carriers of HTLV-1-Associated Myelopathy/ Tropical Spastic Paraparesis (HAM/TSP)
Confirmed Presenter: Md Ishtiak Rashid, Hokkaido University, Japan, Bangladesh

Room: Small Theatre
Format: In Person

Moderator(s): Yoshihiro Yamanishi


Authors List: Show

  • Md Ishtiak Rashid, Hokkaido University, Japan, Bangladesh
  • Junya Sunagawa, Hokkaido University, Japan, Japan
  • Akari Matsuki, Hokkaido University, Japan, Japan
  • Shichijo Takafumi, Kumamoto University, Japan, Japan
  • Masao Matsuoka, Kumamoto University, Japan, Japan
  • Jun-Ichirou Yasunaga, Kumamoto University, Japan, Japan
  • Shinji Nakaoka, Hokkaido University, Japan, Japan

Presentation Overview: Show

HTLV-1-associated diseases like ATL and HAM/TSP affect a subset of HTLV-1-infected individuals, with most patients remaining asymptomatic until onset, making it difficult to identify those at higher risk. Despite advancements in understanding HTLV-1 infection, distinguishing high-risk carriers prone to developing HAM/TSP remains challenging. This study leverages machine learning approach to identify and characterize high-risk carriers of HAM. Recent studies reported the immunogenic potential of mature Gag proteins, previously underexplored in HTLV-1 research. We integrated HTLV-1 antibody titers against Tax, Env, and newly recognized immunogenic Gag p15, p19, and p24 proteins to create a predictive machine-learning model. Initially, the Isolation forest anomaly detection method was applied to antibody titer data from 264 asymptomatic HTLV-1 carriers to identify anomalous data points. Subsequent classifier models capable of distinguishing between clinical subgroups (Carrier, ATL, and HAM) were developed and validated thus, used to predict anomaly samples as unseen test data. Most samples were predicted as HAM, suggesting a progressive high-risk to HAM. Comprehensive feature analysis identified Gag p15 as a novel biomarker alongside Env. Our two-tiered ML approach evaluating humoral immunity to Anti-Gag p15 and Anti-Env effectively identifies high-risk carriers and predicts HAM onset, suggesting Gag p15 as a potential immunotherapy target.

11:15-11:30
Development of Prediction Model for Optimal Treatment Selection in Cancer Therapy using Genetic Testing Data
Confirmed Presenter: Sakura Onozuka, Kyoto University, Japan

Room: Small Theatre
Format: In Person

Moderator(s): Yoshihiro Yamanishi


Authors List: Show

  • Sakura Onozuka, Kyoto University, Japan
  • Mayumi Kamada, Kitasato University, Japan
  • Yohei Harada, Kyoto University, Japan
  • Eiichiro Uchino, Kyoto University, Japan
  • Yasushi Okuno, Kyoto University, Japan

Presentation Overview: Show

In cancer treatment, precision medicine advances by testing genetic mutations to guide patient-specific treatment selection. However, accurate prediction of treatment outcomes is crucial due to the significant differences in treatment effectiveness among patients. Previous studies have faced challenges such as relying on non-routinely measured items for prediction and only predict the effects of single agents. Therefore, this study developed prediction models for multi-drug combination therapies using real-world data from the Center for Cancer Genomics and Advanced Therapeutics (C-CAT). The dataset consists of 50,981 entries combining clinical data and genetic testing results. Treatment outcomes are described according to the Response Evaluation Criteria In Solid Tumors (RECIST) criteria, and in this study, SD, PR, and CR were classified as treatment successes, while PD was classified as a failure. Using this dataset, a binary classification model employing Random Forest was constructed. Four models with different feature sets were evaluated using 5-fold cross-validation. The results showed the model with the highest accuracy demonstrated ROC-AUC of 0.694, indicating that treatment history and genetic feature are important for predicting treatment outcomes. Future efforts will focus on enhancing the feature representation of mutations and cancer types to improve accuracy and explore factors influencing treatment outcomes.

11:30-11:45
From Static to Dynamic: Harnessing Protein Conformational Changes for Drug Development
Confirmed Presenter: Nikhil Pathak, National Tsing Hua University (NTHU), Taiwan

Room: Small Theatre
Format: In Person

Moderator(s): Yoshihiro Yamanishi


Authors List: Show

  • Nikhil Pathak, National Tsing Hua University (NTHU), Taiwan
  • Chi-Yuan Kao, National Tsing Hua University (NTHU), Taiwan
  • Dai-Wei Lin, National Tsing Hua University (NTHU), Taiwan
  • Hong-Ming Tseng, National Tsing Hua University (NTHU), Taiwan
  • Lee-Wei Yang, National Tsing Hua University (NTHU), Taiwan

Presentation Overview: Show

Proteins constantly reconfigure themselves for functional reasons, however, the importance of dynamic nature of disease-relevant proteins has been underappreciated in drug discovery. We introduce "Target DynOmics", a platform integrating experimental and MD-simulated conformations of all current FDA drug targets, facilitating efficient discovery of new indications for a given drug. For 857 protein families, we curated representative apo and drug-complexed experimental structures and identified functional residues and cofactors. We performed 1 μs MD simulations and PCA to yield most populated protein conformations, which will be used for purposes including finding possible repurposing targets (and their corresponding diseases) for 5895 FDA-approved drugs. Here, we present our results of MD-derived conformers for NSP16 from SARS-CoV-2 and DPP4 involved in diabetes. Our 144 drug-complexed protein structures reveal true binding poses of FDA drugs, enabling a log-odds (LOD) score to identify true-binder poses and decoys, leading to the automated drug screening pipeline DRDOCK. DRDOCK calculates specific features for docked poses to derive feature distributions of true-binders and decoys. We trained and validated DRDOCK using datasets derived from the 144 complex structures. In summary, Target DynOmics enables direct use of protein conformers in drug development, providing benchmarks for accurate scoring functions and AI model building.