Machine Learning Driven Discovery of Ribosomal Biomarkersin PCOS
Confirmed Presenter: Ashitha Washington, Computational Biology and Bioinformatics Lab, National
Institute of Technology Calicut, India
Format: In-person
Authors List: Show
- Ashitha Washington, Computational Biology and Bioinformatics Lab, National
Institute of Technology Calicut, India - Ravindra Kumar, Computational Biology and Bioinformatics Lab, National
Institute of Technology Calicut, India
Presentation Overview: Show
Polycystic ovary syndrome (PCOS) represents a multifaceted
endocrine condition marked by genetic, molecular, and
phenotypic variability. To uncover consistent
transcriptomic biomarkers and prognostic gene networks
linked to PCOS, we performed an integrative analysis of
RNA-Seq data compiled from publicly available Gene
Expression Omnibus datasets, comprising 65 PCOS cases and
61 healthy controls across diverse cell types. Data
preprocessing involved normalization followed by
differential expression analysis. Feature selection was
then performed via Elastic Net regression, effectively
managing multicollinearity and refining the feature set to
83 candidate genes for subsequent modeling.
Multiple machine learning classifiers were trained and
validated using a 60:20:20 data split, with hyperparameter
optimization to enhance predictive performance. Among
these, the Support Vector Machine (SVM) model exhibited the
highest classification capability, achieving 92.31%
accuracy on the internal validation set and an impressive
AUC of 0.98. Model explainability was strengthened using
SHAP and LIME analyses, pinpointing the most influential
genes driving model predictions. Logistic regression based
on the key gene clusters produced a prognostic framework
with an AUC of 0.82 and precision of 0.8, suggesting their
robustness as biomarkers despite PCOS heterogeneity.
Functional enrichment results revealed that these genes are
predominantly involved in RNA-binding processes, ribosomal
machinery, and immune regulation. Overall, this integrative
multi-cohort analysis coupled with advanced machine
learning provides a powerful strategy for identifying
clinically actionable biomarkers and prognostic signatures
in PCOS, offering new avenues for molecular diagnosis and
therapeutic development.
GEMINI-Mol: A SE(3)-Equivariant Diffusion Framework for
Generative Multi-Target Drug Design
Format: In person
Authors List: Show
- Michael Oluwasola, Universiti Putra Malaysia, Malaysia
- Noor Dina Muhd Noor, Universiti Putra Malaysia, Malaysia
- Thean Chor Leow, Universiti Putra Malaysia, Malaysia
Presentation Overview: Show
The generation of small molecules capable of simultaneously
modulating multiple biological targets is an emerging
frontier in polypharmacology, offering new opportunities
for treating complex and multifactorial diseases. However,
existing generative frameworks often rely on autoregressive
or fragment-based methods that inadequately capture the
intricate spatial correlations, stereochemical
dependencies, and multi-target constraints inherent in real
molecular systems. To overcome these limitations, we
introduce GEMINI-Mol, a unified deep generative framework
that combines SE(3)-equivariant transformers, diffusion
probabilistic modeling, and graph neural networks for fully
3D-aware multi-target molecular design. By enforcing
rotational and translational equivariance, GEMINI-Mol
preserves molecular geometry and stereochemistry while
modeling long-range atomic interactions underlying
multi-site binding. The architecture integrates built-in
chemical validity filters and uncertainty quantification
mechanisms to ensure the generation of physically plausible
and interpretable molecules. A progressive training
strategy enables efficient co-optimization of diffusion,
transformer, and graph modules, supported by gradient
checkpointing and optimized attention for large-scale
molecular modeling. Across comprehensive benchmarks,
GEMINI-Mol demonstrates superior performance in generating
structurally diverse, chemically sound, and drug-like
molecules with favorable predicted affinities toward
multiple protein targets. Collectively, GEMINI-Mol
establishes a next-generation generative paradigm that
unites geometric deep learning and probabilistic modeling
for rational, multitarget drug discovery.
miTarCGR: A Deep Learning Framework for miRNA TargetPrediction Using Frequency Chaos Game Representation.
Format: In person
Authors List: Show
- Somenath Dutta, Pusan National University, South Korea
- Sudipta Sardar, Pusan National University, South Korea
Presentation Overview: Show
MicroRNAs (miRNAs) are critical 22-23 nucleotide regulatory
molecules that control gene expression through the
miRNA-induced silencing complex (miRISC), influencing
diverse cellular processes including development,
differentiation, and disease progression. Despite two
decades of research, the molecular mechanisms governing
miRNA-target interactions remain incompletely understood,
with functional targeting occurring through both canonical
seed-region pairing and non-canonical mechanisms involving
complex structural interactions. Current computational
approaches for miRNA target prediction have evolved from
early heuristic methods to machine learning and deep
learning frameworks, yet most rely on one-dimensional
sequence representations that may fail to capture the
intricate spatial relationships and long-range dependencies
critical for target recognition.
We present miTarCGR, a revolutionary deep learning
framework that transforms miRNA target prediction into a
computer vision problem using Frequency Chaos Game
Representation (FCGR). This innovative approach converts
linear miRNA and target sequences into two-dimensional
graphical representations that preserve both local and
global sequence characteristics, enabling convolutional
neural networks to identify complex patterns extending
beyond simple linear complementarity. By representing both
miRNA sequences and candidate target sites as FCGR images,
miTarCGR captures compositional features, structural
motifs, and discontinuous base pairing patterns that are
difficult to detect using conventional sequence-based
methods.
Through comprehensive evaluation on multiple benchmark
datasets, miTarCGR demonstrates superior performance
compared to state-of-the-art methods in both site-level and
gene-level target prediction tasks. The framework
incorporates advanced explainability techniques to provide
interpretable insights into learned features, crucial for
advancing biological understanding of miRNA targeting
mechanisms and building confidence in computational
predictions. Our results suggest that two-dimensional
sequence representation provides a more comprehensive view
of miRNA-target interactions, potentially leading to novel
targeting mechanism discovery and improved therapeutic
target identification. This work represents a paradigm
shift in miRNA target prediction, offering a powerful tool
for understanding post-transcriptional regulation and
advancing precision medicine applications.
Integrative multi-omics QTL colocalization maps regulatoryarchitecture in aging human brain
Confirmed Presenter: Xuewei Cao, Center for Statistical Genetics, The Gertrude H. Sergievsky
Center, Columbia University, New York, NY, USA, United States
Format: Live Stream
Authors List: Show
- Xuewei Cao, Center for Statistical Genetics, The Gertrude H. Sergievsky
Center, Columbia University, New York, NY, USA, United States - Haochen Sun, Computational and Systems Biology, Sloan Kettering
Institute, Memorial Sloan Kettering Cancer Center, New
York, NY, USA, United States - Ru Feng, Center for Statistical Genetics, The Gertrude H. Sergievsky
Center, Columbia University, New York, NY, USA, United States - Rahul Mazumder, Operations Research Center, Massachusetts Institute of
Technology, Cambridge, MA, USA, United States - Carlos F Buen Abad Najar, Department of Human Genetics, University of Chicago,
Chicago, IL, USA, United States - Yang Li, Department of Human Genetics, University of Chicago,
Chicago, IL, USA, United States - Philip L. de Jager, Department of Neurology, Columbia University, New York, NY,
USA, United States - David Bennett, Rush Alzheimer’s Disease Center and Department of
Neurological Sciences, Rush University Medical Center,
Chicago, IL, United States - Kushal Dey, Computational and Systems Biology, Sloan Kettering
Institute, Memorial Sloan Kettering Cancer Center, New
York, NY, USA, United States - Gao Wang, Center for Statistical Genetics, The Gertrude H. Sergievsky
Center, Columbia University, New York, NY, USA, United States
Presentation Overview: Show
Background: Multi-trait QTL (xQTL) colocalization has shown
great promises in identifying causal variants with shared
genetic etiology across multiple molecular modalities,
contexts, and complex diseases. However, the lack of
scalable and efficient methods to integrate large-scale
multi-omics data limits deeper insights into xQTL
regulation.
Methodology: We propose ColocBoost, a multi-task learning
colocalization method that can scale to hundreds of traits,
while accounting for multiple causal variants within a
genomic region of interest. ColocBoost employs a
specialized gradient boosting framework that can adaptively
couple colocalized traits while performing causal variant
selection, thereby enhancing the detection of weaker shared
signals compared to existing pairwise and multi-trait
colocalization methods.
Results: We applied ColocBoost genome-wide to 17 gene-level
single-nucleus and bulk xQTL data from the aging brain
cortex of ROSMAP individuals (average N=595), encompassing
6 cell types, 3 brain regions and 3 molecular modalities
(expression, splicing, and protein abundance). Across
molecular xQTLs, ColocBoost identified 16,503 distinct
colocalization events, exhibiting 10.7(±0.74)-fold
enrichment for heritability across 57 complex
diseases/traits and showing strong concordance with
element-gene pairs validated by CRISPR screening assays.
When colocalized against Alzheimer’s disease (AD) GWAS,
ColocBoost identified up to 2.5-fold more distinct
colocalized loci, explaining twice the AD disease
heritability compared to fine-mapping without xQTL
integration. This improvement is largely attributable to
ColocBoost’s enhanced sensitivity in detecting gene-distal
colocalizations, as supported by strong concordance with
known enhancer-gene links, highlighting its ability to
identify biologically plausible AD susceptibility loci with
underlying regulatory mechanisms. Notably, several genes
including BLNK and CTSH showed sub-threshold associations
in GWAS, but were identified through multi-omics
colocalizations which provide new functional support for
their involvement in AD pathogenesis.
Conclusions: Overall, ColocBoost provides a novel framework
to identify colocalized disease-critical functional signals
for varying number of phenotypes. R package colocboost is
freely available on CRAN
(https://CRAN.R-project.org/package=colocboost).
Structural Folds and Donor Motifs Illuminate the Evolutionand Drug Target Potential of Ketal Pyruvyltransferases
Format: In person
Authors List: Show
- Shivani Singh, Sharda University, India
- Sunita Sharma, Sharda University, India
Presentation Overview: Show
Pyruvylation is a process in which a pyruvate group is
transferred to the sugar moiety. It is primarily found in
the majority of pathogenic and non-pathogenic species
across a vast class of bacteria, fungi, and yeast. Our
study explores the evolutionary relationship between
different classes of pyruvyltransferases across diverse
species, focusing on structural and sequence conservation
in bacteria. The structural similarity strongly resembles
the GT-A and GT-B classes of glycosyltransferases. E. coli
demonstrates a GT-B class with a higher number of
positively charged residues such as histidine, lysine, and
arginine in the binding site. We performed PCA and REMD
(Replica Exchange Molecular Dynamics) simulations, each
with 500 ns and two replicates. This approach helps
investigate the catalytic site and acceptor substrate
specificity along with the global conformational space.
Bayesian inference was implemented on a characterized set
of 59 pyruvyltransferases to construct a phylogenetic tree,
and MEME was used to study conserved motifs and residues.
By correlating structural folds, donor binding motifs, and
phylogenetic relationships, we aim to understand the
evolutionary origin of ketal pyruvyltransferases and
identify structural features that could be exploited for
inhibitor design or drug repurposing against bacterial
pathogens such as E. coli, A. baumannii, M. tuberculosis,
and B. fragilis.