X-MAP: Explainable AI Platform for Genetic Variant Interpretation
Confirmed Presenter: Marco Anteghini, BioFolD Unit, Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy, Italy
Room: 04AB
Format: In person
Moderator(s): Emidio Capriotti
Authors List: Show
- Marco Anteghini, BioFolD Unit, Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy, Italy
- Andrea Zauli, BioFolD Unit, Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy, Italy
- Emidio Capriotti, BioFolD Unit, Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy, Italy
Presentation Overview: Show
Genetic variants, particularly missense mutations, can significantly affect protein function and contribute to disease development. Methods like CADD and AlphaMissense are widely used for pathogenicity prediction; however, their integration into existing resources remains limited due to compatibility issues and high computational demands.
We introduce X-MAP, an integrated platform that leverages protein language models to enhance variant effect prediction through a novel embedding-based strategy. This approach captures both local and global protein features, enabling more accurate interpretation of mutation impacts.
Our method generates embeddings for entire protein sequences using multiple state-of-the-art models—ESM2, ESMC, and ESM1v—and extracts contextual information around mutation sites using a dynamic window of four residues on each side. This window size was empirically optimized to balance detailed local structure with computational efficiency
We evaluated both concatenation and difference-based embedding strategies using rigorous 10-fold cross-validation with XGBoost classifiers on a large dataset of 71,595 genetic variants across 12,666 human proteins. Among all methods, the ESMC concatenation strategy with the 4-residue window achieved the highest performance (Accuracy: 0.84, MCC: 0.66, AUC: 0.90), outperforming the Esnp baseline (Accuracy: 0.82, MCC: 0.64, AUC: 0.82), which relies on full sequence concatenation.
By concentrating on regions directly affected by mutations while retaining global sequence context, X-MAP achieves both accuracy and computational efficiency. We are currently developing a hybrid Transformer-CNN model to further enhance prediction accuracy and interpretability. X-MAP represents a powerful and scalable framework for variant analysis with direct applications in precision medicine and disease research.