Transformer-based large language models (LLMs) are changing the world. The capabilities they illustrated in sophisticated natural language, vision and multi-modal tasks have inspired the development of large cellular models (LCMs) for single-cell transcriptomic data, such as scBERT, Geneformer, scGPT, scFoundation, GeneCompass, scMulan, etc. After pretraining on massive amount of single-cell RNA-seq data agnostic to any downstream task, these transformer-based models have demonstrated exceptional performance in various tasks such as cell type annotation, data integration, gene network inference, and the prediction of drug sensitivity or perturbation responses. Such advancements, albeit still in their early stage, suggested promising revolutionary approaches for leveraging AI to understand the complex system of cells from extensive datasets beyond human analytical capacity. Especially, such models have made it possible to conduct in-silico perturbation on cells of various types to predict their responses to gene perturbations without doing experiments on the cells. These models provided prototypes of digital virtual cells that can be used to reconstruct and simulate live cells, which will revolutionize many aspects of future biomedical studies.
Although the community is high enthusiastic to these exciting progresses, the structures and algorithms of LCMs and other similar-scale AI models are mysterious to many people who were not equipped with relevant backgrounds. This tutorial will try to fill this gap. In the tutorial, we will begin from an introduction of basic principles of deep neural networks, and explain the basic structure and algorithm of the original Transformer for natural language tasks. We’ll show to the attendees how to build such models based on current machine learning platforms. Then we’ll introduce several successful ways to build large cellular models based on the basic Transformer model, and overview how such models are pretrained on single-cell RNA-seq data. We’ll show and let the attendees to practice how to use LCMs for basic tasks such as cell type annotation, and look into the specific application of LCMs on in-silico perturbation tasks. Attendees will engage in hands-on activities such as building basic transformer models and executing downstream single-cell tasks, including cell type annotation and in-silico perturbation. These activities will remove the mystery of LCMs for the attendees and help them better understand and feel how LCMs can be built and applied