On the cover: Stereo-seq dissects cell-type composition in an E16.5 mouse embryo sagittal section. In this issue, Chen et al. combined DNA nanoball (DNB)-patterned arrays and in situ RNA capture to create spatial enhanced resolution omics sequencing (Stereo-seq), which was applied to generate the Mouse Organogenesis Spatiotemporal Transcriptomic Atlas (MOSTA) with high sensitivity at single-cell resolution. This cover is taken from one of the representative single-cell segmentation results in a developing E16.5 mouse embryo section, and each color represents a cell type.
On the cover: Proteins are essential to life, and understanding their 3D structure is key to unpicking their function. To date, only 17% of the human proteome is covered by an experimentally determined structure. Two papers in this week’s issue dramatically expand our structural understanding of proteins. Researchers at DeepMind, Google’s London-based sister company, present the latest version of their AlphaFold neural network. Using an entirely new architecture informed by intuitions about protein physics and geometry, it makes highly accurate structure predictions, and was recognized at the 14th Critical Assessment of Techniques for Protein Structure Prediction last December as a solution to the long-standing problem of protein-structure prediction. The team applied AlphaFold to 20,296 proteins, representing 98.5% of the human proteome. The predictions have been made freely available in partnership with the European Bioinformatics Institute, along with additional predictions for long human proteins and for 20 other model organisms. Work from DeepMind.
Single-cell (sc)RNA-seq, together with RNA velocity and metabolic labeling, reveals cellular states and transitions at unprecedented resolution. Fully exploiting these data, however, requires kinetic models capable of unveiling governing regulatory functions. Here, we introduce an analytical framework dynamo (https://github.com/aristoteleo/dynamo-release), which infers absolute RNA velocity, reconstructs continuous vector fields that predict cell fates, employs differential geometry to extract underlying regulations, and ultimately predicts optimal reprogramming paths and perturbation outcomes. We highlight dynamo’s power to overcome fundamental limitations of conventional splicing-based RNA velocity analyses to enable accurate velocity estimations on a metabolically labeled human hematopoiesis scRNA-seq dataset. Furthermore, differential geometry analyses reveal mechanisms driving early megakaryocyte appearance and elucidate asymmetrical regulation within the PU.1-GATA1 circuit. Leveraging the least-action-path method, dynamo accurately predicts drivers of numerous hematopoietic transitions. Finally, in silico perturbations predict cell-fate diversions induced by gene perturbations. Dynamo, thus, represents an important step in advancing quantitative and predictive theories of cell-state transitions.
In the Qiu lab, we leverage the latest machine learning (ML) approaches, such as auto-encoders, graph/physics informed neural networks, transformers, stable diffusion, etc. to model how cells change their states across expression, epigenetic, proteomic, and even morphological levels during evolution, development, and disease. We integrate these scalable, yet often "black box," ML approaches with more explicit and interpretable "white box" systems biology methods, including dynamical systems, differential geometry, etc. to derive mechanistic and functional biological insights.
We actively apply and develop novel single-cell and spatial genomics approaches. Our techniques include time-resolved, metabolic labeling-enabled scRNA-seq, ultra-high resolution spatial transcriptomics that is also characteristic with high RNA capture sensitivity and expansive fields of view. Pushing the boundaries of functional genomics, we're exploring multi-omics, long-read sequencing, imaging-based spatial transcriptomics, Perturb-seq, lineage tracing, and more. Our focus extends to heart evolution, organogenesis, congenital heart diseases, and others, including hematopoiesis and general embryogenesis.
We pioneered the use of differentiable machine learning models to make non-trivial predictions beyond the scope of traditional methods. With these models, we can quantify not just the velocity of cell state changes, but also their acceleration, curvature, Jacobian, divergence, and more. These differentiable approaches enable us to decipher gene regulatory network dynamics and predict optimal reprogramming paths, identifying key transcription factors. Ultimately, this leads to in silico predictions of cell states and fate trajectories following various genetic, epigenetic, and environmental perturbations.
Emergent single-cell genomics have enabled profiling of cell-state transitions with unprecedented scale. However, due to their destructive nature, it is generally infeasible to follow the same cell over time. Advances in single-cell profiling have fueled the development of computational approaches for inferring cellular dynamics from snapshot measurements, such as pseudotime and RNA velocity approaches. Furthermore, exciting developments in metabolic labeling enabled scRNA-seq approaches now enable us to obtain time-resolved transcriptomic kinetics by directly measuring “new” and “old” RNAs in a controllable manner. We recently developed the Dynamo framework that overcomes fundamental limitations of conventional splicing-based RNA velocity analyses to enable accurate velocity estimations for time-resolved scRNA-seq datasets. Furthermore, we go beyond the discrete RNA velocity vectors to a continuous function that can be used to perform higher-order differentials to gain functional biological insights. Dynamo also establishes itself as one of the first tools to make non-trivial predictions of optimal reprogramming paths and in silico perturbation predictions with single cell datasets. We are continuing this line of research in the lab.
Major questions we are actively asking in the lab include:
Our Dynamo approach for RNA velocity vector field allows us to gain mechanistic insight into cell state transitions from the temporal axis. However, complex cell fate changes such as embryogenesis are not only temporally controlled but also spatially controlled. Unfortunately, routine single-cell approaches dissociate the cells, resulting in complete loss of spatial information. In order to best model the spatial axis, similar to the Dynamo work, we will apply, optimize, and develop novel technologies, such as Stereo-seq, STARmap and PIXEL-seq, that can generate the ideal spatial data. Although these promising technologies now allow us to profile cell states across subcellular, cellular, tissue, organ, and even whole embryo levels, analytical tools that fully leverage such data remain lacking. We recently developed a general analytical framework, Spateo, to model spatial transcriptomics at a multidimensional scale, ranging from single-cell segmentation, spatial domain clustering and digitization, cell-cell interactions, to whole organ and tissue 3D morphometric analysis. We will also continue this line of research in the lab.
Major questions we are actively asking in the lab include:
While Dynamo allows us to predict how cells transit in the gene expression space along the temporal axis, the Spateo framework allows us to predict how cells migrate in the physical space. But these two spaces are separate. It is thus critical to build predictive tools to unify both spaces to jointly learn spatiotemporal kinetics. However, spatiotemporal models, such as partial differential equations, are challenging to directly learn from data, especially for high-dimensional biological systems. Fortunately, breakthroughs in Fourier neural operators, diffusion models, and other generative deep learning approaches, which incorporate functional learning, attention and diffusion process, respectively, have begun to show promise for addressing such challenges. We have achieved promising results with these approaches.
In the lab, we are pursuing the following key questions:
Recent advances in machine learning, particularly with powerful, versatile foundational models like ChatGPT, have transformed various fields. These models use attention based transformer and are trained on extensive datasets, enabling them to perform exceptionally well in areas like image, text, and video processing with limited data. Advances in single-cell and spatial genomics has accumulated massive datasets with tens of millions of single cells that provided necessary data to power up foundational models. While there have been several exciting attempts in building these models, such as Geneformer, scGPT, scFoundation and others, in biology with specific downstream tasks like batch effect removal and gene regulatory network analysis, their predictive capabilities are still lacking. To fulfill these unmet gaps, we have formed a vibrant team to actively investigating these possibilities.
We are pursuing the following key questions: