Tangram: Stitching the Cellular Map Back Together

A deep learning framework that moves us from a “bag of cells” to a spatially-aware, predictive tissue model.

Oct 29, 2021 6 min read Computational Methods

The Bottom Line Up Front

I’ve just finished reading Biancalani et al.‘s 2021 Nature Methods paper on “Tangram,” and it’s a piece of work that sits squarely in my wheelhouse. In essence, they’ve built a computational tool that solves a fundamental tradeoff in genomics: it takes the deep, genome-wide information from dissociated single-cell sequencing (scRNA-seq) and intelligently projects it back onto a spatial coordinate system derived from technologies like Visium, MERFISH, or even histological images.

The “Where” Problem in Single-Cell Genomics

The central challenge this paper tackles is one every computational biologist in this space wrestles with. We live in a world of powerful, but siloed, technologies. On one hand, scRNA-seq gives us an incredibly rich, transcriptome-wide view of individual cells, but to get it, we have to grind up the tissue, destroying all spatial context. We’re left with a ‘bag of cells’—we know what is there, but not where. On the other hand, spatial technologies like MERFISH or Visium tell us where genes are expressed, but they are either limited to a pre-selected panel of genes or have resolutions coarser than a single cell. The biological puzzle is clear: how do we get the best of both worlds? How do we put the rich molecular puzzle pieces from scRNA-seq back into their original spatial picture?

Under the Hood: A Jigsaw Puzzle Solver for Cells

Tangram’s approach is elegant. It’s a deep learning framework based on non-convex optimization that treats the problem like a massive jigsaw puzzle. The scRNA-seq profiles are the individual puzzle pieces, and the spatial data (e.g., a MERFISH slide) provides the outline or the image on the box. The algorithm computationally “places” each cell from the scRNA-seq data into a location on the spatial map. The model is trained by rewarding placements that maximize the similarity between the resulting in-silico tissue’s gene expression and the real spatial data’s gene expression. It iteratively shuffles the cell assignments until the total spatial correlation across all shared genes is maximized. The output is a probabilistic map, telling you the likelihood of finding each specific cell from your single-cell dataset at every voxel of the spatial dataset.

Beyond Mapping: Imputation as a Superpower

Here’s the finding that made me lean in. Tangram is more than just a cell-type mapping tool. Its real power lies in its ability to impute and predict. This is where it aligns perfectly with my mission of building predictive models.

First, they show they can take a targeted MERFISH experiment measuring only ~250 genes and use it as a scaffold to project a full snRNA-seq dataset, effectively creating a genome-wide spatial map with ~27,000 genes at single-cell resolution. This is a massive leap in data generation without running a new experiment.

Second, and even more compellingly, they map multi-omic SHARE-seq data (which profiles both RNA and chromatin accessibility in the same cell). By using the RNA modality as the “anchor” to align to the spatial MERFISH data, they successfully impute the spatial patterns of chromatin accessibility. This is a genuine act of prediction—projecting a data modality into a space where we currently have few tools to measure it directly. This transforms the tool from a descriptive aligner into a predictive engine.

Weaving Tangram into My Digital Twin Blueprint

This work provides a foundational component for the predictive digital twins I aim to build. For my work in youth leukemia, it’s a game-changer. I can take a deep scRNA-seq dataset from a bone marrow aspirate—a classic ‘bag of cells’—and use Tangram to map those cells back onto a Visium slide from a patient’s bone marrow biopsy. This allows me to spatially resolve the locations of chemoresistant clones relative to supportive niches or immune cells. It’s the first step to modeling the spatial dependencies that drive drug resistance.

The same logic applies directly to my solid tumor pillar. I can build high-resolution maps of the tumor microenvironment, placing specific T-cell exhaustion states or macrophage polarization states in the context of the tumor’s architecture. Tangram is the bridge between my perturbation-based causal discovery (from Perturb-seq) and the spatial reality of the tissue. It helps transform a list of cell states and gene programs into a spatially coherent, systems-level model.

The Next Frontier: Mapping Causality in Space

This paper sets the stage for a critical next step, and it’s where I see my own research program making a unique contribution.

My Next Computational Step: The immediate opportunity is to integrate Perturb-seq data with spatial maps. My lab’s focus is on using CRISPR screens to build causal models of disease. The logical extension is to run a Perturb-seq screen on cancer cells, then use Tangram to map those perturbed cells back onto a spatial atlas of a tumor. This would allow me to ask questions like: if I knock out Gene X, which is critical for metastasis, where do those cells now localize in the tumor? How does that knockout causally influence the gene expression of its immediate neighbors? This fuses causal inference with spatial biology, moving us toward predictive models of how genetic perturbations rewire entire tissue ecosystems.

Key Experiment for the Field: The boldest claim in the paper is the imputation of spatial chromatin accessibility. This needs rigorous experimental validation. The key experiment would be to take a tissue block, perform MERFISH on one section, and use a next-generation spatial-ATAC technology on an adjacent serial section. This would create a ground-truth dataset to directly compare against Tangram’s imputed ATAC-seq patterns. Passing this test would be a powerful confirmation of the algorithm’s predictive capabilities across modalities.

A Necessary Dose of Skepticism

As powerful as this is, it’s important to recognize the inherent assumptions. The model’s accuracy is fundamentally capped by the quality of its inputs. It assumes the scRNA-seq dataset is a comprehensive representation of the cell populations present in the spatial slice. If a rare but critical cell type is absent from your single-cell prep due to sampling bias or dissociation artifacts—the ‘missing puzzle piece’ scenario—Tangram has no way to place it. The model can’t invent what it hasn’t seen. Furthermore, its ability to deconvolve coarse technologies like Visium is powerful, but the placement of individual cells within a single 50-micron spot is a probabilistic estimation, not a direct measurement. We must be careful not to over-interpret the precision at the sub-spot level.

Reference

Biancalani, Tommaso, et al. “Deep learning and alignment of spatially resolved single-cell transcriptomes with Tangram." Nature methods 18.11 (2021): 1352-1362.

Spatial-Transcriptomics Computational-Biology Deep-Learning Multi-Omics