Chapter 15: Spatial Transcriptomics and Coordinate Math
Johnson’s First Principle: A Cell Without Spatial Coordinates is Headless
Dissolving a tissue to isolate single cells destroys the biological map. A macrophage surrounded by healthy epithelium behaves entirely differently than one surrounded by necrotic tumor. Chapter 13 taught you to identify cell types; Chapter 14 taught you to order them in time. But development and function unfold not only in time — they are embedded in three-dimensional tissue architecture. Without spatial context, single-cell data reveals the molecules but obscures the anatomy.
Spatial transcriptomics solves this by measuring gene expression while preserving tissue coordinates. The core operation is coordinate math — mapping every measurement to a position \((x, y)\) in tissue space, then asking which genes vary across those coordinates, which cell types occupy which coordinates, and which neighboring coordinates communicate with each other.
Core Concepts
The Spatial Resolution Tradeoff
Spatial transcriptomics is defined by a fundamental tradeoff: whole-transcriptome coverage vs. single-cell resolution. No current technology achieves both. Choosing a technology means choosing where on this tradeoff curve to operate, and that choice determines what biological questions can be addressed.
Spatial barcoding (10x Visium, HD, Slide-seq) uses a slide printed with spatially barcoded oligonucleotides. A tissue cryosection is placed on the slide; cellular mRNA diffuses vertically into the capture probes. The barcode encodes the \((x, y)\) coordinate of each capture spot, enabling reconstruction of measurement positions after sequencing. Visium spots are 55 \(\mu\)m diameter at 100 \(\mu\)m spacing, each capturing RNA from ~1-10 cells. Visium HD (2 \(\mu\)m pixels) achieves near-single-cell resolution. The key property: whole-transcriptome coverage (all poly-adenylated RNA is captured), enabling discovery of unanticipated gene expression patterns.
In situ hybridization (MERFISH, Xenium, CosMx) uses fluorescent probes that hybridize directly to mRNA in the intact tissue section. Sequential barcoding cycles decode transcripts: each gene receives a binary code, and 10-20 imaging cycles decode hundreds to thousands of genes at subcellular resolution (~100-200 nm). The key property: resolution sufficient to resolve individual transcripts, enabling subcellular localization and single-cell assignment. The tradeoff: limited panel size (hundreds to thousands of genes, not the full transcriptome).
| Property | Spatial Barcoding | In Situ Hybridization |
|---|---|---|
| Coverage | Whole transcriptome (20K genes) | Targeted panel (100-5,000 genes) |
| Resolution | Multi-cell (55 \(\mu\)m spots) to near-single-cell (HD) | Subcellular (~200 nm) |
| Discovery power | High (unbiased gene discovery) | Low (only panel genes detected) |
| Bioinformatic challenge | Deconvolution of mixed spots | Molecular decoding, registration |
Deconvolution: Unmixing Mixed Spots
Each Visium spot captures RNA from multiple cells. Deconvolution is the inverse problem: given the mixed expression profile of a spot, estimate the cell type composition. Formally, each spot’s expression vector \(\mathbf{y}_s\) is a weighted sum of cell type reference profiles:
\[\mathbf{y}_s = \sum_{k=1}^K \beta_{sk} \cdot \boldsymbol{\mu}_k + \boldsymbol{\epsilon}_s\]
where \(\boldsymbol{\mu}_k\) is the reference expression profile of cell type \(k\), \(\beta_{sk}\) is the proportion of type \(k\) in spot \(s\), and \(\boldsymbol{\epsilon}_s\) is residual noise. This requires a reference scRNA-seq dataset with annotated cell types from the same tissue. Deconvolution solves for the \(\beta_{sk}\) coefficients per spot.
RCTD (Robust Cell Type Decomposition) models each spot’s expression as a linear combination of cell type profiles, with statistical significance testing to determine which cell types are present in each spot. The “robust” in the name refers to its handling of genes that are poorly fit by the linear model — it downweights genes whose expression deviates from the reference, preventing a few discordant genes from driving the deconvolution result.
Cell2location uses a Bayesian Negative Binomial model that accounts for both technical variation (NB count distribution) and biological variation (cell type abundance heterogeneity). Unlike RCTD, which estimates relative proportions, Cell2location estimates the absolute number of cells of each type per spot. This is more interpretable but requires stronger modeling assumptions about cell type expression profiles.
CARD incorporates spatial correlation into deconvolution, assuming that cell type abundances vary smoothly across tissue sections. This makes biological sense — a B-cell zone in the lymph node grades into a T-cell zone gradually, not in isolated spots — and improves deconvolution accuracy for spatially structured tissues.
The reference dependency problem. All deconvolution methods depend critically on the reference scRNA-seq dataset. If the reference lacks a cell type present in the tissue, that cell type’s contribution will be misassigned to the most transcriptionally similar available type. If the reference and spatial data come from different tissue sections, different donors, or different processing protocols, batch effects between the two datasets will distort the deconvolution. A deconvolution result is only as good as the reference it builds on.
Neighborhood Analysis: From Spots to Tissue Organization
Once cell types are assigned to coordinates, the next question is tissue organization — which cell types are near which others, and how that proximity shapes function.
Spatial autocorrelation (Moran’s I) quantifies whether neighboring spots are more similar than expected by chance:
\[I = \frac{n}{W} \frac{\sum_i \sum_j w_{ij}(x_i - \bar{x})(x_j - \bar{x})}{\sum_i (x_i - \bar{x})^2}\]
where \(w_{ij}\) is a spatial weight matrix (1 if spots \(i\) and \(j\) are neighbors, 0 otherwise), \(W\) is the sum of all weights, and \(n\) is the number of spots. Moran’s \(I\) ranges from \([-1, +1]\): \(I > 0\) indicates spatial clustering (neighboring spots are more similar than expected), \(I < 0\) indicates a “checkerboard” pattern (neighboring spots are systematically different), and \(I \approx 0\) indicates random spatial organization.
The null expectation. Under the null hypothesis of no spatial autocorrelation, the expected value of Moran’s \(I\) is:
\[E[I] = -\frac{1}{n-1}\]
For large \(n\) (thousands of spots), \(E[I] \approx 0\), but for small tissue sections with \(n < 100\), the null expectation is meaningfully negative — a value near zero can actually indicate weak positive autocorrelation. Significance is assessed by permutation testing: randomly shuffling expression values across spots (breaking spatial structure), recomputing \(I\) thousands of times, and comparing the observed \(I\) to the empirical null distribution. A \(p\)-value is the fraction of permuted \(I\) values more extreme than the observed \(I\).
Neighborhood definition. The weight matrix \(w_{ij}\) encodes which spots are considered neighbors. The choice of neighborhood definition determines \(W\) and directly affects the computed \(I\) value. Two common approaches exist: \(k\)-nearest neighbors (each spot connects to its \(k\) closest neighbors by Euclidean distance, ensuring every spot has the same number of neighbors) and radius-based (all spots within a fixed distance threshold are neighbors, which can leave isolated spots on tissue edges with few or no neighbors). For Visium’s hexagonal grid, \(k = 6\) is the natural choice — each spot has exactly six immediate neighbors corresponding to the six faces of a hexagon. Using fewer than 6 neighbors underweights spatial structure; using more than 6 dilutes local signal by including spots beyond the immediate tissue neighborhood.
Moran’s \(I\) is widely used to identify spatially variable genes, but the signal must be interpreted cautiously. A gene may appear spatially variable due to: biological compartmentalization (e.g., crypt vs. villus in intestine), tissue edge artifacts (higher signal at tissue periphery due to probe penetration), or technical variation (uneven permeabilization across the section). Comparing Moran’s \(I\) across genes within the same tissue section controls for the last two sources.
Ligand-receptor inference (NicheNet, CellChat, SpaTalk) predicts cell-cell communication from co-expression of ligands and receptors in adjacent cell types. The approach: if cell type A expresses a ligand gene and adjacent cell type B expresses its cognate receptor, then signaling from A to B is plausible. Scores are based on prior knowledge databases of known ligand-receptor pairs and downstream signaling targets.
The communication inference gap. Detecting mRNA for a ligand in one cell type and its receptor in an adjacent cell type is consistent with signaling but does not prove it. mRNA abundance correlates weakly with protein secretion, and secretion does not guarantee binding or downstream pathway activation. Ligand-receptor inference identifies candidates for experimental validation, not confirmed interactions.
Registration: Aligning Data to Anatomy
Spatial transcriptomics data measures gene expression, but the biological interpretation depends on knowing which anatomical region each spot corresponds to. Registration aligns the sequencing data to the tissue histology image (H&E or immunofluorescence) — an affine transformation that rotates, scales, and translates the spot grid to match the tissue image coordinates.
Registration is not a biologically neutral preprocessing step. An affine transformation maps original coordinates \(\mathbf{x}\) to transformed coordinates \(\mathbf{x}'\) via a 2×2 rotation-scaling matrix and a translation vector:
\[\begin{pmatrix} x' \\ y' \end{pmatrix} = \begin{pmatrix} a & b \\ c & d \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} + \begin{pmatrix} t_x \\ t_y \end{pmatrix}\]
The six parameters \((a, b, c, d, t_x, t_y)\) are optimized to minimize mismatch between the spot grid and the tissue boundary in the histology image. The quality of this transformation directly determines whether transcriptomic measurements map to the correct anatomical regions. A misregistration of 100 \(\mu\)m (one Visium spot) can shift a spot from tumor epithelium to stroma, entirely changing its biological interpretation. For in situ methods with subcellular resolution, registration errors of a few microns can misassign transcripts to the wrong cell.
The registration challenge is compounded by tissue deformation during sectioning. The tissue on the slide is not identical to the tissue in the fresh block — it has been stretched, folded, or torn. Computational registration must account for non-linear warping, not just affine transformations, to accurately map sequencing data to histology.
Biological Interpretation
Spatial data is a hypothesis generator, not proof. Detecting ligand-receptor pairs in adjacent cells generates candidates for cell-cell communication, but confirmation requires perturbation — knocking out the ligand or receptor and measuring the effect on the recipient cell. A correlation between a gene’s spatial expression pattern and a tissue compartment is suggestive of compartment-specific function, but the same correlation can arise from technical artifacts (tissue edge effects, permeabilization gradients) or from correlated but non-causal biological processes.
The deconvolution of multi-cell spots is a statistical inference, not a measurement. Results depend heavily on the quality and relevance of the scRNA-seq reference used. A deconvolution that assigns 30% T-cells to every tumor spot likely reflects a reference artifact (e.g., shared stress-response genes between T-cells and tumor cells) rather than true T-cell infiltration. Validating deconvolution results against independent methods (immunohistochemistry, flow cytometry) is essential but rarely done.
Spatial autocorrelation (Moran’s \(I\)) identifies spatially variable genes, but biological interpretation requires distinguishing genuine compartment-specific expression from technical artifacts. A gene with high Moran’s \(I\) that encodes a known structural protein of the extracellular matrix is plausibly compartment-specific; a gene with high Moran’s \(I\) that is uniformly expressed in bulk RNA-seq is more likely an artifact of uneven tissue permeabilization.
Current Landscape (Q2 2026)
- Visium HD (2 \(\mu\)m pixels) bridges the resolution gap between sequencing-based and imaging-based spatial transcriptomics, achieving near-single-cell resolution with whole-transcriptome coverage — the first technology to do so at scale.
- Xenium v2 (10x) supports panels up to 5,000 genes with subcellular resolution, enabling pathway-level analysis without whole-transcriptome requirement while maintaining the single-molecule resolution needed for subcellular localization.
- SpaHDmap (2026) integrates histology images with transcriptomics via multimodal NMF for high-resolution spatial domain detection, linking tissue morphology to molecular architecture.
- Spatial multi-omics (RNA + protein + chromatin from the same section) is emerging from MERFISH and Xenium platforms, enabling multi-modal spatial analysis that links gene expression to protein abundance and chromatin state in anatomical context.
- In silico spatial inference (CytoSPACE, SpaGE) predicts spatial organization from scRNA-seq data alone by mapping cells to spatial coordinates using reference atlases, enabling spatial analysis without performing spatial experiments — but the predictions require experimental validation.
Summary and Required Reading
- Spatial transcriptomics faces a fundamental tradeoff — whole-transcriptome coverage (Visium, Slide-seq) vs. single-cell resolution (MERFISH, Xenium). Technology choice determines what questions can be addressed.
- Deconvolution (RCTD, Cell2location, CARD) estimates cell type composition per spot using scRNA-seq references — results depend critically on reference quality and relevance.
- Moran’s \(I\) quantifies spatial autocorrelation of gene expression; ligand-receptor inference identifies candidate cell-cell communication. Both require validation against artifacts.
- Registration aligns sequencing data to histology via affine transformations — registration quality directly determines anatomical interpretability.
- Spatial data is a hypothesis generator — correlations between expression patterns and tissue compartments suggest function but do not prove it. Perturbation experiments are required for confirmation.
Required Reading
- Ståhl et al.: “Visualization and analysis of gene expression in tissue sections by spatial transcriptomics” (Science, 2016).
- Cable et al.: “Robust decomposition of cell type mixtures in spatial transcriptomics” (Nature Biotechnology, 2022).
Johnson’s Rule: Without spatial context, single-cell data is headless — you know the molecules but not the anatomy.