Perturb-Seq: Moving from 'What Is' to 'What If' in a Single Experiment
A landmark paper that combines CRISPR screens with single-cell RNA-seq, paving the way for causal, high-content genomics.
The Bottom Line Up Front
I’ve just revisited the foundational 2016 Cell paper from the Regev and Weissman labs describing Perturb-seq, and its importance has only grown over time. They ingeniously fused pooled CRISPR-based genetic screens with high-throughput single-cell RNA-sequencing. This creates a system where, in a single experiment, you can knock out hundreds of different genes across a population of cells and read out the full transcriptional consequence of each specific perturbation inside each individual cell. It’s a landmark in functional genomics.
Breaking the Screen-or-Profile Dilemma
The scientific quest here was to resolve a frustrating tradeoff that defined functional genomics for years. On one side, we had pooled CRISPR screens. These are incredibly scalable—you can test thousands of genes at once—but the readout is typically a single, low-content metric like cell survival or the expression of one fluorescent reporter. You learn that a gene is important, but not why. On the other side, we could perform RNA-seq on cells after knocking out a single gene. This gives an incredibly rich, high-content view of the molecular fallout, but it’s painstakingly slow and expensive, done one gene at a time. The central challenge was how to get the best of both: the massive scale of a pooled screen combined with the deep, mechanistic insight of whole-transcriptome profiling.
The Barcode is the Rosetta Stone
The core of Perturb-seq is an elegant technical and computational solution. The key is a lentiviral vector that delivers not only a specific single-guide RNA (sgRNA) to enact a CRISPR knockout but also a transcribed “guide barcode” (GBC). When the cells are processed using droplet-based scRNA-seq, this GBC is captured along with all the other mRNAs from that cell. The GBC is the Rosetta Stone: it allows the authors to definitively link a specific genetic perturbation to a specific high-dimensional transcriptomic profile for thousands of cells in parallel.
Computationally, they then developed a framework (MIMOSCA) to make sense of this massive dataset. Using a regularized linear model, they can deconvolve the specific effect of each perturbation from the complex, pooled data. Crucially, the model is designed to parse out the true perturbation effects from other sources of variation, like the cell cycle, technical noise, or batch effects, which is essential for isolating a clean biological signal.
From Single Genes to a Network Diagram
The finding that truly drives home the power of this method isn’t just the ability to see which genes go up or down after a knockout. The ‘aha!’ moment is seeing how they use these rich phenotypes to reconstruct biological circuits. In their demonstration in bone marrow-derived dendritic cells (BMDCs), they perturbed 24 different transcription factors (TFs) involved in the immune response. By clustering the resulting transcriptional signatures, they could group TFs into functional modules—for example, correctly identifying Stat1 and Stat2 as a cooperative module in the anti-viral response. They went beyond a simple list of gene-phenotype links and generated a data-driven wiring diagram of the regulatory network, complete with activating and repressive relationships. This is a monumental step towards interpretability and systems-level understanding.
This Isn’t Just a Paper; It’s My Lab’s Foundation
I can’t overstate this: Perturb-seq is the bedrock of my research philosophy and the engine for my entire lab’s mission. My goal is to build predictive, causal models of tissues, and this technology is the single most effective tool for generating the necessary causal data at scale.
I can use this exact framework to knock out hundreds of candidate genes in T-ALL cell lines, expose them to chemotherapy, and identify the complete transcriptional programs that are causally responsible for driving chemoresistance. I can also screen for genes in cancer cells that causally alter the expression of checkpoint ligands or, in co-culture systems, screen for genes in T-cells that drive the exhaustion phenotype.
This method is the physical embodiment of my guiding principles. It moves beyond correlation to causality via direct perturbation. It enables prediction by showing what happens to the entire cellular system if a specific gene is removed. And the high-content transcriptomic readout provides the raw material needed to build interpretable models of the underlying gene regulatory networks.
Scaling Causality: Multi-omics and In Vivo Screens
The original Perturb-seq paper was a starting pistol, not a finish line. It immediately opened doors to even more powerful applications, which is where I plan to focus my own computational efforts.
My Next Computational Step: The logical evolution is to move from a single RNA readout to a multi-modal one. My immediate focus is on building the analytical frameworks to handle Perturb-Multiome, where we simultaneously profile the transcriptome and the epigenome (via scATAC-seq) from each perturbed cell. This allows us to ask deeper causal questions: how does knocking out a transcription factor causally rewire the chromatin landscape, and how do those changes then propagate to the transcriptome? This gives us a much more mechanistic, multi-layered view of gene regulation.
Key Experiment for the Field: The experiments in this paper were performed in cell culture. The ultimate challenge and next frontier is to take this technology in vivo. The key experiment would be to transduce a population of tumor cells with a Perturb-seq library, implant them into an immunocompetent mouse model, and allow a tumor to form. Upon excision, performing scRNA-seq on the tumor would reveal which genetic knockouts causally impact a cell’s fitness, metastatic potential, and its interaction with the native immune system within the true complexity of the tumor microenvironment.
Reading the Fine Print: Limitations and Lookouts
For all its power, the approach isn’t without its caveats. First, the model has to infer which cells actually had a successful gene knockout, as CRISPR is not 100% efficient. This introduces a potential source of noise, as the “unaffected” cells can dampen the measured average effect size for a given perturbation. Second, the biological context is critical. An effect observed in a cell line grown in a dish may not translate directly to a complex tissue in vivo, where extrinsic signals from neighboring cells can buffer or modify the outcome of a perturbation. Finally, while the paper demonstrated simple pairwise interactions, the combinatorial space of genetic interactions is astronomically large. Systematically screening all pairs, triplets, and higher-order combinations remains a monumental challenge, forcing us to rely on assumptions of sparsity—that most factors don’t interact—which may not always be true.
Reference
Dixit, Atray, et al. “Perturb-Seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens." cell 167.7 (2016): 1853-1866.