Chapter 31: Epilogue — The Unresolved Problems
The Unresolved Problems
The Pangenome Reference — replacing the single linear reference genome with a graph genome that captures all human genetic variation. Algorithms for read alignment against a pangenome graph are still immature.
Multi-Omic Single-Cell Integration — measuring protein, RNA, and chromatin accessibility from the same single cell at throughput sufficient for clinical studies. Current co-assays exist but are low-throughput.
Longitudinal Molecular Dynamics — integrating wearables, continuous biomarkers, and periodic deep omics into a coherent model of human health trajectories over decades.
Interpretable AI for Biological Discovery — beyond SHAP values and attention maps. We need models that propose testable causal hypotheses, not just predictive features.
Spatial Multi-Omics at Single-Molecule Resolution — whole-transcriptome spatial profiling at subcellular resolution across entire tissue sections. No current platform achieves both breadth and resolution.
Equitable Genomic Medicine — solving the ancestry bias problem in genomic databases. gnomAD is ~60% European; polygenic risk scores derived from European cohorts are largely uninformative in non-European populations.
Genomic Data Privacy — clinical sequencing produces uniquely sensitive data about future disease risk, ancestry, and family relationships. Privacy-preserving methods (differential privacy, federated learning, encrypted computation) remain active research areas, with no deployed system simultaneously guaranteeing privacy, enabling clinical utility, and satisfying regulatory frameworks (HIPAA, GDPR, emerging AI governance laws).
Protein Design Validation Bottleneck — AI can generate millions of protein sequences, but we can experimentally test only a tiny fraction. Closing this loop requires either massive experimental automation or much better computational filters.
The common thread across these challenges is not any specific technology — it is the foundational principles established in this curriculum: the probability distributions that model biological noise (Chapter 4), the linear algebra that compresses high-dimensional signal (Chapter 5), the statistical reasoning that separates confounders from causes (Chapter 6), the physical constraints that govern sequencing and alignment (Chapters 7-9), and the biological understanding that connects molecular state to phenotype. These principles are the permanent toolkit; the tools are temporary.
The Closing Mandate
Assume every paper is wrong until you have reproduced it. Treat your data with respect — it represents a patient, a biopsy, a life. Teach what you learn. The field advances not through competition but through open methods, shared data, and rigorous peer review.
There is no final exam. There is only the next dataset, and it will not have a key.
Required Reading
All references from Chapters 1-30. The reading never stops.