Omics with Johnson 101

A practical guide to bioinformatics

Author

Johnson

Published

June 5, 2026

You are about to read a book about bioinformatics. But this is not a book about running software. It is about understanding what the software does — and why.

Omics with Johnson is built on a single belief: that the most important tool a bioinformatician carries is not any programming language or pipeline manager — it is a deep understanding of the physical, mathematical, and statistical principles that govern how biological data is generated, measured, and interpreted.

Whether you are a biologist learning to analyze your first sequencing experiment, a computer scientist curious about the algorithms that align billions of reads, a clinician seeking to understand genomic reports, a student wondering where to begin, or a self-taught enthusiast working through tutorials that never quite explain why — this book is written for you. The only prerequisites are curiosity and persistence.

Johnson’s First Principle of Bioinformatics

Biological data is physical data. A sequencer is not a magic box — it is an optical instrument governed by fluorescence chemistry and total internal reflection. A read alignment is not a black-box mapping — it is a string-matching problem solved by the Burrows-Wheeler Transform. A p-value is not a measure of truth — it is a probability conditioned on a model that may be wrong. A deep neural network does not “think” — it follows a gradient down a loss landscape shaped by its architecture and training data.

The moment you understand the physics and mathematics behind each tool, you stop being a user and become an architect. You stop trusting default parameters and start questioning assumptions. You stop running pipelines and start designing experiments.

What This Book Will Bring You

By the end of this curriculum, you will be able to:

  • Read a sequencing experiment — understand what FASTQ, BAM, and VCF formats actually represent, which physical processes generated them, and where artifacts hide in each.
  • Choose the correct statistical model — know why a Negative Binomial describes RNA-seq counts, why a Poisson cannot, and how to evaluate whether your analysis accounts for biological overdispersion.
  • Build reproducible pipelines — design containerized, version-controlled computational workflows that survive personnel changes, software updates, and regulatory audits.
  • Interpret your own results — distinguish biological signal from batch effects, technical noise, multiple testing burden, and statistical flukes.
  • Speak across disciplines — communicate with molecular biologists about experimental design, with software engineers about computational architecture, and with clinicians about actionable findings.

For Readers From Different Backgrounds

You are a biologist who learned to code. You know your experimental system but statistics feels unfamiliar. Start with Chapters 4 through 6 — the probability and inference material was written with you in mind. The mathematics is built step by step, with biological intuition guiding every equation. You will not be expected to derive proofs; you will be expected to understand why each distribution exists and what it means when its assumptions are violated.

You are a computer scientist entering biology. You can implement anything but the biological context is new. The tier structure allows you to dive into sequencing physics (Chapters 7-9) early, then loop back to the domain-specific chapters. Each chapter includes a Biological Interpretation section written for readers who need the context, not the syntax.

You are a clinician or clinical researcher. Your priority is variant interpretation, cancer genomics, and diagnostic accuracy. Tier 6 was built for you, but the earlier chapters on statistical inference (Chapter 6) and variant calling (Chapter 10) provide the essential foundation for understanding what a VCF entry means and when a variant classification is defensible.

You are a student. Take the tiers in order. Each chapter builds on the previous. The mathematics and code may feel difficult at first — that is normal, not a signal that you are in the wrong place. Reread passages that are unclear. Work through the derivations on paper. The discomfort of learning fundamentals is temporary; the confidence of mastering them is permanent.

A Note on the Mathematics

This book contains equations. You do not need to be a mathematician to read it, but you do need to be willing to engage with quantitative reasoning. Every equation in this book is accompanied by an intuitive explanation of what it means and why it matters.

Can you understand the concept of a mean? Then you can understand the Negative Binomial — it adds a single parameter to capture biological variation. Do you know that a line has a slope? Then you can understand linear regression and the design matrices that model batch effects. Do you know that a table of numbers can be called a matrix? Then you can understand PCA — it finds the directions in that table that capture the most variation.

The goal is not to turn you into a mathematician. The goal is to ensure that when you encounter a method in a paper, you can evaluate whether its assumptions match your data, whether its conclusions are supported by its statistics, and whether its results are reproducible.

A Note on the Code

This book contains code — short blocks in R, Python, and shell, embedded within chapters to illustrate concepts rather than to serve as tutorials. You are not expected to run every snippet on your first reading. Read them for structure: what data does this function expect? What transformation does it apply? What output does it produce?

The habit of reading code for understanding, not just for execution, is one of the most valuable skills this book teaches. It separates scientists who can evaluate a method from technicians who can only run it.

Course Structure

This curriculum spans 31 chapters organized into six tiers, designed to take you from bare-metal hardware to clinical decision-making:

  • Tier 1: Foundations (Chapters 1-6) — Computing, programming paradigms, reproducibility, probability distributions, linear algebra, and statistical inference. These six chapters are the prerequisite for everything that follows. Master them first.

  • Tier 2: Core Omics (Chapters 7-12) — Sequencing physics (Illumina, Nanopore, PacBio), alignment (Burrows-Wheeler Transform), de novo assembly (De Bruijn graphs), variant calling (Hidden Markov Models), transcriptomics, and epigenomics. The central technologies of modern genomics.

  • Tier 3: High-Resolution (Chapters 13-15) — Single-cell RNA-seq, trajectory inference, and spatial transcriptomics. Methods that resolve biology at cellular and subcellular resolution.

  • Tier 4: Systems (Chapters 16-20) — Network analysis (WGCNA), mass spectrometry proteomics, multi-omic integration (MOFA, CCA), 3D genome architecture (Hi-C), and functional genomics (Perturb-seq). Moving from individual molecules to system-level understanding.

  • Tier 5: AI and Machine Learning (Chapters 21-25) — Machine learning fundamentals, tree-based models, neural networks, transformers and protein foundation models, and DNA foundation models. Opening the black box of AI for biological discovery.

  • Tier 6: Clinical (Chapters 26-31) — ACMG variant interpretation, cancer genomics, pathogen genomics, clinical pipeline architecture, career dynamics, and the unresolved problems facing the field. Translating sequence to practice.

How to Read This Book

A chapter is not a lecture — it is a reference. Read actively: with a notebook, with the original papers cited in the Required Reading, with your own data in mind. Challenge every claim. Ask: “How would I test this? What would it take to convince me this method works?”

The chapters are designed to be revisited. The Core Concepts and Biological Interpretation sections are reference material you will return to as you encounter each method in practice. The Current Landscape sections are refreshed quarterly and are the only sections that become dated — the fundamentals do not change.

A Final Word Before You Begin

You will encounter material in these chapters that is difficult — probability distributions that feel abstract, matrix decompositions that seem disconnected from biology, algorithms whose complexity appears unnecessary. Every one of these concepts exists because someone, facing a real biological question, found that existing tools could not answer it and built something new.

This book teaches you to be that person.

Not every analysis you run will produce a publication. Not every model you build will converge. Not every experiment you design will work. But if you understand why a method works and when it fails, you can diagnose the problem, fix it, and move forward — which is more than any tutorial, any black-box pipeline, or any “one-click” analysis can offer.

Turn the page. There is only the next dataset, and it will not have a key.

Let us begin.