2. Start a Typical Workflow
Qingzhou Zhang
2026-02-23
Source:vignettes/vignette_02-workflow.Rmd
vignette_02-workflow.RmdIntroduction
Protein complexes are the functional machinery of the cell.
High-throughput experimental methods can identify hundreds of putative
protein complexes in a single experiment. The ComplexMap
package provides a comprehensive, end-to-end workflow to process,
analyze, annotate, and visualize such datasets.
The version 2.0 workflow is built around a Physical-First philosophy: we prioritize the physical composition (protein identity) of complexes to define the map’s layout, while using functional enrichment to provide a “color landscape” that highlights biological diversity without collapsing distinct sub-complexes into generic blobs.
Step 1: Loading Data
The ComplexMap package includes the
demoComplexes dataset (622 putative human complexes). We
also provide a helper to locate an example BioCarta gene set (GMT)
file.
# Load example complexes
data("demoComplexes", package = "ComplexMap")
# Access the included example GMT file (BioCarta)
gmtPath <- getExampleGmt()
biocartaGmt <- getGmtFromFile(gmtPath, verbose = FALSE)
# Preview the input
message(sprintf("Loaded %d complexes and %d gene sets.",
length(demoComplexes), length(biocartaGmt)))
#> Loaded 622 complexes and 292 gene sets.Step 2: Running the Main Workflow
The core of the package is the createComplexMap()
function. This high-level wrapper handles:
Refinement: (Optional) Merges highly redundant complexes to clean the input.
Functional Enrichment: Calculates p-values and Fold Enrichment for every complex.
Network Construction: Links complexes based on shared proteins (
alpha=0.75gives 75% weight to physical overlap).Semantic Clustering: Groups functional terms to assign a consistent color palette that maximizes diversity.
Topology Calculation: Runs a Fruchterman-Reingold layout and calculates centrality metrics (betweenness and degree).
# Run the entire workflow
# We use a subset of 200 complexes for this vignette demonstration
cmap <- createComplexMap(
complexList = demoComplexes[1:200],
gmt = biocartaGmt,
ifRefineCpx = TRUE, # Clean up redundancy
mergeThreshold = 0.90, # Only merge nearly identical complexes
alpha = 0.75, # Layout priority: 75% Physical, 25% Functional
verbose = TRUE
)
#> --- Starting ComplexMap Workflow ---
#> Parameters: similarity='jaccard', alpha=0.75 (Diversity Priority)
#> Refinement: Enabled
#>
#> Step 1: Refining complex list...
#>
#> --- Refining Input Complex List (Minimal Merging Strategy) ---
#> Filtered 34 complexes by size. Retaining 166.
#> Identifying merge groups: method='jaccard', threshold >= 0.90...
#> Found 0 redundancy groups. Merging 166 complexes into 166.
#>
#> --- Refinement Complete ---
#>
#> Step 2: Running enrichment analysis...
#> Running enrichment for 166 complexes (Cutoff: 0.05)...
#> Annotation complete. Found significant terms for 64 complexes.
#>
#> Step 3: Building complex network...
#> Building complex network (combined mode, alpha=0.75)...
#> Processing 13695 pairs using 11 cores...
#> Network built: 94 edges retained.
#>
#> Step 4: Generating node attributes...
#> Generating node attributes (prioritizing functional specificity colors)...
#> -> Clustering 135 terms using co-occurrence (jaccard)
#> Metric: 'jaccard' with unit: 'log'; comparing: 135 vectors
#> -> Generating diverse palette: 25 functional domains (Average Linkage).
#>
#> Step 5: Computing map topology...
#> Computing map topology (layout and centrality)...
#> Topology computation complete.
#>
#> --- ComplexMap Workflow Complete ---The Systems Biology Dashboard
One of the key features of the ComplexMap object is its
print method, which provides immediate feedback on the
“health” of your landscape.
# Inspect the resulting object
print(cmap)
#> # ComplexMap Object (Physical-First Layout)
#> # -- Physical Structure: 166 nodes, 94 edges (0.57 edges/node)
#> # -- Functional Landscape:
#> # * Diversity: 24 distinct functional domains (colors)
#> # * Coverage: 38.6% of complexes annotated
#> # -- Accessors: `getNodeTable()`, `getEdgeTable()`
#> # -- Explore: `explore(cmap)` to launch interactive Cytoscape.js viewer.What to look for:
Diversity: A high count of distinct functional domains indicates a specific, high-resolution map.
Coverage: Shows what percentage of your experimental data could be biologically annotated.
Step 3: Exploring and Querying
You can interact with the map data using standard accessors or the
targeted queryMap tool.
3.1 Targeted Querying
Let’s find complexes containing “SMAD4” (a central mediator of TGF-beta signaling).
# Search for protein members
smad_complexes <- queryMap(cmap, query = "SMAD4", type = "protein")
#> Warning: No matches for 'SMAD4'.
if (nrow(smad_complexes) > 0) {
smad_complexes %>%
select(complexId, primaryFunctionalDomain, proteinCount) %>%
knitr::kable()
} else {
message("SMAD4 not found in this subset.")
}
#> SMAD4 not found in this subset.3.2 Accessing Tables
The node and edge tables are available as tibbles for custom
downstream analysis (e.g., in ggplot2).
nodes <- getNodeTable(cmap)
edges <- getEdgeTable(cmap)
# Example: Top 3 most central complexes by betweenness
nodes %>%
select(complexId, primaryFunctionalDomain, betweenness) %>%
arrange(desc(betweenness)) %>%
head(3)
#> # A tibble: 3 × 3
#> complexId primaryFunctionalDomain betweenness
#> <chr> <chr> <dbl>
#> 1 CpxMap_0090 BIOCARTA_SALMONELLA_PATHWAY 0.0434
#> 2 CpxMap_0121 BIOCARTA_HBX_PATHWAY 0.0344
#> 3 CpxMap_0088 BIOCARTA_HBX_PATHWAY 0.0212Step 4: Interactive Visualization
In ComplexMap >= 2.0.0, static R plots have been
replaced by a powerful interactive engine based on
Cytoscape.js.
To launch the explorer, simply call explore() on your
object. This will open a Shiny application where you can:
Zoom and Pan through the physical landscape.
Filter by functional domain or protein count.
Inspect specific nodes to see their full protein list and enrichment scores.
# Launch the interactive viewer
explore(cmap)Step 5: Exporting to External Tools
If you prefer to perform advanced network analysis in
Cytoscape Desktop or Gephi, use
exportNetwork. It sanitizes complex R data types into
strings and writes standard TSV or GraphML files.
# Export for Cytoscape Desktop
exportNetwork(cmap, filePrefix = "my_complex_landscape", format = "cytoscape")