Composition and comparative analysis workflow

This workflow covers the analysis tasks that happen after clusters or cell types already exist and the main question becomes comparative rather than structural:

how cell states are distributed across samples
whether cluster proportions shift between groups
how QC or score columns differ across categories

library(Shennong)
library(dplyr)
library(knitr)
library(Seurat)

if (!exists("pbmc_small", inherits = FALSE)) {
  try(data("pbmc_small", package = "Shennong", envir = environment()), silent = TRUE)
}
if (!exists("pbmc_small", inherits = FALSE) && file.exists(file.path("data", "pbmc_small.rda"))) {
  load(file.path("data", "pbmc_small.rda"))
}

1. Summarize cell-state proportions

sn_calculate_composition() is the main helper for grouped proportion summaries. The most common pattern is sample-by-cluster composition.

knitr::kable(composition_tbl, digits = 2)

sample	seurat_clusters	proportion
pbmc1k	0	16.25
pbmc1k	1	23.75
pbmc1k	2	43.75
pbmc1k	3	16.25
pbmc3k	0	65.00
pbmc3k	1	21.67
pbmc3k	3	13.33

2. Plot proportions across groups

The returned composition table can be fed directly into sn_plot_barplot().

sn_plot_barplot(
  composition_tbl,
  x = sample,
  y = proportion,
  fill = seurat_clusters
)

3. Compare categorical structure beyond clusters

The same composition helper also works for cell-cycle phase, annotation labels, or any other metadata column.

knitr::kable(phase_tbl, digits = 2)

sample	Phase	proportion
pbmc1k	G1	58.75
pbmc1k	G2M	21.25
pbmc1k	S	20.00
pbmc3k	G1	49.17
pbmc3k	G2M	26.67
pbmc3k	S	24.17

4. Compare continuous scores across groups

Once a grouped table exists, Shennong’s plotting helpers can summarize how QC or score columns differ by sample or cluster.

sn_plot_boxplot(score_tbl, x = sample, y = nFeature_RNA)

sn_plot_boxplot(score_tbl, x = seurat_clusters, y = S.Score)

5. Keep comparative analysis downstream of stable labels

Composition and grouped-score analysis become much more interpretable once the labels are stable. In practice the recommended order is:

preprocessing and QC
clustering or integration
marker or reference-based annotation
composition and grouped comparisons

That order reduces the chance of over-interpreting composition shifts that are actually caused by unstable clustering.

Songqi Duan

1. Summarize cell-state proportions

2. Plot proportions across groups

3. Compare categorical structure beyond clusters

4. Compare continuous scores across groups

5. Keep comparative analysis downstream of stable labels