Skip to contents

This wrapper combines fast, broadly useful metrics for batch mixing, biological conservation, and cluster-level diagnostics. It reuses existing neighbor graphs when available and otherwise builds a kNN graph with Annoy or an exact fallback.

Usage

sn_assess_integration(
  x,
  batch,
  label = NULL,
  cluster = "seurat_clusters",
  reduction = .sn_default_metric_reduction(x),
  baseline = NULL,
  baseline_reduction = .sn_default_baseline_reduction(x, reduction),
  graph = NULL,
  dims = NULL,
  k = 20,
  metrics = NULL,
  neighbor_method = c("auto", "graph", "annoy", "exact"),
  max_cells = 5000,
  stratify_by = batch,
  rare_fraction = 0.02,
  rare_n = 50,
  challenge_threshold = 0.5,
  seed = 717,
  n_trees = 50
)

Arguments

x

A Seurat object.

batch

Metadata column containing batch labels.

label

Optional metadata column containing biological labels such as cell type annotations.

cluster

Optional metadata column containing the current clustering.

reduction

Reduction name used for the primary integrated embedding. Defaults to "harmony" when present, otherwise "pca".

baseline

Optional Seurat object used as the baseline reference for PCR batch scoring. If NULL, the baseline is taken from x.

baseline_reduction

Optional reduction name used as the baseline.

graph

Optional graph name stored in x@graphs.

dims

Optional integer vector of embedding dimensions to retain.

k

Number of neighbors used when a graph must be built from embeddings.

metrics

Optional character vector selecting which metrics to compute. Supported values are "batch_lisi", "label_lisi", "batch_silhouette", "label_silhouette", "graph_connectivity", "clustering_agreement", "isolated_label_score", "cluster_purity", "cluster_entropy", "pcr_batch", and "challenging_groups".

neighbor_method

Strategy used when a graph must be built. One of "auto", "graph", "annoy", or "exact".

max_cells

Optional integer cap used to subsample cells before running the metrics.

stratify_by

Optional metadata column used to preserve representation during subsampling. Defaults to batch.

rare_fraction

Fraction-of-cells threshold for flagging rare groups.

rare_n

Absolute cell-count threshold for flagging rare groups.

challenge_threshold

Threshold on the derived challenge_score used to flag difficult groups.

seed

Random seed used when max_cells triggers subsampling.

n_trees

Number of Annoy trees when neighbor_method = "annoy".

Value

A list with four top-level elements:

  • summary: one row per summary metric, plus aggregate scores.

  • per_cell: per-cell outputs such as LISI and silhouette.

  • per_group: per-group outputs such as connectivity, composition, and difficult-group diagnostics.

  • parameters: the effective runtime configuration.

Examples

if (FALSE) { # \dontrun{
data("pbmc_small", package = "Shennong")
pbmc <- sn_run_cluster(
  pbmc_small,
  batch = "sample",
  species = "human",
  verbose = FALSE
)
metrics <- sn_assess_integration(
  pbmc,
  batch = "sample",
  cluster = "seurat_clusters",
  reduction = "harmony",
  baseline_reduction = "pca"
)
metrics$summary
} # }