Scanpy integration#

scDLKit is designed to fit naturally into a Scanpy-centered workflow.

The most important framing is simple:

Scanpy still owns the single-cell analysis workflow.
scDLKit provides the model-training, evaluation, comparison, and output-handoff layer.

If you want the full preprocessing-plus-clustering story, start with the official Scanpy basics tutorial:

https://scanpy.readthedocs.io/en/stable/tutorials/basics/clustering.html

Minimal integration pattern#

import scanpy as sc
from scdlkit import TaskRunner

adata = sc.datasets.pbmc3k_processed()

runner = TaskRunner(
    model="vae",
    task="representation",
    label_key="louvain",
    device="auto",
    epochs=20,
    batch_size=128,
    model_kwargs={"kl_weight": 1e-3},
)

runner.fit(adata)
adata.obsm["X_scdlkit_vae"] = runner.encode(adata)

For this single-cell baseline, use a light VAE KL term so PBMC populations remain visibly separable in the latent UMAP. The quickstart notebook exposes both a quickstart and a full profile; the latter simply runs longer with the same code path when you want a stronger qualitative result.

Continue with Scanpy#

Once the latent representation is in adata.obsm, use it like any other Scanpy embedding:

sc.pp.neighbors(adata, use_rep="X_scdlkit_vae")
sc.tl.umap(adata)
sc.pl.umap(adata, color="louvain")

For reconstruction-capable models, you can also retrieve reconstructed expression directly:

reconstructed = runner.reconstruct(adata)

Workflow map#

Workflow step	Owned by	Where to learn it
Raw QC, filtering, normalization, HVG selection	Scanpy	Official Scanpy preprocessing and clustering tutorials
Train a baseline model on processed PBMC	scDLKit	Scanpy PBMC quickstart
Push latent embeddings into `adata.obsm`	scDLKit + Scanpy handoff	Scanpy PBMC quickstart
Cluster and interpret the scDLKit embedding	Scanpy on top of scDLKit output	Downstream Scanpy after scDLKit
Inspect reconstructed expression	scDLKit	Reconstruction sanity check
Compare baseline models	scDLKit	PBMC model comparison
Wrap a custom PyTorch module	scDLKit	Custom model extension
Try the experimental frozen foundation path	scDLKit	Experimental scGPT PBMC embeddings

Current scope#

scDLKit is not a replacement for Scanpy.

It is the model-training and evaluation layer you can drop into a standard single-cell analysis workflow when you want:

a baseline autoencoder or VAE
a quick benchmark before building a custom method
a consistent way to compare latent representations
a tutorial-backed way to inspect reconstructed outputs when the model supports them

Today that scope is still gene-expression-first. Spatial and multimodal workflows are intentionally deferred until the current baseline toolkit is better benchmarked.