scDLKit#

scDLKit is moving from a baseline toolkit identity toward a publication-first research program with a software artifact attached to it.

Available now#

Today the public repo supports two main entrypoints:

stable baseline workflows through TaskRunner
experimental labeled annotation adaptation through adapt_annotation(...)
a static published annotation tutorial and status page for docs review

The current implemented scope is still narrower than the paper target:

stable deep-learning baselines for single-cell workflows
Scanpy handoff through adata.obsm
experimental scGPT annotation adaptation on labeled human scRNA-seq
beyond-PBMC annotation evidence on human pancreas

Paper target#

The paper target is:

scDLKit is a minimal-code, AnnData-native framework for parameter-efficient adaptation and reproducible benchmarking of single-cell and spatial foundation models.

That target expands the repo in two directions:

model breadth:
- scGPT
- scFoundation
- CellFM
- Nicheformer
task breadth:
- annotation
- integration
- perturbation
- spatial

Use the roadmap when you want the full distinction between paper target and current implementation truth.

Main research task map#

Cell type annotation

Status: Pilot

Main question: Can scDLKit already support a credible low-code adaptation story on labeled human data?

Current implementation note: The pilot currently runs on the experimental scGPT path only. The published quickstart tutorial compares frozen_probe and head, while the heavier annotation benchmark matrix extends to full fine-tuning, lora, adapter, prefix_tuning, and ia3.

Experimental scGPT human-pancreas annotation

Integration / representation transfer

Status: Planned

Main question: Can adapted representations transfer across studies and batches under a standardized benchmark?

/roadmap#integration-pillar

Perturbation-response prediction

Status: Planned

Main question: Can the framework benchmark adaptation strategies on perturbation-response tasks?

/roadmap#perturbation-pillar

Spatial domain / niche classification

Status: Planned

Main question: Can scDLKit support a real spatial pillar anchored by Nicheformer rather than a future placeholder?

/roadmap#spatial-pillar

Current entrypoints#

Stable baseline path#

import scanpy as sc
from scdlkit import TaskRunner

adata = sc.datasets.pbmc3k_processed()

runner = TaskRunner(
    model="vae",
    task="representation",
    label_key="louvain",
    device="auto",
    epochs=20,
    batch_size=128,
    model_kwargs={"kl_weight": 1e-3},
)

runner.fit(adata)
adata.obsm["X_scdlkit_vae"] = runner.encode(adata)

Use this when you want the stable baseline workflow.

Related docs:

Experimental annotation path#

from scdlkit import adapt_annotation

runner = adapt_annotation(
    adata,
    label_key="cell_type",
    output_dir="artifacts/scgpt_annotation",
)
runner.annotate_adata(adata)
runner.save("artifacts/scgpt_annotation/best_model")

Use this when you want the current low-code research-facing adaptation path.

Related docs:

Supporting workflows#

Workflow snapshots#

Latent UMAP from the Scanpy PBMC quickstart — Quickstart embedding colored by the PBMC reference labels.#

Leiden UMAP from the downstream Scanpy tutorial — Leiden clustering on the same embedding after handing control back to Scanpy.#

Current scope#

Scanpy still owns raw-data preprocessing, QC, and most exploratory analysis.
scDLKit currently owns model training, evaluation, comparison, and output handoff.
the current public implementation is still gene-expression-first
the paper target is broader than the current implementation and must remain labeled as such