scDLKit#

scDLKit is moving from a baseline toolkit identity toward a publication-first research program with a software artifact attached to it.

Available now#

Today the public repo supports two main entrypoints:

  • stable baseline workflows through TaskRunner

  • experimental labeled annotation adaptation through adapt_annotation(...)

  • a static published annotation tutorial and status page for docs review

The current implemented scope is still narrower than the paper target:

  • stable deep-learning baselines for single-cell workflows

  • Scanpy handoff through adata.obsm

  • experimental scGPT annotation adaptation on labeled human scRNA-seq

  • beyond-PBMC annotation evidence on human pancreas

Paper target#

The paper target is:

scDLKit is a minimal-code, AnnData-native framework for parameter-efficient adaptation and reproducible benchmarking of single-cell and spatial foundation models.

That target expands the repo in two directions:

  • model breadth:

    • scGPT

    • scFoundation

    • CellFM

    • Nicheformer

  • task breadth:

    • annotation

    • integration

    • perturbation

    • spatial

Use the roadmap when you want the full distinction between paper target and current implementation truth.

Main research task map#

Cell type annotation

Status: Pilot

Main question: Can scDLKit already support a credible low-code adaptation story on labeled human data?

Current implementation note: The pilot currently runs on the experimental scGPT path only. The published quickstart tutorial compares frozen_probe and head, while the heavier annotation benchmark matrix extends to full fine-tuning, lora, adapter, prefix_tuning, and ia3.

Experimental scGPT human-pancreas annotation
Integration / representation transfer

Status: Planned

Main question: Can adapted representations transfer across studies and batches under a standardized benchmark?

/roadmap#integration-pillar
Perturbation-response prediction

Status: Planned

Main question: Can the framework benchmark adaptation strategies on perturbation-response tasks?

/roadmap#perturbation-pillar
Spatial domain / niche classification

Status: Planned

Main question: Can scDLKit support a real spatial pillar anchored by Nicheformer rather than a future placeholder?

/roadmap#spatial-pillar

Current entrypoints#

Stable baseline path#

import scanpy as sc
from scdlkit import TaskRunner

adata = sc.datasets.pbmc3k_processed()

runner = TaskRunner(
    model="vae",
    task="representation",
    label_key="louvain",
    device="auto",
    epochs=20,
    batch_size=128,
    model_kwargs={"kl_weight": 1e-3},
)

runner.fit(adata)
adata.obsm["X_scdlkit_vae"] = runner.encode(adata)

Use this when you want the stable baseline workflow.

Related docs:

Experimental annotation path#

from scdlkit import adapt_annotation

runner = adapt_annotation(
    adata,
    label_key="cell_type",
    output_dir="artifacts/scgpt_annotation",
)
runner.annotate_adata(adata)
runner.save("artifacts/scgpt_annotation/best_model")

Use this when you want the current low-code research-facing adaptation path.

Related docs:

Supporting workflows#

Workflow snapshots#

Latent UMAP from the Scanpy PBMC quickstart

Quickstart embedding colored by the PBMC reference labels.#

Leiden UMAP from the downstream Scanpy tutorial

Leiden clustering on the same embedding after handing control back to Scanpy.#

Current scope#

  • Scanpy still owns raw-data preprocessing, QC, and most exploratory analysis.

  • scDLKit currently owns model training, evaluation, comparison, and output handoff.

  • the current public implementation is still gene-expression-first

  • the paper target is broader than the current implementation and must remain labeled as such