Roadmap#

Paper vision#

scDLKit is moving toward a publication-first identity rather than a feature inventory. The paper target is:

scDLKit is a minimal-code, AnnData-native framework for parameter-efficient adaptation and reproducible benchmarking of single-cell and spatial foundation models.

That target has four model pillars:

scGPT
scFoundation
CellFM
Nicheformer

It also has four research-task pillars:

cell type annotation
integration / representation transfer
perturbation-response prediction
spatial domain / niche classification

And it has a common adaptation comparison set:

frozen embeddings plus linear probe
full fine-tuning
LoRA
adapters
prefix tuning
IA3-style scaling

The paper-level benchmark target should eventually evaluate those methods across:

full-label regimes
low-label regimes
cross-study regimes

Current implementation truth#

The current repo is intentionally narrower than the paper target.

Implemented#

stable baseline workflows through TaskRunner
lower-level training and extension through Trainer plus adapters
reproducible evaluation, reports, and docs-contract validation
experimental scGPT frozen embedding support
experimental scGPT annotation adaptation with a wrapper-first path
generic annotation PEFT configs:
- LoRAConfig
- AdapterConfig
- PrefixTuningConfig
- IA3Config
scGPT annotation strategies:
- frozen probe
- head-only tuning
- full fine-tuning
- LoRA
- adapters
- prefix tuning
- IA3
a dedicated annotation benchmark runner covering full-label, low-label, and cross-study regimes
beyond-PBMC annotation evidence on cached human-pancreas subsets

Pilot#

experimental foundation-model support is currently scGPT only
the strongest current research-task story is annotation
the current beyond-PBMC evidence story is annotation-focused rather than fully task-balanced
the annotation pillar still needs its first frozen benchmark artifact bundle before it should be promoted publicly from Pilot to Implemented

Planned#

scFoundation
CellFM
Nicheformer
integration benchmark pipeline
perturbation benchmark pipeline
spatial benchmark pipeline

The roadmap should never imply that paper-target scope is already available in the current release line.

Current objective#

The current active milestone is still the annotation pillar, but the work has shifted from interface design to evidence freeze.

Why annotation is next:

it is the strongest implemented research-facing capability already in the repo
it already has the generic PEFT layer, benchmark script, and main published tutorial in place
it sets the benchmark, artifact, and PEFT comparison conventions that later task pillars should reuse

Done for the annotation pillar means:

annotation has a task spec and a benchmark matrix with frozen, head, full-finetune, and PEFT comparisons on the current scGPT path
the benchmark workflow produces reviewable artifact bundles for full-label, low-label, cross-study, and Pareto reporting
the main annotation tutorial is the static executed human-pancreas notebook with visible last-run metadata
figure-ready outputs are defined, generated, and tracked
the milestone checklist can be closed without guessing

Milestone 1: Annotation pillar#

Status: Active

Primary objective:

make annotation the first paper-ready task pillar using the current scGPT adaptation path as the starting point

Required outcomes:

annotation task spec
dataset shortlist and registry requirements
frozen / full-FT / LoRA / adapters / prefix / IA3 comparison matrix
low-label and cross-study regime implementation
one main research-facing annotation tutorial
figure-ready artifact inventory
benchmark workflow plus artifact freeze

Milestone 2: Spatial pillar#

Status: Planned

Primary objective:

make spatial a real pillar of the paper rather than a future note

Required outcomes:

Nicheformer integration plan
spatial domain or niche classification task spec
spatial metric pipeline
first spatial tutorial
first spatial qualitative figure plan

Milestone 3: Integration pillar#

Status: Planned

Primary objective:

define representation-transfer benchmarking with task-specific metrics and datasets rather than treating integration as generic embedding inspection

Required outcomes:

integration task spec
metric pipeline for kBET, iLISI / cLISI, ASW, and clustering metrics where appropriate
dataset registry entries
one main integration tutorial

Milestone 4: Perturbation pillar#

Status: Planned

Primary objective:

define perturbation-response prediction as a first-class benchmark task

Required outcomes:

perturbation task spec
dataset shortlist
metric pipeline for correlation, error, and DE recovery
one main perturbation tutorial

Cross-model expansion#

After the task pillars are defined, model breadth should expand toward the paper target in a controlled way:

bring scFoundation, CellFM, and Nicheformer into the common wrapper and benchmark story
keep model parity explicit instead of implying equal maturity
track wrapper, inference, PEFT, tests, tutorials, and benchmarks separately

Maintenance rules#

public docs must distinguish Implemented, Pilot, and Planned
no model, PEFT method, or task should be described as supported until code, tutorial, tests, and benchmark artifacts exist
the high-level roadmap stays public and concise
execution detail belongs in repo-tracked checklist files under planning/