TaskRunner#
What it is#
Status: stable.
TaskRunner is the main beginner workflow in scDLKit. It keeps the common
baseline path compact:
start from a processed
AnnDatachoose a bundled scDLKit model
train and evaluate it
recover embeddings or reconstructed expression values
continue in Scanpy
When to use it#
Use TaskRunner when:
you want the shortest stable path from
AnnDatato a model resultyou are using bundled scDLKit baselines such as
vaeortransformer_aeyou want embeddings back for
adata.obsmyou want reconstructed or predicted expression values from reconstruction-capable models
Use Trainer instead when you need lower-level control or custom wrapped models.
Minimal example#
import scanpy as sc
from scdlkit import TaskRunner
adata = sc.datasets.pbmc3k_processed()
runner = TaskRunner(
model="vae",
task="representation",
label_key="louvain",
device="auto",
epochs=20,
batch_size=128,
model_kwargs={"kl_weight": 1e-3},
)
runner.fit(adata)
adata.obsm["X_scdlkit_vae"] = runner.encode(adata)
reconstructed = runner.reconstruct(adata)
Parameters#
model: built-in model name such asvae,autoencoder,transformer_ae, ormlp_classifier, or an instantiated scDLKit model.task: one ofrepresentation,reconstruction, orclassification.label_key: optionaladata.obscolumn used for supervised metrics or classification.batch_key: optionaladata.obscolumn for batch-aware splits and metrics.layer,use_hvg,normalize,log1p,scale: preprocessing controls applied before training.epochs,batch_size,lr,device,mixed_precision: training and inference defaults.output_dir: optional report/checkpoint directory.
Input expectations#
adatamust be ananndata.AnnDataobject with cells inobsand genes invar.adata.Xor the selected layer must be numeric and feature-consistent between training and inference.label_keymust exist inadata.obsfor classification tasks or supervised evaluation.reconstruct(...)andencode(...)require a fitted runner and a task that exposes those outputs.
Returns / outputs#
fit(...)returns the fittedTaskRunner.encode(...)returns anumpy.ndarraylatent matrix suitable foradata.obsm.reconstruct(...)returns anumpy.ndarrayof reconstructed or predicted expression values.evaluate(...)returns a metric dictionary.save_report(...)writes a Markdown report and sibling CSV table.
Failure modes / raises#
ValueErrorif the selected model does not support the requested task.ValueErroriflabel_keyorbatch_keyis missing fromadata.obs.RuntimeErrorif inference or plotting is called beforefit(...).ValueErrorif you request latent or reconstruction outputs from a classification-only workflow.
Notes / caveats#
TaskRunneris the stable bundled-model path; it is not the annotation fine-tuning surface.Classification models expose class predictions rather than reconstructed expression.
For explicit lower-level control, use Trainer and Data preparation.