Experimental Annotation Quickstart API#
What it is#
Status: experimental.
This page documents the easiest public path for labeled annotation adaptation:
adapt_annotation(...)for the one-call workflowinspect_annotation_data(...)for preflight checksAnnotationRunnerfor the explicit inspect-fit-predict-annotate-save-load flow
The current implementation routes only to the experimental scGPT whole-human
annotation path for human scRNA-seq data.
When to use it#
Use this page instead of TaskRunner when:
you already have labels in
adata.obsyour goal is annotation adaptation, not just a baseline embedding
you want predictions and embeddings written back into
AnnDatayou want to compare frozen and tuned strategies with minimal code
Use Experimental foundation helpers when you want the lower-level scGPT-specific route underneath this alias layer.
Minimal example#
from scdlkit import adapt_annotation
runner = adapt_annotation(
adata,
label_key="cell_type",
output_dir="artifacts/scgpt_annotation",
)
runner.annotate_adata(adata, obs_key="scgpt_label", embedding_key="X_scgpt_best")
runner.save("artifacts/scgpt_annotation/best_model")
Parameters#
label_key: requiredadata.obscolumn containing the target annotation labels.checkpoint: currently fixed to the experimental scGPTwhole-humancheckpoint.strategies: default quickstart is("frozen_probe", "head"). Additional opt-in strategy names are"full_finetune","lora","adapter","prefix_tuning", and"ia3".strategy_configs: optional per-strategy config mapping for the heavier PEFT comparison surface exposed underscdlkit.foundation.lora_config: backward-compatible alias forstrategy_configs={"lora": LoRAConfig(...)}in the0.1.xline.batch_size,val_size,test_size,random_state,device: wrapper training and split defaults.output_dir: optional artifact directory for reports, plots, and saved runner state.
Input expectations#
input must be human scRNA-seq stored in
anndata.AnnData.label_keymust exist inadata.obsand contain at least two label categories for training.the expression matrix must be non-negative for the scGPT tokenization path.
gene overlap with the
whole-humanvocabulary must be sufficient;inspect_annotation_data(...)exposes that check before fitting.optional batch or study metadata can stay in
adata.obsand will be carried through for downstream reporting when present.
Returns / outputs#
inspect_annotation_data(...)returns aScGPTAnnotationDataReport.adapt_annotation(...)returns a fittedAnnotationRunner.AnnotationRunner.predict(...)returnslabel_codes,labels,probabilities, andlatent.AnnotationRunner.annotate_adata(...)writes labels toadata.obsand embeddings toadata.obsm.AnnotationRunner.save(...)writes a directory withmanifest.jsonandmodel_state.pt.strategy comparison artifacts include per-strategy metrics with
macro_f1,accuracy,balanced_accuracy, and multiclassauroc_ovrwhen probability outputs make it valid.
Failure modes / raises#
ImportErrorif the package was installed without thefoundationextra.ValueErrorif labels are missing, class counts are too small, or gene overlap is insufficient.ValueErrorif an unsupported strategy name is requested or a strategy config does not match the selected strategy.ValueErrorif bothstrategy_configsandlora_configare supplied.RuntimeErrorif you try to predict, annotate, or save before fitting or loading a runner.ValueErrorif the saved runner manifest is incomplete or incompatible.
Notes / caveats#
This surface is experimental even though the aliases live at
scdlkit.The beginner default is intentionally CPU-friendly: frozen probe plus head-only tuning.
The heavier annotation benchmark surface extends to:
full fine-tuning
LoRA
adapters
prefix tuning
IA3
The current public model implementation is still
scGPTonly.TaskRunneris not extended for this path in the current release line.