Experimental Foundation Helpers#
What it is#
Status: experimental.
This page documents the explicit lower-level scGPT path underneath
scdlkit.adapt_annotation(...). It is the place to go when you want direct
control over frozen scGPT embeddings, tokenized datasets, split-aware
annotation training, or the underlying wrapper objects.
When to use it#
Use this page when you want to:
extract frozen scGPT embeddings directly
prepare tokenized scGPT data for your own workflow
split tokenized data for annotation fine-tuning
load a
Trainer-compatible scGPT annotation model explicitlydrop below the top-level beginner alias and inspect the scGPT-specific objects
Minimal example#
from scdlkit.foundation import (
AdapterConfig,
load_scgpt_annotation_model,
prepare_scgpt_data,
split_scgpt_data,
)
from scdlkit import Trainer
prepared = prepare_scgpt_data(adata, label_key="cell_type")
split = split_scgpt_data(prepared)
model = load_scgpt_annotation_model(
num_classes=len(prepared.label_categories or ()),
label_categories=prepared.label_categories,
tuning_strategy="adapter",
strategy_config=AdapterConfig(bottleneck_dim=64, dropout=0.05),
)
trainer = Trainer(model=model, task="classification", batch_size=prepared.batch_size)
trainer.fit(split.train, split.val)
Parameters#
load_scgpt_model(...)loads the officialwhole-humancheckpoint for frozen embeddings.prepare_scgpt_data(...)tokenizes compatible humanAnnDataand optionally encodes labels.split_scgpt_data(...)creates train, validation, and test subsets without re-tokenizing.load_scgpt_annotation_model(...)builds ahead,full_finetune,lora,adapter,prefix_tuning, oria3scGPT classifier forTrainer.Generic PEFT configs are exposed under
scdlkit.foundationas:PEFTConfigLoRAConfigAdapterConfigPrefixTuningConfigIA3Config
ScGPTLoRAConfigremains available as a compatibility alias in the0.1.xrelease line.ScGPTAnnotationRunnerandadapt_scgpt_annotation(...)expose the explicit wrapper-first foundation path.
Input expectations#
input must be human scRNA-seq in
AnnData.the checkpoint scope is currently limited to scGPT
whole-human.expression values must be non-negative.
annotation tuning requires a valid
label_keywith at least two label categories.sufficient gene overlap with the checkpoint vocabulary is required; otherwise preparation raises a clear error.
Returns / outputs#
ScGPTPreparedDatastores tokenized tensors plus checkpoint and label metadata.ScGPTSplitDatastores split-aware token datasets for training and evaluation.load_scgpt_model(...)returns an embedding model for frozen inference.load_scgpt_annotation_model(...)returns a classification model ready forTrainer(..., task="classification").ScGPTAnnotationRunnerandadapt_scgpt_annotation(...)can emit reports, plots, predictions, and saved runner state.saved runner manifests now include strategy metadata and serialized strategy-config values so trainable strategies can be reloaded cleanly.
Failure modes / raises#
ImportErrorif the package was installed withoutscdlkit[foundation].ValueErrorif labels are missing, the tuning strategy is unsupported, or the checkpoint vocabulary overlap is too small.ValueErrorif expression values are negative.RuntimeErrorif wrapper prediction or save/load methods are called in the wrong lifecycle stage.
Notes / caveats#
The recommended beginner route is still Experimental annotation quickstart API.
This page documents the lower-level implementation and is intentionally narrower than a general foundation-model framework.
Supported scope remains:
human scRNA-seq only
scGPT
whole-humanonlyannotation tuning only
the current model implementation is still
scGPTonly
The heavier scGPT annotation matrix now includes:
headfull_finetuneloraadapterprefix_tuningia3
Cross-model support for
scFoundation,CellFM, andNicheformerremains future work.