Evaluation and outputs#
What it is#
Status: stable.
This page documents the stable evaluation and export helpers that make scDLKit results comparable and reportable:
compare_models(...)evaluate_predictions(...)save_markdown_report(...)save_metrics_table(...)
When to use it#
Use these helpers when:
you want consistent metrics across tasks
you need a quick benchmark table for several bundled models
you want Markdown and CSV outputs that match the tutorial artifacts
Minimal example#
from scdlkit import compare_models
from scdlkit.evaluation import evaluate_predictions, save_markdown_report
benchmark = compare_models(
adata,
models=["autoencoder", "vae", "transformer_ae"],
task="representation",
shared_kwargs={"label_key": "louvain", "epochs": 10},
output_dir="artifacts/pbmc_compare",
)
metrics = evaluate_predictions("classification", {"y": labels, "logits": logits})
save_markdown_report(metrics, path="artifacts/report.md", title="Classification report")
Parameters#
compare_models(...)expects a sharedAnnData, a list of bundled model names, a task, and optional shared runner kwargs.evaluate_predictions(...)expects a task name plus a prediction dictionary with task-specific keys.save_markdown_report(...)andsave_metrics_table(...)expect metric dictionaries and output paths.
Input expectations#
classification evaluation requires encoded labels under
yand logits underlogits.reconstruction evaluation requires
xandreconstruction.representation evaluation requires
latentand benefits fromyorbatchwhen present.report helpers serialize scalar metrics directly and include structured values as-is in Markdown.
Returns / outputs#
compare_models(...)returns aBenchmarkResultwith a metrics frame, fitted runners, and optional artifact paths.evaluate_predictions(...)returns a task-specific metric dictionary.save_markdown_report(...)writes a Markdown report.save_metrics_table(...)writes a CSV with scalar metrics.
Failure modes / raises#
ValueErrorif the prediction payload does not satisfy the selected task contract.downstream file-writing errors propagate from the filesystem if the destination path is invalid.
Notes / caveats#
compare_models(...)is the stable bundled-model comparison path, not the scGPT adaptation benchmark.prediction payloads usually come from
Trainer.predict_dataset(...)or a compatible wrapper.the tutorial artifacts in
artifacts/are built on these helpers.