Evaluation and outputs#

What it is#

Status: stable.

This page documents the stable evaluation and export helpers that make scDLKit results comparable and reportable:

compare_models(...)
evaluate_predictions(...)
save_markdown_report(...)
save_metrics_table(...)

When to use it#

Use these helpers when:

you want consistent metrics across tasks
you need a quick benchmark table for several bundled models
you want Markdown and CSV outputs that match the tutorial artifacts

Minimal example#

from scdlkit import compare_models
from scdlkit.evaluation import evaluate_predictions, save_markdown_report

benchmark = compare_models(
    adata,
    models=["autoencoder", "vae", "transformer_ae"],
    task="representation",
    shared_kwargs={"label_key": "louvain", "epochs": 10},
    output_dir="artifacts/pbmc_compare",
)

metrics = evaluate_predictions("classification", {"y": labels, "logits": logits})
save_markdown_report(metrics, path="artifacts/report.md", title="Classification report")

Parameters#

compare_models(...) expects a shared AnnData, a list of bundled model names, a task, and optional shared runner kwargs.
evaluate_predictions(...) expects a task name plus a prediction dictionary with task-specific keys.
save_markdown_report(...) and save_metrics_table(...) expect metric dictionaries and output paths.

Input expectations#

classification evaluation requires encoded labels under y and logits under logits.
reconstruction evaluation requires x and reconstruction.
representation evaluation requires latent and benefits from y or batch when present.
report helpers serialize scalar metrics directly and include structured values as-is in Markdown.

Returns / outputs#

compare_models(...) returns a BenchmarkResult with a metrics frame, fitted runners, and optional artifact paths.
evaluate_predictions(...) returns a task-specific metric dictionary.
save_markdown_report(...) writes a Markdown report.
save_metrics_table(...) writes a CSV with scalar metrics.

Failure modes / raises#

ValueError if the prediction payload does not satisfy the selected task contract.
downstream file-writing errors propagate from the filesystem if the destination path is invalid.

Notes / caveats#

compare_models(...) is the stable bundled-model comparison path, not the scGPT adaptation benchmark.
prediction payloads usually come from Trainer.predict_dataset(...) or a compatible wrapper.
the tutorial artifacts in artifacts/ are built on these helpers.

Evaluation and outputs#

What it is#

When to use it#

Minimal example#

Parameters#

Input expectations#

Returns / outputs#

Failure modes / raises#

Notes / caveats#

Related tutorial(s)#