Data guide#
scDLKit works directly with AnnData.
What scDLKit expects#
an
AnnDataobject with a usable expression matrix inadata.Xor a named layeroptional label information in
adata.obsoptional batch information in
adata.obs
prepare_data#
Use prepare_data when you want lower-level control over preprocessing and split construction:
from scdlkit import prepare_data
prepared = prepare_data(
adata,
layer="X",
label_key="louvain",
batch_key="batch",
use_hvg=True,
n_top_genes=2000,
normalize=True,
log1p=True,
batch_aware_split=True,
)
Scanpy-backed preprocessing#
When you request normalize, log1p, or use_hvg, scDLKit uses Scanpy-backed preprocessing. Install it with:
python -m pip install "scdlkit[scanpy]"
Recommended practice#
For the public tutorials, keep preprocessing simple and standard:
use
scanpy.datasets.pbmc3k_processed()for the example notebooksuse
louvainas the label field for representation and classification demosfocus on model behavior rather than broad biological analysis