User Guide#

version: 1.0.0 status: active

The User Guide is the narrative companion to the auto-generated API reference. It explains the why and when for each top-level module, not just the how. Read these pages first to decide which gunz-cm module is right for your task, then consult the API reference for parameter details.

Pipeline overview#

gunz-cm follows a 5-stage pipeline: load → preprocess → convert → visualize → reconstruct/metrics. The table below maps each stage to the modules that implement it.

Stage	Modules	Typical workflow
Load	:doc:`loaders`, :doc:`cli`	`load_cm_data(...)` reads any Hi-C format
Preprocess	:doc:`preprocs`, :doc:`normalization`	Filter empty bins, apply KR/ICE balancing
Convert	:doc:`converters`	Write to COO, GZCM, or MEMMAP
Visualize	:doc:`visualizations`	Render contact maps and 3D structures
Reconstruct / metrics	:doc:`reconstructions`, :doc:`metrics`	Infer 3D structures, score quality
ML training (optional)	:doc:`datasets`, :doc:`resolution_enhancements`, :doc:`samplers`	Train a super-resolution model
Workflow orchestration	:doc:`pipeline`	Chain steps into a reproducible recipe

Module index#

Entry-point trio#

These three modules are where every new user starts.

:doc:root — The top-level gunz_cm namespace. load_cm_data, ContactMatrix, Balancing, Format, exceptions.
:doc:cli — Command-line interface. gunz-cm loaders, gunz-cm converters, gunz-cm recon.
:doc:loaders — File format loaders. .hic, .cool, .mcool, CSV, GZCM, PICKLE/NPY.
:doc:converters — Format conversion. HIC→COOL, COO text, GZCM v1/v2/v3, MEMMAP.

Preprocessing layer#

:doc:preprocs — Filter, downsample, transform contact matrices and 3D points.
:doc:normalization — KR, ICE, VC matrix balancing algorithms.
:doc:structs — Typed data shapes (CmFrameSchema, LoaderConfig, ConflictPolicy, RegionSpec, cm_to_coo/cm_to_csr/etc.). The canonical home for DOP-styled typed data; populated progressively by the v2.18.0–v2.25.0 DOP campaign.
:doc:utils — Internal helpers. Use with care — these are not part of the stable public API.

Storage layer#

:doc:compressions — 6 codecs for GZCM v3 tile compression (BSC, CMC, CMC_ZSTD, ZSTD, BSCM_CMC variants).
:doc:io — Low-level GZCM readers/writers. GZCMReader, GZCMWriter, GZCMChunkedReader/Writer.
:doc:samplers — SpatialBatchSampler for PyTorch DataLoader on genomic data.
:doc:datasets — PyTorch Dataset implementations (HiCSparseDataset, GzcmDataset).

Analysis layer#

:doc:metrics — Reconstruction metrics (Procrustes RMSE, R^2) and resolution-enhancement metrics (MSE, SSIM, PSNR).
:doc:pipeline — Workflow orchestration. Pipeline class, create_pipeline factory.

ML + UI layer#

:doc:reconstructions — 3D structure inference. Classical MDS, PO-MDS. GPU support via torch.
:doc:resolution_enhancements — Super-resolution datasets and transforms. Requires ren or ren-gpu extra.
:doc:visualizations — Display helpers (2D) and 3D structure plotters (Plotly WebGL).

How to use this guide#

If you are new to gunz-cm, start with :doc:root to understand the top-level API, then read the Entry-point trio in order. The first tutorial in :doc:../tutorials/index (Load HIC) is the recommended first hands-on exercise.

If you are migrating from cooler or scvi-tools, focus on :doc:loaders and :doc:converters — these cover the same surface area with different conventions.

If you are building a custom pipeline (e.g., a multi-sample analysis), read :doc:pipeline after the entry-point trio.

If you are debugging an error, check the Known issues section at the bottom of each module page first. Most pre-existing bugs are documented there with workarounds.

Where to go next#

:doc:../tutorials/index — Step-by-step tutorials (load → convert → visualize → random downsample)
:doc:../concepts — Core concepts: lazy loading, balanced vs raw counts, region syntax
:doc:../changelog — Version history and breaking changes
:doc:../gunz_cm — Full auto-generated API reference