User Guide#

version: 1.0.0 status: active

The User Guide is the narrative companion to the auto-generated API reference. It explains the why and when for each top-level module, not just the how. Read these pages first to decide which gunz-cm module is right for your task, then consult the API reference for parameter details.

Pipeline overview#

gunz-cm follows a 5-stage pipeline: load → preprocess → convert → visualize → reconstruct/metrics. The table below maps each stage to the modules that implement it.

Stage

Modules

Typical workflow

Load

:doc:loaders, :doc:cli

load_cm_data(...) reads any Hi-C format

Preprocess

:doc:preprocs, :doc:normalization

Filter empty bins, apply KR/ICE balancing

Convert

:doc:converters

Write to COO, GZCM, or MEMMAP

Visualize

:doc:visualizations

Render contact maps and 3D structures

Reconstruct / metrics

:doc:reconstructions, :doc:metrics

Infer 3D structures, score quality

ML training (optional)

:doc:datasets, :doc:resolution_enhancements, :doc:samplers

Train a super-resolution model

Workflow orchestration

:doc:pipeline

Chain steps into a reproducible recipe

Module index#

Entry-point trio#

These three modules are where every new user starts.

  • :doc:root — The top-level gunz_cm namespace. load_cm_data, ContactMatrix, Balancing, Format, exceptions.

  • :doc:cli — Command-line interface. gunz-cm loaders, gunz-cm converters, gunz-cm recon.

  • :doc:loaders — File format loaders. .hic, .cool, .mcool, CSV, GZCM, PICKLE/NPY.

  • :doc:converters — Format conversion. HIC→COOL, COO text, GZCM v1/v2/v3, MEMMAP.

Preprocessing layer#

  • :doc:preprocs — Filter, downsample, transform contact matrices and 3D points.

  • :doc:normalization — KR, ICE, VC matrix balancing algorithms.

  • :doc:structs — Core data types (Region, Constant, ClosedInterval). Deprecated; use gunz_utils.

  • :doc:utils — Internal helpers. Use with care — these are not part of the stable public API.

Storage layer#

  • :doc:compressions — 6 codecs for GZCM v3 tile compression (BSC, CMC, CMC_ZSTD, ZSTD, BSCM_CMC variants).

  • :doc:io — Low-level GZCM readers/writers. GzcmReader, GzcmWriter, GzcmChunkedReader/Writer.

  • :doc:samplersSpatialBatchSampler for PyTorch DataLoader on genomic data.

  • :doc:datasets — PyTorch Dataset implementations (HiCSparseDataset, GzcmDataset).

Analysis layer#

  • :doc:metrics — Reconstruction metrics (Procrustes RMSE, R^2) and resolution-enhancement metrics (MSE, SSIM, PSNR).

  • :doc:pipeline — Workflow orchestration. Pipeline class, create_pipeline factory.

ML + UI layer#

  • :doc:reconstructions — 3D structure inference. Classical MDS, PO-MDS. GPU support via torch.

  • :doc:resolution_enhancements — Super-resolution datasets and transforms. Requires ren or ren-gpu extra.

  • :doc:visualizations — Display helpers (2D) and 3D structure plotters (Plotly WebGL).

How to use this guide#

If you are new to gunz-cm, start with :doc:root to understand the top-level API, then read the Entry-point trio in order. The first tutorial in :doc:../tutorials/index (Load HIC) is the recommended first hands-on exercise.

If you are migrating from cooler or scvi-tools, focus on :doc:loaders and :doc:converters — these cover the same surface area with different conventions.

If you are building a custom pipeline (e.g., a multi-sample analysis), read :doc:pipeline after the entry-point trio.

If you are debugging an error, check the Known issues section at the bottom of each module page first. Most pre-existing bugs are documented there with workarounds.

Where to go next#

  • :doc:../tutorials/index — Step-by-step tutorials (load → convert → visualize → random downsample)

  • :doc:../concepts — Core concepts: lazy loading, balanced vs raw counts, region syntax

  • :doc:../changelog — Version history and breaking changes

  • :doc:../gunz_cm — Full auto-generated API reference