Getting Started with Gunz-CM#

Welcome to Gunz-CM, a high-performance Python library for loading, manipulating, and analyzing genomic contact matrices from Hi-C and chromatin conformation capture experiments.

This page helps you pick the right starting point based on what you want to do. Each workflow links to a step-by-step tutorial in notebooks/ that you can run locally with Jupyter.

Three common starting points#

1. “I have a .hic / .cool file and want a matrix”#

If you’re starting from raw Hi-C data, the loaders API gives you a single entry point for all formats. The unified load_cm_data(...) function dispatches to the right format-specific loader based on file extension.

Tutorials (in notebooks/, run with jupyter lab):

  • tutorial_load_hic.ipynb — load HIC/COOL files; understand region selection; covers the v2.7.0 region1-optional and v2.8.0 chrom name normalization features

  • tutorial_load_cooler.ipynb — multi-resolution .mcool workflows; iterate over all zoom levels

  • tutorial_load_csv_coo.ipynb — sparse CSV/COO text files (with the base-pair vs bin-id gotcha documented)

  • tutorial_load_gzcm.ipynb — GZCM v1/v2/v3 read+write

  • tutorial_load_pickle_npy.ipynb — pickled DataFrames and raw NumPy arrays

Quick code sketch:

from gunz_cm.loaders import load_cm_data, get_resolutions, get_chrom_infos
from gunz_cm.consts import DataStructure

# 1. Inspect the file
print(get_resolutions("data/sample.mcool"))   # [1000, 5000, 10000, ...]
print(get_chrom_infos("data/sample.mcool"))    # {'chr1': {...}, ...}

# 2. Load a region
cm = load_cm_data(
    fpath="data/sample.mcool",
    resolution=10_000,           # 10kb
    region1="chr1",
    region2="chr1",
    output_format=DataStructure.COO,  # sparse scipy matrix
)

2. “I want to normalize / filter / process the matrix”#

Preprocessing in gunz-cm is a pipeline of functional transformations: load → filter (remove bad bins) → balance (KR/ICE) → analyze. Each step has a dedicated module.

Tutorials (in notebooks/):

  • tutorial_filter_normalize.ipynb — filter empty bins, then KR-normalize

  • tutorial_balance_kr.ipynb — KR and ICE balancing in detail, with before/after heatmaps

  • tutorial_downsample.ipynb — manual binning for resolution enhancement

  • tutorial_multi_resolution.ipynb — compare a contact matrix at multiple resolutions

3. “I want to do 3D structure reconstruction or compartment analysis”#

Downstream analysis of contact matrices includes 3D chromosome reconstruction (MDS-based methods) and A/B compartment calling. The reconstructions and visualizations modules provide the core functions, plus quality-assessment metrics.

Tutorials (in notebooks/):

  • tutorial_3d_reconstruction.ipynb — basic 3D reconstruction from a contact matrix (uses sklearn.MDS since scipy.mdscale was removed)

  • tutorial_3d_quality_check.ipynb — Procrustes alignment, RMSE, and shape similarity between two reconstructed structures

  • tutorial_compartments.ipynb — A/B compartment analysis and PC1 computation

What you need to get started#

Before running the tutorials, make sure you have:

  1. Python 3.11+ (gunz-cm does not support earlier versions)

  2. gunz-cm installed in a virtual environment — see :doc:installation for setup instructions

  3. Jupyter to run the tutorial notebooks:

    pip install jupyter
    
  4. (Optional) Sample data — most tutorials generate synthetic data inline, so you can run them without any real Hi-C files. To use your own data, see :doc:quickstart for the file format requirements.

Running a tutorial#

The tutorials live in notebooks/ at the root of the gunz-cm repository. They are not rendered on the website (a known ecosystem limitation with myst_parser + myst_nb — see docs/MANUAL_UPLOAD.md for details). To run one:

# Clone the repository (if you haven't already)
git clone https://github.com/sXperfect/gunz-cm.git
cd gunz-cm

# Make sure your virtual environment is active
mamba activate gunz_cm

# Launch Jupyter
jupyter lab notebooks/

Then open any tutorial_*.ipynb file and run the cells. Each tutorial is self-contained: it generates its own synthetic data and runs end-to-end in 1-3 minutes.

You can also browse the tutorials on GitHub without running them locally: https://github.com/sXperfect/gunz-cm/tree/main/notebooks

Where to go next#

  • :doc:installation — full installation guide with mamba/venv options

  • :doc:quickstart — minimal 5-minute working example

  • :doc:concepts — mental model behind the library (unified ContactMatrix facade, lazy loading, preprocessing pipeline)

  • :doc:modules — full API reference (autodoc-generated from docstrings)

Need help?#

  • GitHub issues: https://github.com/sXperfect/gunz-cm/issues

  • Live docs: https://gunz-cm.pages.dev/

  • Source: https://github.com/sXperfect/gunz-cm