Tutorials#

Step-by-step Jupyter-notebook-style tutorials covering the gunz-cm processing pipeline: load → convert → visualize → random downsample.

The tutorials live as Jupyter notebooks in notebooks/ at the root of the gunz-cm repository. The pages below are generated from those notebooks at build time (via nbconvert --to markdown --execute) so the rendered HTML always shows fresh outputs.

.. toctree:: :maxdepth: 1

generated/tutorial_load_hic generated/tutorial_load_cooler generated/tutorial_convert generated/tutorial_visualize generated/tutorial_filter_normalize generated/tutorial_balance_kr generated/tutorial_downsample

To run the tutorials locally with fresh outputs:

.. code-block:: bash

git clone https://github.com/sXperfect/gunz-cm.git cd gunz-cm mamba activate gunz_cm jupyter lab notebooks/

The pipeline#

The four pipeline stages map to four pages:

  1. :doc:load — read Hi-C data from any format (HIC, COOL/MCOOL, CSV/COO, GZCM, PICKLE/NPY) into a unified ContactMatrix

  2. :doc:convert — write to COO, GZCM v1/v2/v3, or MEMMAP for archival or downstream tooling

  3. :doc:filter_normalize — remove bad bins, then balance with KR (the standard preprocessing pipeline)

  4. :doc:balance_kr — focused tutorial on KR / ICE balancing, with before/after heatmaps

  5. :doc:visualize — display the contact matrix with display_matrix and display_triangle

  6. :doc:random_downsample — read-subsampling via gunz_cm.preprocs.rand_downsample (stochastic subsampling of the contact matrix for super-resolution or noise reduction)

Note: a single tutorial can be loaded by its slug, e.g. docs/source/tutorials/load.md.

What you need#

  1. Python 3.11+ (gunz-cm does not support earlier versions)

  2. gunz-cm installed in a virtual environment — see :doc:installation for setup instructions

  3. The 12+ tutorial notebooks in notebooks/ (cloned from the repository)

  4. (Optional) Sample data — every tutorial generates synthetic data inline, so you can run them without real Hi-C files

Why the tutorials are pre-rendered#

The MyST Notebook ecosystem (myst_parser + myst_nb) has a known collision documented in commit da827f6: both extensions try to register the same config value, and no version combination resolves it. We work around this by:

  1. Executing the tutorials locally with jupyter nbconvert --execute (which captures current cell outputs)

  2. Converting to MyST Markdown via nbconvert --to markdown

  3. Rendering the static .md in Sphinx via the standard myst_parser

This guarantees the tutorials show on the website with the right code and the right outputs, even without myst_nb.

Pre-existing bugs documented in the tutorials#

The tutorials work around the following pre-existing gunz-cm bugs (per the CHANGELOG):

  • get_resolutions raises LoaderError for single-resolution .cool files

  • get_balancing facade has an AttributeError for COOLER format

  • Balancing.VC enum value exists but no vc_normalize function ships

  • scipy.spatial.distance.mdscale was removed in SciPy 1.16 (use sklearn.manifold.MDS instead)

  • GZCM v2: padded to block_size; original_shape not stored

  • GZCM v3 + CMC: upper-triangular only (lossy in lower)

  • NPY loader returns raw ndarray (not ContactMatrix)

  • CSV loader expects base-pair coordinates (not bin IDs)

  • display_contact_map and display_compartment_map are dead stubs in __init__.py

Where to go next#

  • :doc:installation — full installation guide

  • :doc:quickstart — minimal 5-minute working example

  • :doc:concepts — mental model behind the library

  • :doc:modules — full API reference