Tutorials#
Step-by-step Jupyter-notebook-style tutorials covering the gunz-cm processing pipeline: load → convert → visualize → random downsample.
The tutorials live as Jupyter notebooks in notebooks/ at the root
of the gunz-cm repository. The pages below are generated from those
notebooks at build time (via nbconvert --to markdown --execute) so
the rendered HTML always shows fresh outputs.
.. toctree:: :maxdepth: 1
generated/tutorial_load_hic generated/tutorial_load_cooler generated/tutorial_convert generated/tutorial_visualize generated/tutorial_filter_normalize generated/tutorial_balance_kr generated/tutorial_downsample
To run the tutorials locally with fresh outputs:
.. code-block:: bash
git clone https://github.com/sXperfect/gunz-cm.git cd gunz-cm mamba activate gunz_cm jupyter lab notebooks/
The pipeline#
The four pipeline stages map to four pages:
:doc:
load— read Hi-C data from any format (HIC, COOL/MCOOL, CSV/COO, GZCM, PICKLE/NPY) into a unifiedContactMatrix:doc:
convert— write to COO, GZCM v1/v2/v3, or MEMMAP for archival or downstream tooling:doc:
filter_normalize— remove bad bins, then balance with KR (the standard preprocessing pipeline):doc:
balance_kr— focused tutorial on KR / ICE balancing, with before/after heatmaps:doc:
visualize— display the contact matrix withdisplay_matrixanddisplay_triangle:doc:
random_downsample— read-subsampling viagunz_cm.preprocs.rand_downsample(stochastic subsampling of the contact matrix for super-resolution or noise reduction)
Note: a single tutorial can be loaded by its slug, e.g.
docs/source/tutorials/load.md.
What you need#
Python 3.11+ (gunz-cm does not support earlier versions)
gunz-cm installed in a virtual environment — see :doc:
installationfor setup instructionsThe 12+ tutorial notebooks in
notebooks/(cloned from the repository)(Optional) Sample data — every tutorial generates synthetic data inline, so you can run them without real Hi-C files
Why the tutorials are pre-rendered#
The MyST Notebook ecosystem (myst_parser + myst_nb) has a known
collision documented in commit da827f6: both extensions try to
register the same config value, and no version combination resolves
it. We work around this by:
Executing the tutorials locally with
jupyter nbconvert --execute(which captures current cell outputs)Converting to MyST Markdown via
nbconvert --to markdownRendering the static .md in Sphinx via the standard
myst_parser
This guarantees the tutorials show on the website with the right code
and the right outputs, even without myst_nb.
Pre-existing bugs documented in the tutorials#
The tutorials work around the following pre-existing gunz-cm bugs (per the CHANGELOG):
get_resolutionsraisesLoaderErrorfor single-resolution .cool filesget_balancingfacade has anAttributeErrorfor COOLER formatBalancing.VCenum value exists but novc_normalizefunction shipsscipy.spatial.distance.mdscalewas removed in SciPy 1.16 (usesklearn.manifold.MDSinstead)GZCM v2: padded to
block_size; original_shape not storedGZCM v3 + CMC: upper-triangular only (lossy in lower)
NPY loader returns raw
ndarray(notContactMatrix)CSV loader expects base-pair coordinates (not bin IDs)
display_contact_mapanddisplay_compartment_mapare dead stubs in__init__.py
Where to go next#
:doc:
installation— full installation guide:doc:
quickstart— minimal 5-minute working example:doc:
concepts— mental model behind the library:doc:
modules— full API reference