Tutorial: Load a real Micro-C dataset via GUNZ_CM_TUTORIAL_DATAThis tutorial walks through loading the canonical public mESCMicro-C chr19 dataset from 4DNucleome (accession4DNFI9GMP7J3) using the gunz_cm.loaders API.## Setupbashmkdir -p ~/gunz_cm_tutorial_datapython scripts/download_tutorial_data.py --name mESC_microc_chr19_50kb \    --target ~/gunz_cm_tutorial_dataexport GUNZ_CM_TUTORIAL_DATA=~/gunz_cm_tutorial_data## Learning Objectives* Resolve a multi-resolution .mcool file via the helper.* Inspect the available zoom levels (get_resolutions).* Load at the finest 50 kb resolution on chr19 with KR balancing.#

import sys
from pathlib import Path

from gunz_cm.loaders import load_cm_data, get_resolutions
from gunz_cm.consts import Balancing, DataStructure

_probe = Path.cwd().resolve()
_root = None
while _probe != _probe.parent:
    if (_probe / "notebooks" / "_tutorial_data.py").is_file() and (_probe / "pyproject.toml").is_file():
        _root = _probe
        break
    _probe = _probe.parent
if _root is None:
    raise RuntimeError(f"cannot find gunz-cm repo root from cwd {Path.cwd()}")
sys.path.insert(0, str(_root / "notebooks"))

from _tutorial_data import load_tutorial_dataset, TutorialDataError

try:
    mcool_path = load_tutorial_dataset("mESC_microc_chr19_50kb")
    print(f"Using real dataset: {mcool_path}")
except TutorialDataError as exc:
    print(f"Skip: {exc}")
    mcool_path = None

Skip: dataset 'mESC_microc_chr19_50kb' expected at /home/adhisant/gunz_cm_tutorial_data/4DNFI9GMP7J3.mcool but file is missing; run scripts/download_tutorial_data.py --name mESC_microc_chr19_50kb

1. List the available zoom levels#

if mcool_path is None:
    print("Skipping: dataset not downloaded")
else:
    resolutions = sorted(get_resolutions(mcool_path))
    print(f"Zoom levels: {resolutions}")
    print(f"Finest: {min(resolutions):,} bp; Coarsest: {max(resolutions):,} bp; count={len(resolutions)}")

Skipping: dataset not downloaded

2. Load the 50 kb KR-balanced chr19 contact matrix#

if mcool_path is None:
    print("Skipping: dataset not downloaded")
else:
    cm_df = load_cm_data(
        fpath=mcool_path,
        bin_size_bp=50_000,
        region1="chr19",
        region2="chr19",
        balancing=Balancing.KR,
        output_format=DataStructure.COO,
    )
    print(f"Loaded: shape={cm_df.shape}, nnz={cm_df.nnz}")
    print(f"  count range: [{cm_df.data.min()}, {cm_df.data.max()}]")

Skipping: dataset not downloaded

Where to go next* Compare Micro-C to Hi-C on the same region (tutorial 01).* Convert to .gzcm v4 for downstream NN training.#