Tutorial: Load a real Micro-C dataset via GUNZ_CM_TUTORIAL_DATAThis tutorial walks through loading the canonical public mESCMicro-C chr19 dataset from 4DNucleome (accession4DNFI9GMP7J3) using the gunz_cm.loaders API.## Setupbashmkdir -p ~/gunz_cm_tutorial_datapython scripts/download_tutorial_data.py --name mESC_microc_chr19_50kb \ --target ~/gunz_cm_tutorial_dataexport GUNZ_CM_TUTORIAL_DATA=~/gunz_cm_tutorial_data## Learning Objectives* Resolve a multi-resolution .mcool file via the helper.* Inspect the available zoom levels (get_resolutions).* Load at the finest 50 kb resolution on chr19 with KR balancing.#
import sys
from pathlib import Path
from gunz_cm.loaders import load_cm_data, get_resolutions
from gunz_cm.consts import Balancing, DataStructure
_probe = Path.cwd().resolve()
_root = None
while _probe != _probe.parent:
if (_probe / "notebooks" / "_tutorial_data.py").is_file() and (_probe / "pyproject.toml").is_file():
_root = _probe
break
_probe = _probe.parent
if _root is None:
raise RuntimeError(f"cannot find gunz-cm repo root from cwd {Path.cwd()}")
sys.path.insert(0, str(_root / "notebooks"))
from _tutorial_data import load_tutorial_dataset, TutorialDataError
try:
mcool_path = load_tutorial_dataset("mESC_microc_chr19_50kb")
print(f"Using real dataset: {mcool_path}")
except TutorialDataError as exc:
print(f"Skip: {exc}")
mcool_path = None
Skip: dataset 'mESC_microc_chr19_50kb' expected at /home/adhisant/gunz_cm_tutorial_data/4DNFI9GMP7J3.mcool but file is missing; run scripts/download_tutorial_data.py --name mESC_microc_chr19_50kb
1. List the available zoom levels#
if mcool_path is None:
print("Skipping: dataset not downloaded")
else:
resolutions = sorted(get_resolutions(mcool_path))
print(f"Zoom levels: {resolutions}")
print(f"Finest: {min(resolutions):,} bp; Coarsest: {max(resolutions):,} bp; count={len(resolutions)}")
Skipping: dataset not downloaded
2. Load the 50 kb KR-balanced chr19 contact matrix#
if mcool_path is None:
print("Skipping: dataset not downloaded")
else:
cm_df = load_cm_data(
fpath=mcool_path,
bin_size_bp=50_000,
region1="chr19",
region2="chr19",
balancing=Balancing.KR,
output_format=DataStructure.COO,
)
print(f"Loaded: shape={cm_df.shape}, nnz={cm_df.nnz}")
print(f" count range: [{cm_df.data.min()}, {cm_df.data.max()}]")
Skipping: dataset not downloaded