Changelog#

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[2.8.0] - 2026-06-12#

Added#

  • Chromosome name normalization across the loaders API. Users can now pass region1="chr1" (UCSC) or region1="1" (Ensembl) regardless of whether the underlying file uses the chr prefix or not. The facade transparently normalizes the input to match the file’s actual chrom names before dispatching to the format-specific loader.

  • Design document at docs/design/specs/chrom-name-normalization.md with user stories, edge cases, and the 6 design decisions.

  • 21 new tests in tests/loaders/test_chrom_name_normalization.py covering the chr-swap, no-op, interval preservation, error cases, format pass-through, and end-to-end facade integration.

Fixed#

  • cool_loader.get_chrom_infos was failing on cooler 0.10.4 because it used cooler.is_mcool() which doesn’t exist in that version. Replaced with extension-based detection plus a fallback for single-resolution files. This was a pre-existing bug that surfaced during v2.8.0 development.

Notes#

  • Not normalized (out of scope per the design): chrM/MT/M (mitochondrial), chr1_KI270706v1_random (random contigs), GenBank accessions, case variants (Chr1chr1). Users should use get_chrom_infos(fpath) to see the actual file chrom names.

  • Round-trip behavior: ContactMatrix.chromosome1 returns the file’s actual chrom name (not the user’s input). The user’s form is not “normalized” on output — pass-through.

[2.7.0] - 2026-06-12#

Added#

  • Loaders API: region1 and region2 are now truly optional at all four layers (facade, format-specific loaders, CLI, library). Previously, the facade accepted None but the 4 format-specific loaders all required str, causing a confusing pydantic ValidationError when calling load_cm_data(file, resolution) without a region.

  • Per-format full-genome support matrix:

    • HIC: rejects region1=None with UnsupportedLoaderFeatureError (no safe full-genome API in hictkpy/hicstraw/straw)

    • COOLER: implements per-chromosome iteration with a memory warning if estimated load > 1 GB

    • CSV/COO/MCOO: rejects with UnsupportedLoaderFeatureError (format has no chromosome column)

    • GINTERACTIONS: implements full-genome by dropping the chr1 filter; ContactMatrix.chromosome1 = "ALL"

    • PICKLE/NPY/MEMMAP: no change (already optional before v2.7.0)

  • Facade-level normalization: region1="ALL" / "all" / "All" / "" are silently normalized to None. This prevents the confusing “chromosome ALL not found” error.

  • CLI dual-discovery helper: when a region1-related error occurs in load-data, to-coo, to-bigmat, or to-gzcm, the CLI now automatically prints available chromosomes and resolutions (via get_chrom_infos and get_resolutions) to help the user recover. Opt-out with --no-discovery.

  • 12 new tests in tests/loaders/test_region1_optional.py covering all per-format behaviors and the discovery helper.

Changed#

  • CLI surface: to-coo and to-gzcm changed from region1: str = typer.Argument(...) to region1: t.Optional[str] = typer.Option(None, ...). Existing scripts that pass region1 as a positional argument will need to switch to --region1.

  • docs/design/specs/loaders.md updated to document the v2.7.0 semantics and the format support matrix.

Notes#

  • NOT changed (deferred to a future release): chrom name normalization (chr1 vs 1). Needs separate design discussion per user feedback.

  • Pre-existing test failures (6 + 13) in compressions/, resolution_enhancements/, and visualizations/metrics/ are unchanged from v2.6.x.

2.6.3 - 2026-06-11#

Added#

  • CLI-level tests for all 6 gunz-cm converters subcommands (smoke test that catches import-chain regressions; would have caught the preprocs bug shipped in v2.6.2)

  • 6 CLI-level tests for gunz-cm converters hic2cool (regression coverage for the v2.6.0 migration): help, dry-run, single-resolution roundtrip, multi-resolution invocation, corrupt-file error message, nproc no-op behavior

  • Tests build their own synthetic .hic files at test time using hictkpy (no binary fixtures committed to the repo)

Fixed#

  • preprocs/__init__.py referenced infer_mat_shape_coo (which does not exist; only the private _infer_mat_shape_coo is defined). This broke the entire converters CLI (every subcommand failed at module-import time with ImportError). Caught by the new test_all_commands_help smoke test. The bug was introduced by commit 333a1fb.

[2.6.2] - 2026-06-11#

Skipped (released as part of v2.6.3 — see v2.6.3 above for the preprocs fix).

2.6.1 - 2026-06-10#

Changed#

  • Cleanup: removed 135 per-file __version__ declarations across the codebase. Kept only pyproject.toml and src/gunz_cm/__init__.py as canonical sources (the standard pkg.__version__ convention). The 134 deleted lines were unused dead code that had drifted to 6+ different version strings.

  • Added AGENTS.md rule: do not add per-file or per-module __version__; if a module-internal version is genuinely needed, document it explicitly in the docstring and ask first.

2.6.0 - 2026-06-10#

Added#

  • gunz-cm converters hic2cool rewritten using hictkpy + cooler.create

  • .hic v8 and v9 support (was v6-v7 only)

  • Clear error diagnostic for corrupt-footered .hic files (names package, version, bug #41, workaround)

  • Source and attribution docstrings in the converter module

  • GZCM format user guide and improved Sphinx configuration

Changed#

  • conv extra is now empty; hic2cool PyPI package is no longer required

  • hictkpy and cooler (both already hard deps) handle hic-to-cool conversion in-process

  • README overhaul with professional structure

Fixed#

  • Constants ROW_IDS_COLNAME, COL_IDS_COLNAME, COUNTS_COLNAME were used in loaders/utils.py but not exported from gunz_cm.consts; added local definitions via DataFrameSpecs to avoid circular import

  • Balancing type cast in CLI-to-API bridge (converters.py)

  • 4 loader tests that were silently skipped now have documented root causes

  • Handle partial tiles at boundary in _get_compressed_patch

Removed#

  • hic2cool from conv, all, all-gpu extras in pyproject.toml

2.5.0 - 2026-05-03#

Added#

  • GZCM v3 multi-codec compression with streaming decode (GzcmChunkedWriter, sparse matrix support)

  • BSC-CMC codec with diag_mode parameter

  • Comprehensive resolution benchmark across 5 codecs and 6 resolutions

  • Transform experiment documentation (bsc_cmc concluded as optimal)

  • Task tracking files for sprint management

Changed#

  • Adaptive tiling (no compression benefit found - exp6)

  • Multi-resolution provides no compression benefit (exp7)

  • Updated codec guide with 5kb benchmarks, resolution-dependent recommendations

  • GZCM format specs updated to v1/v2/v3

Fixed#

  • KR normalization formula corrected to (n/sum)^0.25

  • CLI warnings with proper stacklevel

Removed#

  • Deprecated GNZ loader (replaced by GZCM v1-v3 support)

2.4.0 - 2026-04-15#

Added#

  • GNZ v2 chunked loader with 41 passing tests

  • GNZ v2 and normalization performance benchmarks

  • GZCM v1-v3 support in converters (replacing GNZ converter)

  • CMC codec implementation

  • GNZ layout space-speed tradeoff benchmark

Changed#

  • Converters refactored to replace GNZ with GZCM support

Fixed#

  • rc_filters import path

2.3.0 - 2026-03-20#

Added#

  • Centralized masking logic

  • Points module refactoring

  • Python best practices guidelines

  • Ruff and MyPy CI/CD integration

  • Chunked loader verification tests

Changed#

  • Third-party code restructured into implementations/ and originals/

  • CLI migrated and expanded using Typer

Fixed#

  • Hanging tests skipped

  • Test import issues resolved

Removed#

  • Hanging tests that block CI

2.2.0 - 2026-03-01#

Added#

  • Reliability standards for data submodules

  • Standardized logging and exception hierarchy

  • Sphinx documentation with furo theme

  • Cloudflare deployment via GitHub Actions

  • Vectorized HiCSpector and Numba JIT preprocessing helpers

Changed#

  • Metrics, preprocs, and reconstructions refactored for reliability

Fixed#

  • Missing test dependencies breaking unittest discover

  • Exception handling in tests for newly integrated specific exceptions

2.1.0 - 2026-02-10#

Added#

  • Multi-chromosome parallelism

  • Modernized coo.py

  • Centromere fetcher from UCSC

  • Fully Sparse HiCSparseDataset with on-the-fly augmentation

Changed#

  • preprocs NameError and shape assumption fixes in points reconstruction

Fixed#

  • Test suite discovery failures due to missing optional dependencies

  • gunz-utils dependency error in CI

2.0.0 - 2026-01-15#

Added#

  • GEMINI.md guide with multi-backend and multi-data loading info

  • Initial project structure with loaders, preprocs, converters, metrics, reconstructions

  • Support for HIC, COOLER, CSV, MEMMAP formats

  • GPU-accelerated 3D reconstruction (MDS-based)

  • Resolution enhancement models

  • Pipeline architecture for workflow composition

  • GZCM v1/v2 format specifications

Changed#

  • Project restructured from gunz to gunz-cm