Changelog#
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[2.8.0] - 2026-06-12#
Added#
Chromosome name normalization across the loaders API. Users can now pass
region1="chr1"(UCSC) orregion1="1"(Ensembl) regardless of whether the underlying file uses thechrprefix or not. The facade transparently normalizes the input to match the file’s actual chrom names before dispatching to the format-specific loader.Design document at
docs/design/specs/chrom-name-normalization.mdwith user stories, edge cases, and the 6 design decisions.21 new tests in
tests/loaders/test_chrom_name_normalization.pycovering the chr-swap, no-op, interval preservation, error cases, format pass-through, and end-to-end facade integration.
Fixed#
cool_loader.get_chrom_infoswas failing on cooler 0.10.4 because it usedcooler.is_mcool()which doesn’t exist in that version. Replaced with extension-based detection plus a fallback for single-resolution files. This was a pre-existing bug that surfaced during v2.8.0 development.
Notes#
Not normalized (out of scope per the design):
chrM/MT/M(mitochondrial),chr1_KI270706v1_random(random contigs), GenBank accessions, case variants (Chr1≠chr1). Users should useget_chrom_infos(fpath)to see the actual file chrom names.Round-trip behavior:
ContactMatrix.chromosome1returns the file’s actual chrom name (not the user’s input). The user’s form is not “normalized” on output — pass-through.
[2.7.0] - 2026-06-12#
Added#
Loaders API:
region1andregion2are now truly optional at all four layers (facade, format-specific loaders, CLI, library). Previously, the facade acceptedNonebut the 4 format-specific loaders all requiredstr, causing a confusing pydanticValidationErrorwhen callingload_cm_data(file, resolution)without a region.Per-format full-genome support matrix:
HIC: rejects
region1=NonewithUnsupportedLoaderFeatureError(no safe full-genome API in hictkpy/hicstraw/straw)COOLER: implements per-chromosome iteration with a memory warning if estimated load > 1 GB
CSV/COO/MCOO: rejects with
UnsupportedLoaderFeatureError(format has no chromosome column)GINTERACTIONS: implements full-genome by dropping the chr1 filter;
ContactMatrix.chromosome1 = "ALL"PICKLE/NPY/MEMMAP: no change (already optional before v2.7.0)
Facade-level normalization:
region1="ALL"/"all"/"All"/""are silently normalized toNone. This prevents the confusing “chromosome ALL not found” error.CLI dual-discovery helper: when a
region1-related error occurs inload-data,to-coo,to-bigmat, orto-gzcm, the CLI now automatically prints available chromosomes and resolutions (viaget_chrom_infosandget_resolutions) to help the user recover. Opt-out with--no-discovery.12 new tests in
tests/loaders/test_region1_optional.pycovering all per-format behaviors and the discovery helper.
Changed#
CLI surface:
to-cooandto-gzcmchanged fromregion1: str = typer.Argument(...)toregion1: t.Optional[str] = typer.Option(None, ...). Existing scripts that passregion1as a positional argument will need to switch to--region1.docs/design/specs/loaders.mdupdated to document the v2.7.0 semantics and the format support matrix.
Notes#
NOT changed (deferred to a future release): chrom name normalization (chr1 vs 1). Needs separate design discussion per user feedback.
Pre-existing test failures (6 + 13) in
compressions/,resolution_enhancements/, andvisualizations/metrics/are unchanged from v2.6.x.
2.6.3 - 2026-06-11#
Added#
CLI-level tests for all 6
gunz-cm converterssubcommands (smoke test that catches import-chain regressions; would have caught the preprocs bug shipped in v2.6.2)6 CLI-level tests for
gunz-cm converters hic2cool(regression coverage for the v2.6.0 migration): help, dry-run, single-resolution roundtrip, multi-resolution invocation, corrupt-file error message, nproc no-op behaviorTests build their own synthetic
.hicfiles at test time usinghictkpy(no binary fixtures committed to the repo)
Fixed#
preprocs/__init__.pyreferencedinfer_mat_shape_coo(which does not exist; only the private_infer_mat_shape_coois defined). This broke the entire converters CLI (every subcommand failed at module-import time withImportError). Caught by the newtest_all_commands_helpsmoke test. The bug was introduced by commit333a1fb.
[2.6.2] - 2026-06-11#
Skipped (released as part of v2.6.3 — see v2.6.3 above for the preprocs fix).
2.6.1 - 2026-06-10#
Changed#
Cleanup: removed 135 per-file
__version__declarations across the codebase. Kept onlypyproject.tomlandsrc/gunz_cm/__init__.pyas canonical sources (the standardpkg.__version__convention). The 134 deleted lines were unused dead code that had drifted to 6+ different version strings.Added AGENTS.md rule: do not add per-file or per-module
__version__; if a module-internal version is genuinely needed, document it explicitly in the docstring and ask first.
2.6.0 - 2026-06-10#
Added#
gunz-cm converters hic2coolrewritten usinghictkpy+cooler.create.hic v8 and v9 support (was v6-v7 only)
Clear error diagnostic for corrupt-footered .hic files (names package, version, bug #41, workaround)
Source and attribution docstrings in the converter module
GZCM format user guide and improved Sphinx configuration
Changed#
convextra is now empty;hic2coolPyPI package is no longer requiredhictkpyandcooler(both already hard deps) handle hic-to-cool conversion in-processREADME overhaul with professional structure
Fixed#
Constants
ROW_IDS_COLNAME,COL_IDS_COLNAME,COUNTS_COLNAMEwere used inloaders/utils.pybut not exported fromgunz_cm.consts; added local definitions viaDataFrameSpecsto avoid circular importBalancingtype cast in CLI-to-API bridge (converters.py)4 loader tests that were silently skipped now have documented root causes
Handle partial tiles at boundary in
_get_compressed_patch
Removed#
hic2coolfromconv,all,all-gpuextras inpyproject.toml
2.5.0 - 2026-05-03#
Added#
GZCM v3 multi-codec compression with streaming decode (
GzcmChunkedWriter, sparse matrix support)BSC-CMC codec with
diag_modeparameterComprehensive resolution benchmark across 5 codecs and 6 resolutions
Transform experiment documentation (bsc_cmc concluded as optimal)
Task tracking files for sprint management
Changed#
Adaptive tiling (no compression benefit found - exp6)
Multi-resolution provides no compression benefit (exp7)
Updated codec guide with 5kb benchmarks, resolution-dependent recommendations
GZCM format specs updated to v1/v2/v3
Fixed#
KR normalization formula corrected to
(n/sum)^0.25CLI warnings with proper stacklevel
Removed#
Deprecated GNZ loader (replaced by GZCM v1-v3 support)
2.4.0 - 2026-04-15#
Added#
GNZ v2 chunked loader with 41 passing tests
GNZ v2 and normalization performance benchmarks
GZCM v1-v3 support in converters (replacing GNZ converter)
CMC codec implementation
GNZ layout space-speed tradeoff benchmark
Changed#
Converters refactored to replace GNZ with GZCM support
Fixed#
rc_filtersimport path
2.3.0 - 2026-03-20#
Added#
Centralized masking logic
Points module refactoring
Python best practices guidelines
Ruff and MyPy CI/CD integration
Chunked loader verification tests
Changed#
Third-party code restructured into
implementations/andoriginals/CLI migrated and expanded using Typer
Fixed#
Hanging tests skipped
Test import issues resolved
Removed#
Hanging tests that block CI
2.2.0 - 2026-03-01#
Added#
Reliability standards for data submodules
Standardized logging and exception hierarchy
Sphinx documentation with furo theme
Cloudflare deployment via GitHub Actions
Vectorized HiCSpector and Numba JIT preprocessing helpers
Changed#
Metrics, preprocs, and reconstructions refactored for reliability
Fixed#
Missing test dependencies breaking
unittest discoverException handling in tests for newly integrated specific exceptions
2.1.0 - 2026-02-10#
Added#
Multi-chromosome parallelism
Modernized
coo.pyCentromere fetcher from UCSC
Fully Sparse
HiCSparseDatasetwith on-the-fly augmentation
Changed#
preprocsNameError and shape assumption fixes in points reconstruction
Fixed#
Test suite discovery failures due to missing optional dependencies
gunz-utilsdependency error in CI
2.0.0 - 2026-01-15#
Added#
GEMINI.md guide with multi-backend and multi-data loading info
Initial project structure with loaders, preprocs, converters, metrics, reconstructions
Support for HIC, COOLER, CSV, MEMMAP formats
GPU-accelerated 3D reconstruction (MDS-based)
Resolution enhancement models
Pipeline architecture for workflow composition
GZCM v1/v2 format specifications
Changed#
Project restructured from
gunztogunz-cm