gunz_cm.compressions#

Submodules#

Compression codecs for GZCM v3 contact matrix tiles.

Benchmarks (GM12878 chr1 @ 50kb, tile_size=512, window=1Mb):

(*) cmc_zstd offers the best balance of compression ratio and convert speed.

bsc_cmc: CMC transforms (binarization, diagonal transform) + BSC entropy coding. Same compression as CMC with faster access. Best overall codec for storage-constrained use.

Examples

>>> from gunz_cm.compressions import CmcZstdEncoder, CmcZstdDecoder
>>> encoder = CmcZstdEncoder(tile_size=512)
>>> encoded = encoder.encode_tile(tile_data)
>>> decoder = CmcZstdDecoder(tile_size=512)
>>> decoded = decoder.decode_tile(encoded)
class gunz_cm.compressions.BscCmcDecoder(tile_size: int = 512, resolution: int = 50000, dtype: ~numpy.dtype = <class 'numpy.uint32'>, diag_mode: int = 0)[source]#

Bases: object

BSC + CMC Transforms decoder for contact matrix tiles.

Decodes BSC-compressed data that was encoded with CMC transforms. Reverses BSC entropy coding then CMC’s domain-specific transforms.

Parameters:
  • tile_size (int, default=512) – Tile size for block processing.

  • resolution (int, default=50000) – Hi-C resolution in bp.

  • dtype (np.dtype, default=np.uint32) – Data type for decoded tiles.

Examples

decode_tile(payload: bytes) ndarray[source]#

Decode a single compressed tile.

Parameters:

payload (bytes) – Compressed bitstream (shape info + encoded data).

Returns:

Decoded contact matrix tile.

Return type:

np.ndarray

Examples

decode_tiles(payloads: list[bytes]) ndarray[source]#

Decode multiple tiles into a 4D array.

Parameters:

payloads (list[bytes]) – List of encoded bitstreams.

Returns:

4D array of decoded tiles (n_tile_rows, n_tile_cols, tile_size, tile_size).

Return type:

np.ndarray

Examples

class gunz_cm.compressions.BscCmcEncoder(tile_size: int = 512, resolution: int = 50000, level: int = 3, diag_mode: int = 0)[source]#

Bases: object

BSC + CMC Transforms encoder for contact matrix tiles.

Applies CMC’s domain-specific transforms (diagonal transform, binarization) before BSC entropy coding. Combines BSC’s speed with CMC’s structured transforms.

Parameters:
  • tile_size (int, default=512) – Tile size for block processing.

  • resolution (int, default=50000) – Hi-C resolution in bp.

  • level (int, default=3) – BSC compression level (0-9, higher = better compression).

Examples

encode_tile(mat: ndarray) bytes[source]#

Encode a single contact matrix tile.

Parameters:

mat (np.ndarray) – 2D contact matrix tile (upper triangular).

Returns:

Compressed bitstream (shape info + encoded data).

Return type:

bytes

Examples

encode_tiles(tiles: ndarray) list[bytes][source]#

Encode multiple tiles.

Parameters:

tiles (np.ndarray) – 4D array of shape (n_tile_rows, n_tile_cols, tile_size, tile_size).

Returns:

List of encoded bitstreams, one per tile.

Return type:

list[bytes]

Examples

get_compression_info() dict[source]#

Return compression metadata.

Returns:

Compression parameters for header.

Return type:

dict

Examples

class gunz_cm.compressions.BscDecoder(tile_size: int = 512, resolution: int = 50000, dtype: ~numpy.dtype = <class 'numpy.uint32'>)[source]#

Bases: object

BSC decoder for contact matrix tiles.

Uses bsc CLI subprocess for true BSC (Block Sorting Compression) decompression.

Parameters:
  • tile_size (int, default=512) – Tile size for block processing.

  • resolution (int, default=50000) – Hi-C resolution in bp.

  • dtype (np.dtype, default=np.uint32) – Data type for decoded tiles.

Examples

decode_tile(payload: bytes) ndarray[source]#

Decode a single BSC-compressed tile.

Parameters:

payload (bytes) – BSC-compressed bitstream.

Returns:

Decoded contact matrix tile.

Return type:

np.ndarray

Examples

decode_tiles(payloads: list[bytes]) ndarray[source]#

Decode multiple tiles into a 4D array.

Parameters:

payloads (list[bytes]) – List of encoded bitstreams.

Returns:

4D array of decoded tiles (n_tile_rows, n_tile_cols, tile_size, tile_size).

Return type:

np.ndarray

Examples

class gunz_cm.compressions.BscEncoder(tile_size: int = 512, resolution: int = 50000, level: int = 3)[source]#

Bases: object

BSC encoder for contact matrix tiles.

Uses bsc CLI subprocess for true BSC (Block Sorting Compression).

Parameters:
  • tile_size (int, default=512) – Tile size for block processing.

  • resolution (int, default=50000) – Hi-C resolution in bp.

  • level (int, default=3) – Compression level (0-9, higher = better compression).

Examples

encode_tile(mat: ndarray) bytes[source]#

Encode a single contact matrix tile.

Parameters:

mat (np.ndarray) – 2D contact matrix tile.

Returns:

BSC-compressed bitstream.

Return type:

bytes

Examples

encode_tiles(tiles: ndarray) list[bytes][source]#

Encode multiple tiles.

Parameters:

tiles (np.ndarray) – 4D array of shape (n_tile_rows, n_tile_cols, tile_size, tile_size).

Returns:

List of encoded bitstreams, one per tile.

Return type:

list[bytes]

Examples

get_compression_info() dict[source]#

Return compression metadata.

Returns:

Compression parameters for header.

Return type:

dict

Examples

class gunz_cm.compressions.CmcDecoder(tile_size: int = 256, resolution: int = 50000, diag_transform: bool = True)[source]#

Bases: object

CMC decoder for contact matrix tiles.

Parameters:
  • tile_size (int, default=256) – Tile size for block processing.

  • resolution (int, default=50000) – Hi-C resolution in bp.

  • diag_transform (bool, default=True) – Reverse diagonal transform after decoding.

Examples

decode_tile(payload: bytes) ndarray[source]#

Decode a single CMC-encoded tile.

Parameters:

payload (bytes) – CMC-encoded bitstream.

Returns:

Decoded contact matrix tile.

Return type:

np.ndarray

Examples

decode_tiles(payloads: list[bytes]) ndarray[source]#

Decode multiple tiles into a 4D array.

Parameters:

payloads (list[bytes]) – List of encoded bitstreams.

Returns:

4D array of decoded tiles (n_tile_rows, n_tile_cols, tile_size, tile_size).

Return type:

np.ndarray

Examples

class gunz_cm.compressions.CmcEncoder(tile_size: int = 256, resolution: int = 50000, diag_transform: bool = True)[source]#

Bases: object

CMC encoder for contact matrix tiles.

Parameters:
  • tile_size (int, default=256) – Tile size for block processing.

  • resolution (int, default=50000) – Hi-C resolution in bp.

  • diag_transform (bool, default=True) – Apply diagonal transform before encoding.

Examples

encode_tile(mat: ndarray) bytes[source]#

Encode a single contact matrix tile.

Parameters:

mat (np.ndarray) – 2D contact matrix tile (upper triangular).

Returns:

CMC-encoded bitstream.

Return type:

bytes

Examples

encode_tiles(tiles: ndarray) list[bytes][source]#

Encode multiple tiles.

Parameters:

tiles (np.ndarray) – 4D array of shape (n_tile_rows, n_tile_cols, tile_size, tile_size).

Returns:

List of encoded bitstreams, one per tile.

Return type:

list[bytes]

Examples

get_compression_info() dict[source]#

Return compression metadata.

Returns:

Compression parameters for header.

Return type:

dict

Examples

class gunz_cm.compressions.CmcZstdDecoder(tile_size: int = 256, resolution: int = 50000, dtype: ~numpy.dtype = <class 'numpy.uint32'>)[source]#

Bases: object

CMC Transforms + Zstd decoder for contact matrix tiles.

Uses Zstd decompression then reverses CMC’s domain-specific transforms.

Parameters:
  • tile_size (int, default=256) – Tile size for block processing.

  • resolution (int, default=50000) – Hi-C resolution in bp.

  • dtype (np.dtype, default=np.uint32) – Data type for decoded tiles.

Examples

decode_tile(payload: bytes) ndarray[source]#

Decode a single compressed tile.

Parameters:

payload (bytes) – Compressed bitstream (shape info + encoded data).

Returns:

Decoded contact matrix tile.

Return type:

np.ndarray

Examples

decode_tiles(payloads: list[bytes]) ndarray[source]#

Decode multiple tiles into a 4D array.

Parameters:

payloads (list[bytes]) – List of encoded bitstreams.

Returns:

4D array of decoded tiles (n_tile_rows, n_tile_cols, tile_size, tile_size).

Return type:

np.ndarray

Examples

class gunz_cm.compressions.CmcZstdEncoder(tile_size: int = 256, resolution: int = 50000, level: int = 3)[source]#

Bases: object

CMC Transforms + Zstd encoder for contact matrix tiles.

Uses CMC’s domain-specific transforms (diagonal transform, binarization) with Zstd entropy coding for better compression and faster decode.

Parameters:
  • tile_size (int, default=256) – Tile size for block processing.

  • resolution (int, default=50000) – Hi-C resolution in bp.

  • level (int, default=3) – Compression level (1-22 for zstd, 1-9 for zlib fallback).

Examples

encode_tile(mat: ndarray) bytes[source]#

Encode a single contact matrix tile.

Parameters:

mat (np.ndarray) – 2D contact matrix tile (upper triangular).

Returns:

Compressed bitstream (shape info + encoded data).

Return type:

bytes

Examples

encode_tiles(tiles: ndarray) list[bytes][source]#

Encode multiple tiles.

Parameters:

tiles (np.ndarray) – 4D array of shape (n_tile_rows, n_tile_cols, tile_size, tile_size).

Returns:

List of encoded bitstreams, one per tile.

Return type:

list[bytes]

Examples

get_compression_info() dict[source]#

Return compression metadata.

Returns:

Compression parameters for header.

Return type:

dict

Examples

class gunz_cm.compressions.ZstdDecoder(tile_size: int = 256, resolution: int = 50000, dtype: ~numpy.dtype = <class 'numpy.uint32'>, use_zstd: bool = True)[source]#

Bases: object

Zstd decoder for contact matrix tiles.

Parameters:
  • tile_size (int, default=256) – Tile size for block processing.

  • resolution (int, default=50000) – Hi-C resolution in bp.

  • dtype (np.dtype, default=np.uint32) – Data type for decoded tiles.

  • use_zstd (bool, default=True) – Use zstd if available, otherwise zlib fallback.

Examples

decode_tile(payload: bytes) ndarray[source]#

Decode a single compressed tile.

Parameters:

payload (bytes) – Compressed bitstream.

Returns:

Decoded contact matrix tile.

Return type:

np.ndarray

Examples

decode_tiles(payloads: list[bytes]) ndarray[source]#

Decode multiple tiles into a 4D array.

Parameters:

payloads (list[bytes]) – List of encoded bitstreams.

Returns:

4D array of decoded tiles (n_tile_rows, n_tile_cols, tile_size, tile_size).

Return type:

np.ndarray

Examples

class gunz_cm.compressions.ZstdEncoder(tile_size: int = 256, resolution: int = 50000, level: int = 3, use_zstd: bool = True)[source]#

Bases: object

Zstd encoder for contact matrix tiles.

Parameters:
  • tile_size (int, default=256) – Tile size for block processing.

  • resolution (int, default=50000) – Hi-C resolution in bp.

  • level (int, default=3) – Compression level (1-22 for zstd, 1-9 for zlib fallback).

  • use_zstd (bool, default=True) – Use zstd if available, otherwise zlib fallback.

Examples

encode_tile(mat: ndarray) bytes[source]#

Encode a single contact matrix tile.

Parameters:

mat (np.ndarray) – 2D contact matrix tile.

Returns:

Compressed bitstream.

Return type:

bytes

Examples

encode_tiles(tiles: ndarray) list[bytes][source]#

Encode multiple tiles.

Parameters:

tiles (np.ndarray) – 4D array of shape (n_tile_rows, n_tile_cols, tile_size, tile_size).

Returns:

List of encoded bitstreams, one per tile.

Return type:

list[bytes]

Examples

get_compression_info() dict[source]#

Return compression metadata.

Returns:

Compression parameters for header.

Return type:

dict

Examples