gunz_cm.datasets package

Submodules

gunz_cm.datasets.gnz module

Dataset for .gnz unified container.

Examples

class gunz_cm.datasets.gnz.GnzSparseDataset(fpath: str, window_size: int, output_type: str = 'sparse', downsample_ratio: float | Tuple[float, float] | None = None)[source]

Bases: Dataset

Class GnzSparseDataset.

Examples

Notes

gunz_cm.datasets.hic module

PyTorch Dataset implementation for Fully Sparse Hi-C data loading. Supports on-the-fly binomial downsampling and genomic window indexing.

Examples

class gunz_cm.datasets.hic.HiCSparseDataset(fpath: str, resolution: int, window_size: int, blacklist: DataFrame | None = None, downsample_ratio: float | Tuple[float, float] | None = None, balancing: Balancing | None = Balancing.NONE, output_type: str = 'sparse', **kwargs)[source]

Bases: Dataset

A PyTorch Dataset for on-the-fly loading of Hi-C patches from sparse files.

fpathstr

Path to the .hic or .mcool file.

resolutionint

The resolution to load.

window_sizeint

Size of the genomic window (patch) in BP.

indexpd.DataFrame

The binnified and filtered index of training windows.

downsample_ratiofloat or tuple, optional

Ratio for binomial subsampling. If tuple (min, max), a random ratio is sampled per item.

Examples

gunz_cm.datasets.hic.sparse_collate_fn(batch: List[Dict[str, Any]]) Dict[str, Any][source]

Collate function for Sparse Tensors (MinkowskiEngine style). Prepends batch index to coordinates.

Examples

gunz_cm.datasets.memmap module

PyTorch Dataset implementation for Memory-Mapped Hi-C data loading. Offers extreme throughput by bypassing decompression.

Examples

class gunz_cm.datasets.memmap.MemmapSparseDataset(fpath: str, window_size: int, downsample_ratio: float | Tuple[float, float] | None = None, balancing: Balancing | None = None, hic_path: str | None = None, output_type: str = 'sparse')[source]

Bases: Dataset

A PyTorch Dataset for ultra-fast loading of Hi-C patches from uncompressed memory-mapped files (.npdat).

fpathstr

Path to the .npdat file (or base path).

window_sizeint

Size of the genomic window (patch) in BP.

downsample_ratiofloat or tuple, optional

Ratio for binomial subsampling.

Examples

Module contents

class gunz_cm.datasets.HiCSparseDataset(fpath: str, resolution: int, window_size: int, blacklist: DataFrame | None = None, downsample_ratio: float | Tuple[float, float] | None = None, balancing: Balancing | None = Balancing.NONE, output_type: str = 'sparse', **kwargs)[source]

Bases: Dataset

A PyTorch Dataset for on-the-fly loading of Hi-C patches from sparse files.

fpathstr

Path to the .hic or .mcool file.

resolutionint

The resolution to load.

window_sizeint

Size of the genomic window (patch) in BP.

indexpd.DataFrame

The binnified and filtered index of training windows.

downsample_ratiofloat or tuple, optional

Ratio for binomial subsampling. If tuple (min, max), a random ratio is sampled per item.

Examples

gunz_cm.datasets.sparse_collate_fn(batch: List[Dict[str, Any]]) Dict[str, Any][source]

Collate function for Sparse Tensors (MinkowskiEngine style). Prepends batch index to coordinates.

Examples