gunz_cm.datasets package
Submodules
gunz_cm.datasets.gnz module
Dataset for .gnz unified container.
Examples
gunz_cm.datasets.hic module
PyTorch Dataset implementation for Fully Sparse Hi-C data loading. Supports on-the-fly binomial downsampling and genomic window indexing.
Examples
- class gunz_cm.datasets.hic.HiCSparseDataset(fpath: str, resolution: int, window_size: int, blacklist: DataFrame | None = None, downsample_ratio: float | Tuple[float, float] | None = None, balancing: Balancing | None = Balancing.NONE, output_type: str = 'sparse', **kwargs)[source]
Bases:
DatasetA PyTorch Dataset for on-the-fly loading of Hi-C patches from sparse files.
- fpathstr
Path to the .hic or .mcool file.
- resolutionint
The resolution to load.
- window_sizeint
Size of the genomic window (patch) in BP.
- indexpd.DataFrame
The binnified and filtered index of training windows.
- downsample_ratiofloat or tuple, optional
Ratio for binomial subsampling. If tuple (min, max), a random ratio is sampled per item.
Examples
gunz_cm.datasets.memmap module
PyTorch Dataset implementation for Memory-Mapped Hi-C data loading. Offers extreme throughput by bypassing decompression.
Examples
- class gunz_cm.datasets.memmap.MemmapSparseDataset(fpath: str, window_size: int, downsample_ratio: float | Tuple[float, float] | None = None, balancing: Balancing | None = None, hic_path: str | None = None, output_type: str = 'sparse')[source]
Bases:
DatasetA PyTorch Dataset for ultra-fast loading of Hi-C patches from uncompressed memory-mapped files (.npdat).
- fpathstr
Path to the .npdat file (or base path).
- window_sizeint
Size of the genomic window (patch) in BP.
- downsample_ratiofloat or tuple, optional
Ratio for binomial subsampling.
Examples
Module contents
- class gunz_cm.datasets.HiCSparseDataset(fpath: str, resolution: int, window_size: int, blacklist: DataFrame | None = None, downsample_ratio: float | Tuple[float, float] | None = None, balancing: Balancing | None = Balancing.NONE, output_type: str = 'sparse', **kwargs)[source]
Bases:
DatasetA PyTorch Dataset for on-the-fly loading of Hi-C patches from sparse files.
- fpathstr
Path to the .hic or .mcool file.
- resolutionint
The resolution to load.
- window_sizeint
Size of the genomic window (patch) in BP.
- indexpd.DataFrame
The binnified and filtered index of training windows.
- downsample_ratiofloat or tuple, optional
Ratio for binomial subsampling. If tuple (min, max), a random ratio is sampled per item.
Examples