gunz_cm.samplers package

Submodules

gunz_cm.samplers.spatial module

Samplers for spatial data locality optimization.

Examples

class gunz_cm.samplers.spatial.SpatialBatchSampler(dataset_index: DataFrame, batch_size: int, block_size: int = 128, shuffle: bool = True, drop_last: bool = False)[source]

Bases: Sampler[List[int]]

A BatchSampler that yields mini-batches ordered by spatial proximity to maximize cache hits in compressed files.

To maintain randomness for SGD, it performs ‘Block Shuffling’: 1. Sorts the dataset spatially. 2. Groups indices into ‘mega-blocks’ (e.g., 50-100 samples). 3. Shuffles the order of mega-blocks. 4. Yields sequential mini-batches from within each mega-block.

Examples

Module contents

class gunz_cm.samplers.SpatialBatchSampler(dataset_index: DataFrame, batch_size: int, block_size: int = 128, shuffle: bool = True, drop_last: bool = False)[source]

Bases: Sampler[List[int]]

A BatchSampler that yields mini-batches ordered by spatial proximity to maximize cache hits in compressed files.

To maintain randomness for SGD, it performs ‘Block Shuffling’: 1. Sorts the dataset spatially. 2. Groups indices into ‘mega-blocks’ (e.g., 50-100 samples). 3. Shuffles the order of mega-blocks. 4. Yields sequential mini-batches from within each mega-block.

Examples