gunz_cm.loaders package
Submodules
gunz_cm.loaders.cool_loader module
- gunz_cm.loaders.cool_loader.get_assembly(fpath: str | Path, resolution: int) str[source]
Retrieve the genome assembly information from a cooler file.
- fpathstr | pathlib.Path
File path of the cooler file.
- resolutionint
Resolution level to query from the file.
- str
The genome assembly name (e.g., “hg38”).
Examples
- gunz_cm.loaders.cool_loader.get_balancing_weights(fpath: str | Path, resolution: int, region: str, weight_name: str = 'weight') ndarray[source]
Extracts a vector of balancing weights for a given region.
- fpathstr | pathlib.Path
File path of the cooler file (.cool or .mcool).
- resolutionint
The resolution to use.
- regionstr
The chromosome or region to fetch balancing weights for (e.g., “chr1”).
- weight_namestr, optional
The name of the weight column in the bins table. Defaults to ‘weight’.
- np.ndarray
A NumPy array of balancing weights for the specified region. Returns an empty array if the weight column does not exist.
Examples
- gunz_cm.loaders.cool_loader.get_bins(fpath: str | Path, resolution: int) DataFrame[source]
Retrieve the binnified index from a cooler file.
- fpathstr | pathlib.Path
Path to the cooler file.
- resolutionint
The resolution to use.
- pd.DataFrame
DataFrame with columns: ‘chrom’, ‘start’, ‘end’.
Examples
- gunz_cm.loaders.cool_loader.get_chrom_infos(fpath: str | Path) Dict[str, Dict[str, str | int]][source]
Extract chromosome information from a cooler file.
For .mcool files, it reads information from the lowest resolution (largest number).
- fpathstr | pathlib.Path
The file path to the cooler file (.cool or .mcool).
- t.Dict[str, t.Dict[str, t.Union[str, int]]]
A dictionary mapping chromosome names to their details.
Examples
- gunz_cm.loaders.cool_loader.get_resolutions(fpath: str | Path) List[int][source]
Retrieve available resolutions from a .mcool file.
- fpathstr | pathlib.Path
The file path to the multi-resolution cooler file (.mcool).
- t.List[int]
A list of integer resolutions available in the file.
- FileNotFoundError
If the specified fpath does not exist.
- LoaderError
If no resolutions can be found in the file.
Examples
- gunz_cm.loaders.cool_loader.get_sparsity(fpath: str | Path, resolution: int, region1: str, region2: str | None = None) float[source]
Calculates the sparsity of a cooler matrix for given regions.
Examples
- gunz_cm.loaders.cool_loader.load_cooler(fpath: str | Path, resolution: int, region1: str, region2: str | None = None, balancing: Balancing | list[Balancing] | None = Balancing.NONE, output_format: DataStructure = DataStructure.DF, return_raw_counts: bool = False, backend: Backend = Backend.COOLER, chunksize: int | None = None) ContactMatrix[source]
Load contact matrix data from a cooler file lazily.
- fpathstr | pathlib.Path
Path to the cooler file (.cool or .mcool).
- resolutionint
The resolution level to load from the file.
- region1str
First genomic region (e.g., “chr1” or “chr1:1,000,000-2,000,000”).
- balancingBalancing, optional
Balancing method to apply. Defaults to Balancing.NONE.
- region2str, optional
Second genomic region. Defaults to region1 for intra-chromosomal.
- output_formatDS, optional
The desired output format (DS.DF, DS.RCV, or DS.COO). Defaults to DS.DF.
- return_raw_countsbool, optional
Whether to return raw counts alongside balanced counts. Only supported for output_format=DS.DF. Defaults to False.
- backendBackend, optional
Select the underlying backend library for loading. Options: ‘cooler’ (default), ‘hictk’.
- chunksizeint | None, optional
If provided, the data will be loaded in chunks.
- ContactMatrix
A ContactMatrix object that can be used to load the data on demand.
Examples
gunz_cm.loaders.csv_loader module
- gunz_cm.loaders.csv_loader.load_csv(fpath: str | Path | BytesIO, region1: str, resolution: int, region2: str | None = None, balancing: Balancing | None = None, delimiter: str = '\\s+', encoding: str = 'utf-8', output_format: DataStructure = DataStructure.DF, column_names: List[str] | None = None) ContactMatrix[source]
Loads contact data from a CSV-like file path or buffer lazily.
- fpathstr | pathlib.Path | io.BytesIO
The file path or an in-memory byte stream to read from.
- region1str
The chromosome to load (e.g., “chr1”).
- resolutionint
The resolution (bin size) to apply to the coordinate columns.
- region2str | None, optional
The second region for inter-chromosomal data. Currently not supported.
- balancingBalancing | None, optional
The balancing method reflected in the data.
- delimiterstr, optional
The delimiter to use for parsing.
- encodingstr, optional
The character encoding of the file.
- output_formatDataStructure, optional
The desired output format.
- column_namesList[str], optional
Explicit column names.
Examples
gunz_cm.loaders.ginteractions_loader module
- gunz_cm.loaders.ginteractions_loader.load_ginteractions(fpath: str | Path | BytesIO, resolution: int, region1: str, region2: str | None = None, encoding: str = 'utf-8', output_format: DataStructure = DataStructure.DF, **kwargs) ContactMatrix[source]
Loads and processes data from a GInteractions-like tabular file lazily.
- fpathstr | pathlib.Path | io.BytesIO
The file path or an in-memory byte stream to read from.
- resolutionint
The resolution for binning the genomic coordinates.
- region1str
The first chromosome to include in the output (e.g., ‘chr1’).
- region2str | None, optional
The second chromosome. If None, it defaults to region1 for intra-chromosomal interactions.
- encodingstr, optional
The file encoding to use when reading the file.
- output_formatDataStructure, optional
The desired output format.
- **kwargs :
Catches extra keyword arguments.
Examples
gunz_cm.loaders.hic_loader module
Bases:
objectImmutable container for Hi-C file footer information.
Examples
- class gunz_cm.loaders.hic_loader.HiCHeader(version: int, master_index: int, genome: str, metadata: Dict[str, str], chromosomes: Dict[str, Dict[str, int]], resolutions: List[int])[source]
Bases:
objectImmutable container for Hi-C file header information.
Examples
- chromosomes: Dict[str, Dict[str, int]]
- genome: str
- master_index: int
- metadata: Dict[str, str]
- resolutions: List[int]
- version: int
- class gunz_cm.loaders.hic_loader.HiCMetadata(header: HiCHeader, footer: HiCFooter)[source]
Bases:
objectA structured container for all Hi-C file metadata.
Examples
- gunz_cm.loaders.hic_loader.get_balancing_methods(fpath: str | Path) list[str][source]
Retrieves the available balancing (normalization) methods.
Examples
- gunz_cm.loaders.hic_loader.get_bins(fpath: str | Path, resolution: int) DataFrame[source]
Retrieve the binnified index from a Hi-C file.
Examples
- gunz_cm.loaders.hic_loader.get_chrom_infos(fpath: str | Path, use_hicstraw: bool = False) Dict[str, Dict[str, int]][source]
Reads chromosome information from a Hi-C file.
Examples
- gunz_cm.loaders.hic_loader.get_resolutions(fpath: str | Path, use_hicstraw: bool = False) list[int][source]
Retrieves the available resolutions from a Hi-C file.
Examples
- gunz_cm.loaders.hic_loader.get_sparsity(fpath: str | Path, region1: str, resolution: int, region2: str | None = None) float[source]
Calculates the sparsity of a Hi-C contact matrix for a given region.
Examples
- gunz_cm.loaders.hic_loader.load_hic(fpath: str | Path, region1: str, resolution: int, region2: str | None = None, balancing: Balancing | list[Balancing] | None = Balancing.NONE, output_format: DataStructure = DataStructure.DF, return_raw_counts: bool = False, backend: Backend = Backend.STRAW, chunksize: int | None = None) ContactMatrix[source]
Loads contact matrix data from a .hic file lazily.
Examples
- gunz_cm.loaders.hic_loader.read_hic_metadata(fpath: str | Path, order: Literal['big', 'little'] = 'little') HiCMetadata[source]
Reads the complete header and footer metadata from a Hi-C file.
Examples
gunz_cm.loaders.memmap_loader module
- gunz_cm.loaders.memmap_loader.gen_memmap_fpaths(base_fpath: str | Path) tuple[Path, Path][source]
Generates paths for the binary data and JSON metadata files.
- base_fpathstr | pathlib.Path
The base path for the memmap, without an extension.
- tuple[pathlib.Path, pathlib.Path]
A tuple containing the path to the binary (.npdat) file and the metadata (.json) file.
Examples
gunz_cm.loaders.narrowpeaks module
- gunz_cm.loaders.narrowpeaks.get_chrom_infos(fpath: str | Path) dict[str, dict[str, str | None]][source]
Retrieves chromosome information from a narrowPeak file.
This function reads a narrowPeak file, extracts unique chromosome names, and initializes a dictionary with chromosome names as keys and their lengths set to None.
- fpathstr | pathlib.Path
The file path to the narrowPeak file.
- dict[str, dict[str, str | None]]
A dictionary with chromosome names as keys and their lengths set to None.
Examples
- gunz_cm.loaders.narrowpeaks.load_narrowpeak(fpath: str | Path, chromosome: str | None = None, resolution: int | None = None) DataFrame[source]
Reads and processes a narrowPeak file.
This function reads a narrowPeak file, assigns column names, converts data types, validates the data, and filters by a specified region if provided.
- fpathstr | pathlib.Path
The file path to the narrowPeak file.
- chromosomestr | None, optional
The chromosome region to filter by (default is None).
- resolutionint | None, optional
The resolution parameter. If provided, the start and end coordinates will be binned to this resolution.
- pd.DataFrame
A DataFrame containing the processed narrowPeak data.
Examples
gunz_cm.loaders.pickle_loader module
- gunz_cm.loaders.pickle_loader.load_pickle(fpath: str | Path, region1: str | None = None, resolution: int | None = None, region2: str | None = None, balancing: str | None = None, output_format: DataStructure = DataStructure.DF) ContactMatrix[source]
Loads a pickle file containing a contact matrix object lazily.
Examples
gunz_cm.loaders.utils module
- class gunz_cm.loaders.utils.ClosedInterval(start, end)
Bases:
tuple- end
Alias for field number 1
- start
Alias for field number 0
- class gunz_cm.loaders.utils.Constant[source]
Bases:
objectBind instance to name to get a unique object
Examples
- class gunz_cm.loaders.utils.Region(chromosome: int | str, region: Tuple[int, int] | Constant)[source]
Bases:
object- Represent a range of loci and interface with the textual UCSC style in the form ‘chr22:1,000,000-1,500,000’
Use static method Region.from_string to parse USCS string. Converting back to string will canonicalize
- chromosomeint or str
1-22 or string X/Y/M (use chromname() to get chrN string)
- region: ClosedInterval or the constant Region.ALL_LOCI
For ‘chr1:100-500’, region.start == 100 and region.end == 500
Parsing tries to be more lenient than the canonical form requires.
>>> str(Region.from_string('1:1,000-1,500')) == 'chr1:1000-1500' True
>>> str(Region.from_string('chry')) == 'chrY' True
Examples
- ALL_LOCI = <gunz_cm.loaders.utils.Constant object>
Module contents
This module provides a unified interface to parse various contact matrix file formats and load them into memory.
It acts as a facade, dispatching calls to the appropriate format-specific loader (e.g., for .hic, .cool, .csv) while providing a consistent API to the user.
- Functions:
load_cm_data: Load a contact matrix from a file into memory. get_chrom_infos: Query chromosome names and lengths from a file. get_resolutions: List the available resolutions in a file. get_balancing: List available balancing methods for a specific region.
- class gunz_cm.loaders.Balancing(value)[source]
Bases:
BaseStrEnumEnumeration for matrix balancing (normalization) methods.
Examples
- KR = 'KR'
- NONE = 'NONE'
- VC = 'VC'
- VC_SQRT = 'VC_SQRT'
- class gunz_cm.loaders.BpFrag(value)[source]
Bases:
BaseStrEnumEnumeration for binning units (Base Pairs vs. Fragments).
Examples
- BP = 'BP'
- FRAG = 'FRAG'
- class gunz_cm.loaders.Counts(value)[source]
Bases:
BaseStrEnumEnumeration for different types of interaction counts.
Examples
- EXPECTED = 'expected'
- OBSERVED = 'observed'
- OE = 'oe'
- class gunz_cm.loaders.DataStructure(value)[source]
Bases:
BaseStrEnumEnumeration for in-memory data representations.
Examples
- COO = 'coo'
- DF = 'df'
- RC = 'rc'
- RCV = 'rcv'
- class gunz_cm.loaders.Format(value)[source]
Bases:
BaseStrEnumEnumeration for supported file formats.
Uses BaseStrEnum for case-insensitivity and aliases.
Examples
- COO = 'coo'
- COOLER = 'cooler'
- CSV = 'csv'
- GINTERACTIONS = 'ginteractions'
- HIC = 'hic'
- MCOO = 'mcoo'
- MCSV = 'mcsv'
- MEMMAP = 'npdat'
- NPY = 'npy'
- PICKLE = 'pickle'
- TSV = 'tsv'
- class gunz_cm.loaders.GenomeBuild(value)[source]
Bases:
BaseStrEnumEnumeration for standard genome builds.
Examples
- HG19 = 'hg19'
- HG38 = 'hg38'
- MM10 = 'mm10'
- MM9 = 'mm9'
- class gunz_cm.loaders.Region(chromosome: int | str, region: Tuple[int, int] | Constant)[source]
Bases:
object- Represent a range of loci and interface with the textual UCSC style in the form ‘chr22:1,000,000-1,500,000’
Use static method Region.from_string to parse USCS string. Converting back to string will canonicalize
- chromosomeint or str
1-22 or string X/Y/M (use chromname() to get chrN string)
- region: ClosedInterval or the constant Region.ALL_LOCI
For ‘chr1:100-500’, region.start == 100 and region.end == 500
Parsing tries to be more lenient than the canonical form requires.
>>> str(Region.from_string('1:1,000-1,500')) == 'chr1:1000-1500' True
>>> str(Region.from_string('chry')) == 'chrY' True
Examples
- ALL_LOCI = <gunz_cm.loaders.utils.Constant object>
- gunz_cm.loaders.gen_memmap_fpaths(base_fpath: str | Path) tuple[Path, Path][source]
Generates paths for the binary data and JSON metadata files.
- base_fpathstr | pathlib.Path
The base path for the memmap, without an extension.
- tuple[pathlib.Path, pathlib.Path]
A tuple containing the path to the binary (.npdat) file and the metadata (.json) file.
Examples
- gunz_cm.loaders.get_balancing(fpath: str, resolution: int, chrom: str) list[str][source]
Gets available balancing methods for a region in a .hic or .cool file.
- Parameters:
fpath (str) – The path to the contact matrix file.
resolution (int) – The resolution of the contact matrix.
chrom (str) – The chromosome of interest (e.g., ‘chr1’).
- Returns:
A list of available balancing methods (e.g., [‘KR’, ‘VC_SQRT’]).
- Return type:
list[str]
- gunz_cm.loaders.get_bins(fpath: str | Path, resolution: int) DataFrame[source]
Gets the binnified index from a .hic or .cool file.
- Parameters:
fpath (str | pathlib.Path) – The path to the contact matrix file.
resolution (int) – The resolution to use for binnification.
- Returns:
A DataFrame with columns: ‘chrom’, ‘start’, ‘end’.
- Return type:
pd.DataFrame
- gunz_cm.loaders.get_chrom_infos(fpath: str) dict[str, int][source]
Queries chromosome names and lengths from a .hic or .cool file.
- Parameters:
fpath (str) – The path to the contact matrix file.
- Returns:
A mapping of chromosome names to their lengths.
- Return type:
dict[str, int]
- gunz_cm.loaders.get_resolutions(fpath: str) list[int][source]
Gets the resolutions available in a contact matrix file.
- Parameters:
fpath (str) – The path to the contact matrix file.
- Returns:
A list of available resolutions.
- Return type:
list[int]
- gunz_cm.loaders.is_file_standard_cm(fpath: str) bool[source]
Checks if the file is a standard contact matrix file format.
- gunz_cm.loaders.is_memmap_exists(base_fpath: str | Path) bool[source]
Checks if both the binary and metadata files for a memmap exist.
- base_fpathstr | pathlib.Path
The base path for the memmap to check.
- bool
True if both the .npdat and .json files exist, False otherwise.
Examples
- gunz_cm.loaders.load_cm_data(fpath: Path, resolution: int, region1: str | None = None, region2: str | None = None, balancing: Balancing | list[Balancing] | None = None, output_format: DataStructure = DataStructure.DF, fformat: Format | None = None, backend: Backend | None = None, use_fast_hic: bool = False, return_raw_counts: bool = False, **kwargs) DataFrame | tuple[ndarray, ...] | ndarray | tuple[Any, ...][source]
Loads contact matrix data from various file formats.
This function acts as a dispatcher, routing the call to the appropriate format-specific loader based on the file’s extension or the fformat argument.
- Parameters:
fpath (pathlib.Path) – Path to the contact matrix file.
resolution (int) – Resolution of the contact matrix to load.
region1 (str, optional) – First genomic region (e.g., ‘chr1’). Defaults to None.
region2 (str, optional) – Second genomic region. If None, loads intra-chromosomal data for region1. Defaults to None.
balancing (Balancing | list[Balancing], optional) – Balancing (normalization) method(s) to apply. Defaults to None.
out_datastructure (DataStructure, optional) – Desired output format (‘df’ or ‘coo’). Defaults to DataStructure.DF.
fformat (Format, optional) – Explicitly specify file format, otherwise inferred from extension. Defaults to None.
backend (Backend, optional) – Select the underlying backend library for loading. For COOLER: ‘cooler’, ‘hictk’. For HIC: ‘hicstraw’, ‘hictk’, ‘straw’. Defaults to None (uses standard backend for format).
use_fast_hic (bool, optional) – If True and file is .hic, use the faster fast_hic_loader. Equivalent to setting backend=’straw’. Defaults to False.
return_raw_counts (bool, optional) – If True, return raw counts alongside the primary (balanced) counts. Defaults to False.
**kwargs – Additional keyword arguments passed to the specific loader, (e.g., encoding for CSV files).
- Returns:
The loaded contact matrix data in the specified output format.
- Return type:
pd.DataFrame | tuple[np.ndarray, …] | np.ndarray | tuple[t.Any, …]
- Raises:
FormatError – If the file format is not recognized or supported, or if an invalid backend is selected for the format.
NotImplementedError – If return_raw_counts is True for unsupported formats.