gunz_cm.converters package
Submodules
gunz_cm.converters.coo module
Module for converting contact matrix data from standard formats (like .hic or .cool) into a tabular, sparse COO (Coordinate List) text format.
Examples
- gunz_cm.converters.coo.convert_all_intra_to_cm_coo(input_fpath: Path, output_dpath: Path, resolution: int, balancing: Balancing | None, overwrite: bool = False, res_to_one: bool = False, to_mcoo: bool = False, gen_pseudo_weights: bool = False, output_delimiter: str = '\t', columns_order: list[str] | None = None, n_jobs: int = 1) None[source]
Converts all intra-chromosomal matrices in a file to COO format.
This function iterates through all chromosomes found in the input file, extracts the intra-chromosomal contact matrix for each, and saves it as a separate COO file in the specified output directory.
- input_fpathpathlib.Path
Path to the input contact matrix file.
- output_dpathpathlib.Path
Directory where the output COO files will be saved.
- resolutionint
The resolution of the contact matrices.
- balancingBalancing, optional
The balancing method to apply.
- overwritebool, optional
If True, overwrite existing output files. Defaults to False.
- res_to_onebool, optional
If True, normalize bin coordinates. Defaults to False.
- to_mcoobool, optional
If True, convert to modified COO format. Defaults to False.
- gen_pseudo_weightsbool, optional
If True, generate corresponding .weights files. Defaults to False.
- output_delimiterstr, optional
Delimiter for the output files. Defaults to a tab.
- columns_orderlist[str], optional
The specific order of columns for the output files. Defaults to None.
- n_jobsint, optional
The number of jobs to run in parallel. Defaults to 1.
Examples
- gunz_cm.converters.coo.convert_to_cm_coo(input_fpath: Path, output_fpath: Path, region1: str, resolution: int, balancing: Balancing | None, region2: str | None = None, overwrite: bool = False, exist_ok: bool = False, res_to_one: bool = False, to_mcoo: bool = False, gen_pseudo_weights: bool = False, output_delimiter: str = '\t', columns_order: list[str] | None = None) None[source]
Converts contact matrix data to a COO format and saves it to a file.
This function loads data using the main loader, optionally creating a “modified COO” (mCOO) format with both raw and normalized counts, and saves the result to a specified text file.
- input_fpathpathlib.Path
Path to the input contact matrix file (e.g., .hic, .cool).
- output_fpathpathlib.Path
Path where the output COO text file will be saved.
- region1str
The identifier for the first region/chromosome.
- resolutionint
The resolution for binning the contact matrix.
- balancingBalancing, optional
The balancing method to apply. Required if to_mcoo is True.
- region2str, optional
The identifier for the second region, if applicable. Defaults to None.
- overwritebool, optional
If True, overwrite the output file if it exists. Defaults to False.
- exist_okbool, optional
If True, do nothing if the output file already exists. Defaults to False.
- res_to_onebool, optional
If True, normalize bin coordinates by the resolution. Defaults to False.
- to_mcoobool, optional
If True, create a modified COO with raw and normalized counts. Defaults to False.
- gen_pseudo_weightsbool, optional
If True, generate a corresponding .weights file. Defaults to False.
- output_delimiterstr, optional
The delimiter for the output text file. Defaults to a tab.
- columns_orderlist[str], optional
The specific order of columns for the output file. Defaults to None.
- FileExistsError
If the output file exists and neither overwrite nor exist_ok is True.
- ConverterError
If to_mcoo is True but balancing is not provided.
Examples
gunz_cm.converters.gnz module
Converter to .gnz unified container.
Examples
- gunz_cm.converters.gnz.convert_to_gnz(fpath: Path, output_fpath: Path, region1: str, resolution: int, balancing: Balancing | None = None, backend: Backend = Backend.HICTK, dtype: str = 'float32', overwrite: bool = False, layout: str = 'dense', block_size: int = 1024) None[source]
Converts a Hi-C file to a .gnz container with matrix and weights. Supports multiple layouts: dense, tiled, csr, block_sparse. Uses streaming/incremental writing to support high-resolution data.
Examples
gunz_cm.converters.memmap module
Module for converting various contact matrix formats into a memory-mapped (memmap) file for efficient, on-disk matrix operations.
Examples
- gunz_cm.converters.memmap.convert_to_memmap(data: Path | DataFrame | Tuple[_Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], ...], output_fpath: Path, **kwargs) None[source]
- gunz_cm.converters.memmap.convert_to_memmap(data: DataFrame, output_fpath: Path, **kwargs) None
- gunz_cm.converters.memmap.convert_to_memmap(data: Tuple[_Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]], output_fpath: Path, output_full_matrix: bool = True, dtype: type[Any] | dtype[Any] | _SupportsDType[dtype[Any]] | tuple[Any, Any] | list[Any] | _DTypeDict | str | None = None, shape: Tuple[int, int] | None = None, check_output: bool = True, overwrite: bool = False, metadata: Dict[str, Any] | None = None, **kwargs) None
- gunz_cm.converters.memmap.convert_to_memmap(data: Path, output_fpath: Path, region1: str, resolution: int, balancing: Balancing | None, **kwargs) None
Converts contact matrix data to a NumPy memory-mapped file (memmap).
This is a polymorphic function that dispatches to the appropriate implementation based on the type of the data argument.
- datapathlib.Path or pd.DataFrame or tuple
The input data to convert. Can be: - A path to a standard contact matrix file (.hic, .cool, etc.). - A pandas DataFrame in COO format. - A tuple of (rows, cols, values) arrays.
- output_fpathpathlib.Path
The base path for the output memmap file.
- **kwargs :
Additional arguments specific to the conversion type, such as resolution, balancing, output_full_matrix, etc.
Examples