gunz_cm.converters package

Submodules

gunz_cm.converters.coo module

Module for converting contact matrix data from standard formats (like .hic or .cool) into a tabular, sparse COO (Coordinate List) text format.

Examples

gunz_cm.converters.coo.convert_all_intra_to_cm_coo(input_fpath: Path, output_dpath: Path, resolution: int, balancing: Balancing | None, overwrite: bool = False, res_to_one: bool = False, to_mcoo: bool = False, gen_pseudo_weights: bool = False, output_delimiter: str = '\t', columns_order: list[str] | None = None, n_jobs: int = 1) None[source]

Converts all intra-chromosomal matrices in a file to COO format.

This function iterates through all chromosomes found in the input file, extracts the intra-chromosomal contact matrix for each, and saves it as a separate COO file in the specified output directory.

input_fpathpathlib.Path

Path to the input contact matrix file.

output_dpathpathlib.Path

Directory where the output COO files will be saved.

resolutionint

The resolution of the contact matrices.

balancingBalancing, optional

The balancing method to apply.

overwritebool, optional

If True, overwrite existing output files. Defaults to False.

res_to_onebool, optional

If True, normalize bin coordinates. Defaults to False.

to_mcoobool, optional

If True, convert to modified COO format. Defaults to False.

gen_pseudo_weightsbool, optional

If True, generate corresponding .weights files. Defaults to False.

output_delimiterstr, optional

Delimiter for the output files. Defaults to a tab.

columns_orderlist[str], optional

The specific order of columns for the output files. Defaults to None.

n_jobsint, optional

The number of jobs to run in parallel. Defaults to 1.

Examples

gunz_cm.converters.coo.convert_to_cm_coo(input_fpath: Path, output_fpath: Path, region1: str, resolution: int, balancing: Balancing | None, region2: str | None = None, overwrite: bool = False, exist_ok: bool = False, res_to_one: bool = False, to_mcoo: bool = False, gen_pseudo_weights: bool = False, output_delimiter: str = '\t', columns_order: list[str] | None = None) None[source]

Converts contact matrix data to a COO format and saves it to a file.

This function loads data using the main loader, optionally creating a “modified COO” (mCOO) format with both raw and normalized counts, and saves the result to a specified text file.

input_fpathpathlib.Path

Path to the input contact matrix file (e.g., .hic, .cool).

output_fpathpathlib.Path

Path where the output COO text file will be saved.

region1str

The identifier for the first region/chromosome.

resolutionint

The resolution for binning the contact matrix.

balancingBalancing, optional

The balancing method to apply. Required if to_mcoo is True.

region2str, optional

The identifier for the second region, if applicable. Defaults to None.

overwritebool, optional

If True, overwrite the output file if it exists. Defaults to False.

exist_okbool, optional

If True, do nothing if the output file already exists. Defaults to False.

res_to_onebool, optional

If True, normalize bin coordinates by the resolution. Defaults to False.

to_mcoobool, optional

If True, create a modified COO with raw and normalized counts. Defaults to False.

gen_pseudo_weightsbool, optional

If True, generate a corresponding .weights file. Defaults to False.

output_delimiterstr, optional

The delimiter for the output text file. Defaults to a tab.

columns_orderlist[str], optional

The specific order of columns for the output file. Defaults to None.

FileExistsError

If the output file exists and neither overwrite nor exist_ok is True.

ConverterError

If to_mcoo is True but balancing is not provided.

Examples

gunz_cm.converters.gnz module

Converter to .gnz unified container.

Examples

gunz_cm.converters.gnz.convert_to_gnz(fpath: Path, output_fpath: Path, region1: str, resolution: int, balancing: Balancing | None = None, backend: Backend = Backend.HICTK, dtype: str = 'float32', overwrite: bool = False, layout: str = 'dense', block_size: int = 1024) None[source]

Converts a Hi-C file to a .gnz container with matrix and weights. Supports multiple layouts: dense, tiled, csr, block_sparse. Uses streaming/incremental writing to support high-resolution data.

Examples

gunz_cm.converters.memmap module

Module for converting various contact matrix formats into a memory-mapped (memmap) file for efficient, on-disk matrix operations.

Examples

gunz_cm.converters.memmap.convert_to_memmap(data: Path | DataFrame | Tuple[_Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], ...], output_fpath: Path, **kwargs) None[source]
gunz_cm.converters.memmap.convert_to_memmap(data: DataFrame, output_fpath: Path, **kwargs) None
gunz_cm.converters.memmap.convert_to_memmap(data: Tuple[_Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]], output_fpath: Path, output_full_matrix: bool = True, dtype: type[Any] | dtype[Any] | _SupportsDType[dtype[Any]] | tuple[Any, Any] | list[Any] | _DTypeDict | str | None = None, shape: Tuple[int, int] | None = None, check_output: bool = True, overwrite: bool = False, metadata: Dict[str, Any] | None = None, **kwargs) None
gunz_cm.converters.memmap.convert_to_memmap(data: Path, output_fpath: Path, region1: str, resolution: int, balancing: Balancing | None, **kwargs) None

Converts contact matrix data to a NumPy memory-mapped file (memmap).

This is a polymorphic function that dispatches to the appropriate implementation based on the type of the data argument.

datapathlib.Path or pd.DataFrame or tuple

The input data to convert. Can be: - A path to a standard contact matrix file (.hic, .cool, etc.). - A pandas DataFrame in COO format. - A tuple of (rows, cols, values) arrays.

output_fpathpathlib.Path

The base path for the output memmap file.

**kwargs :

Additional arguments specific to the conversion type, such as resolution, balancing, output_full_matrix, etc.

Examples

Module contents