gunz_cm.preprocs package

Submodules

gunz_cm.preprocs.band_matrix module

Module for creating a band matrix from various data structures.

This module provides a polymorphic function, create_band_matrix, that filters a matrix to retain only the elements within a specified distance from the main diagonal.

Examples

gunz_cm.preprocs.band_matrix.create_band_matrix(matrix: ndarray | coo_matrix | DataFrame, max_k: int | None = None, remove_main_diag: bool = False, *, row_ids_colname: str = 'row_ids', col_ids_colname: str = 'col_ids') ndarray | coo_matrix | DataFrame[source]
gunz_cm.preprocs.band_matrix.create_band_matrix(matrix: ndarray, max_k: int | None, remove_main_diag: bool, **kwargs) ndarray
gunz_cm.preprocs.band_matrix.create_band_matrix(matrix: coo_matrix, max_k: int | None, remove_main_diag: bool, **kwargs) coo_matrix
gunz_cm.preprocs.band_matrix.create_band_matrix(matrix: DataFrame, max_k: int | None, remove_main_diag: bool, *, row_ids_colname: str, col_ids_colname: str) DataFrame

Creates a band matrix by keeping elements near the main diagonal.

This function filters a matrix to retain only the elements where the absolute difference between the row and column index is less than or equal to max_k.

matrixnp.ndarray, sp.coo_matrix, or pd.DataFrame

The input matrix to filter.

max_kint, optional

The maximum distance from the main diagonal to keep. If None, all elements are kept (no filtering by distance). Defaults to None.

remove_main_diagbool, optional

If True, elements on the main diagonal (k=0) are removed. Defaults to False.

row_ids_colnamestr, optional

Column name for row IDs (for DataFrame input).

col_ids_colnamestr, optional

Column name for column IDs (for DataFrame input).

np.ndarray, sp.coo_matrix, or pd.DataFrame

A new matrix of the same type as the input, containing only the elements within the specified band.

Examples

gunz_cm.preprocs.commons module

Matrix diagonal masking utilities.

This module provides helper functions for creating boolean masks to select or exclude elements based on their diagonal position within a sparse matrix. These utilities are optimized for performance using vectorized NumPy operations and feature robust input validation.

Examples

gunz_cm.preprocs.converters module

Module for converting between DataFrame and sparse matrix representations.

This module provides two primary, polymorphic functions: to_coo_matrix and to_dataframe. These functions use single dispatch to handle conversions from various data types (e.g., pandas DataFrame, tuples of arrays) to a standard sparse matrix or DataFrame format, with robust validation provided by Pydantic.

Examples

gunz_cm.preprocs.converters.to_coo_matrix(matrix: DataFrame | Tuple[ndarray, ndarray, ndarray], is_triu_sym: bool = True, *, row_ids_colname: str = 'row_ids', col_ids_colname: str = 'col_ids', vals_colname: str = 'counts', shape: Tuple[int, int] | None = None) coo_matrix[source]
gunz_cm.preprocs.converters.to_coo_matrix(matrix: DataFrame, is_triu_sym: bool, *, row_ids_colname: str, col_ids_colname: str, vals_colname: str, shape: Tuple[int, int] | None = None) coo_matrix
gunz_cm.preprocs.converters.to_coo_matrix(matrix: Tuple[ndarray, ndarray, ndarray], is_triu_sym: bool, shape: Tuple[int, int] | None = None, **kwargs) coo_matrix

Convert various data types to a SciPy COO sparse matrix.

matrixpd.DataFrame or tuple

Input data, which can be: - A pandas DataFrame with coordinate and value columns. - A tuple of (rows, columns, values) NumPy arrays.

is_triu_symbool, optional

If True, assumes the matrix is symmetric and stored in upper-triangular format, used for inferring the full matrix shape. Defaults to True.

row_ids_colnamestr, optional

Column name for row IDs (for DataFrame input).

col_ids_colnamestr, optional

Column name for column IDs (for DataFrame input).

vals_colnamestr, optional

Column name for values (for DataFrame input).

shapetuple of ints, optional

The shape of the matrix. If None, it is inferred from the data.

sp.coo_matrix

The COO format sparse matrix representation of the data.

Examples

gunz_cm.preprocs.converters.to_dataframe(matrix: coo_matrix, *, row_ids_colname: str = 'row_ids', col_ids_colname: str = 'col_ids', vals_colname: str = 'counts') DataFrame[source]
gunz_cm.preprocs.converters.to_dataframe(matrix: coo_matrix, *, row_ids_colname: str, col_ids_colname: str, vals_colname: str) DataFrame

Convert a sparse matrix to a pandas DataFrame.

matrixsp.coo_matrix

Input COO format sparse matrix.

row_ids_colnamestr, optional

The desired column name for row IDs in the output DataFrame.

col_ids_colnamestr, optional

The desired column name for column IDs in the output DataFrame.

vals_colnamestr, optional

The desired column name for values in the output DataFrame.

pd.DataFrame

A DataFrame with columns for row IDs, column IDs, and values.

Examples

gunz_cm.preprocs.count_filters module

Provides functionality to filter matrix data based on raw counts.

This module uses functools.singledispatch and Pydantic’s validate_call to provide a single, robust filter_by_raw_counts function that can handle multiple data types safely.

Examples

gunz_cm.preprocs.count_filters.filter_by_raw_counts(matrix: DataFrame | coo_matrix | csr_matrix | ndarray, min_val: int | None = None, max_val: int | None = None, *, raw_counts_colname: str = 'raw_counts') DataFrame | coo_matrix | csr_matrix | ndarray[source]
gunz_cm.preprocs.count_filters.filter_by_raw_counts(matrix: DataFrame, min_val: int | None, max_val: int | None, *, raw_counts_colname: str) DataFrame
gunz_cm.preprocs.count_filters.filter_by_raw_counts(matrix: coo_matrix, min_val: int | None, max_val: int | None, *, raw_counts_colname: str) coo_matrix
gunz_cm.preprocs.count_filters.filter_by_raw_counts(matrix: csr_matrix, min_val: int | None, max_val: int | None, *, raw_counts_colname: str) csr_matrix
gunz_cm.preprocs.count_filters.filter_by_raw_counts(matrix: ndarray, min_val: int | None, max_val: int | None, *, raw_counts_colname: str) ndarray

Filter entries of a matrix based on raw interaction counts.

This function uses Pydantic to validate inputs and single dispatch to route to the correct implementation based on the input data type.

matrixpd.DataFrame, sp.coo_matrix, sp.csr_matrix, or np.ndarray

The input data. For NumPy arrays, this filters by setting values outside the range to 0. For sparse matrices and DataFrames, it removes the entries.

min_valint, optional

The minimum raw count value to include (inclusive). Defaults to None.

max_valint, optional

The maximum raw count value to include (inclusive). Defaults to None.

raw_counts_colnamestr, optional

The name of the column containing raw counts. This is only used if the input is a pandas DataFrame. Defaults to DataFrameSpecs.RAW_COUNTS.

pd.DataFrame, sp.coo_matrix, sp.csr_matrix, or np.ndarray

A new data object of the same type as the input, containing only the filtered entries.

pydantic.ValidationError

If any argument’s type is incorrect.

ValueError

If min_val > max_val, or if raw_counts_colname is not found.

TypeError

If the target column in a DataFrame is not numeric.

Examples

gunz_cm.preprocs.graphs module

Module.

Examples

gunz_cm.preprocs.graphs.comp_single_graph_adj_mat(data: ndarray | coo_matrix | DataFrame, allow_loop: bool = True, is_triu_sym: bool = True, row_ids_colname: str = 'row_ids', col_ids_colname: str = 'col_ids', counts_colname: str = 'counts') ndarray | coo_matrix | DataFrame[source]
gunz_cm.preprocs.graphs.comp_single_graph_adj_mat(cm_coo: coo_matrix, allow_loop: bool = True, is_triu_sym: bool = True, **kwargs) coo_matrix
gunz_cm.preprocs.graphs.comp_single_graph_adj_mat(cm_df: DataFrame, allow_loop: bool = True, is_triu_sym: bool = True, row_ids_colname: str = 'row_ids', col_ids_colname: str = 'col_ids', counts_colname: str = 'counts') DataFrame

Compute the adjacency matrix from a given data structure.

This function operates under the premise that the input matrix is symmetric but keeps only the upper triangular part and the diagonal from the matrix for processing. If allow_loop is True, the diagonal (self-loops) receives value 2 in the adjacency matrix. If allow_loop is False, the diagonal positions are set to 0 in the adjacency matrix, indicating no self-loop is encoded.

datat.Union[np.ndarray, sp.coo_matrix, pd.DataFrame]

The input data structure.

allow_loopbool, optional

Determines if a self-loop should be included in the resulting matrix. Default is True.

is_triu_symbool, optional

Determines if the input matrix is symmetric and only the upper triangular part is used. Default is True.

row_ids_colnamestr, optional

The column name for row IDs in the input DataFrame. Default is cm_consts.ROW_IDS_COLNAME.

col_ids_colnamestr, optional

The column name for column IDs in the input DataFrame. Default is cm_consts.COL_IDS_COLNAME.

counts_colnamestr, optional

The column name for counts in the input DataFrame. Default is cm_consts.COUNTS_COLNAME.

adj_matrixt.Union[np.ndarray, sp.coo_matrix, pd.DataFrame]

The adjacency matrix.

Examples

gunz_cm.preprocs.infer_shape module

Module for inferring matrix shapes from various data structures.

This module provides a utility function to determine the dimensions of a matrix represented as a DataFrame, a SciPy COO matrix, or a tuple of coordinate arrays. It correctly handles symmetric matrices to infer square dimensions.

Examples

gunz_cm.preprocs.infer_shape.infer_mat_shape(matrix: Tuple[ndarray, ndarray] | coo_matrix | DataFrame, is_triu_sym: bool = True, *, row_ids_colname: str = 'row_ids', col_ids_colname: str = 'col_ids') Tuple[int, int][source]
gunz_cm.preprocs.infer_shape.infer_mat_shape(matrix: Tuple[ndarray, ndarray], is_triu_sym: bool, **kwargs) Tuple[int, int]
gunz_cm.preprocs.infer_shape.infer_mat_shape(matrix: coo_matrix, is_triu_sym: bool, **kwargs) Tuple[int, int]
gunz_cm.preprocs.infer_shape.infer_mat_shape(matrix: DataFrame, is_triu_sym: bool, *, row_ids_colname: str, col_ids_colname: str) Tuple[int, int]

Infer the shape of a matrix from different data types.

This function uses Pydantic to validate inputs and single dispatch to route to the correct implementation based on the input data type.

matrixtuple, sp.coo_matrix, or pd.DataFrame

Input data, which can be: - A tuple of (row_indices, column_indices) NumPy arrays. - A SciPy COO sparse matrix. - A pandas DataFrame with coordinate columns.

is_triu_symbool, optional

If True, the shape is inferred as a square matrix (N x N) based on the maximum index found. Defaults to True.

row_ids_colnamestr, optional

Name of the column for row indices. Only used for DataFrames. Defaults to DataFrameSpecs.ROW_IDS.

col_ids_colnamestr, optional

Name of the column for column indices. Only used for DataFrames. Defaults to DataFrameSpecs.COL_IDS.

t.Tuple[int, int]

The inferred (rows, columns) shape of the matrix.

pydantic.ValidationError

If any argument’s type is incorrect (e.g., a tuple with != 2 elements).

ValueError

If required columns are missing from a DataFrame.

TypeError

If the input data type is not supported or if index arrays are not of an integer dtype.

Examples

gunz_cm.preprocs.linear_scaler module

Module.

Examples

gunz_cm.preprocs.linear_scaler.scale_matrix(matrix: ndarray | coo_matrix | csr_matrix, scaling_method: str = 'minmax', min_val: float = 0, max_val: float = 1, exclude_diagonal: bool = False, inplace: bool = False) ndarray | coo_matrix | csr_matrix[source]
gunz_cm.preprocs.linear_scaler.scale_matrix(matrix: ndarray, scaling_method: str, min_val: float, max_val: float, exclude_diagonal: bool = False, inplace: bool = False, **kwargs) ndarray
gunz_cm.preprocs.linear_scaler.scale_matrix(matrix: coo_matrix | csr_matrix, scaling_method: str, min_val: float, max_val: float, exclude_diagonal: bool = False, inplace: bool = False, **kwargs) coo_matrix | csr_matrix
gunz_cm.preprocs.linear_scaler.scale_matrix(matrix: coo_matrix | csr_matrix, scaling_method: str, min_val: float, max_val: float, exclude_diagonal: bool = False, inplace: bool = False, **kwargs) coo_matrix | csr_matrix

Scales a matrix using the specified method.

This function supports both dense and sparse matrices and can scale using either min-max scaling or normalization. It can also exclude diagonal elements from scaling and perform operations in-place if specified.

matrixUnion[np.ndarray, coo_matrix, csr_matrix]

The matrix to scale.

scaling_methodstr, optional

The scaling method to use (‘minmax’ or ‘normal’, default is ‘minmax’).

min_valfloat, optional

The minimum value for min-max scaling (default is 0).

max_valfloat, optional

The maximum value for min-max scaling (default is 1).

exclude_diagonalbool, optional

Whether to exclude diagonal elements from scaling (default is False).

inplacebool, optional

Whether to perform the scaling in-place (default is False).

Union[np.ndarray, coo_matrix, csr_matrix]

The scaled matrix.

Examples

gunz_cm.preprocs.linear_scaler.transform_to_gaussian(matrix: ndarray | coo_matrix | csr_matrix, mu: float = 0, sigma: float = 1, inplace: bool = False) ndarray | coo_matrix | csr_matrix[source]
gunz_cm.preprocs.linear_scaler.transform_to_gaussian(matrix: ndarray, mu: float, sigma: float, inplace: bool = False, **kwargs) ndarray
gunz_cm.preprocs.linear_scaler.transform_to_gaussian(matrix: coo_matrix | csr_matrix, mu: float, sigma: float, inplace: bool = False, **kwargs) coo_matrix | csr_matrix
gunz_cm.preprocs.linear_scaler.transform_to_gaussian(matrix: coo_matrix | csr_matrix, mu: float, sigma: float, inplace: bool = False, **kwargs) coo_matrix | csr_matrix

Transforms a matrix to a Gaussian distribution.

This function transforms the matrix to have a Gaussian distribution with the specified mean and standard deviation. It supports both dense and sparse matrices and can perform operations in-place if specified.

matrixUnion[np.ndarray, coo_matrix, csr_matrix]

The matrix to transform.

mufloat, optional

The mean of the Gaussian distribution (default is 0).

sigmafloat, optional

The standard deviation of the Gaussian distribution (default is 1).

inplacebool, optional

Whether to perform the transformation in-place (default is False).

Union[np.ndarray, coo_matrix, csr_matrix]

The transformed matrix.

Examples

gunz_cm.preprocs.log_scaler module

Module.

Examples

gunz_cm.preprocs.log_scaler.log_scale_matrix(matrix: ndarray | coo_matrix | csr_matrix, exclude_diagonal: bool = False, inplace: bool = False) ndarray | coo_matrix | csr_matrix[source]
gunz_cm.preprocs.log_scaler.log_scale_matrix(matrix: ndarray, exclude_diagonal: bool, inplace: bool, **kwargs) ndarray
gunz_cm.preprocs.log_scaler.log_scale_matrix(matrix: coo_matrix | csr_matrix, exclude_diagonal: bool, inplace: bool, **kwargs) coo_matrix | csr_matrix
gunz_cm.preprocs.log_scaler.log_scale_matrix(matrix: coo_matrix | csr_matrix, exclude_diagonal: bool, inplace: bool, **kwargs) coo_matrix | csr_matrix

Optimized log(1+v) scaling with in-place operation support.

This function applies a log(1+v) transformation to the input matrix. It supports both dense and sparse matrices. If exclude_diagonal is True, the diagonal elements are set to zero for dense matrices or removed for sparse matrices. The inplace parameter allows modifying the matrix in-place to save memory.

matrixUnion[np.ndarray, coo_matrix, csr_matrix]

Input matrix for log scaling.

exclude_diagonalbool, optional

Zero diagonal (dense) or remove entries (sparse), default False.

inplacebool, optional

Modify matrix in-place instead of creating new, default False.

Union[np.ndarray, coo_matrix, csr_matrix]

Log-scaled matrix (original if inplace=True).

Examples

gunz_cm.preprocs.masks module

Centralized manager for genomic and structural masking logic.

gunz_cm.preprocs.masks.expand_with_nans(points_filtered: ndarray, mask: ndarray, full_length: int | None = None) ndarray[source]

Expands a filtered point cloud back to genomic length, inserting NaNs where the mask is False.

gunz_cm.preprocs.masks.get_genomic_mask(resolution: int, region: str, hic_path: str | PathLike, balancing: str = 'KR', root: str | PathLike | None = None) ndarray[source]

Identifies valid (aligned) bins from Hi-C data by inspecting non-zero contacts.

Parameters:
  • resolution (int) – Genomic resolution in bp.

  • region (str) – Chromosome/region identifier.

  • hic_path (str | os.PathLike) – Path to the .hic file.

  • balancing (str) – Normalization scheme (e.g., ‘KR’).

  • root (str | os.PathLike | None) – Project root directory.

Returns:

Boolean mask of valid bins.

Return type:

np.ndarray

gunz_cm.preprocs.masks.get_optimization_mask(points: ndarray, threshold: float = 1e-05) ndarray[source]

Identifies points that have moved from the origin (stagnant noise filter).

gunz_cm.preprocs.masks.get_unified_mask(points: ndarray, resolution: int, region: str, hic_path: str | PathLike, balancing: str = 'KR', root: str | PathLike | None = None) ndarray[source]

Combines Genomic (Hi-C) and Optimization (Movement) masks.

gunz_cm.preprocs.masks.intersect_masks(masks: list[ndarray]) ndarray[source]

Computes bitwise-AND across multiple masks.

gunz_cm.preprocs.mirrors module

Module.

Examples

gunz_cm.preprocs.mirrors.mirror_upper_to_lower_triangle(mat: DataFrame | coo_matrix, remove_diag: bool = False, double_diag: bool = False) DataFrame | coo_matrix[source]
gunz_cm.preprocs.mirrors.mirror_upper_to_lower_triangle(cm_coo: coo_matrix, remove_diag: bool = False, double_diag: bool = False) coo_matrix
gunz_cm.preprocs.mirrors.mirror_upper_to_lower_triangle(cm_df: DataFrame, remove_diag: bool = False, double_diag: bool = False, row_ids_colname: str = 'row_ids', col_ids_colname: str = 'col_ids', vals_colname: str = 'counts') DataFrame

Mirror the upper triangle part to the lower triangle part of a matrix.

matt.Union[pd.DataFrame, sp.coo_matrix]

Input matrix.

remove_diagbool, optional

Whether to remove the main diagonal (default is False).

double_diagbool, optional

Whether to double the diagonal entries (default is False). This is useful for preserving behavior of certain legacy implementations that sum (i, j) and (j, i) blindly even when i=j. Ignored if remove_diag is True.

output_datat.Union[pd.DataFrame, sparse.coo_matrix]

Resulting matrix with the upper triangle mirrored to the lower triangle.

This function assumes the input matrix is a symmetric matrix. It delegates the operation to registered implementations based on the input type.

Examples

gunz_cm.preprocs.mirrors.mirror_upper_to_lower_triangle_coo(cm_coo: coo_matrix, remove_diag: bool = False, double_diag: bool = False) coo_matrix[source]

Mirror the upper triangle part to the lower triangle part of a sparse matrix.

This function assumes the input matrix is a symmetric matrix.

cooscipy.sparse.coo_matrix

The input sparse matrix.

remove_diagbool, optional

Whether to remove the main diagonal. Defaults to False.

double_diagbool, optional

Whether to double the diagonal entries. Defaults to False.

out_matscipy.sparse.coo_matrix

The resulting sparse matrix with the upper triangle mirrored to the lower triangle.

Examples

gunz_cm.preprocs.mirrors.mirror_upper_to_lower_triangle_df(cm_df: DataFrame, remove_diag: bool = False, double_diag: bool = False, row_ids_colname: str = 'row_ids', col_ids_colname: str = 'col_ids', vals_colname: str = 'counts') DataFrame[source]

Mirror the upper triangle part to the lower triangle part of a matrix.

This function assumes the input matrix is a symmetric matrix.

cm_dfpandas.DataFrame

The input DataFrame representing the matrix.

remove_diagbool, optional

Whether to remove the main diagonal (default is False).

double_diagbool, optional

Whether to double the diagonal entries (default is False).

output_dfpandas.DataFrame

The resulting DataFrame with the upper triangle mirrored to the lower triangle.

The function uses the consts module for column names.

Examples

gunz_cm.preprocs.mirrors.symmetrize_edges(rows: ndarray, cols: ndarray, data: ndarray, shape: Tuple[int, int], double_diag: bool = False) coo_matrix[source]

Construct a symmetric COO matrix from directed edge arrays.

This function is more efficient than mirror_upper_to_lower_triangle for constructing matrices from raw edge lists because it skips the intermediate sparse matrix creation and filtering steps.

rowsnp.ndarray

Row indices.

colsnp.ndarray

Column indices.

datanp.ndarray

Values.

shapetuple[int, int]

Shape of the resulting matrix.

double_diagbool, optional

Whether to include diagonal elements (i, i) twice (once as (i, i) and once as mirrored (i, i)). This preserves legacy behavior of blindly summing (i, j) and (j, i). Defaults to False.

sp.coo_matrix

The symmetric sparse matrix.

Examples

gunz_cm.preprocs.noises module

Module.

Examples

gunz_cm.preprocs.noises.add_rand_ligation_noise(data: ndarray | coo_matrix | DataFrame, ratio: float, use_pseudo: bool = False, is_triu_sym: bool = True, inplace: bool = False) ndarray | coo_matrix | DataFrame[source]

Add random ligation noise to the input data.

dataUnion[numpy.ndarray, scipy.sparse.coo_matrix, pandas.DataFrame]

Input data.

ratiofloat

Noise ratio.

is_triu_symbool, optional

Whether the matrix is triangular upper and symmetric (default is True).

inplacebool, optional

Whether to modify the input data in place (default is False).

Union[numpy.ndarray, scipy.sparse.coo_matrix, pandas.DataFrame]

Data with added random ligation noise.

This function adds random ligation noise to the input data. It supports numpy arrays, scipy sparse matrices, and pandas dataframes. Note: The inplace parameter only affects the input data type. For numpy arrays and scipy sparse matrices, inplace=True will modify the original data. For pandas dataframes, inplace=True will not modify the original data.

Examples

gunz_cm.preprocs.noises.add_rand_ligation_noise_coo(cm_coo: coo_matrix, ratio: float, use_pseudo: bool = False, is_triu_sym: bool = True, inplace: bool = False) coo_matrix[source]

Add random ligation noise to a scipy sparse matrix.

This function adds random ligation noise to the input scipy sparse matrix. If inplace is False, a copy of the input matrix is created before adding noise. If is_triu_sym is True, the matrix is assumed to be triangular upper and symmetric.

cm_cooscipy.sparse.coo_matrix

Input scipy sparse matrix.

ratiofloat

Noise ratio.

is_triu_symbool, optional

Whether the matrix is triangular upper and symmetric (default is True).

inplacebool, optional

Whether to modify the input data in place (default is False).

scipy.sparse.coo_matrix

Scipy sparse matrix with added random ligation noise.

Examples

gunz_cm.preprocs.noises.add_rand_ligation_noise_df(cm_df: DataFrame, ratio: float, use_pseudo: bool = False, is_triu_sym: bool = True, inplace: bool = False, row_ids_colname: str = 'row_ids', col_ids_colname: str = 'col_ids', vals_colname: str = 'counts') DataFrame[source]

Add random ligation noise to a pandas DataFrame.

This function adds random ligation noise to the input pandas DataFrame. If inplace is False, a copy of the input DataFrame is created before adding noise. If is_triu_sym is True, the matrix is assumed to be triangular upper and symmetric.

cm_dfpd.DataFrame

Input pandas DataFrame.

ratiofloat

Noise ratio.

is_triu_symbool, optional

Whether the matrix is triangular upper and symmetric (default is True).

inplacebool, optional

Whether to modify the input data in place (default is False).

row_ids_colnamestr, optional

Column name for row IDs (default is ‘row_ids’).

col_ids_colnamestr, optional

Column name for column IDs (default is ‘col_ids’).

vals_colnamestr, optional

Column name for values (default is ‘counts’).

pd.DataFrame

Pandas DataFrame with added random ligation noise.

Examples

gunz_cm.preprocs.noises.add_rand_ligation_noise_mat(cm_mat: coo_matrix, ratio: float, is_triu_sym: bool = True, inplace: bool = False)[source]

Function add_rand_ligation_noise_mat.

Examples

Notes

gunz_cm.preprocs.rc_filters module

Module.

Examples

gunz_cm.preprocs.rc_filters.filter_empty_rowcols(data: ndarray | tuple | coo_matrix | DataFrame, is_triu_sym: bool = True, axis: int | None = None, ret_mapping: bool = False, ret_unique_ids: bool = False, row_ids_colname: str = 'row_ids', col_ids_colname: str = 'col_ids') ndarray | tuple | coo_matrix | DataFrame[source]
gunz_cm.preprocs.rc_filters.filter_empty_rowcols(cm_mat: ndarray, is_triu_sym: bool = True, axis: int | None = None, ret_mapping: bool = False, ret_unique_ids: bool = False, **kwargs) ndarray
gunz_cm.preprocs.rc_filters.filter_empty_rowcols(data: Tuple[ndarray, ndarray], is_triu_sym: bool = True, axis: int | None = None, ret_mapping: bool = False, ret_unique_ids: bool = False, **kwargs) Tuple[ndarray, ndarray, ndarray | None, ndarray | None]
gunz_cm.preprocs.rc_filters.filter_empty_rowcols(cm_coo: coo_matrix, is_triu_sym: bool = True, axis: int | None = None, ret_mapping: bool = False, ret_unique_ids: bool = False, **kwargs) Tuple[coo_matrix, Tuple[ndarray, ...] | None]
gunz_cm.preprocs.rc_filters.filter_empty_rowcols(df: DataFrame, is_triu_sym: bool = True, axis: int = None, ret_mapping: bool = False, ret_unique_ids: bool = False, row_ids_colname: str = 'row_ids', col_ids_colname: str = 'col_ids', **kwargs) DataFrame | Tuple[DataFrame, ...]

Filter out row or columns which entries are zeros (unalignable regions) and project the row and/or column ids.

This function filters out empty rows and columns from the input data.

datanp.ndarray or tuple or scipy.sparse.coo_matrix or pd.DataFrame

The input data.

is_triu_symbool, optional

If the input is symmetric but only the upper triangle part of the matrix is given. Defaults to True.

axisint, optional

The axis to filter on. Defaults to None.

ret_mappingbool, optional

Whether to return the mapping of the original ids to the new ids. Defaults to False.

ret_unique_idsbool, optional

Whether to return unique ids. Defaults to False.

filtered_datanp.ndarray or tuple or scipy.sparse.coo_matrix or pd.DataFrame

The filtered data.

Examples

gunz_cm.preprocs.rc_filters_common module

Module.

Examples

gunz_cm.preprocs.rc_filters_common.filter_common_empty_rowcols(data1, data2, op: str = 'union', is_triu_sym: bool = True, axis: int | None = None, ret_mapping: bool = False, ret_unique_ids: bool = False)[source]
gunz_cm.preprocs.rc_filters_common.filter_common_empty_rowcols(cm_mat1: ndarray, cm_mat2: ndarray, is_triu_sym: bool = True, axis: int = None, ret_mapping: bool = False, **kwargs) ndarray
gunz_cm.preprocs.rc_filters_common.filter_common_empty_rowcols(data1: Tuple[ndarray, ndarray], data2: Tuple[ndarray, ndarray], op: str = 'union', is_triu_sym: bool = True, axis: int | None = None, ret_mapping: bool = False, ret_unique_ids: bool = False, **kwargs) Tuple[ndarray, ndarray, ndarray | None, ndarray | None]
gunz_cm.preprocs.rc_filters_common.filter_common_empty_rowcols(cm_coo1: coo_matrix, cm_coo2: coo_matrix, op: str = 'union', is_triu_sym: bool = True, axis: int | None = None, ret_mapping: bool = False, ret_unique_ids: bool = False, **kwargs) Tuple[spmatrix, Tuple[ndarray, ...] | None]
gunz_cm.preprocs.rc_filters_common.filter_common_empty_rowcols(cm_df1: DataFrame, cm_df2: DataFrame, op: str = 'union', is_triu_sym: bool = True, axis: int | None = None, ret_mapping: bool = False, ret_unique_ids: bool = False, row_ids_colname: str = 'row_ids', col_ids_colname: str = 'col_ids', **kwargs) Tuple[DataFrame, ...] | DataFrame

Filter out unalignable regions from the input data.

datapandas.DataFrame or scipy.sparse matrix

The input data.

is_triu_symbool, optional

If the input is symmetric but only the upper triangle part of the matrix is given. Defaults to True.

axisint, optional

The axis to filter on. Defaults to None.

ret_mappingbool, optional

Whether to return the mapping of the original ids to the new ids. Defaults to False.

filtered_datapandas.DataFrame or scipy.sparse matrix

The filtered data.

Examples

gunz_cm.preprocs.resamples module

Module.

Examples

gunz_cm.preprocs.resamples.rand_downsample(data: ndarray | coo_matrix | DataFrame, ratio: int, val_colname: str = 'counts') ndarray | coo_matrix | DataFrame[source]
gunz_cm.preprocs.resamples.rand_downsample(cm_mat: ndarray, ratio: int, **kwargs) ndarray
gunz_cm.preprocs.resamples.rand_downsample(cm_coo: coo_matrix, ratio: int, **kwargs) coo_matrix
gunz_cm.preprocs.resamples.rand_downsample(cm_df: DataFrame, ratio: int, val_colname: str = 'counts', **kwargs) DataFrame

Randomly downsample a matrix or dataframe by a specified ratio.

This function dispatches to different downsampling functions based on the input data type.

dataUnion[np.ndarray, sp.coo_matrix, pd.DataFrame]

Input data to downsample.

ratioint

Downsample ratio.

val_colnamestr, optional

Column name for values in dataframe (default is cm_consts.COUNTS_COLNAME).

Union[np.ndarray, sp.coo_matrix, pd.DataFrame]

Downsampled data.

Examples

gunz_cm.preprocs.resamples.uniform_resample_mat(cm_mat: ndarray, target_rate: float) ndarray[source]

Uniformly resample a matrix by a specified target rate.

This function simply multiplies the input matrix by the target rate.

cm_matnp.ndarray

Input contact matrix to resample.

target_ratefloat

Target rate for resampling.

np.ndarray

Resampled matrix.

Examples

gunz_cm.preprocs.sparse_wish_dist module

Module.

Examples

gunz_cm.preprocs.sparse_wish_dist.comp_sparse_wish_dist(data, alpha: float = -0.25, na_inf_val: float | None = None)[source]
gunz_cm.preprocs.sparse_wish_dist.comp_sparse_wish_dist(cm_coo: coo_matrix, alpha: float = -0.25, na_inf_val: float | None = None, **kwargs)
gunz_cm.preprocs.sparse_wish_dist.comp_sparse_wish_dist(cm_df: DataFrame, alpha: float = -0.25, na_inf_val: float | None = None, **kwargs) DataFrame

Function comp_sparse_wish_dist.

Examples

Notes

gunz_cm.preprocs.sparse_wish_dist.comp_sparse_wish_dist_rc_ids(row_ids: ndarray | List[int], col_ids: ndarray | List[int], C_vals: ndarray, alpha: float = -0.25, na_inf_val: float | None = None) Tuple[ndarray, ndarray, ndarray, ndarray][source]

Calculate sparse form of Euclidean distance matrix from contact matrix.

Create a tuple of row indices, column indices, contact matrix values, and Euclidean distance values.

row_idst.Union[np.ndarray, t.List[int]]

Array of row indices.

col_idst.Union[np.ndarray, t.List[int]]

Array of column indices.

C_valsnp.ndarray

Array of contact matrix values.

alphafloat, optional

Conversion factor from contact matrix to Euclidean distance matrix (default is -0.25).

na_inf_valt.Optional[float], optional

Value to replace NaN or infinite values (default is None).

t.Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]

A tuple containing (row_ids, col_ids, C_vals, D_vals).

Removes the main diagonal of the matrix. NaN handling is not yet implemented and will raise a NotImplementedError if invalid values are found.

Examples

gunz_cm.preprocs.triu_matrix module

Module.

Examples

gunz_cm.preprocs.triu_matrix.create_triu_matrix(data: ndarray | coo_matrix | DataFrame, min_k: int | None = None, max_k: int | None = None, remove_main_diag: bool = False) ndarray | tuple | coo_matrix | DataFrame[source]
gunz_cm.preprocs.triu_matrix.create_triu_matrix(cm_mat: ndarray, min_k: int | None = None, max_k: int | None = None, remove_main_diag: bool = False, **kwargs) ndarray
gunz_cm.preprocs.triu_matrix.create_triu_matrix(cm_coo: coo_matrix, min_k: int | None = None, max_k: int | None = None, remove_main_diag: bool = False, **kwargs) coo_matrix
gunz_cm.preprocs.triu_matrix.create_triu_matrix(cm_df: DataFrame, min_k: int | None = None, max_k: int | None = None, remove_main_diag: bool = False, row_ids_colname: str = 'row_ids', col_ids_colname: str = 'col_ids', **kwargs) DataFrame

Creates a triangular matrix.

This function creates a triangular matrix based on the input data. The min_k and max_k parameters control the minimum and maximum distance from the main diagonal. If remove_main_diag is True, the main diagonal elements are removed.

datat.Union[np.ndarray, sp.coo_matrix, pd.DataFrame]

The input data to be converted to a triangular matrix.

min_kt.Optional[int], optional

The minimum distance from the main diagonal (default is None).

max_kt.Optional[int], optional

The maximum distance from the main diagonal (default is None).

remove_main_diagbool, optional

Whether to remove the main diagonal elements (default is False).

t.Union[np.ndarray, tuple, sp.coo_matrix, pd.DataFrame]

The triangular matrix.

Examples

gunz_cm.preprocs.weight_filters module

Module.

Examples

gunz_cm.preprocs.weight_filters.filter_by_weights_quantile_df(cm_df: DataFrame, q1: float | None = 0, q3: float | None = 1.0, log: bool | None = True, val_colname: str | None = 'counts', orig_val_colname: str | None = 'raw_counts') DataFrame[source]

Filter a DataFrame based on weight quantiles.

This function calculates weights based on the ratio of normalized counts to raw counts. It then applies log transformation if specified and filters the DataFrame based on the weight quantiles.

cm_dfpd.DataFrame

The input DataFrame containing count data.

q1float, optional

The lower quantile value (default is 0).

q3float, optional

The upper quantile value (default is 1.0).

logbool, optional

Whether to apply log transformation to the weights. Default is True.

val_colnamestr, optional

The column name for normalized counts. Default is cm_consts.COUNTS_COLNAME.

orig_val_colnamestr, optional

The column name for raw counts. Default is cm_consts.RAW_COUNTS_COLNAME.

pd.DataFrame

A new DataFrame filtered based on the weight quantiles.

Examples

Module contents