gunz_cm.preprocs package
Submodules
gunz_cm.preprocs.band_matrix module
Module for creating a band matrix from various data structures.
This module provides a polymorphic function, create_band_matrix, that filters a matrix to retain only the elements within a specified distance from the main diagonal.
Examples
- gunz_cm.preprocs.band_matrix.create_band_matrix(matrix: ndarray | coo_matrix | DataFrame, max_k: int | None = None, remove_main_diag: bool = False, *, row_ids_colname: str = 'row_ids', col_ids_colname: str = 'col_ids') ndarray | coo_matrix | DataFrame[source]
- gunz_cm.preprocs.band_matrix.create_band_matrix(matrix: ndarray, max_k: int | None, remove_main_diag: bool, **kwargs) ndarray
- gunz_cm.preprocs.band_matrix.create_band_matrix(matrix: coo_matrix, max_k: int | None, remove_main_diag: bool, **kwargs) coo_matrix
- gunz_cm.preprocs.band_matrix.create_band_matrix(matrix: DataFrame, max_k: int | None, remove_main_diag: bool, *, row_ids_colname: str, col_ids_colname: str) DataFrame
Creates a band matrix by keeping elements near the main diagonal.
This function filters a matrix to retain only the elements where the absolute difference between the row and column index is less than or equal to max_k.
- matrixnp.ndarray, sp.coo_matrix, or pd.DataFrame
The input matrix to filter.
- max_kint, optional
The maximum distance from the main diagonal to keep. If None, all elements are kept (no filtering by distance). Defaults to None.
- remove_main_diagbool, optional
If True, elements on the main diagonal (k=0) are removed. Defaults to False.
- row_ids_colnamestr, optional
Column name for row IDs (for DataFrame input).
- col_ids_colnamestr, optional
Column name for column IDs (for DataFrame input).
- np.ndarray, sp.coo_matrix, or pd.DataFrame
A new matrix of the same type as the input, containing only the elements within the specified band.
Examples
gunz_cm.preprocs.commons module
Matrix diagonal masking utilities.
This module provides helper functions for creating boolean masks to select or exclude elements based on their diagonal position within a sparse matrix. These utilities are optimized for performance using vectorized NumPy operations and feature robust input validation.
Examples
gunz_cm.preprocs.converters module
Module for converting between DataFrame and sparse matrix representations.
This module provides two primary, polymorphic functions: to_coo_matrix and to_dataframe. These functions use single dispatch to handle conversions from various data types (e.g., pandas DataFrame, tuples of arrays) to a standard sparse matrix or DataFrame format, with robust validation provided by Pydantic.
Examples
- gunz_cm.preprocs.converters.to_coo_matrix(matrix: DataFrame | Tuple[ndarray, ndarray, ndarray], is_triu_sym: bool = True, *, row_ids_colname: str = 'row_ids', col_ids_colname: str = 'col_ids', vals_colname: str = 'counts', shape: Tuple[int, int] | None = None) coo_matrix[source]
- gunz_cm.preprocs.converters.to_coo_matrix(matrix: DataFrame, is_triu_sym: bool, *, row_ids_colname: str, col_ids_colname: str, vals_colname: str, shape: Tuple[int, int] | None = None) coo_matrix
- gunz_cm.preprocs.converters.to_coo_matrix(matrix: Tuple[ndarray, ndarray, ndarray], is_triu_sym: bool, shape: Tuple[int, int] | None = None, **kwargs) coo_matrix
Convert various data types to a SciPy COO sparse matrix.
- matrixpd.DataFrame or tuple
Input data, which can be: - A pandas DataFrame with coordinate and value columns. - A tuple of (rows, columns, values) NumPy arrays.
- is_triu_symbool, optional
If True, assumes the matrix is symmetric and stored in upper-triangular format, used for inferring the full matrix shape. Defaults to True.
- row_ids_colnamestr, optional
Column name for row IDs (for DataFrame input).
- col_ids_colnamestr, optional
Column name for column IDs (for DataFrame input).
- vals_colnamestr, optional
Column name for values (for DataFrame input).
- shapetuple of ints, optional
The shape of the matrix. If None, it is inferred from the data.
- sp.coo_matrix
The COO format sparse matrix representation of the data.
Examples
- gunz_cm.preprocs.converters.to_dataframe(matrix: coo_matrix, *, row_ids_colname: str = 'row_ids', col_ids_colname: str = 'col_ids', vals_colname: str = 'counts') DataFrame[source]
- gunz_cm.preprocs.converters.to_dataframe(matrix: coo_matrix, *, row_ids_colname: str, col_ids_colname: str, vals_colname: str) DataFrame
Convert a sparse matrix to a pandas DataFrame.
- matrixsp.coo_matrix
Input COO format sparse matrix.
- row_ids_colnamestr, optional
The desired column name for row IDs in the output DataFrame.
- col_ids_colnamestr, optional
The desired column name for column IDs in the output DataFrame.
- vals_colnamestr, optional
The desired column name for values in the output DataFrame.
- pd.DataFrame
A DataFrame with columns for row IDs, column IDs, and values.
Examples
gunz_cm.preprocs.count_filters module
Provides functionality to filter matrix data based on raw counts.
This module uses functools.singledispatch and Pydantic’s validate_call to provide a single, robust filter_by_raw_counts function that can handle multiple data types safely.
Examples
- gunz_cm.preprocs.count_filters.filter_by_raw_counts(matrix: DataFrame | coo_matrix | csr_matrix | ndarray, min_val: int | None = None, max_val: int | None = None, *, raw_counts_colname: str = 'raw_counts') DataFrame | coo_matrix | csr_matrix | ndarray[source]
- gunz_cm.preprocs.count_filters.filter_by_raw_counts(matrix: DataFrame, min_val: int | None, max_val: int | None, *, raw_counts_colname: str) DataFrame
- gunz_cm.preprocs.count_filters.filter_by_raw_counts(matrix: coo_matrix, min_val: int | None, max_val: int | None, *, raw_counts_colname: str) coo_matrix
- gunz_cm.preprocs.count_filters.filter_by_raw_counts(matrix: csr_matrix, min_val: int | None, max_val: int | None, *, raw_counts_colname: str) csr_matrix
- gunz_cm.preprocs.count_filters.filter_by_raw_counts(matrix: ndarray, min_val: int | None, max_val: int | None, *, raw_counts_colname: str) ndarray
Filter entries of a matrix based on raw interaction counts.
This function uses Pydantic to validate inputs and single dispatch to route to the correct implementation based on the input data type.
- matrixpd.DataFrame, sp.coo_matrix, sp.csr_matrix, or np.ndarray
The input data. For NumPy arrays, this filters by setting values outside the range to 0. For sparse matrices and DataFrames, it removes the entries.
- min_valint, optional
The minimum raw count value to include (inclusive). Defaults to None.
- max_valint, optional
The maximum raw count value to include (inclusive). Defaults to None.
- raw_counts_colnamestr, optional
The name of the column containing raw counts. This is only used if the input is a pandas DataFrame. Defaults to DataFrameSpecs.RAW_COUNTS.
- pd.DataFrame, sp.coo_matrix, sp.csr_matrix, or np.ndarray
A new data object of the same type as the input, containing only the filtered entries.
- pydantic.ValidationError
If any argument’s type is incorrect.
- ValueError
If min_val > max_val, or if raw_counts_colname is not found.
- TypeError
If the target column in a DataFrame is not numeric.
Examples
gunz_cm.preprocs.graphs module
Module.
Examples
- gunz_cm.preprocs.graphs.comp_single_graph_adj_mat(data: ndarray | coo_matrix | DataFrame, allow_loop: bool = True, is_triu_sym: bool = True, row_ids_colname: str = 'row_ids', col_ids_colname: str = 'col_ids', counts_colname: str = 'counts') ndarray | coo_matrix | DataFrame[source]
- gunz_cm.preprocs.graphs.comp_single_graph_adj_mat(cm_coo: coo_matrix, allow_loop: bool = True, is_triu_sym: bool = True, **kwargs) coo_matrix
- gunz_cm.preprocs.graphs.comp_single_graph_adj_mat(cm_df: DataFrame, allow_loop: bool = True, is_triu_sym: bool = True, row_ids_colname: str = 'row_ids', col_ids_colname: str = 'col_ids', counts_colname: str = 'counts') DataFrame
Compute the adjacency matrix from a given data structure.
This function operates under the premise that the input matrix is symmetric but keeps only the upper triangular part and the diagonal from the matrix for processing. If allow_loop is True, the diagonal (self-loops) receives value 2 in the adjacency matrix. If allow_loop is False, the diagonal positions are set to 0 in the adjacency matrix, indicating no self-loop is encoded.
- datat.Union[np.ndarray, sp.coo_matrix, pd.DataFrame]
The input data structure.
- allow_loopbool, optional
Determines if a self-loop should be included in the resulting matrix. Default is True.
- is_triu_symbool, optional
Determines if the input matrix is symmetric and only the upper triangular part is used. Default is True.
- row_ids_colnamestr, optional
The column name for row IDs in the input DataFrame. Default is cm_consts.ROW_IDS_COLNAME.
- col_ids_colnamestr, optional
The column name for column IDs in the input DataFrame. Default is cm_consts.COL_IDS_COLNAME.
- counts_colnamestr, optional
The column name for counts in the input DataFrame. Default is cm_consts.COUNTS_COLNAME.
- adj_matrixt.Union[np.ndarray, sp.coo_matrix, pd.DataFrame]
The adjacency matrix.
Examples
gunz_cm.preprocs.infer_shape module
Module for inferring matrix shapes from various data structures.
This module provides a utility function to determine the dimensions of a matrix represented as a DataFrame, a SciPy COO matrix, or a tuple of coordinate arrays. It correctly handles symmetric matrices to infer square dimensions.
Examples
- gunz_cm.preprocs.infer_shape.infer_mat_shape(matrix: Tuple[ndarray, ndarray] | coo_matrix | DataFrame, is_triu_sym: bool = True, *, row_ids_colname: str = 'row_ids', col_ids_colname: str = 'col_ids') Tuple[int, int][source]
- gunz_cm.preprocs.infer_shape.infer_mat_shape(matrix: Tuple[ndarray, ndarray], is_triu_sym: bool, **kwargs) Tuple[int, int]
- gunz_cm.preprocs.infer_shape.infer_mat_shape(matrix: coo_matrix, is_triu_sym: bool, **kwargs) Tuple[int, int]
- gunz_cm.preprocs.infer_shape.infer_mat_shape(matrix: DataFrame, is_triu_sym: bool, *, row_ids_colname: str, col_ids_colname: str) Tuple[int, int]
Infer the shape of a matrix from different data types.
This function uses Pydantic to validate inputs and single dispatch to route to the correct implementation based on the input data type.
- matrixtuple, sp.coo_matrix, or pd.DataFrame
Input data, which can be: - A tuple of (row_indices, column_indices) NumPy arrays. - A SciPy COO sparse matrix. - A pandas DataFrame with coordinate columns.
- is_triu_symbool, optional
If True, the shape is inferred as a square matrix (N x N) based on the maximum index found. Defaults to True.
- row_ids_colnamestr, optional
Name of the column for row indices. Only used for DataFrames. Defaults to DataFrameSpecs.ROW_IDS.
- col_ids_colnamestr, optional
Name of the column for column indices. Only used for DataFrames. Defaults to DataFrameSpecs.COL_IDS.
- t.Tuple[int, int]
The inferred (rows, columns) shape of the matrix.
- pydantic.ValidationError
If any argument’s type is incorrect (e.g., a tuple with != 2 elements).
- ValueError
If required columns are missing from a DataFrame.
- TypeError
If the input data type is not supported or if index arrays are not of an integer dtype.
Examples
gunz_cm.preprocs.linear_scaler module
Module.
Examples
- gunz_cm.preprocs.linear_scaler.scale_matrix(matrix: ndarray | coo_matrix | csr_matrix, scaling_method: str = 'minmax', min_val: float = 0, max_val: float = 1, exclude_diagonal: bool = False, inplace: bool = False) ndarray | coo_matrix | csr_matrix[source]
- gunz_cm.preprocs.linear_scaler.scale_matrix(matrix: ndarray, scaling_method: str, min_val: float, max_val: float, exclude_diagonal: bool = False, inplace: bool = False, **kwargs) ndarray
- gunz_cm.preprocs.linear_scaler.scale_matrix(matrix: coo_matrix | csr_matrix, scaling_method: str, min_val: float, max_val: float, exclude_diagonal: bool = False, inplace: bool = False, **kwargs) coo_matrix | csr_matrix
- gunz_cm.preprocs.linear_scaler.scale_matrix(matrix: coo_matrix | csr_matrix, scaling_method: str, min_val: float, max_val: float, exclude_diagonal: bool = False, inplace: bool = False, **kwargs) coo_matrix | csr_matrix
Scales a matrix using the specified method.
This function supports both dense and sparse matrices and can scale using either min-max scaling or normalization. It can also exclude diagonal elements from scaling and perform operations in-place if specified.
- matrixUnion[np.ndarray, coo_matrix, csr_matrix]
The matrix to scale.
- scaling_methodstr, optional
The scaling method to use (‘minmax’ or ‘normal’, default is ‘minmax’).
- min_valfloat, optional
The minimum value for min-max scaling (default is 0).
- max_valfloat, optional
The maximum value for min-max scaling (default is 1).
- exclude_diagonalbool, optional
Whether to exclude diagonal elements from scaling (default is False).
- inplacebool, optional
Whether to perform the scaling in-place (default is False).
- Union[np.ndarray, coo_matrix, csr_matrix]
The scaled matrix.
Examples
- gunz_cm.preprocs.linear_scaler.transform_to_gaussian(matrix: ndarray | coo_matrix | csr_matrix, mu: float = 0, sigma: float = 1, inplace: bool = False) ndarray | coo_matrix | csr_matrix[source]
- gunz_cm.preprocs.linear_scaler.transform_to_gaussian(matrix: ndarray, mu: float, sigma: float, inplace: bool = False, **kwargs) ndarray
- gunz_cm.preprocs.linear_scaler.transform_to_gaussian(matrix: coo_matrix | csr_matrix, mu: float, sigma: float, inplace: bool = False, **kwargs) coo_matrix | csr_matrix
- gunz_cm.preprocs.linear_scaler.transform_to_gaussian(matrix: coo_matrix | csr_matrix, mu: float, sigma: float, inplace: bool = False, **kwargs) coo_matrix | csr_matrix
Transforms a matrix to a Gaussian distribution.
This function transforms the matrix to have a Gaussian distribution with the specified mean and standard deviation. It supports both dense and sparse matrices and can perform operations in-place if specified.
- matrixUnion[np.ndarray, coo_matrix, csr_matrix]
The matrix to transform.
- mufloat, optional
The mean of the Gaussian distribution (default is 0).
- sigmafloat, optional
The standard deviation of the Gaussian distribution (default is 1).
- inplacebool, optional
Whether to perform the transformation in-place (default is False).
- Union[np.ndarray, coo_matrix, csr_matrix]
The transformed matrix.
Examples
gunz_cm.preprocs.log_scaler module
Module.
Examples
- gunz_cm.preprocs.log_scaler.log_scale_matrix(matrix: ndarray | coo_matrix | csr_matrix, exclude_diagonal: bool = False, inplace: bool = False) ndarray | coo_matrix | csr_matrix[source]
- gunz_cm.preprocs.log_scaler.log_scale_matrix(matrix: ndarray, exclude_diagonal: bool, inplace: bool, **kwargs) ndarray
- gunz_cm.preprocs.log_scaler.log_scale_matrix(matrix: coo_matrix | csr_matrix, exclude_diagonal: bool, inplace: bool, **kwargs) coo_matrix | csr_matrix
- gunz_cm.preprocs.log_scaler.log_scale_matrix(matrix: coo_matrix | csr_matrix, exclude_diagonal: bool, inplace: bool, **kwargs) coo_matrix | csr_matrix
Optimized log(1+v) scaling with in-place operation support.
This function applies a log(1+v) transformation to the input matrix. It supports both dense and sparse matrices. If exclude_diagonal is True, the diagonal elements are set to zero for dense matrices or removed for sparse matrices. The inplace parameter allows modifying the matrix in-place to save memory.
- matrixUnion[np.ndarray, coo_matrix, csr_matrix]
Input matrix for log scaling.
- exclude_diagonalbool, optional
Zero diagonal (dense) or remove entries (sparse), default False.
- inplacebool, optional
Modify matrix in-place instead of creating new, default False.
- Union[np.ndarray, coo_matrix, csr_matrix]
Log-scaled matrix (original if inplace=True).
Examples
gunz_cm.preprocs.masks module
Centralized manager for genomic and structural masking logic.
- gunz_cm.preprocs.masks.expand_with_nans(points_filtered: ndarray, mask: ndarray, full_length: int | None = None) ndarray[source]
Expands a filtered point cloud back to genomic length, inserting NaNs where the mask is False.
- gunz_cm.preprocs.masks.get_genomic_mask(resolution: int, region: str, hic_path: str | PathLike, balancing: str = 'KR', root: str | PathLike | None = None) ndarray[source]
Identifies valid (aligned) bins from Hi-C data by inspecting non-zero contacts.
- Parameters:
resolution (int) – Genomic resolution in bp.
region (str) – Chromosome/region identifier.
hic_path (str | os.PathLike) – Path to the .hic file.
balancing (str) – Normalization scheme (e.g., ‘KR’).
root (str | os.PathLike | None) – Project root directory.
- Returns:
Boolean mask of valid bins.
- Return type:
np.ndarray
- gunz_cm.preprocs.masks.get_optimization_mask(points: ndarray, threshold: float = 1e-05) ndarray[source]
Identifies points that have moved from the origin (stagnant noise filter).
gunz_cm.preprocs.mirrors module
Module.
Examples
- gunz_cm.preprocs.mirrors.mirror_upper_to_lower_triangle(mat: DataFrame | coo_matrix, remove_diag: bool = False, double_diag: bool = False) DataFrame | coo_matrix[source]
- gunz_cm.preprocs.mirrors.mirror_upper_to_lower_triangle(cm_coo: coo_matrix, remove_diag: bool = False, double_diag: bool = False) coo_matrix
- gunz_cm.preprocs.mirrors.mirror_upper_to_lower_triangle(cm_df: DataFrame, remove_diag: bool = False, double_diag: bool = False, row_ids_colname: str = 'row_ids', col_ids_colname: str = 'col_ids', vals_colname: str = 'counts') DataFrame
Mirror the upper triangle part to the lower triangle part of a matrix.
- matt.Union[pd.DataFrame, sp.coo_matrix]
Input matrix.
- remove_diagbool, optional
Whether to remove the main diagonal (default is False).
- double_diagbool, optional
Whether to double the diagonal entries (default is False). This is useful for preserving behavior of certain legacy implementations that sum (i, j) and (j, i) blindly even when i=j. Ignored if remove_diag is True.
- output_datat.Union[pd.DataFrame, sparse.coo_matrix]
Resulting matrix with the upper triangle mirrored to the lower triangle.
This function assumes the input matrix is a symmetric matrix. It delegates the operation to registered implementations based on the input type.
Examples
- gunz_cm.preprocs.mirrors.mirror_upper_to_lower_triangle_coo(cm_coo: coo_matrix, remove_diag: bool = False, double_diag: bool = False) coo_matrix[source]
Mirror the upper triangle part to the lower triangle part of a sparse matrix.
This function assumes the input matrix is a symmetric matrix.
- cooscipy.sparse.coo_matrix
The input sparse matrix.
- remove_diagbool, optional
Whether to remove the main diagonal. Defaults to False.
- double_diagbool, optional
Whether to double the diagonal entries. Defaults to False.
- out_matscipy.sparse.coo_matrix
The resulting sparse matrix with the upper triangle mirrored to the lower triangle.
Examples
- gunz_cm.preprocs.mirrors.mirror_upper_to_lower_triangle_df(cm_df: DataFrame, remove_diag: bool = False, double_diag: bool = False, row_ids_colname: str = 'row_ids', col_ids_colname: str = 'col_ids', vals_colname: str = 'counts') DataFrame[source]
Mirror the upper triangle part to the lower triangle part of a matrix.
This function assumes the input matrix is a symmetric matrix.
- cm_dfpandas.DataFrame
The input DataFrame representing the matrix.
- remove_diagbool, optional
Whether to remove the main diagonal (default is False).
- double_diagbool, optional
Whether to double the diagonal entries (default is False).
- output_dfpandas.DataFrame
The resulting DataFrame with the upper triangle mirrored to the lower triangle.
The function uses the consts module for column names.
Examples
- gunz_cm.preprocs.mirrors.symmetrize_edges(rows: ndarray, cols: ndarray, data: ndarray, shape: Tuple[int, int], double_diag: bool = False) coo_matrix[source]
Construct a symmetric COO matrix from directed edge arrays.
This function is more efficient than mirror_upper_to_lower_triangle for constructing matrices from raw edge lists because it skips the intermediate sparse matrix creation and filtering steps.
- rowsnp.ndarray
Row indices.
- colsnp.ndarray
Column indices.
- datanp.ndarray
Values.
- shapetuple[int, int]
Shape of the resulting matrix.
- double_diagbool, optional
Whether to include diagonal elements (i, i) twice (once as (i, i) and once as mirrored (i, i)). This preserves legacy behavior of blindly summing (i, j) and (j, i). Defaults to False.
- sp.coo_matrix
The symmetric sparse matrix.
Examples
gunz_cm.preprocs.noises module
Module.
Examples
- gunz_cm.preprocs.noises.add_rand_ligation_noise(data: ndarray | coo_matrix | DataFrame, ratio: float, use_pseudo: bool = False, is_triu_sym: bool = True, inplace: bool = False) ndarray | coo_matrix | DataFrame[source]
Add random ligation noise to the input data.
- dataUnion[numpy.ndarray, scipy.sparse.coo_matrix, pandas.DataFrame]
Input data.
- ratiofloat
Noise ratio.
- is_triu_symbool, optional
Whether the matrix is triangular upper and symmetric (default is True).
- inplacebool, optional
Whether to modify the input data in place (default is False).
- Union[numpy.ndarray, scipy.sparse.coo_matrix, pandas.DataFrame]
Data with added random ligation noise.
This function adds random ligation noise to the input data. It supports numpy arrays, scipy sparse matrices, and pandas dataframes. Note: The inplace parameter only affects the input data type. For numpy arrays and scipy sparse matrices, inplace=True will modify the original data. For pandas dataframes, inplace=True will not modify the original data.
Examples
- gunz_cm.preprocs.noises.add_rand_ligation_noise_coo(cm_coo: coo_matrix, ratio: float, use_pseudo: bool = False, is_triu_sym: bool = True, inplace: bool = False) coo_matrix[source]
Add random ligation noise to a scipy sparse matrix.
This function adds random ligation noise to the input scipy sparse matrix. If inplace is False, a copy of the input matrix is created before adding noise. If is_triu_sym is True, the matrix is assumed to be triangular upper and symmetric.
- cm_cooscipy.sparse.coo_matrix
Input scipy sparse matrix.
- ratiofloat
Noise ratio.
- is_triu_symbool, optional
Whether the matrix is triangular upper and symmetric (default is True).
- inplacebool, optional
Whether to modify the input data in place (default is False).
- scipy.sparse.coo_matrix
Scipy sparse matrix with added random ligation noise.
Examples
- gunz_cm.preprocs.noises.add_rand_ligation_noise_df(cm_df: DataFrame, ratio: float, use_pseudo: bool = False, is_triu_sym: bool = True, inplace: bool = False, row_ids_colname: str = 'row_ids', col_ids_colname: str = 'col_ids', vals_colname: str = 'counts') DataFrame[source]
Add random ligation noise to a pandas DataFrame.
This function adds random ligation noise to the input pandas DataFrame. If inplace is False, a copy of the input DataFrame is created before adding noise. If is_triu_sym is True, the matrix is assumed to be triangular upper and symmetric.
- cm_dfpd.DataFrame
Input pandas DataFrame.
- ratiofloat
Noise ratio.
- is_triu_symbool, optional
Whether the matrix is triangular upper and symmetric (default is True).
- inplacebool, optional
Whether to modify the input data in place (default is False).
- row_ids_colnamestr, optional
Column name for row IDs (default is ‘row_ids’).
- col_ids_colnamestr, optional
Column name for column IDs (default is ‘col_ids’).
- vals_colnamestr, optional
Column name for values (default is ‘counts’).
- pd.DataFrame
Pandas DataFrame with added random ligation noise.
Examples
gunz_cm.preprocs.rc_filters module
Module.
Examples
- gunz_cm.preprocs.rc_filters.filter_empty_rowcols(data: ndarray | tuple | coo_matrix | DataFrame, is_triu_sym: bool = True, axis: int | None = None, ret_mapping: bool = False, ret_unique_ids: bool = False, row_ids_colname: str = 'row_ids', col_ids_colname: str = 'col_ids') ndarray | tuple | coo_matrix | DataFrame[source]
- gunz_cm.preprocs.rc_filters.filter_empty_rowcols(cm_mat: ndarray, is_triu_sym: bool = True, axis: int | None = None, ret_mapping: bool = False, ret_unique_ids: bool = False, **kwargs) ndarray
- gunz_cm.preprocs.rc_filters.filter_empty_rowcols(data: Tuple[ndarray, ndarray], is_triu_sym: bool = True, axis: int | None = None, ret_mapping: bool = False, ret_unique_ids: bool = False, **kwargs) Tuple[ndarray, ndarray, ndarray | None, ndarray | None]
- gunz_cm.preprocs.rc_filters.filter_empty_rowcols(cm_coo: coo_matrix, is_triu_sym: bool = True, axis: int | None = None, ret_mapping: bool = False, ret_unique_ids: bool = False, **kwargs) Tuple[coo_matrix, Tuple[ndarray, ...] | None]
- gunz_cm.preprocs.rc_filters.filter_empty_rowcols(df: DataFrame, is_triu_sym: bool = True, axis: int = None, ret_mapping: bool = False, ret_unique_ids: bool = False, row_ids_colname: str = 'row_ids', col_ids_colname: str = 'col_ids', **kwargs) DataFrame | Tuple[DataFrame, ...]
Filter out row or columns which entries are zeros (unalignable regions) and project the row and/or column ids.
This function filters out empty rows and columns from the input data.
- datanp.ndarray or tuple or scipy.sparse.coo_matrix or pd.DataFrame
The input data.
- is_triu_symbool, optional
If the input is symmetric but only the upper triangle part of the matrix is given. Defaults to True.
- axisint, optional
The axis to filter on. Defaults to None.
- ret_mappingbool, optional
Whether to return the mapping of the original ids to the new ids. Defaults to False.
- ret_unique_idsbool, optional
Whether to return unique ids. Defaults to False.
- filtered_datanp.ndarray or tuple or scipy.sparse.coo_matrix or pd.DataFrame
The filtered data.
Examples
gunz_cm.preprocs.rc_filters_common module
Module.
Examples
- gunz_cm.preprocs.rc_filters_common.filter_common_empty_rowcols(data1, data2, op: str = 'union', is_triu_sym: bool = True, axis: int | None = None, ret_mapping: bool = False, ret_unique_ids: bool = False)[source]
- gunz_cm.preprocs.rc_filters_common.filter_common_empty_rowcols(cm_mat1: ndarray, cm_mat2: ndarray, is_triu_sym: bool = True, axis: int = None, ret_mapping: bool = False, **kwargs) ndarray
- gunz_cm.preprocs.rc_filters_common.filter_common_empty_rowcols(data1: Tuple[ndarray, ndarray], data2: Tuple[ndarray, ndarray], op: str = 'union', is_triu_sym: bool = True, axis: int | None = None, ret_mapping: bool = False, ret_unique_ids: bool = False, **kwargs) Tuple[ndarray, ndarray, ndarray | None, ndarray | None]
- gunz_cm.preprocs.rc_filters_common.filter_common_empty_rowcols(cm_coo1: coo_matrix, cm_coo2: coo_matrix, op: str = 'union', is_triu_sym: bool = True, axis: int | None = None, ret_mapping: bool = False, ret_unique_ids: bool = False, **kwargs) Tuple[spmatrix, Tuple[ndarray, ...] | None]
- gunz_cm.preprocs.rc_filters_common.filter_common_empty_rowcols(cm_df1: DataFrame, cm_df2: DataFrame, op: str = 'union', is_triu_sym: bool = True, axis: int | None = None, ret_mapping: bool = False, ret_unique_ids: bool = False, row_ids_colname: str = 'row_ids', col_ids_colname: str = 'col_ids', **kwargs) Tuple[DataFrame, ...] | DataFrame
Filter out unalignable regions from the input data.
- datapandas.DataFrame or scipy.sparse matrix
The input data.
- is_triu_symbool, optional
If the input is symmetric but only the upper triangle part of the matrix is given. Defaults to True.
- axisint, optional
The axis to filter on. Defaults to None.
- ret_mappingbool, optional
Whether to return the mapping of the original ids to the new ids. Defaults to False.
- filtered_datapandas.DataFrame or scipy.sparse matrix
The filtered data.
Examples
gunz_cm.preprocs.resamples module
Module.
Examples
- gunz_cm.preprocs.resamples.rand_downsample(data: ndarray | coo_matrix | DataFrame, ratio: int, val_colname: str = 'counts') ndarray | coo_matrix | DataFrame[source]
- gunz_cm.preprocs.resamples.rand_downsample(cm_mat: ndarray, ratio: int, **kwargs) ndarray
- gunz_cm.preprocs.resamples.rand_downsample(cm_coo: coo_matrix, ratio: int, **kwargs) coo_matrix
- gunz_cm.preprocs.resamples.rand_downsample(cm_df: DataFrame, ratio: int, val_colname: str = 'counts', **kwargs) DataFrame
Randomly downsample a matrix or dataframe by a specified ratio.
This function dispatches to different downsampling functions based on the input data type.
- dataUnion[np.ndarray, sp.coo_matrix, pd.DataFrame]
Input data to downsample.
- ratioint
Downsample ratio.
- val_colnamestr, optional
Column name for values in dataframe (default is cm_consts.COUNTS_COLNAME).
- Union[np.ndarray, sp.coo_matrix, pd.DataFrame]
Downsampled data.
Examples
- gunz_cm.preprocs.resamples.uniform_resample_mat(cm_mat: ndarray, target_rate: float) ndarray[source]
Uniformly resample a matrix by a specified target rate.
This function simply multiplies the input matrix by the target rate.
- cm_matnp.ndarray
Input contact matrix to resample.
- target_ratefloat
Target rate for resampling.
- np.ndarray
Resampled matrix.
Examples
gunz_cm.preprocs.sparse_wish_dist module
Module.
Examples
- gunz_cm.preprocs.sparse_wish_dist.comp_sparse_wish_dist(data, alpha: float = -0.25, na_inf_val: float | None = None)[source]
- gunz_cm.preprocs.sparse_wish_dist.comp_sparse_wish_dist(cm_coo: coo_matrix, alpha: float = -0.25, na_inf_val: float | None = None, **kwargs)
- gunz_cm.preprocs.sparse_wish_dist.comp_sparse_wish_dist(cm_df: DataFrame, alpha: float = -0.25, na_inf_val: float | None = None, **kwargs) DataFrame
Function comp_sparse_wish_dist.
Examples
Notes
- gunz_cm.preprocs.sparse_wish_dist.comp_sparse_wish_dist_rc_ids(row_ids: ndarray | List[int], col_ids: ndarray | List[int], C_vals: ndarray, alpha: float = -0.25, na_inf_val: float | None = None) Tuple[ndarray, ndarray, ndarray, ndarray][source]
Calculate sparse form of Euclidean distance matrix from contact matrix.
Create a tuple of row indices, column indices, contact matrix values, and Euclidean distance values.
- row_idst.Union[np.ndarray, t.List[int]]
Array of row indices.
- col_idst.Union[np.ndarray, t.List[int]]
Array of column indices.
- C_valsnp.ndarray
Array of contact matrix values.
- alphafloat, optional
Conversion factor from contact matrix to Euclidean distance matrix (default is -0.25).
- na_inf_valt.Optional[float], optional
Value to replace NaN or infinite values (default is None).
- t.Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]
A tuple containing (row_ids, col_ids, C_vals, D_vals).
Removes the main diagonal of the matrix. NaN handling is not yet implemented and will raise a NotImplementedError if invalid values are found.
Examples
gunz_cm.preprocs.triu_matrix module
Module.
Examples
- gunz_cm.preprocs.triu_matrix.create_triu_matrix(data: ndarray | coo_matrix | DataFrame, min_k: int | None = None, max_k: int | None = None, remove_main_diag: bool = False) ndarray | tuple | coo_matrix | DataFrame[source]
- gunz_cm.preprocs.triu_matrix.create_triu_matrix(cm_mat: ndarray, min_k: int | None = None, max_k: int | None = None, remove_main_diag: bool = False, **kwargs) ndarray
- gunz_cm.preprocs.triu_matrix.create_triu_matrix(cm_coo: coo_matrix, min_k: int | None = None, max_k: int | None = None, remove_main_diag: bool = False, **kwargs) coo_matrix
- gunz_cm.preprocs.triu_matrix.create_triu_matrix(cm_df: DataFrame, min_k: int | None = None, max_k: int | None = None, remove_main_diag: bool = False, row_ids_colname: str = 'row_ids', col_ids_colname: str = 'col_ids', **kwargs) DataFrame
Creates a triangular matrix.
This function creates a triangular matrix based on the input data. The min_k and max_k parameters control the minimum and maximum distance from the main diagonal. If remove_main_diag is True, the main diagonal elements are removed.
- datat.Union[np.ndarray, sp.coo_matrix, pd.DataFrame]
The input data to be converted to a triangular matrix.
- min_kt.Optional[int], optional
The minimum distance from the main diagonal (default is None).
- max_kt.Optional[int], optional
The maximum distance from the main diagonal (default is None).
- remove_main_diagbool, optional
Whether to remove the main diagonal elements (default is False).
- t.Union[np.ndarray, tuple, sp.coo_matrix, pd.DataFrame]
The triangular matrix.
Examples
gunz_cm.preprocs.weight_filters module
Module.
Examples
- gunz_cm.preprocs.weight_filters.filter_by_weights_quantile_df(cm_df: DataFrame, q1: float | None = 0, q3: float | None = 1.0, log: bool | None = True, val_colname: str | None = 'counts', orig_val_colname: str | None = 'raw_counts') DataFrame[source]
Filter a DataFrame based on weight quantiles.
This function calculates weights based on the ratio of normalized counts to raw counts. It then applies log transformation if specified and filters the DataFrame based on the weight quantiles.
- cm_dfpd.DataFrame
The input DataFrame containing count data.
- q1float, optional
The lower quantile value (default is 0).
- q3float, optional
The upper quantile value (default is 1.0).
- logbool, optional
Whether to apply log transformation to the weights. Default is True.
- val_colnamestr, optional
The column name for normalized counts. Default is cm_consts.COUNTS_COLNAME.
- orig_val_colnamestr, optional
The column name for raw counts. Default is cm_consts.RAW_COUNTS_COLNAME.
- pd.DataFrame
A new DataFrame filtered based on the weight quantiles.
Examples