gunz_cm.reconstructions.third_party package

Submodules

gunz_cm.reconstructions.third_party.flamingo module

Module.

Examples

gunz_cm.reconstructions.third_party.flamingo.comp_flamingo_obj_perf(region1: str, resolution: int, balancing: str, input_fpath: str, points_fpath: str, region2: str | None = None) dict[source]

Computes the performance metrics (Spearman and Pearson correlation) for Euclidean distances predicted by FLAMINGO.

  • The function loads contact map data and point coordinates, filters out invalid data, and computes the Euclidean distance matrix.

  • If region2 is not provided, the function assumes the same region for both comparisons.

  • The function raises a FileNotFoundError if the points file does not exist.

region1str

The first genomic region to analyze.

resolutionint

The resolution of the genomic data.

balancingstr

The balancing method to use for the contact map data.

input_fpathstr

The file path to the input contact map data.

points_fpathstr

The file path to the points data.

region2t.Optional[str], optional

The second genomic region to analyze, by default None.

dict

A dictionary containing the region, Spearman correlation, Pearson correlation, and the ratio of valid data.

Examples

gunz_cm.reconstructions.third_party.h3dg module

Module.

Examples

gunz_cm.reconstructions.third_party.h3dg.comp_h3dg_obj_perf(region1: str, resolution: int, balancing: str, cm_fpath: str, points_fpath: str, mappings_fpath: str, region2: str | None = None) dict[source]

Computes the performance metrics (Spearman and Pearson correlation) for Euclidean distances predicted by Hierarchical 3D Genome.

This function computes the Spearman correlation between contact counts and Euclidean distances derived from 3D points. It handles edge cases such as invalid loci and missing data.

region1str

The chromosome region for the first chromosome.

resolutionint

The resolution of the data.

balancingstr

The balancing method to use.

cm_fpathstr

The file path to the input contact matrix data.

points_fpathstr

The file path to the points data (pdb or xyz).

mappings_fpathstr

The file path to the mappings data.

region2Optional[str], optional

The chromosome region for the second chromosome (default is None).

dict

A dictionary containing the region, correlation, and data ratio.

Examples

gunz_cm.reconstructions.third_party.h3dg.find_h3dg_mapping_fpath(path: str) str[source]

Finds the coordinate mapping file in the specified directory.

This function searches for files with the _coordinate_mapping.txt extension and returns the first one found.

pathstr

The directory path to search for mapping files.

str

The file path to the coordinate mapping file.

Examples

gunz_cm.reconstructions.third_party.h3dg.find_h3dg_points_fpath(path: str) str[source]

Finds the most recent points file in the specified directory.

This function searches for files with the .pdb extension and returns the most recent one.

pathstr

The directory path to search for points files.

str

The file path to the most recent points file.

Examples

gunz_cm.reconstructions.third_party.h3dg.gen_h3dg_coo(chr_region: str, resolution: int, balancing: str, input_fpath: str, output_fpath: str, overwrite: bool = False) None[source]

Generates COO format files for H3DG from a contact matrix file.

  • This function generates both raw and normalized COO files.

  • The raw counts file is saved with the suffix .raw.

  • The normalized counts file is saved with the suffix specified by the balancing parameter.

chr_regionstr

The chromosome region to process.

resolutionint

The resolution of the contact matrix.

balancingstr

The balancing method to use for normalization.

input_fpathstr

The file path to the input contact matrix data.

output_fpathstr

The base file path for the output COO files.

overwritebool, optional

Whether to overwrite existing files (default is False).

None

Examples

gunz_cm.reconstructions.third_party.h3dg.gen_h3dg_domain(input_fpath: str, output_fpath: str, overwrite: bool = False, only_autosome: bool = False) None[source]

Converts an arrowhead file to an H3DG domain file.

This function reads an input file containing domain information, filters for intra-chromosomal interactions, and optionally filters for autosomes only. It then writes the processed data to an output file in H3DG format. If overwrite is False and the output file already exists, the function will raise a FileExistsError.

input_fpathstr

The file path to the input arrowhead file.

output_fpathstr

The file path to the output H3DG domain file.

overwritebool, optional

If True, allows overwriting the output file if it already exists. Default is False.

only_autosomebool, optional

If True, filters the domains to include only autosomes. Default is False.

None

Examples

gunz_cm.reconstructions.third_party.h3dg.load_h3dg_points(res_path: str, points_fpath: str | None = None, parser: str = 'regex', mapping_fpath: str | None = None, resolution: int | None = None, num_bins: int | None = None, rc_ids: List | None = None, def_coor: float = nan) ndarray[source]

Loads and processes 3D points from a PDB file and a coordinate mapping file.

  • If points_fpath is not provided, the function finds the most recent points file in res_path.

  • If mapping_fpath is not provided, the function finds the coordinate mapping file in res_path.

  • The function handles missing data and invalid loci by setting them to def_coor.

res_pathstr

The directory path containing the points and mapping files.

points_fpathOptional[str], optional

The file path to the PDB file containing the points, default is None.

parserstr, optional

The parser to use (‘regex’ or ‘pandas’), default is ‘regex’.

mapping_fpathOptional[str], optional

The file path to the coordinate mapping file, default is None.

resolutionOptional[int], optional

The resolution to use for normalization, default is None.

num_binsOptional[int], optional

The number of bins, default is None.

rc_idsOptional[List]], optional

Row and column IDs to filter valid loci, default is None.

def_coorfloat, optional

The default coordinate value for missing or invalid loci, default is np.nan.

np.ndarray

A 2D numpy array of shape (N, 3) containing the processed points.

Examples

gunz_cm.reconstructions.third_party.h3dg.parse_h3dg_mapping(mapping_fpath: str, resolution: int | None = None) ndarray[source]

Parses the coordinate mapping file and returns a mapping matrix.

  • If resolution is not provided, it is inferred from the differences in the loci.

  • The function normalizes the loci by the resolution.

mapping_fpathstr

The file path to the coordinate mapping file.

resolutionOptional[int], optional

The resolution to use for normalization, default is None.

np.ndarray

A 2D numpy array of shape (N, 2) containing the parsed mapping.

Examples

gunz_cm.reconstructions.third_party.h3dg.parse_h3dg_points(points_fpath: str, parser: str = 'regex') ndarray[source]

Parses 3D points from a PDB file using a specified parser.

  • The function supports two parsers: ‘regex’ and ‘pandas’.

  • The ‘regex’ parser uses regular expressions to extract coordinates.

  • The ‘pandas’ parser uses pandas to read the file and extract coordinates.

points_fpathstr

The file path to the PDB file containing the points.

parserstr, optional

The parser to use (‘regex’ or ‘pandas’), default is ‘regex’.

np.ndarray

A 2D numpy array of shape (N, 3) containing the parsed points.

Examples

gunz_cm.reconstructions.third_party.shneigh module

Module.

Examples

gunz_cm.reconstructions.third_party.shneigh.comp_shneigh_obj_perf(region1: str, resolution: int, balancing: str, input_fpath: str, points_fpath: str, region2: str | None = None) dict[source]

Computes the performance metrics (Spearman and Pearson correlation) for Euclidean distances predicted by SHNeigh.

This function computes the Spearman rank correlation between contact counts and Euclidean distances derived from 3D coordinates. It handles cases where some loci are invalid and ensures that only valid data points are used in the computation. If the points file does not exist, a FileNotFoundError is raised.

region1str

The chromosome region for the first chromosome.

resolutionint

The resolution of the data.

balancingstr

The balancing method.

input_fpathstr

The path to the input file containing contact counts.

points_fpathstr

The path to the file containing 3D coordinates.

region2Optional[str], optional

The chromosome region for the second chromosome, by default None.

dict

A dictionary containing the region, Spearman rank correlation, Pearson correlation, and data ratio.

Examples

gunz_cm.reconstructions.third_party.shneigh.gen_shneigh_coo(chr_region: str, resolution: int, balancing: str, input_fpath: str, output_fpath: str, overwrite: bool = False, exist_ok: bool = False)[source]

Generates a COO format file for SHNeigh.

This function converts the input data to a COO format file suitable for SHNeigh.

chr_regionstr

The chromosome region.

resolutionint

The resolution of the data.

balancingstr

The balancing method.

input_fpathstr

The path to the input file.

output_fpathstr

The path to the output file.

overwritebool, optional

Whether to overwrite the output file if it exists, by default False.

None

Examples

gunz_cm.reconstructions.third_party.superrec module

Module.

Examples

gunz_cm.reconstructions.third_party.superrec.comp_superrec_obj_perf(region1: str, resolution: int, balancing: str, input_fpath: str, points_fpath: str, region2: str | None = None) dict[source]

Computes the performance metrics (Spearman and Pearson correlation) for Euclidean distances predicted by SuperRec.

This function loads the count data for a specified region and resolution, computes the Euclidean distance matrix from the points file, and then calculates the Spearman and Pearson correlation coefficients between the counts and the distances. The function assumes that the points file contains valid points and that the row and column IDs are mapped to these valid points.

region1str

The first region for which to compute the performance metrics.

resolutionint

The resolution at which to load the data.

balancingstr

The balancing method to use when loading the data.

input_fpathstr

The file path to the input data.

points_fpathstr

The file path to the points data.

region2Optional[str], optional

The second region for which to compute the performance metrics, by default None.

dict

A dictionary containing the region, Spearman correlation coefficient, Pearson correlation coefficient, and data ratio.

Examples

gunz_cm.reconstructions.third_party.superrec.gen_superrec_coo(chr_region: str, resolution: int, balancing: str, input_fpath: str, output_fpath: str, overwrite: bool = False, exist_ok: bool = False) None[source]

Generates a COO format file for SHNeigh.

This function converts the input data to a COO format file suitable for SHNeigh. It uses the convert_to_cm_coo function with specific parameters tailored for SHNeigh’s requirements.

chr_regionstr

The chromosome region.

resolutionint

The resolution of the data.

balancingstr

The balancing method.

input_fpathstr

The path to the input file.

output_fpathstr

The path to the output file.

overwritebool, optional

Whether to overwrite the output file if it exists, by default False.

exist_okbool, optional

Whether to ignore the operation if the output file already exists, by default False.

None

Examples

Module contents