gunz_cm.reconstructions.third_party package
Submodules
gunz_cm.reconstructions.third_party.flamingo module
Module.
Examples
- gunz_cm.reconstructions.third_party.flamingo.comp_flamingo_obj_perf(region1: str, resolution: int, balancing: str, input_fpath: str, points_fpath: str, region2: str | None = None) dict[source]
Computes the performance metrics (Spearman and Pearson correlation) for Euclidean distances predicted by FLAMINGO.
The function loads contact map data and point coordinates, filters out invalid data, and computes the Euclidean distance matrix.
If region2 is not provided, the function assumes the same region for both comparisons.
The function raises a FileNotFoundError if the points file does not exist.
- region1str
The first genomic region to analyze.
- resolutionint
The resolution of the genomic data.
- balancingstr
The balancing method to use for the contact map data.
- input_fpathstr
The file path to the input contact map data.
- points_fpathstr
The file path to the points data.
- region2t.Optional[str], optional
The second genomic region to analyze, by default None.
- dict
A dictionary containing the region, Spearman correlation, Pearson correlation, and the ratio of valid data.
Yeremia G. Adhisantoso (adhisant@tnt.uni-hannover.de)
Qwen2.5 72B - 4.25bpw
Examples
gunz_cm.reconstructions.third_party.h3dg module
Module.
Examples
- gunz_cm.reconstructions.third_party.h3dg.comp_h3dg_obj_perf(region1: str, resolution: int, balancing: str, cm_fpath: str, points_fpath: str, mappings_fpath: str, region2: str | None = None) dict[source]
Computes the performance metrics (Spearman and Pearson correlation) for Euclidean distances predicted by Hierarchical 3D Genome.
This function computes the Spearman correlation between contact counts and Euclidean distances derived from 3D points. It handles edge cases such as invalid loci and missing data.
- region1str
The chromosome region for the first chromosome.
- resolutionint
The resolution of the data.
- balancingstr
The balancing method to use.
- cm_fpathstr
The file path to the input contact matrix data.
- points_fpathstr
The file path to the points data (pdb or xyz).
- mappings_fpathstr
The file path to the mappings data.
- region2Optional[str], optional
The chromosome region for the second chromosome (default is None).
- dict
A dictionary containing the region, correlation, and data ratio.
Yeremia G. Adhisantoso (adhisant@tnt.uni-hannover.de)
Qwen2.5 Coder 32B - 6.5bpw
Examples
- gunz_cm.reconstructions.third_party.h3dg.find_h3dg_mapping_fpath(path: str) str[source]
Finds the coordinate mapping file in the specified directory.
This function searches for files with the _coordinate_mapping.txt extension and returns the first one found.
- pathstr
The directory path to search for mapping files.
- str
The file path to the coordinate mapping file.
Yeremia G. Adhisantoso (adhisant@tnt.uni-hannover.de)
Qwen2.5 Coder 32B - 6.5bpw
Examples
- gunz_cm.reconstructions.third_party.h3dg.find_h3dg_points_fpath(path: str) str[source]
Finds the most recent points file in the specified directory.
This function searches for files with the .pdb extension and returns the most recent one.
- pathstr
The directory path to search for points files.
- str
The file path to the most recent points file.
Yeremia G. Adhisantoso (adhisant@tnt.uni-hannover.de)
Qwen2.5 Coder 32B - 6.5bpw
Examples
- gunz_cm.reconstructions.third_party.h3dg.gen_h3dg_coo(chr_region: str, resolution: int, balancing: str, input_fpath: str, output_fpath: str, overwrite: bool = False) None[source]
Generates COO format files for H3DG from a contact matrix file.
This function generates both raw and normalized COO files.
The raw counts file is saved with the suffix .raw.
The normalized counts file is saved with the suffix specified by the balancing parameter.
- chr_regionstr
The chromosome region to process.
- resolutionint
The resolution of the contact matrix.
- balancingstr
The balancing method to use for normalization.
- input_fpathstr
The file path to the input contact matrix data.
- output_fpathstr
The base file path for the output COO files.
- overwritebool, optional
Whether to overwrite existing files (default is False).
None
Yeremia G. Adhisantoso (adhisant@tnt.uni-hannover.de)
Qwen2.5 Coder 32B - 6.5bpw
Examples
- gunz_cm.reconstructions.third_party.h3dg.gen_h3dg_domain(input_fpath: str, output_fpath: str, overwrite: bool = False, only_autosome: bool = False) None[source]
Converts an arrowhead file to an H3DG domain file.
This function reads an input file containing domain information, filters for intra-chromosomal interactions, and optionally filters for autosomes only. It then writes the processed data to an output file in H3DG format. If overwrite is False and the output file already exists, the function will raise a FileExistsError.
- input_fpathstr
The file path to the input arrowhead file.
- output_fpathstr
The file path to the output H3DG domain file.
- overwritebool, optional
If True, allows overwriting the output file if it already exists. Default is False.
- only_autosomebool, optional
If True, filters the domains to include only autosomes. Default is False.
None
Yeremia G. Adhisantoso (adhisant@tnt.uni-hannover.de)
Qwen2.5 Coder 32B - 6.5bpw
Examples
- gunz_cm.reconstructions.third_party.h3dg.load_h3dg_points(res_path: str, points_fpath: str | None = None, parser: str = 'regex', mapping_fpath: str | None = None, resolution: int | None = None, num_bins: int | None = None, rc_ids: List | None = None, def_coor: float = nan) ndarray[source]
Loads and processes 3D points from a PDB file and a coordinate mapping file.
If points_fpath is not provided, the function finds the most recent points file in res_path.
If mapping_fpath is not provided, the function finds the coordinate mapping file in res_path.
The function handles missing data and invalid loci by setting them to def_coor.
- res_pathstr
The directory path containing the points and mapping files.
- points_fpathOptional[str], optional
The file path to the PDB file containing the points, default is None.
- parserstr, optional
The parser to use (‘regex’ or ‘pandas’), default is ‘regex’.
- mapping_fpathOptional[str], optional
The file path to the coordinate mapping file, default is None.
- resolutionOptional[int], optional
The resolution to use for normalization, default is None.
- num_binsOptional[int], optional
The number of bins, default is None.
- rc_idsOptional[List]], optional
Row and column IDs to filter valid loci, default is None.
- def_coorfloat, optional
The default coordinate value for missing or invalid loci, default is np.nan.
- np.ndarray
A 2D numpy array of shape (N, 3) containing the processed points.
Yeremia G. Adhisantoso (adhisant@tnt.uni-hannover.de)
Qwen2.5 Coder 32B - 6.5bpw
Examples
- gunz_cm.reconstructions.third_party.h3dg.parse_h3dg_mapping(mapping_fpath: str, resolution: int | None = None) ndarray[source]
Parses the coordinate mapping file and returns a mapping matrix.
If resolution is not provided, it is inferred from the differences in the loci.
The function normalizes the loci by the resolution.
- mapping_fpathstr
The file path to the coordinate mapping file.
- resolutionOptional[int], optional
The resolution to use for normalization, default is None.
- np.ndarray
A 2D numpy array of shape (N, 2) containing the parsed mapping.
Yeremia G. Adhisantoso (adhisant@tnt.uni-hannover.de)
Qwen2.5 Coder 32B - 6.5bpw
Examples
- gunz_cm.reconstructions.third_party.h3dg.parse_h3dg_points(points_fpath: str, parser: str = 'regex') ndarray[source]
Parses 3D points from a PDB file using a specified parser.
The function supports two parsers: ‘regex’ and ‘pandas’.
The ‘regex’ parser uses regular expressions to extract coordinates.
The ‘pandas’ parser uses pandas to read the file and extract coordinates.
- points_fpathstr
The file path to the PDB file containing the points.
- parserstr, optional
The parser to use (‘regex’ or ‘pandas’), default is ‘regex’.
- np.ndarray
A 2D numpy array of shape (N, 3) containing the parsed points.
Yeremia G. Adhisantoso (adhisant@tnt.uni-hannover.de)
Qwen2.5 Coder 32B - 6.5bpw
Examples
gunz_cm.reconstructions.third_party.shneigh module
Module.
Examples
- gunz_cm.reconstructions.third_party.shneigh.comp_shneigh_obj_perf(region1: str, resolution: int, balancing: str, input_fpath: str, points_fpath: str, region2: str | None = None) dict[source]
Computes the performance metrics (Spearman and Pearson correlation) for Euclidean distances predicted by SHNeigh.
This function computes the Spearman rank correlation between contact counts and Euclidean distances derived from 3D coordinates. It handles cases where some loci are invalid and ensures that only valid data points are used in the computation. If the points file does not exist, a FileNotFoundError is raised.
- region1str
The chromosome region for the first chromosome.
- resolutionint
The resolution of the data.
- balancingstr
The balancing method.
- input_fpathstr
The path to the input file containing contact counts.
- points_fpathstr
The path to the file containing 3D coordinates.
- region2Optional[str], optional
The chromosome region for the second chromosome, by default None.
- dict
A dictionary containing the region, Spearman rank correlation, Pearson correlation, and data ratio.
Yeremia G. Adhisantoso (adhisant@tnt.uni-hannover.de)
Qwen2.5 72B - 4.25bpw
Examples
- gunz_cm.reconstructions.third_party.shneigh.gen_shneigh_coo(chr_region: str, resolution: int, balancing: str, input_fpath: str, output_fpath: str, overwrite: bool = False, exist_ok: bool = False)[source]
Generates a COO format file for SHNeigh.
This function converts the input data to a COO format file suitable for SHNeigh.
- chr_regionstr
The chromosome region.
- resolutionint
The resolution of the data.
- balancingstr
The balancing method.
- input_fpathstr
The path to the input file.
- output_fpathstr
The path to the output file.
- overwritebool, optional
Whether to overwrite the output file if it exists, by default False.
None
Yeremia G. Adhisantoso (adhisant@tnt.uni-hannover.de)
Qwen2.5 72B - 4.25bpw
Examples
gunz_cm.reconstructions.third_party.superrec module
Module.
Examples
- gunz_cm.reconstructions.third_party.superrec.comp_superrec_obj_perf(region1: str, resolution: int, balancing: str, input_fpath: str, points_fpath: str, region2: str | None = None) dict[source]
Computes the performance metrics (Spearman and Pearson correlation) for Euclidean distances predicted by SuperRec.
This function loads the count data for a specified region and resolution, computes the Euclidean distance matrix from the points file, and then calculates the Spearman and Pearson correlation coefficients between the counts and the distances. The function assumes that the points file contains valid points and that the row and column IDs are mapped to these valid points.
- region1str
The first region for which to compute the performance metrics.
- resolutionint
The resolution at which to load the data.
- balancingstr
The balancing method to use when loading the data.
- input_fpathstr
The file path to the input data.
- points_fpathstr
The file path to the points data.
- region2Optional[str], optional
The second region for which to compute the performance metrics, by default None.
- dict
A dictionary containing the region, Spearman correlation coefficient, Pearson correlation coefficient, and data ratio.
Yeremia G. Adhisantoso (adhisant@tnt.uni-hannover.de)
Qwen2.5 72B - 4.25bpw
Examples
- gunz_cm.reconstructions.third_party.superrec.gen_superrec_coo(chr_region: str, resolution: int, balancing: str, input_fpath: str, output_fpath: str, overwrite: bool = False, exist_ok: bool = False) None[source]
Generates a COO format file for SHNeigh.
This function converts the input data to a COO format file suitable for SHNeigh. It uses the convert_to_cm_coo function with specific parameters tailored for SHNeigh’s requirements.
- chr_regionstr
The chromosome region.
- resolutionint
The resolution of the data.
- balancingstr
The balancing method.
- input_fpathstr
The path to the input file.
- output_fpathstr
The path to the output file.
- overwritebool, optional
Whether to overwrite the output file if it exists, by default False.
- exist_okbool, optional
Whether to ignore the operation if the output file already exists, by default False.
None
Yeremia G. Adhisantoso (adhisant@tnt.uni-hannover.de)
Osiris v3.2
Examples