gunz_cm.utils package
Submodules
gunz_cm.utils.intervals module
Genomic interval utilities for binnification and set operations. Implemented to minimize dependencies on bioframe for core dataloading tasks.
Examples
- gunz_cm.utils.intervals.binnify(chromsizes: Dict[str, int], binsize: int) DataFrame[source]
Divide a genome into evenly sized bins. Matches bioframe.binnify logic.
- chromsizesdict
Dictionary mapping chromosome names to lengths in bp.
- binsizeint
Size of bins in bp.
- pd.DataFrame
DataFrame with columns: ‘chrom’, ‘start’, ‘end’.
Examples
- gunz_cm.utils.intervals.subtract(df1: DataFrame, df2: DataFrame) DataFrame[source]
Remove intervals from df1 that overlap with any interval in df2. Simplified implementation of interval subtraction.
- df1pd.DataFrame
Target intervals (e.g., training windows).
- df2pd.DataFrame
Excluded intervals (e.g., centromeres, blacklisted regions).
- pd.DataFrame
Filtered df1 containing only intervals that do NOT overlap with df2.
Examples
gunz_cm.utils.logger module
Centralized logging configuration for the gunz_cm package.
Examples
gunz_cm.utils.matrix module
Module.
Examples
gunz_cm.utils.path module
Module.
Examples
- gunz_cm.utils.path.append_root_dir() None[source]
Append the root directory to sys.path.
This function first retrieves the root directory of the Git repository using get_root_dir(). If the root directory is not already in sys.path, it appends it and prints a confirmation message.
None
None
Yeremia G. Adhisantoso (adhisant@tnt.uni-hannover.de)
Osiris v3.0
Examples
- gunz_cm.utils.path.get_root_dir() str[source]
Get the root directory of the Git repository.
This function attempts to find the root directory of the Git repository by searching from the current directory upwards. If no Git repository is found, it raises a RuntimeError.
None
- str
The root directory of the Git repository.
Yeremia G. Adhisantoso (adhisant@tnt.uni-hannover.de)
Osiris v3.0
Examples
gunz_cm.utils.resources module
Utilities for fetching genomic resources (centromeres, blacklists).
Examples
- gunz_cm.utils.resources.fetch_centromeres(genome: str, cache: bool = True, cache_dir: Path = PosixPath('/home/adhisant/.gunz_cm/resources')) DataFrame[source]
Fetch centromere coordinates for a given genome assembly from UCSC.
- genomestr
Genome assembly name (e.g., ‘hg19’, ‘hg38’, ‘mm10’).
- cachebool, optional
Whether to cache the downloaded data. Defaults to True.
- cache_dirpathlib.Path, optional
Directory to store cached files. Defaults to ~/.gunz_cm/resources.
- pd.DataFrame
DataFrame with columns: [‘chrom’, ‘start’, ‘end’, ‘name’, ‘gieStain’].
Examples
gunz_cm.utils.stream module
Module.
Examples
- gunz_cm.utils.stream.bstr2int(payload: bytes, order: str | None = 'big') int[source]
Converts a byte string to an integer.
This function converts a byte string to an integer using a specified byte order. If the byte string is empty, a ValueError is raised.
- payloadbytes
The byte string to convert.
- ordert.Optional[str], optional
The byte order to use (default is ‘big’).
- int
The integer representation of the byte string.
Yeremia G. Adhisantoso (adhisant@tnt.uni-hannover.de)
Qwen2.5 72B - 4.25bpw
Examples
- gunz_cm.utils.stream.int2bstr(val: int, len_in_byte: int, order: Literal['big', 'little'] = 'big') bytes[source]
Converts an integer to a byte string.
This function converts an integer to a byte string of a specified length and byte order. If the integer is too large to fit in the specified length, it will be truncated.
- valint
The integer to convert.
- len_in_byteint
The length of the resulting byte string in bytes.
- ordert.Literal[‘big’, ‘little’], optional
The byte order to use (default is ‘big’).
- bytes
The byte string representation of the integer.
Yeremia G. Adhisantoso (adhisant@tnt.uni-hannover.de)
Qwen2.5 72B - 4.25bpw
Examples
- gunz_cm.utils.stream.read_str(reader: BinaryIO, encoding: str = 'utf-8') str[source]
Reads a null-terminated string from a binary reader.
This function reads bytes from the reader until a null byte (b’') is encountered. The read bytes are then decoded using the specified encoding.
- readert.BinaryIO
The binary reader to read from.
- encodingstr, optional
The encoding to use for decoding the read bytes (default is ‘utf-8’).
- str
The decoded string.
Yeremia G. Adhisantoso (adhisant@tnt.uni-hannover.de)
Qwen2.5 72B - 4.25bpw
Examples