gunz_cm.utils package

Submodules

gunz_cm.utils.intervals module

Genomic interval utilities for binnification and set operations. Implemented to minimize dependencies on bioframe for core dataloading tasks.

Examples

gunz_cm.utils.intervals.binnify(chromsizes: Dict[str, int], binsize: int) DataFrame[source]

Divide a genome into evenly sized bins. Matches bioframe.binnify logic.

chromsizesdict

Dictionary mapping chromosome names to lengths in bp.

binsizeint

Size of bins in bp.

pd.DataFrame

DataFrame with columns: ‘chrom’, ‘start’, ‘end’.

Examples

gunz_cm.utils.intervals.subtract(df1: DataFrame, df2: DataFrame) DataFrame[source]

Remove intervals from df1 that overlap with any interval in df2. Simplified implementation of interval subtraction.

df1pd.DataFrame

Target intervals (e.g., training windows).

df2pd.DataFrame

Excluded intervals (e.g., centromeres, blacklisted regions).

pd.DataFrame

Filtered df1 containing only intervals that do NOT overlap with df2.

Examples

gunz_cm.utils.logger module

Centralized logging configuration for the gunz_cm package.

Examples

gunz_cm.utils.logger.setup_logging(verbose: bool) None[source]

Configures logging for the CLI and application.

verbosebool

If True, sets the console log level to DEBUG. Otherwise, sets it to INFO.

None

Examples

gunz_cm.utils.matrix module

Module.

Examples

gunz_cm.utils.path module

Module.

Examples

gunz_cm.utils.path.append_root_dir() None[source]

Append the root directory to sys.path.

This function first retrieves the root directory of the Git repository using get_root_dir(). If the root directory is not already in sys.path, it appends it and prints a confirmation message.

None

None

Examples

gunz_cm.utils.path.get_root_dir() str[source]

Get the root directory of the Git repository.

This function attempts to find the root directory of the Git repository by searching from the current directory upwards. If no Git repository is found, it raises a RuntimeError.

None

str

The root directory of the Git repository.

Examples

gunz_cm.utils.resources module

Utilities for fetching genomic resources (centromeres, blacklists).

Examples

gunz_cm.utils.resources.fetch_centromeres(genome: str, cache: bool = True, cache_dir: Path = PosixPath('/home/adhisant/.gunz_cm/resources')) DataFrame[source]

Fetch centromere coordinates for a given genome assembly from UCSC.

genomestr

Genome assembly name (e.g., ‘hg19’, ‘hg38’, ‘mm10’).

cachebool, optional

Whether to cache the downloaded data. Defaults to True.

cache_dirpathlib.Path, optional

Directory to store cached files. Defaults to ~/.gunz_cm/resources.

pd.DataFrame

DataFrame with columns: [‘chrom’, ‘start’, ‘end’, ‘name’, ‘gieStain’].

Examples

gunz_cm.utils.stream module

Module.

Examples

gunz_cm.utils.stream.bstr2int(payload: bytes, order: str | None = 'big') int[source]

Converts a byte string to an integer.

This function converts a byte string to an integer using a specified byte order. If the byte string is empty, a ValueError is raised.

payloadbytes

The byte string to convert.

ordert.Optional[str], optional

The byte order to use (default is ‘big’).

int

The integer representation of the byte string.

Examples

gunz_cm.utils.stream.int2bstr(val: int, len_in_byte: int, order: Literal['big', 'little'] = 'big') bytes[source]

Converts an integer to a byte string.

This function converts an integer to a byte string of a specified length and byte order. If the integer is too large to fit in the specified length, it will be truncated.

valint

The integer to convert.

len_in_byteint

The length of the resulting byte string in bytes.

ordert.Literal[‘big’, ‘little’], optional

The byte order to use (default is ‘big’).

bytes

The byte string representation of the integer.

Examples

gunz_cm.utils.stream.read_str(reader: BinaryIO, encoding: str = 'utf-8') str[source]

Reads a null-terminated string from a binary reader.

This function reads bytes from the reader until a null byte (b’') is encountered. The read bytes are then decoded using the specified encoding.

readert.BinaryIO

The binary reader to read from.

encodingstr, optional

The encoding to use for decoding the read bytes (default is ‘utf-8’).

str

The decoded string.

Examples

Module contents