Quickstart

This guide provides a rapid introduction to loading and manipulating genomic contact matrices using Gunz-CM.

1. Loading a Contact Matrix

The load_cm_data function is the universal entry point for all supported formats (.cool, .mcool, .hic, .csv).

from gunz_cm.loaders import load_cm_data

# Load a specific genomic region from a cooler file
cm = load_cm_data(
    fpath="data/sample.cool",
    chromosome1="chr1",
    resolution=10000,  # 10kb
    start1=0,
    end1=1000000       # First 1MB
)

print(f"Loaded {cm.chromosome1} at {cm.resolution}bp resolution.")

2. Accessing the Data

The resulting ContactMatrix object allows you to access the underlying data in multiple formats:

# As a Pandas DataFrame (COO format: bin1_id, bin2_id, count)
df = cm.to_dataframe()
print(df.head())

# As a SciPy Sparse Matrix (CSR)
sparse_matrix = cm.to_sparse()

3. High-Performance Filtering

Gunz-CM provides optimized filters for cleaning your data before analysis:

from gunz_cm.preprocs import filter_empty_rowcols

# Remove bins with zero total contacts (unalignable regions)
cm_filtered = filter_empty_rowcols(cm)

print(f"Original size: {cm.data.shape}")
print(f"Filtered size: {cm_filtered.data.shape}")

4. Metadata and Genomic Info

Query file-level information without loading the full matrix into memory:

from gunz_cm.loaders import get_chrom_infos, get_resolutions

# Check available resolutions in an .mcool file
res_list = get_resolutions("data/multires.mcool")
print(f"Available resolutions: {res_list}")

# Get chromosome sizes
chroms = get_chrom_infos("data/sample.cool")
print(f"Chromosome 1 size: {chroms['chr1']} bp")