Quickstart
This guide provides a rapid introduction to loading and manipulating genomic contact matrices using Gunz-CM.
1. Loading a Contact Matrix
The load_cm_data function is the universal entry point for all supported formats (.cool, .mcool, .hic, .csv).
from gunz_cm.loaders import load_cm_data
# Load a specific genomic region from a cooler file
cm = load_cm_data(
fpath="data/sample.cool",
chromosome1="chr1",
resolution=10000, # 10kb
start1=0,
end1=1000000 # First 1MB
)
print(f"Loaded {cm.chromosome1} at {cm.resolution}bp resolution.")
2. Accessing the Data
The resulting ContactMatrix object allows you to access the underlying data in multiple formats:
# As a Pandas DataFrame (COO format: bin1_id, bin2_id, count)
df = cm.to_dataframe()
print(df.head())
# As a SciPy Sparse Matrix (CSR)
sparse_matrix = cm.to_sparse()
3. High-Performance Filtering
Gunz-CM provides optimized filters for cleaning your data before analysis:
from gunz_cm.preprocs import filter_empty_rowcols
# Remove bins with zero total contacts (unalignable regions)
cm_filtered = filter_empty_rowcols(cm)
print(f"Original size: {cm.data.shape}")
print(f"Filtered size: {cm_filtered.data.shape}")
4. Metadata and Genomic Info
Query file-level information without loading the full matrix into memory:
from gunz_cm.loaders import get_chrom_infos, get_resolutions
# Check available resolutions in an .mcool file
res_list = get_resolutions("data/multires.mcool")
print(f"Available resolutions: {res_list}")
# Get chromosome sizes
chroms = get_chrom_infos("data/sample.cool")
print(f"Chromosome 1 size: {chroms['chr1']} bp")