# Core Concepts

Understanding the mental model of **Gunz-CM** is key to using it effectively for large-scale genomic analysis.

## 1. The Unified Matrix Facade

Genomic contact data comes in many flavors: sparse COO (CSV), binary HDF5 (Cooler), and indexed binary (.hic). 

Gunz-CM abstracts these details away. Whether you load a 10kb resolution Cooler file or a raw CSV, you interact with the **`ContactMatrix`** object. This object provides a consistent API for:
*   Genomic metadata (Chromosomes, resolution).
*   Data access (as DataFrames, CSR matrices, or NumPy arrays).

## 2. Lazy Loading vs. In-Memory

The library is designed for memory efficiency:
*   **Lazy Loaders:** When you call `load_cm_data`, the library typically only reads the specific genomic range you requested.
*   **Memory-Map (Memmap):** For massive datasets, Gunz-CM supports memory-mapping, allowing you to treat a file on disk as if it were an in-memory NumPy array without consuming all your RAM.

## 3. Preprocessing Pipelines

Gunz-CM treats preprocessing as a series of functional transformations. The typical workflow is:
1.  **Load:** Import raw data into a `ContactMatrix`.
2.  **Filter:** Use optimized functions like `filter_empty_rowcols` to remove non-informative regions (e.g., centromeres or unalignable bins).
3.  **Normalize:** Apply balancing weights (KR, ICE, or VC) to correct for sequencing bias.
4.  **Analyze:** Pass the cleaned matrix to downstream metrics (Spearman R, IPR4, etc.).

## 4. Matrix Backends

You can choose your underlying data representation based on your task:
*   **Pandas (COO):** Best for sparse manipulation and filtering.
*   **SciPy (CSR/CSC):** Best for heavy linear algebra and loss calculations.
*   **PyTorch/Tensor:** Best for 3D reconstruction and gradient-based optimization.