Skip to content

forgeCol / forgeMat Developer Guide

Overview

forgeCol<T> and forgeMat<T> are forge's portable vector and matrix types. They support: - CPU scalar loops (all platforms) - OpenACC GPU offloading (NVIDIA, via nvc++) - Apple Metal GPU dispatch (float only, Apple Silicon)

Key Conventions

forgeMat set_size

IMPORTANT: forgeMat::set_size(nCols, nRows) - first argument is columns, second is rows. This is opposite to most matrix APIs. The constructor forgeMat(nRows, nCols) takes rows first (normal order), but internally calls set_size(nCols, nRows).

Column-Major Layout

Data is stored column-major: element (r, c) is at index r + n_rows * c. This matches Armadillo/LAPACK/Fortran conventions.

Type System

forgeCol type Scalar type Metal dispatch?
forgeCol<float> float Yes (N >= 4096)
forgeCol<double> double No (CPU only)
forgeCol<forgeComplex<float>> forgeComplex<float> Yes (N >= 4096)
forgeCol<forgeComplex<double>> forgeComplex<double> No (CPU only)

View Mode

A forgeCol view wraps external memory without owning it. Views enable zero-copy column extraction from forgeMat and subvec slicing.

Creating Views

// Factory method
forgeCol<float> v = forgeCol<float>::view(ptr, length);

// From forgeMat column
forgeCol<float> col = mat.col(2);      // view into matrix column 2
col.is_view();                       // true

// Subvec
forgeCol<float> sv = vec.subvec(10, 19); // view of elements 10..19

View Semantics

  • Write-through: Modifications via a view write to the underlying memory.
    mat.col(1) %= scale;  // modifies matrix column 1 in-place
    
  • Copy produces owning: forgeCol<float> copy(view) deep-copies; copy.is_view() == false.
  • Destructor skips free: Views don't free memory on destruction.
  • Lifetime: The view must not outlive the underlying memory.
  • set_size breaks view: Calling set_size() on a view breaks the view (sets isView_ = false) and allocates fresh memory. The original memory is NOT freed.

col_copy vs col

  • mat.col(ii) returns a non-owning view (fast, no allocation)
  • mat.col_copy(ii) returns a deep copy (safe for long-lived use)
  • mat.set_col(ii, src) writes a forgeCol into column ii (memcpy)

Arma <-> forgeCol Conversion

// arma -> forgeCol (copies data)
arma::Col<std::complex<float>> arma_vec = ...;
forgeCol<forgeComplex<float>> pg_vec(arma_vec);

// forgeCol -> arma (copies data)
arma::Col<std::complex<float>> back = pg_vec.getArma();

Both conversions are memcpy. forgeComplex<T> and std::complex<T> have identical memory layout.

Pattern for Gnufft/Robject Boundaries

// Forward: forgeCol -> arma -> Gnufft -> arma -> forgeCol
Col<CxT1> result = (*G_obj) * weighted_pg.getArma();
outData_pg.set_col(ii, forgeCol<forgeComplex<T1>>(result));

// Adjoint: same pattern
Col<CxT1> adjResult = (*G_obj) / slice_pg.getArma();
forgeCol<forgeComplex<T1>> adj_pg(adjResult);

Available Operators

Element-wise (forgeCol)

Operator Description Metal dispatch
A + B addition cvec_add / vec_add
A - B subtraction cvec_sub / vec_sub
A % B element-wise multiply cvec_mul / vec_mul
A / B element-wise divide cvec_div / vec_div
A + s scalar add vec_add_scalar
A % s scalar multiply cvec_mul_scalar / vec_mul_scalar
W % X real * complex rvec_cmul
A += B in-place add same kernels
A %= B in-place multiply same kernels

Free Functions

Function Description Metal dispatch
sum(A) sum of elements vec_sum
cdot(A, B) complex dot product cvec_cdot
norm(A) L2 norm cvec_norm2sq + sqrt
conj(A) element-wise conjugate CPU only
abs(A) element-wise magnitude CPU only
real(A) extract real parts CPU only
imag(A) extract imaginary parts CPU only

forgeMat Free Functions

Function Description
sum(M, dim) row/column sum (dim=0: col-wise, dim=1: row-wise)
vectorise(M) flatten to forgeCol
conj(M) element-wise conjugate

Aligned Allocation (METAL_COMPUTE)

When METAL_COMPUTE is defined, set_size() uses std::aligned_alloc(16384, ...) for page-aligned allocation (Apple Silicon page size = 16384). This enables Phase 3 newBufferWithBytesNoCopy for true zero-copy GPU buffers.

The destructor uses std::free() instead of delete[] accordingly.

has_nan()

Detects NaN values using bit-level inspection (works under -ffast-math):

forgeCol<float> v(100);
v.ones();
// ... some computation ...
if (v.has_nan()) { /* handle error */ }

Works for float, double, forgeComplex<float>, and forgeComplex<double>.