forgeCol / forgeMat Developer Guide¶

Overview¶

forgeCol<T> and forgeMat<T> are forge's portable vector and matrix types. They support: - CPU scalar loops (all platforms) - OpenACC GPU offloading (NVIDIA, via nvc++) - Apple Metal GPU dispatch (float only, Apple Silicon)

Key Conventions¶

forgeMat set_size¶

IMPORTANT: forgeMat::set_size(nCols, nRows) - first argument is columns, second is rows. This is opposite to most matrix APIs. The constructor forgeMat(nRows, nCols) takes rows first (normal order), but internally calls set_size(nCols, nRows).

Column-Major Layout¶

Data is stored column-major: element (r, c) is at index r + n_rows * c. This matches Armadillo/LAPACK/Fortran conventions.

Type System¶

forgeCol type	Scalar type	Metal dispatch?
`forgeCol<float>`	`float`	Yes (N >= 4096)
`forgeCol<double>`	`double`	No (CPU only)
`forgeCol<forgeComplex<float>>`	`forgeComplex<float>`	Yes (N >= 4096)
`forgeCol<forgeComplex<double>>`	`forgeComplex<double>`	No (CPU only)

View Mode¶

A forgeCol view wraps external memory without owning it. Views enable zero-copy column extraction from forgeMat and subvec slicing.

Creating Views¶

// Factory method
forgeCol<float> v = forgeCol<float>::view(ptr, length);

// From forgeMat column
forgeCol<float> col = mat.col(2);      // view into matrix column 2
col.is_view();                       // true

// Subvec
forgeCol<float> sv = vec.subvec(10, 19); // view of elements 10..19

View Semantics¶

Write-through: Modifications via a view write to the underlying memory.
```
mat.col(1) %= scale;  // modifies matrix column 1 in-place
```
Copy produces owning: forgeCol<float> copy(view) deep-copies; copy.is_view() == false.
Destructor skips free: Views don't free memory on destruction.
Lifetime: The view must not outlive the underlying memory.
set_size breaks view: Calling set_size() on a view breaks the view (sets isView_ = false) and allocates fresh memory. The original memory is NOT freed.

col_copy vs col¶

mat.col(ii) returns a non-owning view (fast, no allocation)
mat.col_copy(ii) returns a deep copy (safe for long-lived use)
mat.set_col(ii, src) writes a forgeCol into column ii (memcpy)

Arma <-> forgeCol Conversion¶

// arma -> forgeCol (copies data)
arma::Col<std::complex<float>> arma_vec = ...;
forgeCol<forgeComplex<float>> pg_vec(arma_vec);

// forgeCol -> arma (copies data)
arma::Col<std::complex<float>> back = pg_vec.getArma();

Both conversions are memcpy. forgeComplex<T> and std::complex<T> have identical memory layout.

Pattern for Gnufft/Robject Boundaries¶

// Forward: forgeCol -> arma -> Gnufft -> arma -> forgeCol
Col<CxT1> result = (*G_obj) * weighted_pg.getArma();
outData_pg.set_col(ii, forgeCol<forgeComplex<T1>>(result));

// Adjoint: same pattern
Col<CxT1> adjResult = (*G_obj) / slice_pg.getArma();
forgeCol<forgeComplex<T1>> adj_pg(adjResult);

Available Operators¶

Element-wise (forgeCol)¶

Operator	Description	Metal dispatch
`A + B`	addition	`cvec_add` / `vec_add`
`A - B`	subtraction	`cvec_sub` / `vec_sub`
`A % B`	element-wise multiply	`cvec_mul` / `vec_mul`
`A / B`	element-wise divide	`cvec_div` / `vec_div`
`A + s`	scalar add	`vec_add_scalar`
`A % s`	scalar multiply	`cvec_mul_scalar` / `vec_mul_scalar`
`W % X`	real * complex	`rvec_cmul`
`A += B`	in-place add	same kernels
`A %= B`	in-place multiply	same kernels

Free Functions¶

Function	Description	Metal dispatch
`sum(A)`	sum of elements	`vec_sum`
`cdot(A, B)`	complex dot product	`cvec_cdot`
`norm(A)`	L2 norm	`cvec_norm2sq` + sqrt
`conj(A)`	element-wise conjugate	CPU only
`abs(A)`	element-wise magnitude (complex or real)	CPU only
`real(A)`	extract real parts	CPU only
`imag(A)`	extract imaginary parts	CPU only
`accu_schur(A, B)`	element-wise dot product (no conjugation)	CPU only
`norm_grad(g, yi, W)`	scale-invariant gradient magnitude for PCG	CPU only
`to_complex(A)`	lift real vector to complex (zero imaginary)	CPU only
`log1p(A)`	element-wise log(1 + x)	CPU only
`sqrt(A)`	element-wise square root	CPU only
`square(A)`	element-wise square	CPU only
`s / A`	scalar divided by each element	CPU only
`accu(A)`	sum of elements (alias for `sum`)	CPU only

forgeMat Free Functions¶

Function	Description
`sum(M, dim)`	row/column sum (dim=0: col-wise, dim=1: row-wise)
`vectorise(M)`	flatten to forgeCol
`conj(M)`	element-wise conjugate
`matmul(A, B)`	matrix-matrix or matrix-vector multiply (via Armadillo)
`trans(A)`	conjugate transpose (complex) or transpose (real)
`strans(A)`	non-conjugate (simple) transpose
`reshape(A, rows, cols)`	reshape to new dimensions (column-major preserved)
`cols(A, first, last)`	deep-copy column subrange [first, last] (inclusive)

Aligned Allocation (METAL_COMPUTE)¶

When METAL_COMPUTE is defined, set_size() uses std::aligned_alloc(16384, ...) for page-aligned allocation (Apple Silicon page size = 16384). This enables Phase 3 newBufferWithBytesNoCopy for true zero-copy GPU buffers.

The destructor uses std::free() instead of delete[] accordingly.

has_nan()¶

Detects NaN values using bit-level inspection (works under -ffast-math):

forgeCol<float> v(100);
v.ones();
// ... some computation ...
if (v.has_nan()) { /* handle error */ }

Works for float, double, forgeComplex<float>, and forgeComplex<double>.