forgeCol / forgeMat Developer Guide¶
Overview¶
forgeCol<T> and forgeMat<T> are forge's portable vector and matrix types. They support:
- CPU scalar loops (all platforms)
- OpenACC GPU offloading (NVIDIA, via nvc++)
- Apple Metal GPU dispatch (float only, Apple Silicon)
Key Conventions¶
forgeMat set_size¶
IMPORTANT: forgeMat::set_size(nCols, nRows) - first argument is columns, second is rows. This is opposite to most matrix APIs. The constructor forgeMat(nRows, nCols) takes rows first (normal order), but internally calls set_size(nCols, nRows).
Column-Major Layout¶
Data is stored column-major: element (r, c) is at index r + n_rows * c. This matches Armadillo/LAPACK/Fortran conventions.
Type System¶
| forgeCol type | Scalar type | Metal dispatch? |
|---|---|---|
forgeCol<float> |
float |
Yes (N >= 4096) |
forgeCol<double> |
double |
No (CPU only) |
forgeCol<forgeComplex<float>> |
forgeComplex<float> |
Yes (N >= 4096) |
forgeCol<forgeComplex<double>> |
forgeComplex<double> |
No (CPU only) |
View Mode¶
A forgeCol view wraps external memory without owning it. Views enable zero-copy column extraction from forgeMat and subvec slicing.
Creating Views¶
// Factory method
forgeCol<float> v = forgeCol<float>::view(ptr, length);
// From forgeMat column
forgeCol<float> col = mat.col(2); // view into matrix column 2
col.is_view(); // true
// Subvec
forgeCol<float> sv = vec.subvec(10, 19); // view of elements 10..19
View Semantics¶
- Write-through: Modifications via a view write to the underlying memory.
mat.col(1) %= scale; // modifies matrix column 1 in-place - Copy produces owning:
forgeCol<float> copy(view)deep-copies;copy.is_view() == false. - Destructor skips free: Views don't free memory on destruction.
- Lifetime: The view must not outlive the underlying memory.
- set_size breaks view: Calling
set_size()on a view breaks the view (setsisView_ = false) and allocates fresh memory. The original memory is NOT freed.
col_copy vs col¶
mat.col(ii)returns a non-owning view (fast, no allocation)mat.col_copy(ii)returns a deep copy (safe for long-lived use)mat.set_col(ii, src)writes a forgeCol into column ii (memcpy)
Arma <-> forgeCol Conversion¶
// arma -> forgeCol (copies data)
arma::Col<std::complex<float>> arma_vec = ...;
forgeCol<forgeComplex<float>> pg_vec(arma_vec);
// forgeCol -> arma (copies data)
arma::Col<std::complex<float>> back = pg_vec.getArma();
Both conversions are memcpy. forgeComplex<T> and std::complex<T> have identical memory layout.
Pattern for Gnufft/Robject Boundaries¶
// Forward: forgeCol -> arma -> Gnufft -> arma -> forgeCol
Col<CxT1> result = (*G_obj) * weighted_pg.getArma();
outData_pg.set_col(ii, forgeCol<forgeComplex<T1>>(result));
// Adjoint: same pattern
Col<CxT1> adjResult = (*G_obj) / slice_pg.getArma();
forgeCol<forgeComplex<T1>> adj_pg(adjResult);
Available Operators¶
Element-wise (forgeCol)¶
| Operator | Description | Metal dispatch |
|---|---|---|
A + B |
addition | cvec_add / vec_add |
A - B |
subtraction | cvec_sub / vec_sub |
A % B |
element-wise multiply | cvec_mul / vec_mul |
A / B |
element-wise divide | cvec_div / vec_div |
A + s |
scalar add | vec_add_scalar |
A % s |
scalar multiply | cvec_mul_scalar / vec_mul_scalar |
W % X |
real * complex | rvec_cmul |
A += B |
in-place add | same kernels |
A %= B |
in-place multiply | same kernels |
Free Functions¶
| Function | Description | Metal dispatch |
|---|---|---|
sum(A) |
sum of elements | vec_sum |
cdot(A, B) |
complex dot product | cvec_cdot |
norm(A) |
L2 norm | cvec_norm2sq + sqrt |
conj(A) |
element-wise conjugate | CPU only |
abs(A) |
element-wise magnitude (complex or real) | CPU only |
real(A) |
extract real parts | CPU only |
imag(A) |
extract imaginary parts | CPU only |
accu_schur(A, B) |
element-wise dot product (no conjugation) | CPU only |
norm_grad(g, yi, W) |
scale-invariant gradient magnitude for PCG | CPU only |
to_complex(A) |
lift real vector to complex (zero imaginary) | CPU only |
log1p(A) |
element-wise log(1 + x) | CPU only |
sqrt(A) |
element-wise square root | CPU only |
square(A) |
element-wise square | CPU only |
s / A |
scalar divided by each element | CPU only |
accu(A) |
sum of elements (alias for sum) |
CPU only |
forgeMat Free Functions¶
| Function | Description |
|---|---|
sum(M, dim) |
row/column sum (dim=0: col-wise, dim=1: row-wise) |
vectorise(M) |
flatten to forgeCol |
conj(M) |
element-wise conjugate |
matmul(A, B) |
matrix-matrix or matrix-vector multiply (via Armadillo) |
trans(A) |
conjugate transpose (complex) or transpose (real) |
strans(A) |
non-conjugate (simple) transpose |
reshape(A, rows, cols) |
reshape to new dimensions (column-major preserved) |
cols(A, first, last) |
deep-copy column subrange [first, last] (inclusive) |
Aligned Allocation (METAL_COMPUTE)¶
When METAL_COMPUTE is defined, set_size() uses std::aligned_alloc(16384, ...) for page-aligned allocation (Apple Silicon page size = 16384). This enables Phase 3 newBufferWithBytesNoCopy for true zero-copy GPU buffers.
The destructor uses std::free() instead of delete[] accordingly.
has_nan()¶
Detects NaN values using bit-level inspection (works under -ffast-math):
forgeCol<float> v(100);
v.ones();
// ... some computation ...
if (v.has_nan()) { /* handle error */ }
Works for float, double, forgeComplex<float>, and forgeComplex<double>.