Documentation Quality Pass Implementation Plan¶

For agentic workers: REQUIRED: Use superpowers:subagent-driven-development (if subagents available) or superpowers:executing-plans to implement this plan. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Improve all public and private doc comments across forge to include LaTeX math, literature references with DOIs, usage examples, template constraints, and cross-references.

Architecture: Edit doc comments in C++ headers and implementation files in-place. No code logic changes. Validate with doxide build && mkdocs build after each task group. Tier 1 (operators/solvers) pauses for user review; all other tiers are autonomous.

Tech Stack: C++ doc comments (///), Doxide (Tree-sitter), MkDocs Material, MathJax (pymdownx.arithmatex)

Spec: docs/superpowers/specs/2026-03-17-doc-quality-pass-design.md

Conventions for All Tasks¶

Every task follows the same pattern:

Read the source file
Enhance doc comments per the spec (math, refs, examples, cross-refs)
Convert Unicode math (β, ½, ‖·‖) to LaTeX ( $\beta$ , $\frac{1}{2}$ , $\|\cdot\|$ )
Use $...$ for inline math, $$...$$ for display math
Use @see with DOI for literature refs
Add @code/@endcode blocks for examples
Field maps use $\omega$ (rad/s), NOT $2\pi f$ (Hz)
Do NOT change any code logic — only comments
Commit after each task

Validation command (run after each task):

source docs/.venv/bin/activate && rm -rf docs/api/*.md && doxide build && mkdocs build

Chunk 1: Tier 1 — Operators (user review checkpoint)¶

Task 1: Gdft.h — Field-corrected DFT operator¶

Files: - Modify: forge/Operators/Gdft.h

[ ] Step 1: Enhance class-level doc comment

Add to the existing class doc: - Display math for the forward model: $$d_j = \sum_k x_k \, e^{-i(2\pi \mathbf{k}_j \cdot \mathbf{r}_k + \omega_k t_j)}$$ where $\omega_k$ = FM[k] in rad/s - Note that operator* computes the forward transform and operator/ computes the adjoint (conjugate transpose) - Cross-refs: @see GdftR2 for R2* gradient extension, @see Gnufft for the fast gridding-based alternative - Ref: @see Fessler & Sutton, "Nonuniform Fast Fourier Transforms Using Min-Max Interpolation," IEEE TSP, 2003. https://doi.org/10.1109/TSP.2002.807005

[ ] Step 2: Add usage example

Add a @code/@endcode block showing basic construction and forward/adjoint:

/// @code
/// // 2D field-corrected DFT: 64x64 image, 4096 k-space samples
/// Col<float> kx(4096), ky(4096), kz(4096, fill::zeros);
/// Col<float> ix(4096), iy(4096), iz(4096, fill::zeros);
/// Col<float> FM(4096, fill::zeros);  // field map (rad/s)
/// Col<float> t(4096, fill::zeros);   // readout times (s)
/// Gdft<float> G(4096, 4096, kx, ky, kz, ix, iy, iz, FM, t);
/// Col<cx_float> kdata = G * image;   // forward
/// Col<cx_float> recon = G / kdata;   // adjoint
/// @endcode

[ ] Step 3: Convert existing Unicode math to LaTeX

Replace any inline Unicode symbols in existing comments with LaTeX equivalents.

[ ] Step 4: Validate and commit

Run: source docs/.venv/bin/activate && rm -rf docs/api/*.md && doxide build && mkdocs build

git add forge/Operators/Gdft.h
git commit -m "docs: enhance Gdft.h with LaTeX math, example, and literature ref"

Task 2: GdftR2.h — DFT with R2* gradients¶

Files: - Modify: forge/Operators/GdftR2.h

[ ] Step 1: Enhance class-level doc comment

Add: - First-order Taylor expansion formula for the R2-corrected signal model. The expansion improves accuracy by including the spatial gradient of the field map/R2 decay. - Explanation of calcGradientMaps and the finite-difference operator Cd - When to use GdftR2 vs plain Gdft (when R2* decay varies significantly across the FOV) - Cross-refs: @see Gdft for the base DFT without gradients

[ ] Step 2: Add usage example

Show construction with gradient maps, similar pattern to Gdft but with the R2* extension.

[ ] Step 3: Convert Unicode math to LaTeX, validate, commit

git add forge/Operators/GdftR2.h
git commit -m "docs: enhance GdftR2.h with Taylor expansion formula and example"

Task 3: Gnufft.h — Kaiser-Bessel NUFFT¶

Files: - Modify: forge/Operators/Gnufft.h

[ ] Step 1: Enhance class-level doc comment

Add: - Gridding algorithm description: convolution with KB kernel on oversampled grid, FFT, then crop/deapodize - KB kernel formula: $$\kappa(u) = \frac{1}{W} I_0\!\left(\beta\sqrt{1 - (2u/W)^2}\right)$$ where $W$ is kernel width, $\beta$ is shape parameter, $I_0$ is modified Bessel function of the first kind - Guidance: gridos (grid oversampling factor, typically 2.0), kernelWidth (typically 4-6), beta = pi * W * (1 - 0.5/gridos) - LUT accuracy notes: precomputed lookup table for kernel evaluation - Refs: - @see Jackson et al., "Selection of a Convolution Function for Fourier Inversion Using Gridding," IEEE TMI, 1991. https://doi.org/10.1109/42.75611 - @see Pipe & Menon, "Sampling Density Compensation in MRI," MRM, 1999. https://doi.org/10.1002/(SICI)1522-2594(199901)41:1<179::AID-MRM25>3.0.CO;2-V - @see Beatty et al., "Rapid Gridding Reconstruction with a Minimal Oversampling Ratio," IEEE TMI, 2005. https://doi.org/10.1109/TMI.2004.842452 - Cross-refs: @see Gdft for the brute-force DFT alternative, @see gridding.h for low-level gridding kernels

[ ] Step 2: Add usage example

/// @code
/// Gnufft<float> G(dataLength, 2.0f, Nx, Ny, Nz, kx, ky, kz, ix, iy, iz);
/// Col<cx_float> kdata = G * image;  // forward: gridding + FFT
/// Col<cx_float> recon = G / kdata;  // adjoint: FFT + degridding
/// @endcode

[ ] Step 3: Convert Unicode math to LaTeX, validate, commit

git add forge/Operators/Gnufft.h
git commit -m "docs: enhance Gnufft.h with KB kernel formula, refs, and example"

Task 4: Gfft.h — Uniform Cartesian FFT¶

Files: - Modify: forge/Operators/Gfft.h

[ ] Step 1: Enhance class-level doc comment

Minimal additions: - Note that this wraps FFTW (CPU) or cuFFT (GPU) for fully-sampled Cartesian data - Cross-refs: @see Gnufft for non-Cartesian data

[ ] Step 2: Add usage example

/// @code
/// Gfft<float> F(Nx * Ny);
/// Col<cx_float> kdata = F * image;  // forward FFT
/// Col<cx_float> recon = F / kdata;  // inverse FFT
/// @endcode

[ ] Step 3: Validate and commit

git add forge/Operators/Gfft.h
git commit -m "docs: enhance Gfft.h with example and cross-refs"

Task 5: SENSE.h — Multi-coil sensitivity encoding¶

Files: - Modify: forge/Operators/SENSE.h

[ ] Step 1: Enhance class-level doc comment

Add: - Multi-coil model in LaTeX: $$\mathbf{d}_c = G(S_c \circ \mathbf{x}), \quad c = 1, \ldots, N_c$$ where $S_c$ is the sensitivity map for coil $c$ and $\circ$ is element-wise multiplication - Stacking convention: the full data vector concatenates all coils: $\mathbf{d} = [\mathbf{d}_1^T, \ldots, \mathbf{d}_{N_c}^T]^T$ - Template constraint: Tobj must support operator* (forward) and operator/ (adjoint). Compatible types: Gdft, Gnufft, Gfft, TimeSegmentation - Ref: @see Pruessmann et al., "SENSE: Sensitivity Encoding for Fast MRI," MRM, 1999. https://doi.org/10.1002/(SICI)1522-2594(199911)42:5<952::AID-MRM16>3.0.CO;2-S - Cross-refs: @see pcSENSE for phase-corrected variant, @see TimeSegmentation for field-corrected wrapper

[ ] Step 2: Add usage example

/// @code
/// Gnufft<float> G(n1, n2, kx, ky, kz, ix, iy, iz, FM, t);
/// SENSE<float, Gnufft<float>> S(G, SMap, nc);
/// Col<cx_float> allcoil_data = S * image;  // forward: all coils
/// Col<cx_float> recon = S / allcoil_data;  // adjoint: combined
/// @endcode

[ ] Step 3: Convert Unicode math to LaTeX, validate, commit

git add forge/Operators/SENSE.h
git commit -m "docs: enhance SENSE.h with multi-coil model, template constraints, and ref"

Task 6: pcSENSE.h — Phase-corrected SENSE¶

Files: - Modify: forge/Operators/pcSENSE.h

[ ] Step 1: Enhance class-level doc comment

Add: - Per-shot phase-corrected model formula: for shot $s$, $\mathbf{d}_{c,s} = G_s(P_s \circ S_c \circ \mathbf{x})$ where $P_s = e^{i\phi_s(\mathbf{r})}$ is the shot-specific phase map - Distinction from SENSE: handles multi-shot acquisitions where each shot has a different B0-induced phase error (e.g., diffusion-weighted imaging with navigator-based correction) - Cross-refs: @see SENSE for single-phase variant, @see pcSenseTimeSeg for combined phase + time-segmentation - Refs: - @see Liu, Moseley & Bammer, "Simultaneous phase correction and SENSE reconstruction for navigated multi-shot DWI," MRM, 2005. https://doi.org/10.1002/mrm.20706 - @see Holtrop & Sutton, "High spatial resolution diffusion weighted imaging on clinical 3T MRI scanners using multislab spiral acquisitions," J Med Imaging, 2016. https://doi.org/10.1117/1.JMI.3.2.023501

[ ] Step 2: Add usage example

Show construction with per-shot phase maps and sensitivity maps.

[ ] Step 3: Convert Unicode math to LaTeX, validate, commit

git add forge/Operators/pcSENSE.h
git commit -m "docs: enhance pcSENSE.h with phase model formula, refs (Liu 2005, Holtrop 2016)"

Task 7: pcSenseTimeSeg.h — pcSENSE + Time Segmentation¶

Files: - Modify: forge/Operators/pcSenseTimeSeg.h

[ ] Step 1: Enhance class-level doc comment

Add: - Combined model: per-shot phase correction + time-segmented field correction applied together - When to use: multi-shot non-Cartesian acquisitions with significant B0 inhomogeneity (e.g., spiral DWI) - Cross-refs: @see pcSENSE for phase correction without time segmentation, @see TimeSegmentation for field correction without per-shot phases

[ ] Step 2: Add usage example
[ ] Step 3: Validate and commit

git add forge/Operators/pcSenseTimeSeg.h
git commit -m "docs: enhance pcSenseTimeSeg.h with combined model description and example"

Chunk 2: Tier 1 — Penalties, Solvers, Gridding (user review checkpoint)¶

Task 8: Robject.h — Abstract penalty base class¶

Files: - Modify: forge/Penalties/Robject.h

[ ] Step 1: Enhance class-level doc comment

Add: - Potential function family in LaTeX: - $\psi(d)$: potential function (penalty applied to each difference) - $\dot\psi(d) = \psi'(d)$: first derivative - $\omega(d) = \psi'(d)/d$: weighting function used in surrogate optimization - Penalty evaluation: $$R(\mathbf{x}) = \sum_j \beta_j \psi([C\mathbf{x}]_j)$$ where $C$ is the finite-difference operator - Subclass contract: override wpot(), dpot(), pot() to define a custom penalty. Default implementations are quadratic ($\psi(d) = \frac{1}{2}d^2$). - Explain Cd (forward finite differences) and Ctd (adjoint / transpose finite differences)

[ ] Step 2: Add usage example

Show how a penalty plugs into the PCG solver.

[ ] Step 3: Convert Unicode math to LaTeX, validate, commit

git add forge/Penalties/Robject.h
git commit -m "docs: enhance Robject.h with potential function family and subclass contract"

Task 9: QuadPenalty.h — Quadratic (Tikhonov) penalty¶

Files: - Modify: forge/Penalties/QuadPenalty.h

[ ] Step 1: Enhance class-level doc comment

Add/convert: - Penalty in LaTeX: $$R(\mathbf{x}) = \frac{\beta}{2}\|C\mathbf{x}\|^2$$ - Potential functions: $\psi(d) = \frac{1}{2}d^2$, $\dot\psi(d) = d$, $\omega(d) = 1$ - $\beta$ guidance: controls regularization strength. Larger $\beta$ = smoother images, smaller residual norm. Typical range depends on SNR and data scaling. - Cross-ref: @see TVPenalty for edge-preserving alternative

[ ] Step 2: Add usage example

/// @code
/// QuadPenalty<float> R(Nx, Ny, Nz, beta);
/// auto xhat = solve_pwls_pcg<float>(x0, G, W, yi, R, niter);
/// @endcode

[ ] Step 3: Convert Unicode math to LaTeX, validate, commit

git add forge/Penalties/QuadPenalty.h
git commit -m "docs: enhance QuadPenalty.h with LaTeX formula and beta guidance"

Task 10: TVPenalty.h — Total Variation penalty¶

Files: - Modify: forge/Penalties/TVPenalty.h

[ ] Step 1: Enhance class-level doc comment

Add/convert: - Hyperbola (Charbonnier) potential in LaTeX: $$\psi(d) = \delta^2\!\left(\sqrt{1 + (d/\delta)^2} - 1\right)$$ - Derivative: $$\dot\psi(d) = \frac{d}{\sqrt{1 + (d/\delta)^2}}$$ - Weight: $$\omega(d) = \frac{1}{\sqrt{1 + (d/\delta)^2}}$$ - Correct the existing header comment: this is NOT the Fair potential |d| - delta*log(1+|d|/delta). It is the hyperbola/Charbonnier penalty. Behaves quadratically near zero, linearly for large differences. - $\delta$ guidance: controls the quadratic-to-linear transition. Smaller $\delta$ = sharper edges, closer to true TV, but harder optimization. - Ref: @see Rudin, Osher & Fatemi, "Nonlinear Total Variation Based Noise Removal Algorithms," Physica D, 1992. https://doi.org/10.1016/0167-2789(92)90242-F (describes exact TV — this implementation is a smooth approximation) - Cross-ref: @see QuadPenalty for the simpler quadratic alternative

[ ] Step 2: Add usage example
[ ] Step 3: Convert Unicode math to LaTeX, validate, commit

git add forge/Penalties/TVPenalty.h
git commit -m "docs: enhance TVPenalty.h with LaTeX formulas, delta guidance, and ROF ref"

Task 11: solve_pwls_pcg.hpp — PCG solver¶

Files: - Modify: forge/Solvers/solve_pwls_pcg.hpp

[ ] Step 1: Enhance doc comment on solve_pwls_pcg function

Add/convert: - PWLS objective in LaTeX: $$\hat{\mathbf{x}} = \arg\min_{\mathbf{x}} \|\mathbf{W}^{1/2}(\mathbf{y} - A\mathbf{x})\|^2 + R(\mathbf{x})$$ - PCG algorithm outline: gradient = $A^H W(A\mathbf{x} - \mathbf{y}) + \nabla R(\mathbf{x})$, Polak-Ribière-Polyak $\beta$ update, quadratic surrogate step-size - Convergence thresholds: 1e-10 for zero-gradient detection (normalized gradient magnitude), 1e-20 for denominator guard in step-size computation - Thread safety note: reads g_should_stop atomic for early termination - Ref: @see Fessler, "Penalized Weighted Least-Squares Image Reconstruction for Positron Emission Tomography," IEEE TMI, 1994. https://doi.org/10.1109/42.363108

[ ] Step 2: Enhance norm_grad doc comment

Add LaTeX for the normalized gradient: $\|\nabla\| = \frac{\|\mathbf{g}\|}{\mathbf{y}^H(W \circ \mathbf{y})}$

[ ] Step 3: Add usage example

/// @code
/// Gnufft<float> G(n1, n2, kx, ky, kz, ix, iy, iz, FM, t);
/// SENSE<float, Gnufft<float>> S(G, SMap, nc);
/// QuadPenalty<float> R(Nx, Ny, 1, 0.001f);
/// Col<float> W(n1 * nc, fill::ones);
/// forgeCol<forgeComplex<float>> x0(n2);  x0.zeros();
/// auto xhat = solve_pwls_pcg<float>(x0, S, W, yi, R, 50);
/// @endcode

[ ] Step 4: Convert Unicode math to LaTeX, validate, commit

git add forge/Solvers/solve_pwls_pcg.hpp
git commit -m "docs: enhance solve_pwls_pcg with PWLS objective, algorithm outline, and ref"

Task 12: reconSolve.h — High-level reconstruction helper¶

Files: - Modify: forge/Solvers/reconSolve.h

[ ] Step 1: Enhance doc comments

Add: - Workflow description: reconSolve initializes image-space coordinates, constructs the encoding operator, and calls solve_pwls_pcg - Coordinate formula in LaTeX: image coordinates span $[0, (N-1)/N]$ in each dimension - Cross-ref: @see solve_pwls_pcg for the underlying solver

[ ] Step 2: Add usage example showing full pipeline
[ ] Step 3: Validate and commit

git add forge/Solvers/reconSolve.h
git commit -m "docs: enhance reconSolve.h with workflow description and example"

Task 13: TimeSegmentation.h — Time-segmented field correction¶

Files: - Modify: forge/Gridding/TimeSegmentation.h

[ ] Step 1: Enhance class-level doc comment

Add: - Time-segmentation approximation in LaTeX: $$e^{-i\omega(\mathbf{r})t} \approx \sum_{l=1}^{L} b_l(t)\, e^{-i\omega(\mathbf{r})\tau_l}$$ where $\omega(\mathbf{r})$ is the field map in rad/s, $\tau_l$ are segment centers, $b_l(t)$ are interpolation coefficients - Interpolation types: Hanning window vs min-max (Fessler) — min-max is more accurate but slower to precompute - $L$ selection guidance: more segments = better accuracy but more computation. Typical: 4-8 for moderate inhomogeneity. - Refs: - @see Sutton et al., "Fast, Iterative Image Reconstruction for MRI in the Presence of Field Inhomogeneities," IEEE TMI, 2003. https://doi.org/10.1109/TMI.2002.808360 - @see Man et al., "Multifrequency Interpolation for Fast Off-Resonance Correction," IEEE TMI, 1997. https://doi.org/10.1109/42.611354 - Cross-refs: @see Gdft for brute-force field correction (exact but slow)

[ ] Step 2: Add usage example

/// @code
/// Gnufft<float> G(n1, n2, kx, ky, kz, ix, iy, iz, FM, t);
/// TimeSegmentation<float, Gnufft<float>> Gts(G, FM, t, L, interptype, gridos);
/// SENSE<float, TimeSegmentation<float, Gnufft<float>>> S(Gts, SMap, nc);
/// @endcode

[ ] Step 3: Convert Unicode math to LaTeX, validate, commit

git add forge/Gridding/TimeSegmentation.h
git commit -m "docs: enhance TimeSegmentation.h with approximation formula, L guidance, and refs"

Task 14: gridding.h — Low-level gridding kernels¶

Files: - Modify: forge/Gridding/gridding.h

[ ] Step 1: Enhance doc comments

Add: - Gridding algorithm overview: adjoint = scatter k-space samples onto oversampled Cartesian grid via KB convolution; forward = sample from grid at non-Cartesian locations - KB kernel formula: $$\kappa(u) = \frac{1}{W} I_0\!\left(\beta\sqrt{1 - (2u/W)^2}\right)$$ for $|u| \leq W/2$, zero otherwise - Oversampling factor: grid is $\text{gridos} \times N$ in each dimension (typically 2.0) - Density compensation: data weights $W$ compensate for non-uniform k-space sampling density - Refs: same Jackson, Pipe, Beatty refs as Gnufft - Cross-ref: @see Gnufft for the high-level operator, @see griddingSupport.h for kernel helpers

[ ] Step 2: Validate and commit

git add forge/Gridding/gridding.h
git commit -m "docs: enhance gridding.h with KB kernel formula, algorithm overview, and refs"

Task 15: Tier 1 validation and user review checkpoint¶

[ ] Step 1: Full rebuild

source docs/.venv/bin/activate && rm -rf docs/api/*.md && doxide build && mkdocs build

[ ] Step 2: Start dev server for user review

source docs/.venv/bin/activate && mkdocs serve -a 127.0.0.1:8000

[ ] Step 3: Present diff to user

Run git diff main -- forge/Operators/ forge/Penalties/ forge/Solvers/ forge/Gridding/TimeSegmentation.h forge/Gridding/gridding.h and present for review. PAUSE HERE — wait for user approval before proceeding to Tier 2.

Chunk 3: Tiers 2-3 — Core Types and Utilities (autonomous)¶

Task 16: forgeCol.hpp — GPU-aware column vector¶

Files: - Modify: forge/Core/forgeCol.hpp

[ ] Step 1: Enhance class-level and method doc comments

Add: - GPU/CPU sync: isOnGPU flag tracks data location. After GPU operations, data is on device. getArma() triggers memcpy back to host. - View safety: views are non-owning. The parent must outlive the view. set_size() on a view breaks it (allocates fresh memory). - Metal dispatch: operations dispatch to Metal GPU when METAL_COMPUTE is defined, T = float, and n_elem >= 4096. - getArma() cost: non-const version does OpenACC device→host update; const version is zero-copy view. - @warning getArmaComplex() returns a temporary view — never capture with auto.

[ ] Step 2: Add usage example
[ ] Step 3: Validate and commit

git add forge/Core/forgeCol.hpp
git commit -m "docs: enhance forgeCol.hpp with GPU sync semantics, view safety, and example"

Task 17: forgeMat.hpp — GPU-aware matrix¶

Files: - Modify: forge/Core/forgeMat.hpp

[ ] Step 1: Enhance class-level and method doc comments

Add: - @warning set_size(nCols, nRows) — first argument is COLUMNS, second is ROWS. This is opposite to most matrix APIs. The constructor forgeMat(nRows, nCols) takes rows first (normal order). - Column-major layout: element $(r, c)$ at index $r + n\_rows \times c$ - col() returns non-owning view (fast), col_copy() returns deep copy (safe for long-lived use) - getArma(): non-const does OpenACC update; const is zero-copy

[ ] Step 2: Add usage example
[ ] Step 3: Validate and commit

git add forge/Core/forgeMat.hpp
git commit -m "docs: enhance forgeMat.hpp with set_size warning, layout docs, and example"

Task 18: forgeComplex.hpp — GPU-compatible complex type¶

Files: - Modify: forge/Core/forgeComplex.hpp

[ ] Step 1: Enhance doc comments

Add: - Memory layout note: identical to std::complex<T> (interleaved real, imag). Safe to reinterpret_cast between the two. - GPU compatibility: no virtual methods, no exceptions — safe for OpenACC parallel regions and Metal shaders. - Document key operators and free functions (abs, arg, norm, conj, polar)

[ ] Step 2: Add usage example
[ ] Step 3: Validate and commit

git add forge/Core/forgeComplex.hpp
git commit -m "docs: enhance forgeComplex.hpp with layout notes, GPU compat, and example"

Task 19: Tier 3 utilities — ForgeLog, IO, FFT, griddingSupport, griddingTypes¶

Files: - Modify: forge/Core/ForgeLog.hpp - Modify: forge/IO/processIsmrmrd.hpp - Modify: forge/IO/processNIFTI.hpp - Modify: forge/FFT/fftCPU.h - Modify: forge/FFT/ftCpu.h - Modify: forge/FFT/ftCpuWithGrads.h - Modify: forge/FFT/fftAccelerate.h - Modify: forge/FFT/fftGPU.h - Modify: forge/Gridding/griddingSupport.h - Modify: forge/Core/griddingTypes.h - Modify: forge/IO/directRecon.h - Modify: forge/IO/acqTracking.h - Modify: forge/Solvers/solve_grad_desc.hpp

[ ] Step 1: Enhance each file per spec

For each file, add the items listed in the Tier 3 table of the spec. Key additions: - ForgeLog.hpp: Progress bar lifecycle (add → tick → done), JSONL message types, example - processIsmrmrd.hpp: Template function docs, data layout conventions - processNIFTI.hpp: LPS → RAS coordinate system notes, quaternion conversion ref - fftCPU.h: FFTW plan notes, memory layout (interleaved real/imag, length 2·N) - ftCpu.h: LaTeX for DFT kernel formula, cross-ref to Gdft - ftCpuWithGrads.h: R2 gradient kernel docs, cross-ref to GdftR2 - fftAccelerate.h/fftGPU.h: Minimal — example, cross-refs - griddingSupport.h: KB kernel helpers (bessi0, calculateLUT), LaTeX for KB formula, cross-ref to gridding.h - griddingTypes.h: ReconstructionSample<T1> and parameters<T1> field descriptions - directRecon.h: Density compensation, SoS combination docs - acqTracking.h: Acquisition tracking data organization - solve_grad_desc.hpp*: Gradient descent objective, cross-ref to solve_pwls_pcg

[ ] Step 2: Validate all

source docs/.venv/bin/activate && rm -rf docs/api/*.md && doxide build && mkdocs build

[ ] Step 3: Commit in groups

git add forge/Core/ForgeLog.hpp forge/Core/griddingTypes.h
git commit -m "docs: enhance Core utility headers (ForgeLog, griddingTypes)"

git add forge/FFT/
git commit -m "docs: enhance FFT headers with math, examples, and cross-refs"

git add forge/IO/ forge/Gridding/griddingSupport.h forge/Solvers/solve_grad_desc.hpp
git commit -m "docs: enhance IO, griddingSupport, and solve_grad_desc headers"

Chunk 4: Tiers 4-6 — Metal, Internals, Guides (autonomous)¶

Task 20: Tier 4 — Metal shader documentation¶

Files: - Modify: forge/Metal/vectorops_metal.metal - Modify: forge/Metal/MetalVectorOps.h - Modify: forge/Metal/MetalVectorOps.mm - Modify: forge/Metal/MetalNufftPipeline.h - Modify: forge/Metal/MetalNufftPipeline.mm - Modify: forge/Metal/MetalGridding.h (gridding dispatch wrapper) - Modify: Any other *.metal files in forge/Metal/ (dft_metal.metal, gridding_metal.metal, nufft_support_metal.metal)

[ ] Step 1: Document Metal compute kernels

For each .metal file: - Add per-kernel doc comment: inputs, outputs, dispatch semantics - Data layout: interleaved complex (float2 = {real, imag}) - Threadgroup sizing and workgroup notes - Cross-ref to the C++ dispatch wrapper

For .h/.mm files: - Document dispatch strategy (buffer binding, command encoding) - Error handling patterns - Cross-ref to forgeCol operator dispatch

[ ] Step 2: Validate and commit

git add forge/Metal/
git commit -m "docs: add Metal shader and pipeline documentation for CUDA porting reference"

Task 21: Tier 5 — Private/internal method pass¶

Files: - All files already modified in Tiers 1-4

[ ] Step 1: Add one-liner docs to undocumented private methods

Go through each file modified in previous tasks and add brief doc comments to: - Private helper functions in operator classes - Internal state management (memory allocation, GPU sync) - Implementation details in solvers (step-size selection, convergence checks) - SFINAE/template machinery (detail::has_pgcol_ops, guard functions)

One-liner format: /// Compute the step size via quadratic surrogate approximation.

Do NOT add lengthy documentation — a "what and why" one-liner is sufficient.

[ ] Step 2: Validate and commit

source docs/.venv/bin/activate && rm -rf docs/api/*.md && doxide build && mkdocs build
git add forge/Operators/ forge/Penalties/ forge/Solvers/ forge/Gridding/ forge/Core/ forge/FFT/ forge/IO/ forge/Metal/
git commit -m "docs: add one-liner docs to private/internal methods across all tiers"

Task 22: Tier 6 — Hand-authored guides refresh¶

Files: - Modify: docs/guides/operators-and-solvers.md - Modify: docs/guides/forge-types.md - Modify: docs/guides/metal-backend.md - Modify: docs/guides/forgeview.md - Modify: docs/guides/mpi.md - Modify: docs/getting-started/building.md - Modify: docs/getting-started/installation.md - Modify: docs/getting-started/first-reconstruction.md

[ ] Step 1: Verify each guide against current code

For each file: 1. Read the guide 2. Cross-reference against the current source code (header files, CMakeLists.txt, Boost.ProgramOptions definitions) 3. Fix any stale information: - Operator tables that don't match current headers - CLI flags that have changed - Build commands that have changed - Docker image names that have changed - Dependency versions that have changed

[ ] Step 2: Update operator table in operators-and-solvers.md

Verify every row in the operator, regularization, and solver tables matches the current headers. Add any missing entries.

[ ] Step 3: Update forge-types.md

Verify operator tables and free function lists against current forgeCol.hpp and forgeMat.hpp.

[ ] Step 4: Update metal-backend.md

Verify kernel list against current Metal shader files. Update build commands. Check phase status (1/2/3).

[ ] Step 5: Validate and commit

source docs/.venv/bin/activate && mkdocs build
git add docs/
git commit -m "docs: refresh hand-authored guides to match current codebase"

Task 23: Final validation¶

[ ] Step 1: Clean full rebuild

source docs/.venv/bin/activate && rm -rf docs/api/*.md site/
doxide build && mkdocs build

[ ] Step 2: Spot-check rendered pages

Start mkdocs serve and verify: - [ ] LaTeX math renders in API Reference pages (check solve_pwls_pcg, QuadPenalty, Gdft) - [ ] @see references appear as formatted citations - [ ] @code examples render as syntax-highlighted code blocks - [ ] Cross-references between classes work - [ ] No @brief or other raw Doxygen commands visible - [ ] Guides content is current and accurate

[ ] Step 3: Push

git push

Intentionally Deferred Files¶

These headers exist but are excluded from this pass — their documentation is adequate or they are low-traffic internal files:

forge/Core/forge.h — aggregate header (just includes)
forge/Core/ForgeIncludes.h — internal includes/macros
forge/Core/ForgeExitCodes.hpp — enum definitions, self-documenting
forge/Core/SignalHandler.hpp — small utility, already has comments
forge/Core/Tracer.hpp — NVTX/Instruments wrapper, context clear from usage
forge/Core/AccelerateDispatch.hpp — Apple vDSP dispatch, internal to forgeCol
forge/Core/forgeSubview_Col.hpp — column subview, internal type
forge/FFT/fftshift.hpp — fftshift utility, self-explanatory