Skip to content

forge

Fast, Optimized Reconstruction via Gridding Engine

pg_detail :

Types

Name Description
Robject Abstract base class for regularization penalty functions.
GdftR2 Non-uniform DFT operator extended with R2* (transverse relaxation) gradient maps.
Gnufft Non-uniform Fast Fourier Transform operator using Kaiser-Bessel gridding.
pcSenseTimeSeg Phase-corrected multi-shot SENSE with time-segmented field-map correction.
TimeSegmentation Off-resonance correction via time-segmented interpolation.
Gfft Uniform Cartesian FFT encoding operator.
ReconstructionSample Single k-space sample with trajectory coordinates and density compensation.
SENSE Multi-coil sensitivity encoding (SENSE) operator.
PGProgressManager Singleton manager for stacked progress bar display.
PGImagePreviewContext Singleton storing image dimensions for preview extraction.
pcSENSE Phase-corrected multi-shot SENSE operator.
Tracer RAII wrapper that pushes/pops an NVTX named range for GPU profiling.
SignpostTracer RAII wrapper that emits os_signpost intervals for Apple Instruments.
QuadPenalty Quadratic (Tikhonov) regularization penalty.
NiftiOrientation Orientation metadata for NIfTI output files.
TVPenalty Approximate total-variation (TV) regularization penalty.
Gdft Non-uniform field-corrected discrete Fourier transform (DFT) operator.

Variables

Name Description
dest Query or set the JSONL output destination.
PG_NO_PARENT_BAR Sentinel value indicating no parent progress bar is active.

Operators

Name Description
operator% Real-weight element-wise multiply with a complex vector: out[i] = W[i] · X[i]. W is real (forgeCol<T>), X is complex (forgeCol<forgeComplex<T>>). Dispatches to Accelerate vDSP rvec_cmul on Apple platforms for float. T : Floating-point base type. W : Real weight vector. X : Complex vector, same length as W. Return : Complex vector of element-wise products.
operator/ Scalar divided by each element: out[i] = s / A[i]. T : Element type. s : Scalar numerator. A : Denominator vector. Return : New vector of quotients.

Functions

Name Description
gridCoilImages Grid all coil images from an ISMRMRD dataset.
calcSumOfSquaresImage Compute a sum-of-squares combination image from individual coil images.
metal_dft_create Create a Metal DFT context for Gdft (no gradient correction).
metal_dft_create_with_grads Create a Metal DFT context for GdftR2 (with gradient correction).
metal_dft_forward Forward DFT: image → k-space. ctx : Active Metal DFT context. idata_r : Input image real part (num_i floats). idata_i : Input image imag part (num_i floats). kdata_r : Output k-space real part (num_k floats). kdata_i : Output k-space imag part (num_k floats).
metal_dft_adjoint Adjoint DFT: k-space → image. ctx : Active Metal DFT context. kdata_r : Input k-space real part (num_k floats). kdata_i : Input k-space imag part (num_k floats). idata_r : Output image real part (num_i floats). idata_i : Output image imag part (num_i floats).
metal_dft_destroy Destroy a Metal DFT context and release resources.
metal_nufft_pipeline_create Create a full GPU NUFFT pipeline context.
metal_nufft_pipeline_destroy Destroy a pipeline context and release all GPU resources.
metal_nufft_forward Full forward NUFFT: image → k-space (all on GPU). ctx : Active pipeline context. imageIn : Input image, interleaved complex (2 * NxNyNz floats). samplesOut : Output k-space, interleaved complex (2 * numSamples floats).
metal_nufft_adjoint Full adjoint NUFFT: k-space → image (all on GPU). ctx : Active pipeline context. samplesIn : Input k-space, interleaved complex (2 * numSamples floats). imageOut : Output image, interleaved complex (2 * NxNyNz floats).
calculateLUT Precompute a Kaiser-Bessel kernel lookup table.
kernel_value_LUT Evaluate the Kaiser-Bessel kernel from a precomputed lookup table.
deinterleave_data2d Deinterleave a 2-D interleaved complex buffer into separate real and imaginary arrays.
deinterleave_data3d Deinterleave a 3-D interleaved complex buffer into separate real and imaginary arrays.
deapodization2d Apply 2-D deapodization correction (divide out the KB convolution kernel in image space).
deapodization3d Apply 3-D deapodization correction (divide out the KB convolution kernel in image space).
crop_center_region2d Crop the centre region of a 2-D oversampled grid to the target image size.
crop_center_region3d Crop the centre region of a 3-D oversampled grid to the target image size.
zero_pad2d Zero-pad a 2-D image onto an oversampled grid (inverse of crop_center_region2d).
zero_pad3d Zero-pad a 3-D image onto an oversampled grid (inverse of crop_center_region3d).
circshift2 2-D circular shift by (xshift, yshift) elements.
fftshift2 2-D forward FFT shift (swap quadrants so DC is at centre).
fftshift3 3-D forward FFT shift (swap octants so DC is at centre).
ifftshift2 2-D inverse FFT shift (undo fftshift2).
ifftshift3 3-D inverse FFT shift (undo fftshift3).
fft2shift_grid In-place 2-D FFT shift on a complex grid (std::complex variant).
fft3shift_grid In-place 3-D FFT shift on a complex grid (std::complex variant).
normalize_fft2d Normalise a 2-D interleaved buffer by \(1/(n_x \times n_y)\) after an inverse FFT.
normalize_fft3d Normalise a 3-D interleaved buffer by \(1/(n_x \times n_y \times n_z)\) after an inverse FFT.
ifft2dAccelerate In-place 2D inverse FFT (float, interleaved).
fft3dAccelerate In-place 3D forward FFT (float, interleaved).
ifft3dAccelerate In-place 3D inverse FFT (float, interleaved).
fft2dAccelerate In-place 2D forward FFT (double, interleaved) — delegates to FFTW.
ifft2dAccelerate In-place 2D inverse FFT (double, interleaved) — delegates to FFTW.
fft3dAccelerate In-place 3D forward FFT (double, interleaved) — delegates to FFTW.
ifft3dAccelerate In-place 3D inverse FFT (double, interleaved) — delegates to FFTW.
ifft2dCPU In-place 2-D inverse FFT (single precision).
fft2dCPU In-place 2-D forward FFT (single precision).
ifft3dCPU In-place 3-D inverse FFT (single precision).
fft3dCPU In-place 3-D forward FFT (single precision).
ifft2dCPU In-place 2-D inverse FFT (double precision).
fft1dCPU In-place 1-D forward FFT (double precision).
fft2dCPU In-place 2-D forward FFT (double precision).
ifft3dCPU In-place 3-D inverse FFT (double precision).
fft3dCPU In-place 3-D forward FFT (double precision).
gridding_adjoint_2D 2-D Kaiser-Bessel adjoint gridding (k-space -> oversampled grid).
gridding_adjoint_3D 3-D Kaiser-Bessel adjoint gridding (k-space -> oversampled grid).
gridding_forward_2D 2-D Kaiser-Bessel forward gridding (oversampled grid -> k-space samples).
gridding_forward_3D 3-D Kaiser-Bessel forward gridding (oversampled grid -> k-space samples).
computeFH_CPU_Grid Full adjoint NUFFT pipeline: k-space -> image (grid, IFFT, crop, deapodize). T1 : Floating-point precision type. numK_per_coil : Number of k-space samples per coil. kx : k-space x-coordinates. ky : k-space y-coordinates. kz : k-space z-coordinates. dIn : Input k-space data (interleaved real/imag). Nx : Image size in x. Ny : Image size in y. Nz : Image size in z. gridOS : Grid oversampling factor. kernelWidth : Kaiser-Bessel kernel width. beta : Kaiser-Bessel kernel shape parameter beta. LUT : Precomputed Kaiser-Bessel lookup table. sizeLUT : Number of LUT entries. stream : OpenACC/CUDA stream (NULL for CPU). plan : cuFFT plan pointer (NULL for CPU). pGridData_crop_deAp : Scratch: deapodized cropped image. pGridData_crop_d : Scratch: cropped grid data (GPU). pGridData : Scratch: full oversampled grid. pGridData_d : Scratch: GPU-side oversampled grid.
computeFd_CPU_Grid Full forward NUFFT pipeline: image -> k-space (deapodize, zero-pad, FFT, grid). T1 : Floating-point precision type. numK_per_coil : Number of k-space samples per coil. kx : k-space x-coordinates. ky : k-space y-coordinates. kz : k-space z-coordinates. dIn : Input image data (interleaved real/imag). Nx : Image size in x. Ny : Image size in y. Nz : Image size in z. gridOS : Grid oversampling factor. kernelWidth : Kaiser-Bessel kernel width. beta : Kaiser-Bessel kernel shape parameter beta. LUT : Precomputed Kaiser-Bessel lookup table. sizeLUT : Number of LUT entries. stream : OpenACC/CUDA stream (NULL for CPU). plan : cuFFT plan pointer (NULL for CPU). pGridData : Scratch: full oversampled grid (CPU). pGridData_d : Scratch: GPU-side oversampled grid. pGridData_os : Scratch: zero-padded oversampled grid (CPU). pGridData_os_d : Scratch: GPU-side zero-padded grid. pSamples : Output k-space samples array (interleaved real/imag).
sheppLogan2D Generate a 2D modified Shepp-Logan phantom image.
metal_gridding_create Allocate and initialise a Metal gridding context.
metal_gridding_destroy Destroy a Metal gridding context and release all Metal resources.
metal_gridding_adjoint_2D Adjoint (k-space → image) gridding for a 2D problem.
metal_gridding_adjoint_3D Adjoint (k-space → image) gridding for a 3D problem.
metal_gridding_forward_2D Forward (image → k-space) gridding for a 2D problem.
metal_gridding_forward_3D Forward (image → k-space) gridding for a 3D problem.
PG_TUI_SPAWN_MODE Query or set the forgeview spawn mode.
FORGE_LOG_INIT Initialise the forge logger.
FORGE_LOG_REINIT_CLASSIC Reinitialise the spdlog logger for classic (non-TUI) mode.
FORGE_TUI_START Emit a "start" lifecycle event and optionally auto-spawn forgeview.
FORGE_TUI_EXIT Emit an "exit" lifecycle event and clean up.
PG_PROGRESS Access the singleton progress manager.
PG_PROGRESS_ADD Register a new progress bar with label and max count.
FORGE_PROGRESS_TICK Advance bar at idx, reporting current absolute count.
FORGE_PROGRESS_DONE Mark bar at idx as completed (hidden in classic mode, "done" in TUI).
FORGE_METRICS Emit convergence metrics for a progress bar (TUI mode only).
FORGE_SET_IMAGE_DIMS Set the image dimensions for preview extraction.
FORGE_IMAGE_PREVIEW Emit an image preview for display in forgeview.
stackOfSpiralsTimingVector Generate timing vector for stack-of-spirals readout.
imageCoordinates3D Generate 3D image-space coordinates matching pcSENSE Cube(Nx,Ny,Nz) layout.
randomShotPhase3D Generate random per-shot linear phase ramps (models motion-induced phase).
fftshift 2-D FFT-shift: shift both rows and columns by floor(size/2). Equivalent to calling fftshift(X, 0) then fftshift(X, 1). T1 : Armadillo array type. X : Input 2-D array. Return : 2-D shifted copy of X.
reconSolve High-level iterative MRI image reconstruction via PCG.
memptr Return a raw pointer to the underlying data buffer.
reset_mem Release ownership of internal memory without freeing it (used by move semantics).
set_size Allocate (or reallocate) storage for length elements, freeing any prior allocation.
zeros Set all elements to zero (GPU-aware via OpenACC).
ones Set all elements to one (GPU-aware via OpenACC).
subvec Return a non-owning view of elements [first..last] (inclusive). The view must not outlive this forgeCol (or the underlying memory if this is itself a view).
pg_isnan_float Bit-level NaN detection that works even under -ffast-math / -Ofast.
has_nan Check if any element is NaN.
serialize Boost.Serialization support for MPI communication.
sum Sum all elements of a complex forgeCol (separate real/imag reductions for OpenACC).
sum Sum all elements of a real forgeCol, dispatching to Accelerate vDSP on Apple.
cdot Conjugate dot product of two complex vectors: Σᵢ conj(A[i]) · B[i]. Equivalent to the Hermitian inner product ⟨A, B⟩. Accelerate cblas_cdotc is dispatched on Apple platforms for float (faster than Metal for reductions). T : Floating-point base type. A : First complex vector (conjugated). B : Second complex vector, same length as A. Return : Scalar complex inner product.
norm L2 norm of a complex vector: √(Σᵢ
norm L2 norm of a real vector: √(Σᵢ A[i]²). T : Floating-point type (real only). A : Input real vector. Return : Non-negative scalar norm.
conj Element-wise complex conjugate.
abs Element-wise magnitude of a complex vector: out[i] =
real Extract real parts of a complex vector.
imag Extract imaginary parts of a complex vector.
accu_schur Element-wise dot product without conjugation: Σᵢ A[i] · B[i]. Unlike cdot, A is not conjugated. Used in the norm_grad denominator to compute yᵀ · (W ⊙ y) (the non-Hermitian form). T : Floating-point base type. A : First complex vector (not conjugated). B : Second complex vector, same length as A. Return : Scalar complex dot product.
accu_schur Element-wise dot product for real vectors: Σᵢ A[i] · B[i]. T : Floating-point type (real only; SFINAE-excluded for complex). A : First real vector. B : Second real vector, same length as A. Return : Scalar dot product.
norm_grad Scale-invariant gradient magnitude for PCG convergence testing.
to_complex Lift a real vector to complex with zero imaginary part.
log1p Element-wise log(1 + x) for a real vector.
sqrt Element-wise square root for a real vector.
square Element-wise square of a real vector: out[i] = A[i]². T : Floating-point type (real only). A : Input real vector. Return : New vector of squared values.
accu Sum all elements of a real vector (alias for sum).
abs Element-wise absolute value of a real vector.
openISMRMRDData Open an ISMRMRD dataset and initialise bookkeeping structures.
closeISMRMRDData Close an ISMRMRD dataset and free associated resources.
convertFromNDArrayToArma Convert an ISMRMRD NDArray to an Armadillo column vector (zero-copy view).
getISMRMRDFieldMap Read the off-resonance field map (rad/s) from the "FieldMap" NDArray.
getISMRMRDSenseMap Read the multi-coil SENSE sensitivity maps from the "SENSEMap" NDArray.
getISMRMRDPhaseMaps Read the shot-by-shot phase maps from the "PhaseMaps" NDArray.
getISMRMRDTemporalBasis Read the temporal basis vectors ("v" NDArray) for low-rank reconstruction.
getISMRMRDCompletePhaseMap Extract the phase map sub-volume for a specific acquisition combination.
getISMRMRDCompleteSENSEMap Extract the SENSE map sub-volume for a single slice from the full stacked maps.
getISMRMRDCompleteFieldMap Extract the field map sub-volume for a single slice from the full stacked maps.
processISMRMRDInput One-shot convenience: open dataset, read field map and SENSE map.
extractNiftiOrientation Extract NIfTI orientation metadata from an ISMRMRD dataset.
getISMRMRDAcqData Read a single acquisition's k-space data and trajectory from the dataset.
writeISMRMRDImageData Write a reconstructed complex image back into the ISMRMRD dataset.
getCompleteISMRMRDAcqData Read all acquisitions for a given (slice, rep, avg, echo, phase) combination.
metal_vector_get_context Get or create the process-wide singleton context.
metal_rvec_cmul Real weight * complex: C[i] = W[i] * X[i]. W has n floats; X and C have 2*n floats.
metal_cvec_axpy Complex AXPY: C[i] = A[i] + (alphaRe + i*alphaIm) * B[i].
metal_cvec_mul_scalar Complex scalar multiply: C[i] = (alphaRe + i*alphaIm) * A[i].
metal_cvec_cdot Complex dot product: sum(conj(A[i]) * B[i]).
metal_cvec_norm2sq L2 norm squared of complex vector: sum(
metal_vec_sum Sum of real float vector.
metal_vecops_dispatch_count Total number of Metal command buffer commits (vector ops only, not gridding).
metal_vecops_wait_seconds Cumulative time in seconds spent in waitUntilCompleted (vector ops only).
metal_vecops_reset_stats Reset both counters to zero.
ftCpu Non-uniform forward DFT kernel (CPU / OpenACC).
iftCpu Non-uniform adjoint DFT kernel (CPU / OpenACC).
brainPhantom3D Generate a 3D multi-compartment brain-like phantom.
spiralTrajectory2D Generate a 2D Archimedean spiral k-space trajectory.
imageCoordinates2D Generate image-space coordinates matching forge convention.
linearTimingVector Generate a linear timing vector for a single readout.
forge_signal_handler SIGINT handler: sets g_should_stop to true. Signal-safe: only sets an atomic flag; no logging or I/O.
forge_install_signal_handlers Install signal handlers for forge executables.
rotationMatrixToQuaternion Convert a 3x3 rotation matrix to a NIfTI quaternion (a, b, c, d).
writeNiftiRealImage Write a real-valued image volume to a NIfTI-1 (.nii) file. filename : Base filename (without .nii extension, which is appended automatically). imageData : Flat column vector of image voxels (column-major, NxNyNzNslabN_TR). Nx : Image size along readout. Ny : Image size along phase-encode. Nz : Image size along slice/partition (default 1 for 2-D). Nslab : Number of slabs (default 1). N_TR : Number of time frames (default 1). orient : Optional NiftiOrientation with voxel sizes and direction cosines.
writeNiftiMagPhsImage Write a complex-valued image as two NIfTI files: magnitude (_mag) and phase (_phs).
reset_mem Release ownership of internal memory without freeing it (used by move semantics).
set_size Allocate (or reallocate) storage for nCols x nRows elements.
zeros Set all elements to zero (GPU-aware via OpenACC).
ones Set all elements to one (GPU-aware via OpenACC).
getArma Const version — returns a non-owning zero-copy arma view (no OpenACC update).
getArmaComplex Return a zero-copy arma::Mat view for forgeMat> types. Since forgeComplex and std::complex have identical memory layout, this creates a non-owning arma view of the forgeMat's data.
col Return a non-owning view into column colIndx.
col Const version of column view.
col_copy Deep-copy a column (returns an owning forgeCol).
set_col Write a forgeCol into column colIndx (memcpy).
serialize Boost.Serialization support for MPI communication.
sum Sum a complex forgeMat along the given dimension (0=column-wise, 1=row-wise).
sum Sum a real forgeMat along the given dimension (0=column-wise, 1=row-wise).
vectorise Flatten a forgeMat into a single forgeCol in column-major order.
conj Element-wise complex conjugate of a forgeMat.
matmul Matrix-matrix multiply for complex forgeMat (conjugate-transpose NOT applied).
matmul Matrix-vector multiply for complex forgeMat × forgeCol.
matmul Matrix-matrix multiply for real forgeMat.
trans Conjugate transpose of a complex forgeMat.
strans Non-conjugate (simple) transpose of a complex forgeMat.
trans Transpose of a real forgeMat.
reshape Reshape a forgeMat to new dimensions (column-major order preserved).
cols Extract a deep-copy column subrange [first, last] (inclusive).
ifft2dGPU In-place 2-D inverse FFT on GPU (single precision).
fft2dGPU In-place 2-D forward FFT on GPU (single precision).
ifft3dGPU In-place 3-D inverse FFT on GPU (single precision).
fft3dGPU In-place 3-D forward FFT on GPU (single precision).
fft1dGPU In-place 1-D forward FFT on GPU (double precision).
ifft2dGPU In-place 2-D inverse FFT on GPU (double precision).
fft2dGPU In-place 2-D forward FFT on GPU (double precision).
ifft3dGPU In-place 3-D inverse FFT on GPU (double precision).
fft3dGPU In-place 3-D forward FFT on GPU (double precision).
dot_double Element-wise dot product (without conjugation) of two complex vectors.
norm_grad Normalized gradient magnitude for convergence testing.
solve_pwls_pcg Solve a penalized weighted least-squares problem via preconditioned CG.
ftCpuWithGrads Non-uniform forward DFT kernel with R2* field-map gradients (CPU / OpenACC).
iftCpuWithGrads Non-uniform adjoint DFT kernel with R2* field-map gradients (CPU / OpenACC).

Variable Details

PG_NO_PARENT_BAR

constexpr size_t PG_NO_PARENT_BAR

Sentinel value indicating no parent progress bar is active.

dest

static FILE* dest

Query or set the JSONL output destination.

Default is stderr. Set to stdout or a FILE* opened from a path. All JSONL-emitting code (spdlog sink + direct emitters) uses this.

Operator Details

operator%

template <typename T> forgeCol<forgeComplex<T>> operator%(const forgeCol<T>& W, const forgeCol<forgeComplex<T>>& X)

Real-weight element-wise multiply with a complex vector: out[i] = W[i] · X[i].

W is real (forgeCol<T>), X is complex (forgeCol<forgeComplex<T>>). Dispatches to Accelerate vDSP rvec_cmul on Apple platforms for float.

T : Floating-point base type.

W : Real weight vector.

X : Complex vector, same length as W.

Return : Complex vector of element-wise products.

operator/

template <typename T> forgeCol<T> operator/(const T& s, const forgeCol<T>& A)

Scalar divided by each element: out[i] = s / A[i].

T : Element type.

s : Scalar numerator.

A : Denominator vector.

Return : New vector of quotients.

Function Details

FORGE_IMAGE_PREVIEW

inline void FORGE_IMAGE_PREVIEW(size_t bar_id, size_t iter, const std::vector<pg_detail::ImagePlane>& planes)

Emit an image preview for display in forgeview.

In TUI mode: base64-encodes each plane and emits JSONL with a "planes" array. In classic mode: no-op (images aren't displayable on a scrolling terminal).

bar_id : Bar index from PG_PROGRESS_ADD.

iter : Current iteration number (1-based).

planes : Vector of image planes to send.

FORGE_LOG_INIT

inline void FORGE_LOG_INIT(const std::string& level_str = "info", const std::string& log_path = "forge.log", bool no_jsonl = false, const std::string& jsonl_output = "stderr")

Initialise the forge logger.

level_str : Log level string ("debug", "info", "warn", "error").

log_path : Path to the rotating log file (always human-readable).

no_jsonl : If true, disable JSONL and use classic spdlog+indicators.

jsonl_output : Where to send JSONL: "stderr" (default), "stdout", or a file path.

FORGE_LOG_REINIT_CLASSIC

inline void FORGE_LOG_REINIT_CLASSIC(const std::string& log_path = "forge.log")

Reinitialise the spdlog logger for classic (non-TUI) mode.

Called when forgeview fails to spawn, to swap the JSONL sink for a colored stderr sink so the user sees human-readable output instead of raw JSON.

FORGE_METRICS

inline void FORGE_METRICS(size_t bar_id, size_t iter, double error_norm, double penalty)

Emit convergence metrics for a progress bar (TUI mode only).

bar_id : Bar index from PG_PROGRESS_ADD.

iter : Current iteration number (1-based).

error_norm : Data-fidelity error norm (||yi - Ax||).

penalty : Roughness penalty value (R.Penalty(x)).

FORGE_PROGRESS_DONE

inline void FORGE_PROGRESS_DONE(size_t idx, const std::string& error = "")

Mark bar at `idx` as completed (hidden in classic mode, "done" in TUI).

idx : Bar index from PG_PROGRESS_ADD.

error : Optional error/cancellation message to include in the JSONL done event.

FORGE_PROGRESS_TICK

inline void FORGE_PROGRESS_TICK(size_t idx, size_t current, const std::string& postfix = "")

Advance bar at idx, reporting current absolute count.

idx : Bar index from PG_PROGRESS_ADD.

current : Absolute progress count (1-based, for ETA computation).

postfix : Optional postfix text.

FORGE_SET_IMAGE_DIMS

inline void FORGE_SET_IMAGE_DIMS(size_t nx, size_t ny, size_t nz = 1)

Set the image dimensions for preview extraction.

Call this in the app before entering the solver loop.

FORGE_TUI_EXIT

inline void FORGE_TUI_EXIT(int code)

Emit an "exit" lifecycle event and clean up.

Case 1 — !PG_TUI_MODE(): classic info log only.

Cases 2 & 3 — PG_TUI_MODE(): emit JSONL exit to PG_JSONL_DEST(), then if forgeview is active (tui.active), wait for the user to quit the TUI. After exit, if the JSONL destination is a file (not stderr or stdout), shuts down spdlog and fcloses the file.

FORGE_TUI_START

inline void FORGE_TUI_START(const std::string& app_name, const std::string& version = FORGE_VERSION_STRING)

Emit a "start" lifecycle event and optionally auto-spawn forgeview.

Three cases depending on PG_TUI_MODE() and PG_TUI_SPAWN_MODE():

Case 1 — !PG_TUI_MODE(): classic info log only (no JSONL, no forgeview).

Case 2 — PG_TUI_MODE() && !PG_TUI_SPAWN_MODE(): emit JSONL start to PG_JSONL_DEST() without spawning forgeview (piped/file output mode).

Case 3 — PG_TUI_MODE() && PG_TUI_SPAWN_MODE() (default): try to locate and fork forgeview, redirect stderr to it, then emit JSONL start. Falls back to classic mode if forgeview is not found or stdout is not a terminal.

PG_PROGRESS

inline PGProgressManager& PG_PROGRESS()

Access the singleton progress manager.

PG_PROGRESS_ADD

inline size_t PG_PROGRESS_ADD( std::shared_ptr<PGProgressBar> bar, const std::string& label = "", size_t max_progress = 0)

Register a new progress bar with label and max count.

In TUI mode, emits a JSONL "add" message. In classic mode, adds to indicators. Returns an index for use with FORGE_PROGRESS_TICK / FORGE_PROGRESS_DONE.

PG_TUI_SPAWN_MODE

inline bool& PG_TUI_SPAWN_MODE(bool set_value = true, bool do_set = false)

Query or set the forgeview spawn mode.

Default is true (spawn enabled). Set to false via --no-tui to disable forgeview spawning while keeping JSONL output active.

abs

template <typename T> forgeCol<T> abs(const forgeCol<forgeComplex<T>>& A)

Element-wise magnitude of a complex vector: out[i] = |A[i]|.

T : Floating-point base type.

A : Input complex vector.

Return : Real vector of magnitudes, same length as A.

template <typename T, typename std::enable_if< !std::is_same<T, forgeComplex<float>>::value && !std::is_same<T, forgeComplex<double>>::value, int>::type = 0> forgeCol<T> abs(const forgeCol<T>& A)

Element-wise absolute value of a real vector.

T : Floating-point type (real only).

A : Input real vector.

Return : New vector with std::abs(A[i]) at each position.

accu

template <typename T, typename std::enable_if< !std::is_same<T, forgeComplex<float>>::value && !std::is_same<T, forgeComplex<double>>::value, int>::type = 0> T accu(const forgeCol<T>& A)

Sum all elements of a real vector (alias for `sum`).

T : Floating-point type (real only).

A : Input real vector.

Return : Scalar sum of all elements.

accu_schur

template <typename T> forgeComplex<T> accu_schur(const forgeCol<forgeComplex<T>>& A, const forgeCol<forgeComplex<T>>& B)

Element-wise dot product without conjugation: Σᵢ A[i] · B[i].

Unlike cdot, A is not conjugated. Used in the norm_grad denominator to compute yᵀ · (W ⊙ y) (the non-Hermitian form).

T : Floating-point base type.

A : First complex vector (not conjugated).

B : Second complex vector, same length as A.

Return : Scalar complex dot product.

template <typename T, typename std::enable_if< !std::is_same<T, forgeComplex<float>>::value && !std::is_same<T, forgeComplex<double>>::value, int>::type = 0> T accu_schur(const forgeCol<T>& A, const forgeCol<T>& B)

Element-wise dot product for real vectors: Σᵢ A[i] · B[i].

T : Floating-point type (real only; SFINAE-excluded for complex).

A : First real vector.

B : Second real vector, same length as A.

Return : Scalar dot product.

brainPhantom3D

template <typename T1> arma::Col<std::complex<T1>> brainPhantom3D(arma::uword Nx, arma::uword Ny, arma::uword Nz)

Generate a 3D multi-compartment brain-like phantom.

Compartments (painted in order, later overwrites earlier): 1. Skull shell (0.20) 2. Gray matter (0.80) 3. White matter core (1.00) 4. Left lateral ventricle (0.30) 5. Right lateral ventricle (0.30)

Nx : Image width

Ny : Image height

Nz : Number of slices

Return : Vectorized complex column (NxNyNz), matching pcSENSE Cube layout

calcSumOfSquaresImage

template <typename T> forgeCol<T> calcSumOfSquaresImage( forgeCol<forgeComplex<T>> coilImages, uword Nx, uword Nz, uword NSliceMax, uword NCoils)

Compute a sum-of-squares combination image from individual coil images.

T : Floating-point precision type (float or double).

coilImages : Stacked coil images, length NxNzNSliceMax*NCoils.

Nx : In-plane image size (pixels per slice).

Nz : Number of z-partitions.

NSliceMax : Maximum number of slices.

NCoils : Number of receiver coils.

Return : Sum-of-squares combined image, length NxNzNSliceMax.

calculateLUT

template <typename T1> void calculateLUT(T1 beta, T1 width, T1*& LUT, uword& sizeLUT)

Precompute a Kaiser-Bessel kernel lookup table.

Evaluates the KB kernel on a uniform grid of sizeLUT points over [0, width/2] and stores the result. Caller takes ownership of the allocated LUT array.

T1 : Floating-point type.

beta : Kaiser-Bessel shape parameter beta.

width : Kernel half-width.

LUT : Output: pointer to newly allocated LUT array.

sizeLUT : Output: number of entries in the LUT.

cdot

template <typename T> forgeComplex<T> cdot(const forgeCol<forgeComplex<T>>& A, const forgeCol<forgeComplex<T>>& B)

Conjugate dot product of two complex vectors: Σᵢ conj(A[i]) · B[i].

Equivalent to the Hermitian inner product ⟨A, B⟩. Accelerate cblas_cdotc is dispatched on Apple platforms for float (faster than Metal for reductions).

T : Floating-point base type.

A : First complex vector (conjugated).

B : Second complex vector, same length as A.

Return : Scalar complex inner product.

circshift2

template <typename T> void circshift2(T* __restrict pDst, const T* __restrict pSrc, int xdim, int ydim, int xshift, int yshift)

2-D circular shift by (xshift, yshift) elements.

closeISMRMRDData

inline void closeISMRMRDData(ISMRMRD::Dataset*& d, ISMRMRD::IsmrmrdHeader& hdr, acqTracking*& acqTrack)

Close an ISMRMRD dataset and free associated resources.

col

forgeCol<T> col(const uword colIndx)

Return a non-owning view into column colIndx.
The returned forgeCol wraps the matrix's memory directly — writes through
the view modify the matrix (e.g., mat.col(ii) %= d).
The view must not outlive the matrix.

forgeCol<T> col(const uword colIndx) const

Const version of column view.

col_copy

forgeCol<T> col_copy(const uword colIndx) const

Deep-copy a column (returns an owning forgeCol).
Use this when you need the column to outlive the matrix.

cols

template <typename T> forgeMat<T> cols(const forgeMat<T>& A, uword first, uword last)

Extract a deep-copy column subrange [first, last] (inclusive).

computeFH_CPU_Grid

template <typename T1> void computeFH_CPU_Grid(int numK_per_coil, const T1* __restrict kx, const T1* __restrict ky, const T1* __restrict kz, const T1* __restrict dIn, int Nx, int Ny, int Nz, T1 gridOS, const T1 kernelWidth, const T1 beta, const T1* LUT, const uword sizeLUT, void* stream, CFTHandle* plan, T1* pGridData_crop_deAp, T1* pGridData_crop_d, T1* pGridData, T1* pGridData_d)

Full adjoint NUFFT pipeline: k-space -> image (grid, IFFT, crop, deapodize).

T1 : Floating-point precision type.

numK_per_coil : Number of k-space samples per coil.

kx : k-space x-coordinates.

ky : k-space y-coordinates.

kz : k-space z-coordinates.

dIn : Input k-space data (interleaved real/imag).

Nx : Image size in x.

Ny : Image size in y.

Nz : Image size in z.

gridOS : Grid oversampling factor.

kernelWidth : Kaiser-Bessel kernel width.

beta : Kaiser-Bessel kernel shape parameter beta.

LUT : Precomputed Kaiser-Bessel lookup table.

sizeLUT : Number of LUT entries.

stream : OpenACC/CUDA stream (NULL for CPU).

plan : cuFFT plan pointer (NULL for CPU).

pGridData_crop_deAp : Scratch: deapodized cropped image.

pGridData_crop_d : Scratch: cropped grid data (GPU).

pGridData : Scratch: full oversampled grid.

pGridData_d : Scratch: GPU-side oversampled grid.

computeFd_CPU_Grid

template <typename T1> void computeFd_CPU_Grid(int numK_per_coil, const T1* __restrict kx, const T1* __restrict ky, const T1* __restrict kz, const T1* __restrict dIn, int Nx, int Ny, int Nz, T1 gridOS, const T1 kernelWidth, const T1 beta, const T1* LUT, const uword sizeLUT, void* stream, CFTHandle* plan, T1* pGridData, T1* pGridData_d, T1* pGridData_os, T1* pGridData_os_d, T1* pSamples)

Full forward NUFFT pipeline: image -> k-space (deapodize, zero-pad, FFT, grid).

T1 : Floating-point precision type.

numK_per_coil : Number of k-space samples per coil.

kx : k-space x-coordinates.

ky : k-space y-coordinates.

kz : k-space z-coordinates.

dIn : Input image data (interleaved real/imag).

Nx : Image size in x.

Ny : Image size in y.

Nz : Image size in z.

gridOS : Grid oversampling factor.

kernelWidth : Kaiser-Bessel kernel width.

beta : Kaiser-Bessel kernel shape parameter beta.

LUT : Precomputed Kaiser-Bessel lookup table.

sizeLUT : Number of LUT entries.

stream : OpenACC/CUDA stream (NULL for CPU).

plan : cuFFT plan pointer (NULL for CPU).

pGridData : Scratch: full oversampled grid (CPU).

pGridData_d : Scratch: GPU-side oversampled grid.

pGridData_os : Scratch: zero-padded oversampled grid (CPU).

pGridData_os_d : Scratch: GPU-side zero-padded grid.

pSamples : Output k-space samples array (interleaved real/imag).

conj

template <typename T> forgeCol<forgeComplex<T>> conj(const forgeCol<forgeComplex<T>>& A)

Element-wise complex conjugate.

T : Floating-point base type.

A : Input complex vector.

Return : New vector with conj(A[i]) at each position.

template <typename T> forgeMat<forgeComplex<T>> conj(const forgeMat<forgeComplex<T>>& pgA)

Element-wise complex conjugate of a forgeMat.

convertFromNDArrayToArma

template <typename T1> arma::Col<T1> convertFromNDArrayToArma(ISMRMRD::NDArray<T1>& inArray)

Convert an ISMRMRD NDArray to an Armadillo column vector (zero-copy view).

crop_center_region2d

template <typename T1> void crop_center_region2d( T1* __restrict pDst, T1* __restrict pSrc, int imageSizeX, int imageSizeY, int gridSizeX, int gridSizeY)

Crop the centre region of a 2-D oversampled grid to the target image size.

crop_center_region3d

template <typename T1> void crop_center_region3d(T1* __restrict pDst, T1* __restrict pSrc, int imageSizeX, int imageSizeY, int imageSizeZ, int gridSizeX, int gridSizeY, int gridSizeZ)

Crop the centre region of a 3-D oversampled grid to the target image size.

deapodization2d

template <typename T1> void deapodization2d( T1* __restrict pDst, T1* __restrict pSrc, int imageX, int imageY, T1 kernelWidth, T1 beta, T1 gridOS)

Apply 2-D deapodization correction (divide out the KB convolution kernel in image space).

deapodization3d

template <typename T1> void deapodization3d( T1* __restrict pDst, T1* __restrict pSrc, int imageX, int imageY, int imageZ, T1 kernelWidth, T1 beta, T1 gridOS)

Apply 3-D deapodization correction (divide out the KB convolution kernel in image space).

deinterleave_data2d

template <typename T1> void deinterleave_data2d(T1* __restrict pSrc, T1* __restrict outR_d, T1* __restrict outI_d, int imageX, int imageY)

Deinterleave a 2-D interleaved complex buffer into separate real and imaginary arrays.

deinterleave_data3d

template <typename T1> void deinterleave_data3d( T1* __restrict pSrc, T1* __restrict outR_d, T1* __restrict outI_d, int imageX, int imageY, int imageZ)

Deinterleave a 3-D interleaved complex buffer into separate real and imaginary arrays.

dot_double

template <typename T1> inline complex<T1> dot_double(const Col<complex<T1>>& A, const Col<complex<T1>>& B)

Element-wise dot product (without conjugation) of two complex vectors.

Computes Σ_k A[k] · B[k] (no complex conjugate on A).

T1 : Floating-point base type.

A : First complex vector.

B : Second complex vector, same length as A.

Return : Scalar dot product.

extractNiftiOrientation

inline NiftiOrientation extractNiftiOrientation(const ISMRMRD::IsmrmrdHeader& hdr, ISMRMRD::Dataset* d, arma::uword Nx, arma::uword Ny, arma::uword Nz, acqTracking* acqTrack = nullptr, arma::uword sliceIdx = 0)

Extract NIfTI orientation metadata from an ISMRMRD dataset.

Reads voxel sizes from the reconstructed (or encoded) space FOV and image matrix dimensions, and reads direction cosines and position from the first relevant acquisition. Direction cosines are converted from ISMRMRD LPS convention to NIfTI RAS convention by negating rows 0 and 1.

hdr : Parsed ISMRMRD header.

d : Pointer to the open ISMRMRD dataset.

Nx : Image matrix size along readout direction.

Ny : Image matrix size along phase-encode direction.

Nz : Image matrix size along slice/partition direction.

acqTrack : Optional acquisition tracker; when provided, uses the acquisition for the given sliceIdx.

sliceIdx : Slice index used when acqTrack is non-null.

Return : A populated NiftiOrientation struct with valid == true on success.

fft1dCPU

template <typename T1, typename std::enable_if<std::is_same<T1, double>::value, uword>::type = 0> void fft1dCPU(T1* d_data, uword nx)

In-place 1-D forward FFT (double precision).

T1 : Must be double.

d_data : Pointer to interleaved real/imag double array of length 2·nx.

nx : Number of complex samples.

fft1dGPU

template <typename T1, typename std::enable_if<std::is_same<T1, double>::value, int>::type = 0> void fft1dGPU(T1* d_data, int nx, void* stream, cufftHandle* plan)

In-place 1-D forward FFT on GPU (double precision).

T1 : Must be double.

d_data : GPU pointer to interleaved real/imag double array, length 2·nx.

nx : Number of complex samples.

stream : OpenACC/CUDA stream pointer.

plan : Pointer to a pre-allocated cuFFT plan.

fft2dAccelerate

void fft2dAccelerate(double* d_data, uword nx, uword ny)

In-place 2D forward FFT (double, interleaved) — delegates to FFTW.

fft2dCPU

template <typename T1, typename std::enable_if<std::is_same<T1, float>::value, uword>::type = 0> void fft2dCPU(T1* d_data, uword nx, uword ny)

In-place 2-D forward FFT (single precision).

T1 : Must be float.

d_data : Pointer to interleaved real/imag float array of length 2·nx·ny.

nx : Transform size in x.

ny : Transform size in y.

template <typename T1, typename std::enable_if<std::is_same<T1, double>::value, uword>::type = 0> void fft2dCPU(T1* d_data, uword nx, uword ny)

In-place 2-D forward FFT (double precision).

T1 : Must be double.

d_data : Pointer to interleaved real/imag double array of length 2·nx·ny.

nx : Transform size in x.

ny : Transform size in y.

fft2dGPU

template <typename T1, typename std::enable_if<std::is_same<T1, float>::value, int>::type = 0> void fft2dGPU(T1* d_data, int nx, int ny, void* stream, cufftHandle* plan)

In-place 2-D forward FFT on GPU (single precision).

T1 : Must be float.

d_data : GPU pointer to interleaved real/imag float array, length 2·nx·ny.

nx : Transform size in x.

ny : Transform size in y.

stream : OpenACC/CUDA stream pointer.

plan : Pointer to a pre-allocated cuFFT plan.

template <typename T1, typename std::enable_if<std::is_same<T1, double>::value, int>::type = 0> void fft2dGPU(T1* d_data, int nx, int ny, void* stream, cufftHandle* plan)

In-place 2-D forward FFT on GPU (double precision).

T1 : Must be double.

d_data : GPU pointer to interleaved real/imag double array, length 2·nx·ny.

nx : Transform size in x.

ny : Transform size in y.

stream : OpenACC/CUDA stream pointer.

plan : Pointer to a pre-allocated cuFFT plan.

fft2shift_grid

template <typename T1> void fft2shift_grid(std::complex<T1>* __restrict src, int dimY, int dimX)

In-place 2-D FFT shift on a complex grid (std::complex variant).

fft3dAccelerate

void fft3dAccelerate(float* d_data, uword nx, uword ny, uword nz)

In-place 3D forward FFT (float, interleaved).

void fft3dAccelerate(double* d_data, uword nx, uword ny, uword nz)

In-place 3D forward FFT (double, interleaved) — delegates to FFTW.

fft3dCPU

template <typename T1, typename std::enable_if<std::is_same<T1, float>::value, uword>::type = 0> void fft3dCPU(T1* d_data, uword nx, uword ny, uword nz)

In-place 3-D forward FFT (single precision).

T1 : Must be float.

d_data : Pointer to interleaved real/imag float array of length 2·nx·ny·nz.

nx : Transform size in x.

ny : Transform size in y.

nz : Transform size in z.

template <typename T1, typename std::enable_if<std::is_same<T1, double>::value, uword>::type = 0> void fft3dCPU(T1* d_data, uword nx, uword ny, uword nz)

In-place 3-D forward FFT (double precision).

T1 : Must be double.

d_data : Pointer to interleaved real/imag double array of length 2·nx·ny·nz.

nx : Transform size in x.

ny : Transform size in y.

nz : Transform size in z.

fft3dGPU

template <typename T1, typename std::enable_if<std::is_same<T1, float>::value, int>::type = 0> void fft3dGPU(T1* d_data, int nx, int ny, int nz, void* stream, cufftHandle* plan)

In-place 3-D forward FFT on GPU (single precision).

T1 : Must be float.

d_data : GPU pointer to interleaved real/imag float array.

nx : Transform size in x.

ny : Transform size in y.

nz : Transform size in z.

stream : OpenACC/CUDA stream pointer.

plan : Pointer to a pre-allocated cuFFT plan.

template <typename T1, typename std::enable_if<std::is_same<T1, double>::value, int>::type = 0> void fft3dGPU(T1* d_data, int nx, int ny, int nz, void* stream, cufftHandle* plan)

In-place 3-D forward FFT on GPU (double precision).

T1 : Must be double.

d_data : GPU pointer to interleaved real/imag double array.

nx : Transform size in x.

ny : Transform size in y.

nz : Transform size in z.

stream : OpenACC/CUDA stream pointer.

plan : Pointer to a pre-allocated cuFFT plan.

fft3shift_grid

template <typename T1> void fft3shift_grid(std::complex<T1>* __restrict src, int dimY, int dimX, int dimZ)

In-place 3-D FFT shift on a complex grid (std::complex variant).

fftshift

template <typename T1> arma_inline T1 fftshift(T1 X)

2-D FFT-shift: shift both rows and columns by floor(size/2).

Equivalent to calling fftshift(X, 0) then fftshift(X, 1).

T1 : Armadillo array type.

X : Input 2-D array.

Return : 2-D shifted copy of X.

fftshift2

template <typename T> void fftshift2(T* __restrict out, const T* __restrict in, int xdim, int ydim)

2-D forward FFT shift (swap quadrants so DC is at centre).

fftshift3

template <typename T> void fftshift3(T* __restrict out, const T* __restrict in, int xdim, int ydim, int zdim)

3-D forward FFT shift (swap octants so DC is at centre).

forge_install_signal_handlers

inline void forge_install_signal_handlers()

Install signal handlers for forge executables.

Installs forge_signal_handler for SIGINT and ensures SIGPIPE is ignored (SIGPIPE is already SIG_IGN'd by ForgeLog.hpp when TUI mode spawns forgeview, but this is safe to call independently).

forge_signal_handler

inline void forge_signal_handler(int /*sig*/)

SIGINT handler: sets g_should_stop to true.

Signal-safe: only sets an atomic flag; no logging or I/O.

ftCpu

template <typename T1> void ftCpu(T1* kdata_r, T1* kdata_i, const T1* idata_r, const T1* idata_i, const T1* kx, const T1* ky, const T1* kz, const T1* ix, const T1* iy, const T1* iz, const T1* FM, const T1* t, const int num_k, const int num_i)

Non-uniform forward DFT kernel (CPU / OpenACC).

Computes kdata[j] = Sigma_k (idata_r[k] + iidata_i[k]) * exp(-i2pi(kx[j]ix[k] + ky[j]iy[k] + kz[j]iz[k]) - iFM[k]*t[j]).

T1 : Floating-point precision type.

kdata_r : Output k-space real part array, length num_k.

kdata_i : Output k-space imaginary part array, length num_k.

idata_r : Input image real part array, length num_i.

idata_i : Input image imaginary part array, length num_i.

kx : k-space x-coordinates, length num_k.

ky : k-space y-coordinates, length num_k.

kz : k-space z-coordinates, length num_k.

ix : Image-space x-coordinates, length num_i.

iy : Image-space y-coordinates, length num_i.

iz : Image-space z-coordinates, length num_i.

FM : Off-resonance field map (rad/s), length num_i.

t : Per-sample readout time (s), length num_k.

num_k : Number of k-space samples.

num_i : Number of image pixels.

ftCpuWithGrads

template <typename T1> void ftCpuWithGrads(T1* kdata_r, T1* kdata_i, const T1* idata_r, const T1* idata_i, const T1* kx, const T1* ky, const T1* kz, const T1* ix, const T1* iy, const T1* iz, const T1* FM, const T1* Gx, const T1* Gy, const T1* Gz, const T1* t, const int num_k, const int num_i, const int num_x, const int num_y, const int num_z)

Non-uniform forward DFT kernel with R2* field-map gradients (CPU / OpenACC).

Extends ftCpu by incorporating spatially-varying R2 decay via gradient field maps Gx, Gy, Gz. Computes: kdata[j] = Sigma_k image[k] * exp(-i2pikx - iFM[k]t[j] - R2(k)t[j]) where R2(k) is interpolated from the gradient maps.

T1 : Floating-point precision type.

kdata_r : Output k-space real part array, length num_k.

kdata_i : Output k-space imaginary part array, length num_k.

idata_r : Input image real part array, length num_i.

idata_i : Input image imaginary part array, length num_i.

kx : k-space x-coordinates, length num_k.

ky : k-space y-coordinates, length num_k.

kz : k-space z-coordinates, length num_k.

ix : Image-space x-coordinates, length num_i.

iy : Image-space y-coordinates, length num_i.

iz : Image-space z-coordinates, length num_i.

FM : Off-resonance field map (rad/s), length num_i.

Gx : R2* gradient in x, length num_x.

Gy : R2* gradient in y, length num_y.

Gz : R2* gradient in z, length num_z.

t : Per-sample readout time (s), length num_k.

num_k : Number of k-space samples.

num_i : Number of image pixels.

num_x : Number of R2* gradient samples in x.

num_y : Number of R2* gradient samples in y.

num_z : Number of R2* gradient samples in z.

getArma

arma::Mat<T> getArma() const

Const version — returns a non-owning zero-copy arma view (no OpenACC update).

getArmaComplex

template <typename U = T, typename std::enable_if< std::is_same<U, forgeComplex<float>>::value || std::is_same<U, forgeComplex<double>>::value, int>::type = 0> arma::Mat<std::complex<typename U::value_type>> getArmaComplex() const

Return a zero-copy arma::Mat view for forgeMat<forgeComplex<T>> types.
Since forgeComplex<T> and std::complex<T> have identical memory layout,
this creates a non-owning arma view of the forgeMat's data.

getCompleteISMRMRDAcqData

template <typename T1> void getCompleteISMRMRDAcqData(ISMRMRD::Dataset* d, acqTracking* acqTrack, uword NSlice, uword NRep, uword NAve, uword NEcho, uword NPhase, forgeCol<forgeComplex<T1>>& data, forgeCol<T1>& kx, forgeCol<T1>& ky, forgeCol<T1>& kz, forgeCol<T1>& tvec)

Read all acquisitions for a given (slice, rep, avg, echo, phase) combination.

Iterates over all shots and partitions using the acqTracking lookup array, collects k-space data and trajectory into flat column vectors ordered [readout, coil, shot].

getISMRMRDAcqData

template <typename T1> void getISMRMRDAcqData(ISMRMRD::Dataset* d, uword Nacq, forgeCol<forgeComplex<T1>>& data, forgeCol<T1>& kx, forgeCol<T1>& ky, forgeCol<T1>& kz, forgeCol<T1>& tvec)

Read a single acquisition's k-space data and trajectory from the dataset.

getISMRMRDCompleteFieldMap

template <typename T1> forgeCol<T1> getISMRMRDCompleteFieldMap( ISMRMRD::Dataset* d, const forgeCol<T1>& FieldMaps, uword NSlice, uword imageSize)

Extract the field map sub-volume for a single slice from the full stacked maps.

getISMRMRDCompletePhaseMap

template <typename T1> forgeCol<T1> getISMRMRDCompletePhaseMap(ISMRMRD::Dataset* d, uword NSlice, uword NSet, uword NRep, uword NAvg, uword NPhase, uword NEcho, uword NSeg, uword imageSize)

Extract the phase map sub-volume for a specific acquisition combination.

Slices the full phase-map array by (slice, set, rep, avg, phase, echo, seg) indices to return a single phase-map volume of size imageSize * NShotMax * NParMax.

getISMRMRDCompleteSENSEMap

template <typename T1> forgeCol<forgeComplex<T1>> getISMRMRDCompleteSENSEMap( ISMRMRD::Dataset* d, const forgeCol<forgeComplex<T1>>& SENSEMaps, uword NSlice, uword imageSize)

Extract the SENSE map sub-volume for a single slice from the full stacked maps.

getISMRMRDFieldMap

template <typename T1> forgeCol<T1> getISMRMRDFieldMap(ISMRMRD::Dataset* d)

Read the off-resonance field map (rad/s) from the "FieldMap" NDArray.

getISMRMRDPhaseMaps

template <typename T1> forgeCol<T1> getISMRMRDPhaseMaps(ISMRMRD::Dataset* d)

Read the shot-by-shot phase maps from the "PhaseMaps" NDArray.

getISMRMRDSenseMap

template <typename T1> forgeCol<forgeComplex<T1>> getISMRMRDSenseMap(ISMRMRD::Dataset* d)

Read the multi-coil SENSE sensitivity maps from the "SENSEMap" NDArray.

getISMRMRDTemporalBasis

template <typename T1> forgeCol<forgeComplex<T1>> getISMRMRDTemporalBasis(ISMRMRD::Dataset* d)

Read the temporal basis vectors ("v" NDArray) for low-rank reconstruction.

gridCoilImages

template <typename T> forgeCol<forgeComplex<T>> gridCoilImages(uword Ninplane, uword Nz, ISMRMRD::Dataset* d, ISMRMRD::IsmrmrdHeader* hdr, acqTracking* acqTrack, uword NPhase, uword NEcho, uword NAvg, uword NRep)

Grid all coil images from an ISMRMRD dataset.

Reads raw k-space data from the ISMRMRD dataset, applies density compensation, and grids each coil image using the NUFFT adjoint.

T : Floating-point precision type (float or double).

Ninplane : Number of in-plane readout samples.

Nz : Number of z-partitions / slices.

d : Pointer to the open ISMRMRD dataset.

hdr : Pointer to the parsed ISMRMRD header.

acqTrack : Pointer to acquisition tracking object.

NPhase : Phase index to reconstruct.

NEcho : Echo index to reconstruct.

NAvg : Average index to reconstruct.

NRep : Repetition index to reconstruct.

Return : Stacked coil images as a flat complex column vector.

gridding_adjoint_2D

template <typename T1> int gridding_adjoint_2D(unsigned int n, parameters<T1> params, T1 beta, ReconstructionSample<T1>* __restrict sample, const T1* LUT, const uword sizeLUT, T1* __restrict gridData)

2-D Kaiser-Bessel adjoint gridding (k-space -> oversampled grid).

T1 : Floating-point precision type.

n : Number of k-space samples.

params : Gridding parameters (image/grid size, oversampling, kernel).

beta : Kaiser-Bessel kernel shape parameter beta.

sample : Array of ReconstructionSample structs (k-space value + coordinates).

LUT : Precomputed Kaiser-Bessel lookup table.

sizeLUT : Number of entries in the LUT.

gridData : Output oversampled grid (interleaved real/imag), length 2gridXgridY.

Return : 0 on success.

gridding_adjoint_3D

template <typename T1> int gridding_adjoint_3D(unsigned int n, parameters<T1> params, T1 beta, ReconstructionSample<T1>* __restrict sample, const T1* LUT, const uword sizeLUT, T1* gridData)

3-D Kaiser-Bessel adjoint gridding (k-space -> oversampled grid).

T1 : Floating-point precision type.

n : Number of k-space samples.

params : Gridding parameters (image/grid size, oversampling, kernel).

beta : Kaiser-Bessel kernel shape parameter beta.

sample : Array of ReconstructionSample structs.

LUT : Precomputed Kaiser-Bessel lookup table.

sizeLUT : Number of entries in the LUT.

gridData : Output oversampled grid (interleaved real/imag), length 2gridXgridY*gridZ.

Return : 0 on success.

gridding_forward_2D

template <typename T1> int gridding_forward_2D(unsigned int n, parameters<T1> params, const T1* kx, const T1* ky, T1 beta, T1* __restrict pSamples, const T1* LUT, const uword sizeLUT, T1* __restrict pGridData)

2-D Kaiser-Bessel forward gridding (oversampled grid -> k-space samples).

T1 : Floating-point precision type.

n : Number of k-space samples.

params : Gridding parameters.

kx : k-space x-coordinates, length n.

ky : k-space y-coordinates, length n.

beta : Kaiser-Bessel kernel shape parameter beta.

pSamples : Output k-space samples array, length 2*n (interleaved).

LUT : Precomputed Kaiser-Bessel lookup table.

sizeLUT : Number of entries in the LUT.

pGridData : Input oversampled grid (interleaved real/imag).

Return : 0 on success.

gridding_forward_3D

template <typename T1> int gridding_forward_3D(unsigned int n, parameters<T1> params, const T1* kx, const T1* ky, const T1* kz, T1 beta, T1* __restrict pSamples, const T1* LUT, const uword sizeLUT, T1* __restrict pGridData)

3-D Kaiser-Bessel forward gridding (oversampled grid -> k-space samples).

T1 : Floating-point precision type.

n : Number of k-space samples.

params : Gridding parameters.

kx : k-space x-coordinates, length n.

ky : k-space y-coordinates, length n.

kz : k-space z-coordinates, length n.

beta : Kaiser-Bessel kernel shape parameter beta.

pSamples : Output k-space samples array, length 2*n (interleaved).

LUT : Precomputed Kaiser-Bessel lookup table.

sizeLUT : Number of entries in the LUT.

pGridData : Input oversampled grid (interleaved real/imag).

Return : 0 on success.

has_nan

template <typename U = T, typename std::enable_if<std::is_same<U, float>::value, int>::type = 0> bool has_nan() const

Check if any element is NaN. Works for float, double, and forgeComplex types.

ifft2dAccelerate

void ifft2dAccelerate(float* d_data, uword nx, uword ny)

In-place 2D inverse FFT (float, interleaved).  Normalises by 1/(nx*ny).

void ifft2dAccelerate(double* d_data, uword nx, uword ny)

In-place 2D inverse FFT (double, interleaved) — delegates to FFTW.

ifft2dCPU

template <typename T1, typename std::enable_if<std::is_same<T1, float>::value, uword>::type = 0> void ifft2dCPU(T1* d_data, uword nx, uword ny)

In-place 2-D inverse FFT (single precision).

T1 : Must be float.

d_data : Pointer to interleaved real/imag float array of length 2·nx·ny.

nx : Transform size in x.

ny : Transform size in y.

template <typename T1, typename std::enable_if<std::is_same<T1, double>::value, uword>::type = 0> void ifft2dCPU(T1* d_data, uword nx, uword ny)

In-place 2-D inverse FFT (double precision).

T1 : Must be double.

d_data : Pointer to interleaved real/imag double array of length 2·nx·ny.

nx : Transform size in x.

ny : Transform size in y.

ifft2dGPU

template <typename T1, typename std::enable_if<std::is_same<T1, float>::value, int>::type = 0> void ifft2dGPU(T1* d_data, int nx, int ny, void* stream, cufftHandle* plan)

In-place 2-D inverse FFT on GPU (single precision).

T1 : Must be float.

d_data : GPU pointer to interleaved real/imag float array, length 2·nx·ny.

nx : Transform size in x.

ny : Transform size in y.

stream : OpenACC/CUDA stream pointer.

plan : Pointer to a pre-allocated cuFFT plan.

template <typename T1, typename std::enable_if<std::is_same<T1, double>::value, int>::type = 0> void ifft2dGPU(T1* d_data, int nx, int ny, void* stream, cufftHandle* plan)

In-place 2-D inverse FFT on GPU (double precision).

T1 : Must be double.

d_data : GPU pointer to interleaved real/imag double array, length 2·nx·ny.

nx : Transform size in x.

ny : Transform size in y.

stream : OpenACC/CUDA stream pointer.

plan : Pointer to a pre-allocated cuFFT plan.

ifft3dAccelerate

void ifft3dAccelerate(float* d_data, uword nx, uword ny, uword nz)

In-place 3D inverse FFT (float, interleaved).  Normalises by 1/(nx*ny*nz).

void ifft3dAccelerate(double* d_data, uword nx, uword ny, uword nz)

In-place 3D inverse FFT (double, interleaved) — delegates to FFTW.

ifft3dCPU

template <typename T1, typename std::enable_if<std::is_same<T1, float>::value, uword>::type = 0> void ifft3dCPU(T1* d_data, uword nx, uword ny, uword nz)

In-place 3-D inverse FFT (single precision).

T1 : Must be float.

d_data : Pointer to interleaved real/imag float array of length 2·nx·ny·nz.

nx : Transform size in x.

ny : Transform size in y.

nz : Transform size in z.

template <typename T1, typename std::enable_if<std::is_same<T1, double>::value, uword>::type = 0> void ifft3dCPU(T1* d_data, uword nx, uword ny, uword nz)

In-place 3-D inverse FFT (double precision).

T1 : Must be double.

d_data : Pointer to interleaved real/imag double array of length 2·nx·ny·nz.

nx : Transform size in x.

ny : Transform size in y.

nz : Transform size in z.

ifft3dGPU

template <typename T1, typename std::enable_if<std::is_same<T1, float>::value, int>::type = 0> void ifft3dGPU(T1* d_data, int nx, int ny, int nz, void* stream, cufftHandle* plan)

In-place 3-D inverse FFT on GPU (single precision).

T1 : Must be float.

d_data : GPU pointer to interleaved real/imag float array.

nx : Transform size in x.

ny : Transform size in y.

nz : Transform size in z.

stream : OpenACC/CUDA stream pointer.

plan : Pointer to a pre-allocated cuFFT plan.

template <typename T1, typename std::enable_if<std::is_same<T1, double>::value, int>::type = 0> void ifft3dGPU(T1* d_data, int nx, int ny, int nz, void* stream, cufftHandle* plan)

In-place 3-D inverse FFT on GPU (double precision).

T1 : Must be double.

d_data : GPU pointer to interleaved real/imag double array.

nx : Transform size in x.

ny : Transform size in y.

nz : Transform size in z.

stream : OpenACC/CUDA stream pointer.

plan : Pointer to a pre-allocated cuFFT plan.

ifftshift2

template <typename T> void ifftshift2(T* __restrict out, const T* __restrict in, int xdim, int ydim)

2-D inverse FFT shift (undo fftshift2).

ifftshift3

template <typename T> void ifftshift3(T* __restrict out, const T* __restrict in, int xdim, int ydim, int zdim)

3-D inverse FFT shift (undo fftshift3).

iftCpu

template <typename T1> void iftCpu(T1* idata_r, T1* idata_i, const T1* kdata_r, const T1* kdata_i, const T1* kx, const T1* ky, const T1* kz, const T1* ix, const T1* iy, const T1* iz, const T1* FM, const T1* t, const int num_k, const int num_i)

Non-uniform adjoint DFT kernel (CPU / OpenACC).

Computes idata[k] = Sigma_j (kdata_r[j] + ikdata_i[j]) * exp(+i2pi(kx[j]ix[k] + ky[j]iy[k] + kz[j]iz[k]) + iFM[k]*t[j]).

T1 : Floating-point precision type.

idata_r : Output image real part array, length num_i.

idata_i : Output image imaginary part array, length num_i.

kdata_r : Input k-space real part array, length num_k.

kdata_i : Input k-space imaginary part array, length num_k.

kx : k-space x-coordinates, length num_k.

ky : k-space y-coordinates, length num_k.

kz : k-space z-coordinates, length num_k.

ix : Image-space x-coordinates, length num_i.

iy : Image-space y-coordinates, length num_i.

iz : Image-space z-coordinates, length num_i.

FM : Off-resonance field map (rad/s), length num_i.

t : Per-sample readout time (s), length num_k.

num_k : Number of k-space samples.

num_i : Number of image pixels.

iftCpuWithGrads

template <typename T1> void iftCpuWithGrads(T1* idata_r, T1* idata_i, const T1* kdata_r, const T1* kdata_i, const T1* kx, const T1* ky, const T1* kz, const T1* ix, const T1* iy, const T1* iz, const T1* FM, const T1* Gx, const T1* Gy, const T1* Gz, const T1* t, const int num_k, const int num_i, const int num_x, const int num_y, const int num_z)

Non-uniform adjoint DFT kernel with R2* field-map gradients (CPU / OpenACC).

Adjoint of ftCpuWithGrads; applies conjugate phase and R2* weighting.

T1 : Floating-point precision type.

idata_r : Output image real part array, length num_i.

idata_i : Output image imaginary part array, length num_i.

kdata_r : Input k-space real part array, length num_k.

kdata_i : Input k-space imaginary part array, length num_k.

kx : k-space x-coordinates, length num_k.

ky : k-space y-coordinates, length num_k.

kz : k-space z-coordinates, length num_k.

ix : Image-space x-coordinates, length num_i.

iy : Image-space y-coordinates, length num_i.

iz : Image-space z-coordinates, length num_i.

FM : Off-resonance field map (rad/s), length num_i.

Gx : R2* gradient in x, length num_x.

Gy : R2* gradient in y, length num_y.

Gz : R2* gradient in z, length num_z.

t : Per-sample readout time (s), length num_k.

num_k : Number of k-space samples.

num_i : Number of image pixels.

num_x : Number of R2* gradient samples in x.

num_y : Number of R2* gradient samples in y.

num_z : Number of R2* gradient samples in z.

imag

template <typename T> forgeCol<T> imag(const forgeCol<forgeComplex<T>>& A)

Extract imaginary parts of a complex vector.

T : Floating-point base type.

A : Input complex vector.

Return : Real vector of imaginary parts, same length as A.

imageCoordinates2D

template <typename T1> void imageCoordinates2D(arma::uword Nx, arma::uword Ny, arma::Col<T1>& ix, arma::Col<T1>& iy, arma::Col<T1>& iz)

Generate image-space coordinates matching forge convention.

For a 2D image of size Nx x Ny, returns ix, iy, iz vectors where: ix(j,i,k) = (i - Nx/2) / Nx iy(j,i,k) = (j - Ny/2) / Ny iz = 0

Nx : Image width

Ny : Image height

ix : x image coordinates (Nx*Ny elements)

iy : y image coordinates

iz : z image coordinates (all zeros)

imageCoordinates3D

template <typename T1> void imageCoordinates3D( arma::uword Nx, arma::uword Ny, arma::uword Nz, arma::Col<T1>& ix, arma::Col<T1>& iy, arma::Col<T1>& iz)

Generate 3D image-space coordinates matching pcSENSE Cube(Nx,Ny,Nz) layout.

Nx, : Ny, Nz Image dimensions

ix, : iy, iz Normalized coordinates in [-0.5, 0.5)

kernel_value_LUT

template <typename T1> T1 kernel_value_LUT(T1 dist, const T1* LUT, uword sizeLUT, T1 width)

Evaluate the Kaiser-Bessel kernel from a precomputed lookup table.

Linearly interpolates between LUT entries.

T1 : Floating-point type.

dist : Distance from kernel centre (must be < width/2).

LUT : Precomputed LUT array.

sizeLUT : Number of LUT entries.

width : Kernel half-width (same units as dist).

Return : Interpolated kernel value at dist.

linearTimingVector

template <typename T1> arma::Col<T1> linearTimingVector(arma::uword nSamples, T1 readoutTime)

Generate a linear timing vector for a single readout.

nSamples : Number of samples

readoutTime : Total readout duration in seconds (e.g. 0.01 for 10ms)

Return : Timing vector from 0 to readoutTime

log1p

template <typename T, typename std::enable_if< !std::is_same<T, forgeComplex<float>>::value && !std::is_same<T, forgeComplex<double>>::value, int>::type = 0> forgeCol<T> log1p(const forgeCol<T>& A)

Element-wise log(1 + x) for a real vector.

T : Floating-point type (real only).

A : Input real vector.

Return : New vector with std::log1p(A[i]) at each position.

matmul

template <typename T> forgeMat<forgeComplex<T>> matmul(const forgeMat<forgeComplex<T>>& A, const forgeMat<forgeComplex<T>>& B)

Matrix-matrix multiply for complex forgeMat (conjugate-transpose NOT applied).

template <typename T> forgeCol<forgeComplex<T>> matmul(const forgeMat<forgeComplex<T>>& A, const forgeCol<forgeComplex<T>>& x)

Matrix-vector multiply for complex forgeMat × forgeCol.

template <typename T> forgeMat<T> matmul(const forgeMat<T>& A, const forgeMat<T>& B)

Matrix-matrix multiply for real forgeMat.

memptr

T* memptr() const

Return a raw pointer to the underlying data buffer.

metal_cvec_axpy

void metal_cvec_axpy( MetalVectorContext* ctx, const float* A, const float* B, float* C, float alphaRe, float alphaIm, size_t n)

Complex AXPY: C[i] = A[i] + (alphaRe + i*alphaIm) * B[i].

metal_cvec_cdot

void metal_cvec_cdot(MetalVectorContext* ctx, const float* A, const float* B, float* outRe, float* outIm, size_t n)

Complex dot product: sum(conj(A[i]) * B[i]).

metal_cvec_mul_scalar

void metal_cvec_mul_scalar(MetalVectorContext* ctx, const float* A, float alphaRe, float alphaIm, float* C, size_t n)

Complex scalar multiply: C[i] = (alphaRe + i*alphaIm) * A[i].

metal_cvec_norm2sq

float metal_cvec_norm2sq(MetalVectorContext* ctx, const float* A, size_t n)

L2 norm squared of complex vector: sum(|A[i]|^2).

metal_dft_adjoint

void metal_dft_adjoint( MetalDFTContext* ctx, const float* kdata_r, const float* kdata_i, float* idata_r, float* idata_i)

Adjoint DFT: k-space → image.

ctx : Active Metal DFT context.

kdata_r : Input k-space real part (num_k floats).

kdata_i : Input k-space imag part (num_k floats).

idata_r : Output image real part (num_i floats).

idata_i : Output image imag part (num_i floats).

metal_dft_create

MetalDFTContext* metal_dft_create(const float* kx, const float* ky, const float* kz, const float* ix, const float* iy, const float* iz, const float* FM, const float* t, unsigned int num_k, unsigned int num_i)

Create a Metal DFT context for Gdft (no gradient correction).

Uploads k-space trajectory, image coordinates, field map, and timing vector to persistent Metal buffers.

kx,ky,kz : k-space coordinates (num_k elements each)

ix,iy,iz : image-space coordinates (num_i elements each)

FM : field map in radians/sec (num_i elements)

t : timing vector in seconds (num_k elements)

num_k : number of k-space samples

num_i : number of image pixels

Return : Pointer to new context, or nullptr on failure.

metal_dft_create_with_grads

MetalDFTContext* metal_dft_create_with_grads(const float* kx, const float* ky, const float* kz, const float* ix, const float* iy, const float* iz, const float* FM, const float* t, const float* Gx, const float* Gy, const float* Gz, unsigned int num_k, unsigned int num_i, unsigned int num_x, unsigned int num_y, unsigned int num_z)

Create a Metal DFT context for GdftR2 (with gradient correction).

Same as metal_dft_create() but also uploads gradient maps and grid dims.

metal_dft_destroy

void metal_dft_destroy(MetalDFTContext* ctx)

Destroy a Metal DFT context and release resources.

metal_dft_forward

void metal_dft_forward( MetalDFTContext* ctx, const float* idata_r, const float* idata_i, float* kdata_r, float* kdata_i)

Forward DFT: image → k-space.

ctx : Active Metal DFT context.

idata_r : Input image real part (num_i floats).

idata_i : Input image imag part (num_i floats).

kdata_r : Output k-space real part (num_k floats).

kdata_i : Output k-space imag part (num_k floats).

metal_gridding_adjoint_2D

void metal_gridding_adjoint_2D(MetalGriddingContext* ctx, const float* dIn, float* pGridOut)

Adjoint (k-space → image) gridding for a 2D problem.

Zeroes the grid buffer, runs the adjoint scatter kernel on the GPU, and leaves the result in pGridOut (interleaved real/imag floats, gridNx * gridNy complex values).

ctx : Active Metal context.

dIn : Input k-space data, interleaved real/imag (2*numSamples floats).

pGridOut : Output oversampled grid buffer (2 * gridNx * gridNy floats).

metal_gridding_adjoint_3D

void metal_gridding_adjoint_3D(MetalGriddingContext* ctx, const float* dIn, float* pGridOut)

Adjoint (k-space → image) gridding for a 3D problem.

ctx : Active Metal context.

dIn : Input k-space data, interleaved real/imag (2*numSamples floats).

pGridOut : Output oversampled grid buffer (2 * gridNx * gridNy * gridNz floats).

metal_gridding_create

MetalGriddingContext* metal_gridding_create(int gridNx, int gridNy, int gridNz, int imageNx, int imageNy, int imageNz, float gridOS, float kernelWidth, const float* LUT, int sizeLUT, const float* kx, const float* ky, const float* kz, int numSamples)

Allocate and initialise a Metal gridding context.

Upload the Kaiser-Bessel LUT and k-space trajectory to Metal buffers once here so they need not be transferred on every operator call.

gridNx : Oversampled grid size in X (= ceil(gridOS * Nx))

gridNy : Oversampled grid size in Y (= ceil(gridOS * Ny))

gridNz : Oversampled grid size in Z (1 for 2D)

imageNx : Image size in X

imageNy : Image size in Y

imageNz : Image size in Z (1 for 2D)

gridOS : Grid oversampling factor

kernelWidth : Kaiser-Bessel kernel width in grid cells

LUT : Kaiser-Bessel lookup table (host pointer)

sizeLUT : Number of entries in LUT

kx : k-space coordinates in X (host pointer, numSamples elements)

ky : k-space coordinates in Y (host pointer, numSamples elements)

kz : k-space coordinates in Z (host pointer, numSamples elements; may be all-zero for 2D)

numSamples : Number of k-space samples per coil

Return : Pointer to a new context, or nullptr if Metal is unavailable (e.g. device does not support MTLGPUFamilyApple6).

metal_gridding_destroy

void metal_gridding_destroy(MetalGriddingContext* ctx)

Destroy a Metal gridding context and release all Metal resources.

ctx : Context previously created by metal_gridding_create().

metal_gridding_forward_2D

void metal_gridding_forward_2D(MetalGriddingContext* ctx, const float* pGridIn, float* pSamplesOut)

Forward (image → k-space) gridding for a 2D problem.

Reads from the oversampled grid pGridIn and accumulates into pSamplesOut (one complex value per k-space sample).

ctx : Active Metal context.

pGridIn : Input oversampled grid (2 * gridNx * gridNy floats).

pSamplesOut : Output k-space samples, interleaved real/imag (2*numSamples floats).

metal_gridding_forward_3D

void metal_gridding_forward_3D(MetalGriddingContext* ctx, const float* pGridIn, float* pSamplesOut)

Forward (image → k-space) gridding for a 3D problem.

ctx : Active Metal context.

pGridIn : Input oversampled grid (2 * gridNx * gridNy * gridNz floats).

pSamplesOut : Output k-space samples, interleaved real/imag (2*numSamples floats).

metal_nufft_adjoint

void metal_nufft_adjoint(MetalNufftPipelineContext* ctx, const float* samplesIn, float* imageOut)

Full adjoint NUFFT: k-space → image (all on GPU).

ctx : Active pipeline context.

samplesIn : Input k-space, interleaved complex (2 * numSamples floats).

imageOut : Output image, interleaved complex (2 * NxNyNz floats).

metal_nufft_forward

void metal_nufft_forward(MetalNufftPipelineContext* ctx, const float* imageIn, float* samplesOut)

Full forward NUFFT: image → k-space (all on GPU).

ctx : Active pipeline context.

imageIn : Input image, interleaved complex (2 * NxNyNz floats).

samplesOut : Output k-space, interleaved complex (2 * numSamples floats).

metal_nufft_pipeline_create

MetalNufftPipelineContext* metal_nufft_pipeline_create(int gridNx, int gridNy, int gridNz, int imageNx, int imageNy, int imageNz, float gridOS, float kernelWidth, float beta, const float* LUT, int sizeLUT, const float* kx, const float* ky, const float* kz, int numSamples)

Create a full GPU NUFFT pipeline context.

Uploads the Kaiser-Bessel LUT and k-space trajectory to Metal shared buffers, allocates working buffers for all intermediate stages, and pre-compiles MPSGraph FFT executables for the oversampled grid dimensions.

gridNx : Oversampled grid size in X (= ceil(gridOS * Nx), made even)

gridNy : Oversampled grid size in Y

gridNz : Oversampled grid size in Z (1 for 2D)

imageNx : Image size in X

imageNy : Image size in Y

imageNz : Image size in Z (1 for 2D)

gridOS : Grid oversampling factor

kernelWidth : Kaiser-Bessel kernel width in grid cells

beta : Kaiser-Bessel beta parameter

LUT : Kaiser-Bessel lookup table (host pointer)

sizeLUT : Number of entries in LUT

kx : k-space X coordinates (host, numSamples elements)

ky : k-space Y coordinates (host, numSamples elements)

kz : k-space Z coordinates (host, numSamples elements)

numSamples : Number of k-space samples per coil

Return : Pointer to a new context, or nullptr on failure.

metal_nufft_pipeline_destroy

void metal_nufft_pipeline_destroy(MetalNufftPipelineContext* ctx)

Destroy a pipeline context and release all GPU resources.

metal_rvec_cmul

void metal_rvec_cmul(MetalVectorContext* ctx, const float* W, const float* X, float* C, size_t n)

Real weight * complex: C[i] = W[i] * X[i]. W has n floats; X and C have 2*n floats.

metal_vec_sum

float metal_vec_sum(MetalVectorContext* ctx, const float* A, size_t n)

Sum of real float vector.

metal_vecops_dispatch_count

uint64_t metal_vecops_dispatch_count()

Total number of Metal command buffer commits (vector ops only, not gridding).

metal_vecops_reset_stats

void metal_vecops_reset_stats()

Reset both counters to zero.

metal_vecops_wait_seconds

double metal_vecops_wait_seconds()

Cumulative time in seconds spent in waitUntilCompleted (vector ops only).

metal_vector_get_context

MetalVectorContext* metal_vector_get_context()

Get or create the process-wide singleton context. Thread-safe (uses std::call_once internally). Returns nullptr if no Apple GPU is available.

norm

template <typename T> T norm(const forgeCol<forgeComplex<T>>& A)

L2 norm of a complex vector: √(Σᵢ |A[i]|²).

T : Floating-point base type.

A : Input complex vector.

Return : Non-negative scalar norm.

template <typename T> T norm(const forgeCol<T>& A)

L2 norm of a real vector: √(Σᵢ A[i]²).

T : Floating-point type (real only).

A : Input real vector.

Return : Non-negative scalar norm.

norm_grad

template <typename T> T norm_grad(const forgeCol<forgeComplex<T>>& g, const forgeCol<forgeComplex<T>>& yi, const forgeCol<T>& W)

Scale-invariant gradient magnitude for PCG convergence testing.

Returns ‖g‖ / |Re(yᵀ · (W ⊙ y))|, where yᵀ is the non-conjugate transpose. This normalisation makes the convergence criterion independent of the data scale.

T : Floating-point base type.

g : Current gradient vector.

yi : Measured k-space data vector.

W : Non-negative data weight vector, same length as yi.

Return : Normalised gradient magnitude (returns 0 if denominator is ~0).

template <typename T1> inline T1 norm_grad(const Col<complex<T1>>& g, const Col<complex<T1>>& yi, const Col<T1>& W)

Normalized gradient magnitude for convergence testing.

Returns \(\|\mathbf{g}\| / (\mathbf{y}_i^H \mathbf{W} \mathbf{y}_i)\), providing a scale-invariant stopping criterion.

T1 : Floating-point base type.

g : Current gradient vector.

yi : Measured k-space data vector.

W : Non-negative data weights vector, same length as yi.

Return : Normalized gradient magnitude.

normalize_fft2d

template <typename T1> void normalize_fft2d(T1* __restrict pDst, T1* __restrict pSrc, int gridSizeX, int gridSizeY)

Normalise a 2-D interleaved buffer by $1/(n_x \times n_y)$ after an inverse FFT.

normalize_fft3d

template <typename T1> void normalize_fft3d(T1* __restrict pDst, T1* __restrict pSrc, int gridSizeX, int gridSizeY, int gridSizeZ)

Normalise a 3-D interleaved buffer by $1/(n_x \times n_y \times n_z)$ after an inverse FFT.

ones

void ones()

Set all elements to one (GPU-aware via OpenACC).

void ones()

Set all elements to one (GPU-aware via OpenACC).

openISMRMRDData

inline void openISMRMRDData( std::string inputDataFile, ISMRMRD::Dataset*& d, ISMRMRD::IsmrmrdHeader& hdr, acqTracking*& acqTrack)

Open an ISMRMRD dataset and initialise bookkeeping structures.

Allocates and opens the ISMRMRD::Dataset object, reads and deserialises the XML header, and constructs an acqTracking object for the file.

inputDataFile : Path to the ISMRMRD HDF5 file.

d : Output: pointer to the newly opened dataset.

hdr : Output: parsed ISMRMRD header.

acqTrack : Output: pointer to the newly created acqTracking object.

pg_isnan_float

static inline bool pg_isnan_float(float x)

Bit-level NaN detection that works even under -ffast-math / -Ofast.
IEEE 754: NaN has all exponent bits set and non-zero mantissa.

processISMRMRDInput

template <typename T1> void processISMRMRDInput(std::string inputDataFile, ISMRMRD::Dataset*& d, ISMRMRD::IsmrmrdHeader& hdr, forgeCol<T1>& FM, forgeCol<forgeComplex<T1>>& sen, acqTracking*& acqTrack)

One-shot convenience: open dataset, read field map and SENSE map.

randomShotPhase3D

template <typename T1> arma::Col<T1> randomShotPhase3D( arma::uword Nx, arma::uword Ny, arma::uword Nz, arma::uword Ns, T1 maxPhase, unsigned int seed = 42)

Generate random per-shot linear phase ramps (models motion-induced phase).

Each shot gets a random linear ramp: PMap(x,y,z,ss) = gxx + gyy + gz*z with random coefficients in [-maxPhase, maxPhase].

Nx, : Ny, Nz Image dimensions

Ns : Number of shots

maxPhase : Maximum phase gradient magnitude (radians)

seed : RNG seed for reproducibility

Return : Vectorized phase map (Ni*Ns elements, radians)

real

template <typename T> forgeCol<T> real(const forgeCol<forgeComplex<T>>& A)

Extract real parts of a complex vector.

T : Floating-point base type.

A : Input complex vector.

Return : Real vector of real parts, same length as A.

reconSolve

template <typename T1, typename TObj, typename RObj> forgeCol<forgeComplex<T1>> reconSolve(forgeCol<forgeComplex<T1>> data, TObj& Sg, RObj R, Col<T1> kx, Col<T1> ky, Col<T1> kz, uword Nx, uword Ny, uword Nz, Col<T1> tvec, uword niter, size_t parent_bar_idx = PG_NO_PARENT_BAR, size_t parent_bar_offset = 0)

High-level iterative MRI image reconstruction via PCG.

Workflow: 1. Constructs uniform data weights \(\mathbf{W} = \mathbf{1}\) (length equal to the data vector). 2. Creates a zero-filled initial image estimate \(\mathbf{x}_0\) of size \(N_x \times N_y \times N_z\). 3. Calls solve_pwls_pcg to solve the PWLS objective.

Image-space coordinates (set up by initImageSpaceCoords) span \([0, (N-1)/N]\) in each dimension with column-major (x-fastest) ordering.

// Full reconstruction pipeline
forgeCol<float> ix, iy, iz;
initImageSpaceCoords<float>(ix, iy, iz, Nx, Ny, Nz);
Gnufft<float> G(n1, 2.0f, Nx, Ny, Nz, kx, ky, kz, ix, iy, iz);
SENSE<float, Gnufft<float>> S(G, SMap, nc);
QuadPenalty<float> R(Nx, Ny, Nz, beta);
auto xhat = reconSolve<float>(data, S, R, kx, ky, kz, Nx, Ny, Nz, tvec, niter);

See : solve_pwls_pcg

See : initImageSpaceCoords

T1 : Floating-point precision type (float or double).

TObj : Encoding operator type (e.g., SENSE<T1, Gnufft<T1>>).

RObj : Regularization object type (e.g., QuadPenalty<T1>).

data : Measured k-space data (stacked multi-coil if using SENSE), length n1.

Sg : Encoding operator.

R : Regularization penalty object.

kx : k-space x-coordinates, length n1 (unused by this function; provided for context).

ky : k-space y-coordinates, length n1.

kz : k-space z-coordinates, length n1.

Nx : Image size in x.

Ny : Image size in y.

Nz : Image size in z (use 1 for 2-D).

tvec : Per-sample readout time vector in seconds, length n1.

niter : Maximum number of PCG iterations.

Return : Reconstructed image vector, length \(N_x \cdot N_y \cdot N_z\).

reset_mem

void reset_mem()

Release ownership of internal memory without freeing it (used by move semantics).

void reset_mem()

Release ownership of internal memory without freeing it (used by move semantics).

reshape

template <typename T> forgeMat<T> reshape(const forgeMat<T>& A, uword rows, uword cols)

Reshape a forgeMat to new dimensions (column-major order preserved).

rotationMatrixToQuaternion

inline void rotationMatrixToQuaternion(const float R[3][3], float q[4])

Convert a 3x3 rotation matrix to a NIfTI quaternion (a, b, c, d).

Follows Shepperd's method as described in the NIfTI-1 specification. The output quaternion satisfies q[0] >= 0 (canonical form). Only valid for proper rotations (det = +1). The caller must apply qfac to the slice column before calling this function.

R : 3x3 rotation matrix.

q : Output quaternion: q[0]=a (w), q[1]=b, q[2]=c, q[3]=d.

serialize

template <class Archive> void serialize(Archive& ar, const unsigned int /*version*/)

Boost.Serialization support for MPI communication.

template <class Archive> void serialize(Archive& ar, const unsigned int /*version*/)

Boost.Serialization support for MPI communication.

set_col

void set_col(uword colIndx, const forgeCol<T>& src)

Write a forgeCol into column colIndx (memcpy).

set_size

void set_size(uword length)

Allocate (or reallocate) storage for `length` elements, freeing any prior allocation.

void set_size(uword nCols, uword nRows)

Allocate (or reallocate) storage for nCols x nRows elements. Note: first arg is COLUMNS.

sheppLogan2D

template <typename T1> arma::Col<std::complex<T1>> sheppLogan2D(arma::uword Nx, arma::uword Ny)

Generate a 2D modified Shepp-Logan phantom image.

Nx : Number of pixels in x direction

Ny : Number of pixels in y direction

Return : Nx*Ny x 1 complex column vector (vectorized image, column-major)

solve_pwls_pcg

template <typename T1, typename Tobj, typename Robj> forgeCol<forgeComplex<T1>> solve_pwls_pcg(const forgeCol<forgeComplex<T1>>& xInitial, Tobj const& A, forgeCol<T1> const& W_pg, forgeCol<forgeComplex<T1>> const& yi_pg, Robj const& R, uword niter, size_t parent_bar_idx = PG_NO_PARENT_BAR, size_t parent_bar_offset = 0)

Solve a penalized weighted least-squares problem via preconditioned CG.
\[\hat{\mathbf{x}} = - \min_{\mathbf{x}} \|\mathbf{W}^{1/2}(\mathbf{y} - A\mathbf{x})\|^2 + R(\mathbf{x})\]

The gradient is \(\nabla = A^H W(A\mathbf{x} - \mathbf{y}) + \nabla R(\mathbf{x})\). Uses Polak-Ribiere-Polyak conjugate direction update with a quadratic surrogate step-size selection.

This is the primary overload accepting and returning forgeCol types.

Convergence thresholds: - 1e-10: normalized gradient magnitude below this triggers early stop - 1e-20: denominator guard in step-size computation to avoid division by zero

Thread safety: reads the g_should_stop atomic for cooperative early termination (e.g., on SIGINT).

Gnufft<float> G(n1, 2.0f, Nx, Ny, Nz, kx, ky, kz, ix, iy, iz);
SENSE<float, Gnufft<float>> S(G, SMap, nc);
QuadPenalty<float> R(Nx, Ny, 1, 0.001f);
forgeCol<float> W(n1 * nc);  W.ones();
forgeCol<forgeComplex<float>> x0(Nx*Ny);  x0.zeros();
auto xhat = solve_pwls_pcg<float>(x0, S, W, yi, R, 50);

See : Fessler, "Penalized Weighted Least-Squares Image Reconstruction for Positron Emission Tomography," IEEE TMI, 1994. https://doi.org/10.1109/42.363108

T1 : Floating-point precision type (float or double).

Tobj : Encoding operator type (must support A * x and A / y).

Robj : Regularization object type (must support Gradient, Denom).

xInitial : Initial image estimate, length n2.

A : Encoding operator (forward: image->data, adjoint: data->image).

W_pg : Non-negative data weights, length n1.

yi_pg : Measured k-space data, length n1.

R : Regularization penalty object.

niter : Maximum number of CG iterations.

parent_bar_idx : Index of an enclosing forgeview progress bar to tick after each iteration, or PG_NO_PARENT_BAR (default).

parent_bar_offset : Tick offset within the parent bar (default 0).

Return : Reconstructed image estimate as forgeCol, length n2.

spiralTrajectory2D

template <typename T1> void spiralTrajectory2D(arma::uword nInterleaves, arma::uword nSamplesPerArm, arma::uword Nx, arma::Col<T1>& kx, arma::Col<T1>& ky, arma::Col<T1>& kz)

Generate a 2D Archimedean spiral k-space trajectory.

nInterleaves : Number of spiral interleaves

nSamplesPerArm : Samples per spiral arm

Nx : Image dimension (determines kmax = Nx/2)

kx : k-space x coordinates

ky : k-space y coordinates

kz : k-space z coordinates (all zeros)

sqrt

template <typename T, typename std::enable_if< !std::is_same<T, forgeComplex<float>>::value && !std::is_same<T, forgeComplex<double>>::value, int>::type = 0> forgeCol<T> sqrt(const forgeCol<T>& A)

Element-wise square root for a real vector.

T : Floating-point type (real only).

A : Non-negative real input vector.

Return : New vector with std::sqrt(A[i]) at each position.

square

template <typename T, typename std::enable_if< !std::is_same<T, forgeComplex<float>>::value && !std::is_same<T, forgeComplex<double>>::value, int>::type = 0> forgeCol<T> square(const forgeCol<T>& A)

Element-wise square of a real vector: out[i] = A[i]².

T : Floating-point type (real only).

A : Input real vector.

Return : New vector of squared values.

stackOfSpiralsTimingVector

template <typename T1> arma::Col<T1> stackOfSpiralsTimingVector( arma::uword nSamplesPerArm, arma::uword nKzAcq, arma::uword Ns, T1 readoutTimePerArm)

Generate timing vector for stack-of-spirals readout.

Each shot acquires kz encodes sequentially, each taking readoutTimePerArm. Timing resets to 0 for each shot (pcSENSE reshapes to Mat(Nd,Ns) columns).

nSamplesPerArm : Samples per spiral arm

nKzAcq : Acquired kz encodes per shot

Ns : Number of shots

readoutTimePerArm : Readout duration per spiral arm (seconds)

Return : Timing vector of length Nd_per_shot * Ns

strans

template <typename T> forgeMat<forgeComplex<T>> strans(const forgeMat<forgeComplex<T>>& A)

Non-conjugate (simple) transpose of a complex forgeMat.

subvec

forgeCol<T> subvec(uword first, uword last)

Return a non-owning view of elements [first..last] (inclusive).
The view must not outlive this forgeCol (or the underlying memory if this is itself a view).

sum

template <typename T> const forgeComplex<T> sum(const forgeCol<forgeComplex<T>>& pgA)

Sum all elements of a complex forgeCol (separate real/imag reductions for OpenACC).

template <typename T> const T sum(const forgeCol<T>& pgA)

Sum all elements of a real forgeCol, dispatching to Accelerate vDSP on Apple.

template <typename T> const forgeCol<forgeComplex<T>> sum(const forgeMat<forgeComplex<T>>& pgA, const uword dim = 0)

Sum a complex forgeMat along the given dimension (0=column-wise, 1=row-wise).

template <typename T> const forgeCol<T> sum(const forgeMat<T>& pgA, const uword dim = 0)

Sum a real forgeMat along the given dimension (0=column-wise, 1=row-wise).

to_complex

template <typename T, typename std::enable_if< !std::is_same<T, forgeComplex<float>>::value && !std::is_same<T, forgeComplex<double>>::value, int>::type = 0> forgeCol<forgeComplex<T>> to_complex(const forgeCol<T>& A)

Lift a real vector to complex with zero imaginary part.

T : Floating-point type (real only).

A : Real input vector.

Return : Complex vector with A[i] as real part and 0 as imaginary part.

trans

template <typename T> forgeMat<forgeComplex<T>> trans(const forgeMat<forgeComplex<T>>& A)

Conjugate transpose of a complex forgeMat.

template <typename T> forgeMat<T> trans(const forgeMat<T>& A)

Transpose of a real forgeMat.

vectorise

template <typename T> const forgeCol<T> vectorise(const forgeMat<T>& pgA)

Flatten a forgeMat into a single forgeCol in column-major order.

writeISMRMRDImageData

template <typename T1> void writeISMRMRDImageData(ISMRMRD::Dataset* d, forgeCol<forgeComplex<T1>>& image, uword Nx, uword Ny, uword Nz)

Write a reconstructed complex image back into the ISMRMRD dataset.

writeNiftiMagPhsImage

template <typename T1> void writeNiftiMagPhsImage(std::string filename, forgeCol<forgeComplex<T1>> imageData, uword Nx, uword Ny, uword Nz, uword Nslab = 1, uword N_TR = 1, const NiftiOrientation& orient = NiftiOrientation { })

Write a complex-valued image as two NIfTI files: magnitude (`_mag`) and phase (`_phs`).

writeNiftiRealImage

template <typename T1> void writeNiftiRealImage(std::string filename, forgeCol<T1> imageData, uword Nx, uword Ny, uword Nz = 1, uword Nslab = 1, uword N_TR = 1, const NiftiOrientation& orient = NiftiOrientation { })

Write a real-valued image volume to a NIfTI-1 (.nii) file.

filename : Base filename (without .nii extension, which is appended automatically).

imageData : Flat column vector of image voxels (column-major, NxNyNzNslabN_TR).

Nx : Image size along readout.

Ny : Image size along phase-encode.

Nz : Image size along slice/partition (default 1 for 2-D).

Nslab : Number of slabs (default 1).

N_TR : Number of time frames (default 1).

orient : Optional NiftiOrientation with voxel sizes and direction cosines.

zero_pad2d

template <typename T1> void zero_pad2d(T1* __restrict pDst, T1* __restrict pSrc, int imageSizeX, int imageSizeY, T1 gridOS)

Zero-pad a 2-D image onto an oversampled grid (inverse of crop_center_region2d).

zero_pad3d

template <typename T1> void zero_pad3d(T1* __restrict pDst, T1* __restrict pSrc, int imageSizeX, int imageSizeY, int imageSizeZ, T1 gridOS)

Zero-pad a 3-D image onto an oversampled grid (inverse of crop_center_region3d).

zeros

void zeros()

Set all elements to zero (GPU-aware via OpenACC).

void zeros()

Set all elements to zero (GPU-aware via OpenACC).