Skip to content

MPI Executable Modernization

Date: 2026-03-16 Status: Approved Closes: Issue #2

Problem

The MPI executables (forgeSenseMPI, forgePcSenseMPI, forgePcSenseMPI_TS) fail to build due to two errors:

  1. op_circshift redefinition — Armadillo 15.2.3 now natively defines arma::op_circshift, but the custom Support/ArmaExtensions/op_circshift_bones.hpp also defines it inside namespace arma {}. The custom version has a different interface (apply() signature, positive_modulo helper) than the native one. The existing #ifndef include guard in arma_extensions.h does NOT prevent this because the conflict is between Armadillo's own definition and the custom one, not double-inclusion of the custom file.

  2. reconSolve extern template signature mismatch — MPI headers declare old Col<complex<T1>> signatures, but reconSolve now uses forgeCol<forgeComplex<T1>> with two additional size_t parameters.

Beyond the build errors, the MPI code is pre-2024 vintage: raw Armadillo types, std::cout logging, no exit codes, no signal handling, no CLI parity with the primary executables. This modernization brings them up to the same standard before public release.

Approach

Full modernization of all MPI executables and helper classes to match the primary executables (forgeSense, forgePcSense, forgePcSenseTimeSeg).

Components

1. Remove custom op_circshift (use Armadillo native)

Root cause: Armadillo 15.2.3 natively provides arma::circshift() and arma::op_circshift. The custom op_circshift_bones.hpp, op_circshift_meat.hpp, and fn_circshift.hpp in Support/ArmaExtensions/ conflict with the native definitions.

Fix: Remove the custom op_circshift files entirely and update call sites to use Armadillo's native circshift() API.

API difference: - Custom (forge): circshift(X, dim, shift) — dimension first, shift second - Native (Armadillo): circshift(X, shift, dim) — shift first, dimension second

Call sites to update (only in forge/FFT/fftshift.hpp):

// Old: circshift(X, dim, std::floor(size / 2))
// New: circshift(X, static_cast<sword>(std::floor(size / 2)), dim)

// Old: circshift(circshift(X, 0, std::floor(X.n_rows / 2)), 1, std::floor(X.n_cols / 2))
// New: circshift(circshift(X, static_cast<sword>(std::floor(X.n_rows / 2)), 0),
//                static_cast<sword>(std::floor(X.n_cols / 2)), 1)

Note: Armadillo's circshift takes sword (signed) for the shift amount, not uword. The static_cast<sword> ensures correct type matching.

Files to modify: - Support/ArmaExtensions/arma_extensions.h — remove #include of the three op_circshift / fn_circshift headers from the namespace arma {} block - forge/FFT/fftshift.hpp — swap argument order at 2 call sites

Files to delete: - Support/ArmaExtensions/op_circshift_bones.hpp - Support/ArmaExtensions/op_circshift_meat.hpp - Support/ArmaExtensions/fn_circshift.hpp

2. Fix reconSolve extern templates

Update declarations in mpipcSENSE.h, mpipcSENSETimeSeg.h, and explicit instantiations in mpipcSENSE.cpp, mpipcSENSETimeSeg.cpp to match the current reconSolve signature using forgeCol<forgeComplex<T1>> types and the two trailing size_t parameters.

The mpipcSENSE and mpipcSENSETimeSeg operator classes' operator* and operator/ must also be updated to accept and return forgeCol<forgeComplex<T1>> to match what reconSolve passes them.

3. Type modernization

Executables (forgeSenseMPI.cpp, forgePcSenseMPI.cpp, forgePcSenseMPI_TS.cpp): Convert local variables from raw Armadillo to forge types:

Old New
Col<float> forgeCol<float>
Col<std::complex<float>> forgeCol<forgeComplex<float>>
std::complex<float> forgeComplex<float>

Update processISMRMRDInput, getISMRMRDCompleteSENSEMap, getCompleteISMRMRDAcqData template parameters to use float (base type) matching primary executables.

MPI helper classes (mpipcSENSE.h/.cpp, mpipcSENSETimeSeg.h/.cpp): - Operator interface (operator*, operator/) converts to forgeCol<forgeComplex<T1>> to match reconSolve. - All internal member variables and computation uses forge types throughout. - MPI communication (bmpi::gather, bmpi::broadcast) works directly with forge types via new Boost.Serialization support (see section 3a).

3a. Boost.Serialization support for forgeCol and forgeMat

Files: forge/Core/forgeCol.hpp, forge/Core/forgeMat.hpp

Add serialize() methods to enable bmpi::gather/bmpi::broadcast with forge types directly, eliminating the need for Armadillo types at MPI boundaries.

forgeCol:

#ifdef ForgeMPI
template <class Archive>
void serialize(Archive& ar, const unsigned int /*version*/)
{
    arma::uword len = n_elem;
    ar & len;
    if (Archive::is_loading::value) {
        set_size(len);
    }
    ar & boost::serialization::make_array(mem, n_elem);
}
#endif

forgeMat:

#ifdef ForgeMPI
template <class Archive>
void serialize(Archive& ar, const unsigned int /*version*/)
{
    arma::uword nr = n_rows, nc = n_cols;
    ar & nr;
    ar & nc;
    if (Archive::is_loading::value) {
        set_size(nc, nr); // note: set_size takes (nCols, nRows)
    }
    ar & boost::serialization::make_array(mem, n_rows * n_cols);
}
#endif

These are guarded by #ifdef ForgeMPI so non-MPI builds don't pull in Boost headers. The forgeComplex<T> type is layout-compatible with two consecutive T values, so make_array on the raw memory works correctly for both real and complex element types.

The ARMA_EXTRA_COL_PROTO/MEAT and ARMA_EXTRA_MAT_PROTO/MEAT macros in arma_extensions.h are retained for backward compatibility with any code that serializes raw Armadillo types, but MPI helper classes will use forge types exclusively.

Encoding limits: Change from +1 convention to raw max with <= loops. Note: forgePcSenseMPI.cpp currently uses +1 with <= which is a double-counting bug — this fix corrects it.

4. CLI and infrastructure

Each MPI executable gains the full modern CLI and infrastructure.

New includes:

#include "../forge/Core/ForgeExitCodes.hpp"
#include "../forge/Core/ForgeLog.hpp"
#include "../forge/Core/SignalHandler.hpp"

New CLI flags:

Flag Behavior
--version Print FORGE v{VERSION} ({binary}) built {DATE} and exit 0
--device auto (default): rank-based assignment (rank % ngpus). Integer N: explicit GPU.
--log-level spdlog level (trace, debug, info, warn, error). Default: info
--no-tui Disable forgeview spawning (JSONL continues)
--no-jsonl Disable JSONL, use classic spdlog+indicators
--jsonl-output JSONL destination (stderr/stdout/file). Implies --no-tui.
--mpi-log-all Enable all ranks logging to stderr/JSONL (default: rank 0 only)

--device auto implementation:

std::string device_str = vm["device"].as<std::string>();
if (device_str == "auto") {
    #ifdef OPENACC_GPU
    int ngpus = acc_get_num_devices(acc_device_nvidia);
    if (ngpus > 0) {
        acc_set_device_num(world.rank() % ngpus, acc_device_nvidia);
    }
    #endif
} else {
    int device = std::stoi(device_str);
    // Same validation as primary executables
}

Note: MPI executables use po::value<std::string>()->default_value("auto") for --device (unlike primary executables which use po::value<int>()).

MPI logging:

  • FORGE_LOG_INIT called on all ranks.
  • Rank 0: full stderr/JSONL + file logging.
  • Other ranks: file-only logging (suppress stderr/JSONL sink) unless --mpi-log-all.
  • When --mpi-log-all is set, use spdlog::set_pattern(fmt::format("[rank {}] %+", world.rank())) to prefix log messages with rank number.
  • FORGE_TUI_START/FORGE_TUI_EXIT called on rank 0 only.

Exit codes: Use ForgeExitCode enum. --helpSuccess (0).

Important: Fix po::notify(vm) ordering — currently called BEFORE --help check in all 3 MPI executables. Must be moved AFTER --help/--version checks (matching primary executables) to avoid throwing on missing required args.

Signal handling: forge_install_signal_handlers() on all ranks. Note: in MPI environments, mpirun sends SIGINT to all processes. If g_should_stop is detected and ranks exit at different times while others are blocked in MPI collectives, MPI_Finalize() may hang. Document this as a known limitation; recommend mpirun --signal SIGINT for clean shutdown.

Progress tracking: Rank 0 creates progress bars via PG_PROGRESS_ADD and passes recon_bar_idx/iter_offset to reconSolve. Other ranks pass PG_NO_PARENT_BAR.

Image dims: FORGE_SET_IMAGE_DIMS(Nx, Ny, Nz) on rank 0.

5. Replace std::cout with structured logging

All std::cout statements in MPI executables and helper classes replaced with FORGE_DEBUG/FORGE_INFO. Remove using namespace arma and using namespace std.

Note: forgeSenseMPI.cpp has more std::cout instances than the other two — all must be converted, gated to rank 0 where appropriate.

Testing

Build verification

cmake -B build_mpi -S . -DMETAL_COMPUTE=ON -DOPENACC_GPU=OFF -DMPISupport=ON
cmake --build build_mpi -j4

All MPI targets must compile and link without errors.

Existing test suites

cpu_tests and metal_tests must continue to pass — the op_circshift removal and fftshift.hpp argument swap affect shared code.

CLI tests

Add [CLI_MPI] tagged tests:

  • forgeSenseMPI --version exits 0 with correct version string
  • forgeSenseMPI --help exits 0
  • forgeSenseMPI --device auto --no-jsonl -i /nonexistent.h5 -o /tmp/ -F NUFFT exits with validation error (2)

Single-process execution (no mpirun needed).

Functional test

mpirun -np 2 forgeSenseMPI --version — verify both ranks print version and exit 0.

Documentation

  • README.md: Add MPI section with --device auto, --mpi-log-all.
  • CHANGELOG.md: MPI modernization entry, op_circshift removal note.
  • CLAUDE.md: Note MPI executables are modernized.

Files Modified

File Change
Support/ArmaExtensions/arma_extensions.h Remove op_circshift/fn_circshift includes
forge/FFT/fftshift.hpp Swap circshift argument order to match Armadillo native
forge/Core/forgeCol.hpp Add Boost.Serialization serialize() method (guarded by ForgeMPI)
forge/Core/forgeMat.hpp Add Boost.Serialization serialize() method (guarded by ForgeMPI)
apps/MPI/forgeSenseMPI.cpp Full modernization: types, CLI, logging, exit codes, signals
apps/MPI/forgePcSenseMPI.cpp Same
apps/MPI/forgePcSenseMPI_TS.cpp Same
apps/MPI/mpipcSENSE.h Update operator interface to forgeCol, extern template, logging
apps/MPI/mpipcSENSE.cpp Update types (Armadillo at MPI boundary), instantiation, logging
apps/MPI/mpipcSENSETimeSeg.h Same as mpipcSENSE.h
apps/MPI/mpipcSENSETimeSeg.cpp Same as mpipcSENSE.cpp
forge/Tests/CLITests.cpp Add MPI CLI tests
README.md MPI documentation
CHANGELOG.md MPI modernization + op_circshift removal
CLAUDE.md Update MPI status

Files deleted

File Reason
Support/ArmaExtensions/op_circshift_bones.hpp Replaced by Armadillo native
Support/ArmaExtensions/op_circshift_meat.hpp Replaced by Armadillo native
Support/ArmaExtensions/fn_circshift.hpp Replaced by Armadillo native

Files reviewed, unchanged

File Reason
forge/Core/ForgeIncludes.h Still includes arma_extensions.h for MPI serialization support
forge/Solvers/reconSolve.h Signature unchanged; MPI declarations updated to match
apps/MPI/CMakeLists.txt Build targets unchanged