MPI Executable Modernization¶
Date: 2026-03-16 Status: Approved Closes: Issue #2
Problem¶
The MPI executables (forgeSenseMPI, forgePcSenseMPI, forgePcSenseMPI_TS) fail
to build due to two errors:
-
op_circshiftredefinition — Armadillo 15.2.3 now natively definesarma::op_circshift, but the customSupport/ArmaExtensions/op_circshift_bones.hppalso defines it insidenamespace arma {}. The custom version has a different interface (apply()signature,positive_modulohelper) than the native one. The existing#ifndefinclude guard inarma_extensions.hdoes NOT prevent this because the conflict is between Armadillo's own definition and the custom one, not double-inclusion of the custom file. -
reconSolveextern template signature mismatch — MPI headers declare oldCol<complex<T1>>signatures, butreconSolvenow usesforgeCol<forgeComplex<T1>>with two additionalsize_tparameters.
Beyond the build errors, the MPI code is pre-2024 vintage: raw Armadillo types,
std::cout logging, no exit codes, no signal handling, no CLI parity with the
primary executables. This modernization brings them up to the same standard before
public release.
Approach¶
Full modernization of all MPI executables and helper classes to match the primary executables (forgeSense, forgePcSense, forgePcSenseTimeSeg).
Components¶
1. Remove custom op_circshift (use Armadillo native)¶
Root cause: Armadillo 15.2.3 natively provides arma::circshift() and
arma::op_circshift. The custom op_circshift_bones.hpp, op_circshift_meat.hpp,
and fn_circshift.hpp in Support/ArmaExtensions/ conflict with the native
definitions.
Fix: Remove the custom op_circshift files entirely and update call sites
to use Armadillo's native circshift() API.
API difference:
- Custom (forge): circshift(X, dim, shift) — dimension first, shift second
- Native (Armadillo): circshift(X, shift, dim) — shift first, dimension second
Call sites to update (only in forge/FFT/fftshift.hpp):
// Old: circshift(X, dim, std::floor(size / 2))
// New: circshift(X, static_cast<sword>(std::floor(size / 2)), dim)
// Old: circshift(circshift(X, 0, std::floor(X.n_rows / 2)), 1, std::floor(X.n_cols / 2))
// New: circshift(circshift(X, static_cast<sword>(std::floor(X.n_rows / 2)), 0),
// static_cast<sword>(std::floor(X.n_cols / 2)), 1)
Note: Armadillo's circshift takes sword (signed) for the shift amount, not
uword. The static_cast<sword> ensures correct type matching.
Files to modify:
- Support/ArmaExtensions/arma_extensions.h — remove #include of the three
op_circshift / fn_circshift headers from the namespace arma {} block
- forge/FFT/fftshift.hpp — swap argument order at 2 call sites
Files to delete:
- Support/ArmaExtensions/op_circshift_bones.hpp
- Support/ArmaExtensions/op_circshift_meat.hpp
- Support/ArmaExtensions/fn_circshift.hpp
2. Fix reconSolve extern templates¶
Update declarations in mpipcSENSE.h, mpipcSENSETimeSeg.h, and explicit
instantiations in mpipcSENSE.cpp, mpipcSENSETimeSeg.cpp to match the current
reconSolve signature using forgeCol<forgeComplex<T1>> types and the two
trailing size_t parameters.
The mpipcSENSE and mpipcSENSETimeSeg operator classes' operator* and
operator/ must also be updated to accept and return forgeCol<forgeComplex<T1>>
to match what reconSolve passes them.
3. Type modernization¶
Executables (forgeSenseMPI.cpp, forgePcSenseMPI.cpp, forgePcSenseMPI_TS.cpp):
Convert local variables from raw Armadillo to forge types:
| Old | New |
|---|---|
Col<float> |
forgeCol<float> |
Col<std::complex<float>> |
forgeCol<forgeComplex<float>> |
std::complex<float> |
forgeComplex<float> |
Update processISMRMRDInput, getISMRMRDCompleteSENSEMap,
getCompleteISMRMRDAcqData template parameters to use float (base type)
matching primary executables.
MPI helper classes (mpipcSENSE.h/.cpp, mpipcSENSETimeSeg.h/.cpp):
- Operator interface (operator*, operator/) converts to
forgeCol<forgeComplex<T1>> to match reconSolve.
- All internal member variables and computation uses forge types throughout.
- MPI communication (bmpi::gather, bmpi::broadcast) works directly with
forge types via new Boost.Serialization support (see section 3a).
3a. Boost.Serialization support for forgeCol and forgeMat¶
Files: forge/Core/forgeCol.hpp, forge/Core/forgeMat.hpp
Add serialize() methods to enable bmpi::gather/bmpi::broadcast with forge
types directly, eliminating the need for Armadillo types at MPI boundaries.
forgeCol:
#ifdef ForgeMPI
template <class Archive>
void serialize(Archive& ar, const unsigned int /*version*/)
{
arma::uword len = n_elem;
ar & len;
if (Archive::is_loading::value) {
set_size(len);
}
ar & boost::serialization::make_array(mem, n_elem);
}
#endif
forgeMat:
#ifdef ForgeMPI
template <class Archive>
void serialize(Archive& ar, const unsigned int /*version*/)
{
arma::uword nr = n_rows, nc = n_cols;
ar & nr;
ar & nc;
if (Archive::is_loading::value) {
set_size(nc, nr); // note: set_size takes (nCols, nRows)
}
ar & boost::serialization::make_array(mem, n_rows * n_cols);
}
#endif
These are guarded by #ifdef ForgeMPI so non-MPI builds don't pull in Boost
headers. The forgeComplex<T> type is layout-compatible with two consecutive
T values, so make_array on the raw memory works correctly for both real
and complex element types.
The ARMA_EXTRA_COL_PROTO/MEAT and ARMA_EXTRA_MAT_PROTO/MEAT macros in
arma_extensions.h are retained for backward compatibility with any code
that serializes raw Armadillo types, but MPI helper classes will use forge
types exclusively.
Encoding limits: Change from +1 convention to raw max with <= loops.
Note: forgePcSenseMPI.cpp currently uses +1 with <= which is a
double-counting bug — this fix corrects it.
4. CLI and infrastructure¶
Each MPI executable gains the full modern CLI and infrastructure.
New includes:
#include "../forge/Core/ForgeExitCodes.hpp"
#include "../forge/Core/ForgeLog.hpp"
#include "../forge/Core/SignalHandler.hpp"
New CLI flags:
| Flag | Behavior |
|---|---|
--version |
Print FORGE v{VERSION} ({binary}) built {DATE} and exit 0 |
--device |
auto (default): rank-based assignment (rank % ngpus). Integer N: explicit GPU. |
--log-level |
spdlog level (trace, debug, info, warn, error). Default: info |
--no-tui |
Disable forgeview spawning (JSONL continues) |
--no-jsonl |
Disable JSONL, use classic spdlog+indicators |
--jsonl-output |
JSONL destination (stderr/stdout/file). Implies --no-tui. |
--mpi-log-all |
Enable all ranks logging to stderr/JSONL (default: rank 0 only) |
--device auto implementation:
std::string device_str = vm["device"].as<std::string>();
if (device_str == "auto") {
#ifdef OPENACC_GPU
int ngpus = acc_get_num_devices(acc_device_nvidia);
if (ngpus > 0) {
acc_set_device_num(world.rank() % ngpus, acc_device_nvidia);
}
#endif
} else {
int device = std::stoi(device_str);
// Same validation as primary executables
}
Note: MPI executables use po::value<std::string>()->default_value("auto") for
--device (unlike primary executables which use po::value<int>()).
MPI logging:
FORGE_LOG_INITcalled on all ranks.- Rank 0: full stderr/JSONL + file logging.
- Other ranks: file-only logging (suppress stderr/JSONL sink) unless
--mpi-log-all. - When
--mpi-log-allis set, usespdlog::set_pattern(fmt::format("[rank {}] %+", world.rank()))to prefix log messages with rank number. FORGE_TUI_START/FORGE_TUI_EXITcalled on rank 0 only.
Exit codes: Use ForgeExitCode enum. --help → Success (0).
Important: Fix po::notify(vm) ordering — currently called BEFORE --help
check in all 3 MPI executables. Must be moved AFTER --help/--version checks
(matching primary executables) to avoid throwing on missing required args.
Signal handling: forge_install_signal_handlers() on all ranks. Note: in MPI
environments, mpirun sends SIGINT to all processes. If g_should_stop is
detected and ranks exit at different times while others are blocked in MPI
collectives, MPI_Finalize() may hang. Document this as a known limitation;
recommend mpirun --signal SIGINT for clean shutdown.
Progress tracking: Rank 0 creates progress bars via PG_PROGRESS_ADD and
passes recon_bar_idx/iter_offset to reconSolve. Other ranks pass
PG_NO_PARENT_BAR.
Image dims: FORGE_SET_IMAGE_DIMS(Nx, Ny, Nz) on rank 0.
5. Replace std::cout with structured logging¶
All std::cout statements in MPI executables and helper classes replaced with
FORGE_DEBUG/FORGE_INFO. Remove using namespace arma and using namespace std.
Note: forgeSenseMPI.cpp has more std::cout instances than the other two —
all must be converted, gated to rank 0 where appropriate.
Testing¶
Build verification¶
cmake -B build_mpi -S . -DMETAL_COMPUTE=ON -DOPENACC_GPU=OFF -DMPISupport=ON
cmake --build build_mpi -j4
All MPI targets must compile and link without errors.
Existing test suites¶
cpu_tests and metal_tests must continue to pass — the op_circshift removal
and fftshift.hpp argument swap affect shared code.
CLI tests¶
Add [CLI_MPI] tagged tests:
forgeSenseMPI --versionexits 0 with correct version stringforgeSenseMPI --helpexits 0forgeSenseMPI --device auto --no-jsonl -i /nonexistent.h5 -o /tmp/ -F NUFFTexits with validation error (2)
Single-process execution (no mpirun needed).
Functional test¶
mpirun -np 2 forgeSenseMPI --version — verify both ranks print version and exit 0.
Documentation¶
- README.md: Add MPI section with
--device auto,--mpi-log-all. - CHANGELOG.md: MPI modernization entry,
op_circshiftremoval note. - CLAUDE.md: Note MPI executables are modernized.
Files Modified¶
| File | Change |
|---|---|
Support/ArmaExtensions/arma_extensions.h |
Remove op_circshift/fn_circshift includes |
forge/FFT/fftshift.hpp |
Swap circshift argument order to match Armadillo native |
forge/Core/forgeCol.hpp |
Add Boost.Serialization serialize() method (guarded by ForgeMPI) |
forge/Core/forgeMat.hpp |
Add Boost.Serialization serialize() method (guarded by ForgeMPI) |
apps/MPI/forgeSenseMPI.cpp |
Full modernization: types, CLI, logging, exit codes, signals |
apps/MPI/forgePcSenseMPI.cpp |
Same |
apps/MPI/forgePcSenseMPI_TS.cpp |
Same |
apps/MPI/mpipcSENSE.h |
Update operator interface to forgeCol, extern template, logging |
apps/MPI/mpipcSENSE.cpp |
Update types (Armadillo at MPI boundary), instantiation, logging |
apps/MPI/mpipcSENSETimeSeg.h |
Same as mpipcSENSE.h |
apps/MPI/mpipcSENSETimeSeg.cpp |
Same as mpipcSENSE.cpp |
forge/Tests/CLITests.cpp |
Add MPI CLI tests |
README.md |
MPI documentation |
CHANGELOG.md |
MPI modernization + op_circshift removal |
CLAUDE.md |
Update MPI status |
Files deleted¶
| File | Reason |
|---|---|
Support/ArmaExtensions/op_circshift_bones.hpp |
Replaced by Armadillo native |
Support/ArmaExtensions/op_circshift_meat.hpp |
Replaced by Armadillo native |
Support/ArmaExtensions/fn_circshift.hpp |
Replaced by Armadillo native |
Files reviewed, unchanged¶
| File | Reason |
|---|---|
forge/Core/ForgeIncludes.h |
Still includes arma_extensions.h for MPI serialization support |
forge/Solvers/reconSolve.h |
Signature unchanged; MPI declarations updated to match |
apps/MPI/CMakeLists.txt |
Build targets unchanged |