MPI Modernization Implementation Plan¶
For agentic workers: REQUIRED: Use superpowers:subagent-driven-development (if subagents available) or superpowers:executing-plans to implement this plan. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Fix MPI build errors and fully modernize all MPI executables and helper classes to match the primary executables' patterns (forge types, structured logging, CLI parity, exit codes, signal handling).
Architecture: Remove conflicting custom op_circshift (Armadillo native suffices). Add Boost.Serialization to forgeCol/forgeMat for MPI communication. Convert all MPI code from raw Armadillo to forge types. Add full modern CLI infrastructure to all 3 MPI executables.
Tech Stack: C++17, Boost.MPI, Boost.Serialization, Armadillo 15.2.3, spdlog, Catch2
Spec: docs/superpowers/specs/2026-03-16-mpi-modernization-design.md
Chunk 1: Shared Code Fixes (prerequisites for all builds)¶
Task 1: Remove custom op_circshift and update fftshift.hpp¶
Files:
- Modify: Support/ArmaExtensions/arma_extensions.h
- Modify: forge/FFT/fftshift.hpp
- Delete: Support/ArmaExtensions/op_circshift_bones.hpp
- Delete: Support/ArmaExtensions/op_circshift_meat.hpp
- Delete: Support/ArmaExtensions/fn_circshift.hpp
Context: Armadillo 15.2.3 natively defines arma::op_circshift and
arma::circshift(). The custom versions in Support/ArmaExtensions/ conflict,
causing a redefinition error when both are in scope (MPI builds). The custom
circshift(X, dim, shift) has swapped argument order vs Armadillo's native
circshift(X, shift, dim).
- [ ] Step 1: Remove circshift includes from arma_extensions.h
In Support/ArmaExtensions/arma_extensions.h, remove lines 36-42:
namespace arma {
#include "op_circshift_bones.hpp"
#include "fn_circshift.hpp"
#include "op_circshift_meat.hpp"
//#include "permute.hpp"
}
Replace with just the permute.hpp import (if it's needed — it's commented out,
so just remove the whole namespace arma {} block).
- [ ] Step 2: Update fftshift.hpp call sites
In forge/FFT/fftshift.hpp, change line 58 from:
T1 out = circshift(X, dim, std::floor(size / 2));
T1 out = arma::circshift(X, static_cast<arma::sword>(std::floor(size / 2)), dim);
Change line 65 from:
T1 out = circshift(circshift(X, 0, std::floor(X.n_rows / 2)), 1, std::floor(X.n_cols / 2));
T1 out = arma::circshift(
arma::circshift(X, static_cast<arma::sword>(std::floor(X.n_rows / 2)), 0),
static_cast<arma::sword>(std::floor(X.n_cols / 2)), 1);
Note: arma::circshift takes sword (signed word) for the shift amount. The
explicit namespace prefix avoids ambiguity with any lingering using namespace arma.
- [ ] Step 3: Delete the custom circshift files
git rm Support/ArmaExtensions/op_circshift_bones.hpp
git rm Support/ArmaExtensions/op_circshift_meat.hpp
git rm Support/ArmaExtensions/fn_circshift.hpp
- [ ] Step 4: Build and run ALL tests (non-MPI)
cmake --build build --target cpu_tests --target metal_tests -j4
./build/cpu_tests '~[Benchmark]'
./build/metal_tests '~[Benchmark]'
- [ ] Step 5: Commit
git add Support/ArmaExtensions/arma_extensions.h forge/FFT/fftshift.hpp
git commit -m "fix: remove custom op_circshift, use Armadillo native circshift"
Task 2: Add Boost.Serialization support to forgeCol and forgeMat¶
Files:
- Modify: forge/Core/forgeCol.hpp
- Modify: forge/Core/forgeMat.hpp
- [ ] Step 1: Add serialize() to forgeCol
In forge/Core/forgeCol.hpp, add inside the forgeCol class (near the end,
before the closing };). Guard with #ifdef ForgeMPI:
#ifdef ForgeMPI
/// @brief Boost.Serialization support for MPI communication.
template <class Archive>
void serialize(Archive& ar, const unsigned int /*version*/)
{
arma::uword len = n_elem;
ar & len;
if (Archive::is_loading::value) {
set_size(len);
}
ar & boost::serialization::make_array(mem, n_elem);
}
#endif
Also add the necessary Boost.Serialization includes, guarded by #ifdef ForgeMPI,
near the top of the file (after existing includes):
#ifdef ForgeMPI
#include <boost/serialization/array.hpp>
#include <boost/serialization/serialization.hpp>
#endif
- [ ] Step 2: Add serialize() to forgeMat
Same pattern in forge/Core/forgeMat.hpp:
#ifdef ForgeMPI
/// @brief Boost.Serialization support for MPI communication.
template <class Archive>
void serialize(Archive& ar, const unsigned int /*version*/)
{
arma::uword nr = n_rows;
arma::uword nc = n_cols;
ar & nr;
ar & nc;
if (Archive::is_loading::value) {
set_size(nc, nr); // note: set_size takes (nCols, nRows)
}
ar & boost::serialization::make_array(mem, n_rows * n_cols);
}
#endif
Add the same guarded includes near the top.
- [ ] Step 3: Build non-MPI tests to verify no regressions
cmake --build build --target cpu_tests -j4
./build/cpu_tests '~[Benchmark]'
- [ ] Step 4: Build MPI to verify serialization compiles
cmake -B build_mpi -S . -DMETAL_COMPUTE=ON -DOPENACC_GPU=OFF -DMPISupport=ON
cmake --build build_mpi -j4 2>&1 | grep "error:" | head -10
Note: forgeCol::n_elem and forgeMat::n_rows/n_cols are const members. The
serialize() method uses a local copy for the ar & operation, then calls
set_size() on load. Verify that set_size() works on a const-member class —
if n_elem is truly const and set_size modifies it via placement-new or similar,
this will work. If not, you may need to use const_cast or a different approach.
Check the actual forgeCol::set_size() implementation.
- [ ] Step 5: Commit
git add forge/Core/forgeCol.hpp forge/Core/forgeMat.hpp
git commit -m "feat: add Boost.Serialization support to forgeCol and forgeMat for MPI"
Chunk 2: MPI Helper Class Modernization¶
Task 3: Modernize mpipcSENSE types and operator signatures¶
Files:
- Modify: apps/MPI/mpipcSENSE.h
- Modify: apps/MPI/mpipcSENSE.cpp
-
[ ] Step 1: Update mpipcSENSE.h
-
Replace
typedef std::complex<T1> CxT1;withtypedef forgeComplex<T1> CxT1; -
Convert member variables from Armadillo to forge types:
Mat<CxT1> SMap→forgeMat<CxT1> SMapMat<T1> PMap→forgeMat<T1> PMapCol<T1> FMap→forgeCol<T1> FMapMat<T1> Kx, Ky, Kz, Tvec→forgeMat<T1> Kx, Ky, Kz, TvecCol<T1> Ix, Iy, Iz→forgeCol<T1> Ix, Iy, Iz-
Col<uword> shotList, coilList→ keep asarma::Col<arma::uword>(MPI serializes these) -
Update operator signatures:
forgeCol<CxT1> operator*(const forgeCol<CxT1>& d) const; forgeCol<CxT1> operator/(const forgeCol<CxT1>& d) const; -
Update constructor signature to accept forge types:
mpipcSENSE(forgeCol<T1> kx, forgeCol<T1> ky, forgeCol<T1> kz, uword nx, uword ny, uword nz, uword nc, forgeCol<T1> t, forgeCol<CxT1> SENSEmap, forgeCol<T1> FieldMap, forgeCol<T1> ShotPhaseMap, bmpi::environment& en, bmpi::communicator& wor); -
Update extern template declaration to match current
reconSolvesignature:extern template forgeCol<forgeComplex<float>> reconSolve( forgeCol<forgeComplex<float>>, mpipcSENSE<float>&, QuadPenalty<float>, arma::Col<float>, arma::Col<float>, arma::Col<float>, arma::uword, arma::uword, arma::uword, arma::Col<float>, arma::uword, size_t, size_t); -
[ ] Step 2: Update mpipcSENSE.cpp
-
Update operator* and operator/ implementations:
- Change return types and parameter types to
forgeCol<forgeComplex<T1>> - Internal computation: the operators use Armadillo expression templates
heavily (%, , exp, conj, vectorise, reshape). Keep internal computation
in Armadillo types by extracting views via
.getArma()/.getArmaComplex()at the top of operator and operator/, then wrap the final result back into forge types before returning. - Armadillo expression template pitfall: Never use
autoto capture expressions involving.getArmaComplex()— must force evaluation into namedarma::Col<std::complex<T>>variables. -
For MPI gather/broadcast: with Boost.Serialization now on forge types, communication can use forge types directly. However, if the internal computation already uses Armadillo temporaries (
Mat<complex<T1>>), those can be gathered/broadcast as-is since they already have serialization support viaARMA_EXTRAmacros. Use whichever is most natural at each communication point. -
Update explicit template instantiation:
template forgeCol<forgeComplex<float>> reconSolve( forgeCol<forgeComplex<float>>, mpipcSENSE<float>&, QuadPenalty<float>, arma::Col<float>, arma::Col<float>, arma::Col<float>, arma::uword, arma::uword, arma::uword, arma::Col<float>, arma::uword, size_t, size_t); -
Replace all
std::coutwithFORGE_DEBUG:// Old: std::cout << "Task Number = " << tasks << std::endl; // New: FORGE_DEBUG("Task Number = {}", tasks); -
[ ] Step 3: Verify CMakeLists links Boost.MPI and Boost.Serialization
Check that apps/MPI/CMakeLists.txt links Boost::mpi and Boost::serialization.
These may already be pulled in transitively through ForgeCommon (the main
CMakeLists.txt adds them to LIBS). If not, add:
target_link_libraries(forgePcSenseMPI ${mpiLIBS} ForgeCommon Boost::mpi Boost::serialization)
- [ ] Step 4: Build MPI to verify
cmake --build build_mpi -j4 2>&1 | grep "error:" | head -20
- [ ] Step 5: Commit
git add apps/MPI/mpipcSENSE.h apps/MPI/mpipcSENSE.cpp
git commit -m "refactor(mpipcSENSE): modernize to forge types, update reconSolve signature"
Task 4: Modernize mpipcSENSETimeSeg types and operator signatures¶
Files:
- Modify: apps/MPI/mpipcSENSETimeSeg.h
- Modify: apps/MPI/mpipcSENSETimeSeg.cpp
Same pattern as Task 3. Key differences:
- Has additional GObj (Gnufft) and AObj (TimeSegmentation) members
- Constructor takes extra L and type parameters
- Same MPI gather/broadcast patterns
- [ ] Step 1: Update mpipcSENSETimeSeg.h (same type conversions as Task 3)
- [ ] Step 2: Update mpipcSENSETimeSeg.cpp (same pattern, update explicit instantiation)
- [ ] Step 3: Build MPI to verify
- [ ] Step 4: Commit
git add apps/MPI/mpipcSENSETimeSeg.h apps/MPI/mpipcSENSETimeSeg.cpp
git commit -m "refactor(mpipcSENSETimeSeg): modernize to forge types, update reconSolve signature"
Chunk 3: MPI Executable Modernization¶
Common pattern for all 3 executables. Each gets:
- New includes:
ForgeExitCodes.hpp,ForgeLog.hpp,SignalHandler.hpp,<cstdio> - Full modern CLI:
--version,--device(string, default "auto"),--log-level,--no-tui,--no-jsonl,--jsonl-output,--mpi-log-all,--voxel-size,--no-orient po::notify()moved AFTER--help/--versionchecks- FORGE_LOG_INIT + PG_TUI_SPAWN_MODE (rank 0 only for TUI)
- Rank-based logging suppression (non-rank-0 gets file-only unless
--mpi-log-all) forge_install_signal_handlers()on all ranks- ForgeExitCode enum for all return values
- Encoding limits: raw
.maximumwith<=loops (remove+1convention) - All
std::cout→FORGE_DEBUG/FORGE_INFO - Convert local variables to forge types
- Progress tracking on rank 0
- FORGE_SET_IMAGE_DIMS, NIfTI orientation support
Task 5: Modernize forgeSenseMPI.cpp¶
Files:
- Modify: apps/MPI/forgeSenseMPI.cpp
- [ ] Step 1: Add new includes
After existing includes, add:
#include "../forge/Core/ForgeExitCodes.hpp"
#include "../forge/Core/ForgeLog.hpp"
#include "../forge/Core/SignalHandler.hpp"
#include <cstdio>
- [ ] Step 2: Add full modern CLI options
Replace the current desc.add_options() block with the full set matching primary
executables plus MPI-specific flags. Add: --version, --device (string,
default "auto"), --log-level, --no-tui, --no-jsonl, --jsonl-output,
--mpi-log-all, --voxel-size, --no-orient.
- [ ] Step 3: Fix po::notify ordering
Move po::notify(vm) AFTER --help/--version checks:
po::store(po::parse_command_line(argc, argv, desc), vm);
if (vm.count("help")) {
std::cout << desc << std::endl;
return static_cast<int>(ForgeExitCode::Success);
}
if (vm.count("version")) {
std::cout << "FORGE v" << FORGE_VERSION_STRING << " (forgeSenseMPI) built "
<< __DATE__ << std::endl;
return static_cast<int>(ForgeExitCode::Success);
}
po::notify(vm);
- [ ] Step 4: Add logging and signal handler initialization
After po::notify, add the JSONL flag processing block (same as primary
executables), then:
FORGE_LOG_INIT(vm["log-level"].as<std::string>(), "forge.log", no_jsonl, jsonl_output);
PG_TUI_SPAWN_MODE(!no_tui, true);
// Suppress non-rank-0 stderr output unless --mpi-log-all
if (world.rank() != 0 && !vm["mpi-log-all"].as<bool>()) {
// Reinit with file-only logging (no stderr/JSONL sink)
auto level = spdlog::get("powergrid") ? spdlog::get("powergrid")->level() : spdlog::level::info;
auto file_sink = std::make_shared<spdlog::sinks::rotating_file_sink_mt>(
"forge.log", 10UL * 1024UL * 1024UL, 3);
file_sink->set_level(spdlog::level::debug);
auto logger = std::make_shared<spdlog::logger>("powergrid",
spdlog::sinks_init_list{file_sink});
logger->set_level(level);
spdlog::set_default_logger(logger);
}
if (world.rank() == 0) {
FORGE_TUI_START("forgeSenseMPI");
}
forge_install_signal_handlers();
- [ ] Step 5: Add --device auto handling
std::string device_str = vm["device"].as<std::string>();
if (device_str == "auto") {
#ifdef OPENACC_GPU
int ngpus = acc_get_num_devices(acc_device_nvidia);
if (ngpus > 0) {
int gpunum = world.rank() % ngpus;
acc_set_device_num(gpunum, acc_device_nvidia);
FORGE_INFO("Rank {}: using GPU device {} (auto, {} GPUs available)",
world.rank(), gpunum, ngpus);
} else {
acc_set_device_type(acc_device_host);
FORGE_INFO("Rank {}: no GPUs available, using host", world.rank());
}
#elif defined(METAL_COMPUTE)
FORGE_INFO("Rank {}: using Metal GPU backend (device 0)", world.rank());
#else
FORGE_INFO("Rank {}: CPU-only build", world.rank());
#endif
} else {
int device = std::stoi(device_str);
if (device < 0) {
FORGE_ERROR("Invalid --device value: {}. Must be >= 0.", device);
return static_cast<int>(ForgeExitCode::ValidationError);
}
#ifdef OPENACC_GPU
acc_set_device_num(device, acc_device_default);
#endif
}
- [ ] Step 6: Convert local variables to forge types
Key variables that MUST change (will not compile otherwise):
- ix, iy, iz → forgeCol<float> (required by initImageSpaceCoords)
- data → forgeCol<forgeComplex<float>> (required by reconSolve first param)
- ImageTemp → forgeCol<forgeComplex<float>> (return type of reconSolve)
- FM, sen → forgeCol<float>, forgeCol<forgeComplex<float>> (required by
processISMRMRDInput)
Variables that CAN remain Armadillo (reconSolve still takes Col<T1> for these):
- kx, ky, kz, tvec — Col<float> is fine
- senSlice, fmSlice — Armadillo types are fine (SENSE constructor takes them)
Armadillo expression template pitfall (from MEMORY.md): Never use auto to
capture getArmaComplex() * getArma() — must use explicit
arma::Col<std::complex<T>> result = ... to force evaluation before temporaries
are destroyed.
Update processISMRMRDInput<float>() call to match primary executable pattern.
- [ ] Step 7: Fix encoding limits convention
Change all encoding limit extractions from +1 to raw .maximum:
int NShotMax = hdr.encoding[0].encodingLimits.kspace_encoding_step_1->maximum;
int NParMax = hdr.encoding[0].encodingLimits.kspace_encoding_step_2->maximum;
// etc.
Update loop bounds from < NSliceMax to <= NSliceMax (matching primary executables).
-
[ ] Step 8: Replace all std::cout with FORGE_DEBUG/FORGE_INFO
-
[ ] Step 9: Add progress tracking (rank 0 only)
if (world.rank() == 0) {
FORGE_SET_IMAGE_DIMS(Nx, Ny, Nz);
// Create progress bar and pass recon_bar_idx to reconSolve
}
- [ ] Step 10: Add NIfTI orientation support
Same pattern as primary executables: extractNiftiOrientation, --voxel-size
parsing, --no-orient, pass orient to writeNiftiMagPhsImage.
- [ ] Step 11: Add FORGE_TUI_EXIT
if (world.rank() == 0) {
FORGE_TUI_EXIT(static_cast<int>(ForgeExitCode::Success));
}
return static_cast<int>(ForgeExitCode::Success);
- [ ] Step 12: Build MPI to verify
cmake --build build_mpi --target forgeSenseMPI -j4 2>&1 | tail -10
- [ ] Step 13: Commit
git add apps/MPI/forgeSenseMPI.cpp
git commit -m "feat(forgeSenseMPI): full modernization — forge types, CLI, logging, signals"
Task 6: Modernize forgePcSenseMPI.cpp¶
Files:
- Modify: apps/MPI/forgePcSenseMPI.cpp
Same pattern as Task 5. Key differences:
- Uses mpipcSENSE<float> operator (MPI-distributed)
- Has broadcast-then-reconstruct pattern (rank 0 reads data, broadcasts to all)
- Encoding limits already use raw .maximum (correct). Loops already use <=
(correct) except for NRep < NRepMax + 1 at line 204 — normalize to
NRep <= NRepMax for consistency. This is NOT a bug fix (functionally
equivalent), just a style normalization.
- [ ] Steps 1-13: Same as Task 5, adapted for forgePcSenseMPI patterns.
git add apps/MPI/forgePcSenseMPI.cpp
git commit -m "feat(forgePcSenseMPI): full modernization — forge types, CLI, logging, signals"
Task 7: Modernize forgePcSenseMPI_TS.cpp¶
Files:
- Modify: apps/MPI/forgePcSenseMPI_TS.cpp
Same pattern as Task 5. Key differences:
- Uses NON-MPI pcSenseTimeSeg<float> operator (not mpipcSENSETimeSeg)
- Has same task-distribution pattern as forgeSenseMPI
- Encoding limits use +1 convention — normalize
- [ ] Steps 1-13: Same as Task 5.
git add apps/MPI/forgePcSenseMPI_TS.cpp
git commit -m "feat(forgePcSenseMPI_TS): full modernization — forge types, CLI, logging, signals"
Chunk 4: Testing, Documentation, Verification¶
Task 8: Verify full MPI build¶
- [ ] Step 1: Clean MPI build
rm -rf build_mpi
cmake -B build_mpi -S . -DMETAL_COMPUTE=ON -DOPENACC_GPU=OFF -DMPISupport=ON
cmake --build build_mpi -j4 2>&1 | tail -20
- [ ] Step 2: Verify non-MPI tests still pass
cmake --build build --target cpu_tests --target metal_tests -j4
./build/cpu_tests '~[Benchmark]'
./build/metal_tests '~[Benchmark]'
- [ ] Step 3: Verify MPI executables run
./build_mpi/forgeSenseMPI --version 2>&1
./build_mpi/forgePcSenseMPI --version 2>&1
./build_mpi/forgePcSenseMPI_TS --version 2>&1
FORGE v0.10.0 ({binary}) built ...
./build_mpi/forgeSenseMPI --help 2>&1 | head -5
- [ ] Step 4: Commit any build fixes
Task 9: Add CLI tests for MPI executables¶
Files:
- Modify: forge/Tests/CLITests.cpp (or apps/MPI/CMakeLists.txt for paths)
- [ ] Step 1: Add MPI binary path defines
In CMakeLists.txt, add compile definitions for MPI test binary paths (similar
to existing FORGE_SENSE_PATH etc.), or use runtime detection.
- [ ] Step 2: Add CLI tests
#ifdef FORGE_SENSE_MPI_PATH
TEST_CASE("forgeSenseMPI --version exits 0", "[CLI_MPI]")
{
auto [code, output] = run_command(
std::string(FORGE_SENSE_MPI_PATH) + " --version");
REQUIRE(code == 0);
REQUIRE(output.find("FORGE v") != std::string::npos);
}
TEST_CASE("forgeSenseMPI --help exits 0", "[CLI_MPI]")
{
auto [code, output] = run_command(
std::string(FORGE_SENSE_MPI_PATH) + " --help");
REQUIRE(code == 0);
}
#endif
- [ ] Step 3: Build and run
cmake --build build_mpi --target cpu_tests -j4
./build_mpi/cpu_tests "[CLI_MPI]"
- [ ] Step 4: Commit
git add forge/Tests/CLITests.cpp CMakeLists.txt
git commit -m "test: add CLI tests for MPI executables"
Task 10: Update documentation¶
Files:
- Modify: README.md
- Modify: CHANGELOG.md
- Modify: CLAUDE.md
- [ ] Step 1: Update README.md
Add an MPI section documenting:
- --device auto (rank-based GPU assignment) vs explicit device selection
- --mpi-log-all for multi-rank logging
- Example: mpirun -np 4 forgeSenseMPI --device auto --no-tui [args]
- [ ] Step 2: Update CHANGELOG.md
Add under [Unreleased]:
### Fixed
- MPI build: removed custom op_circshift (conflicts with Armadillo 15.2.3 native).
- MPI build: updated reconSolve extern template signatures to match modern forgeCol types.
- MPI executables: fixed po::notify() ordering (was before --help check).
- MPI executables: fixed encoding limit double-counting bug in forgePcSenseMPI.
### Changed
- MPI executables fully modernized: forge types, ForgeLog structured logging,
ForgeExitCodes, signal handling, --version/--device/--log-level/--no-tui/
--no-jsonl/--jsonl-output/--mpi-log-all flags.
- Added Boost.Serialization support to forgeCol and forgeMat for MPI communication.
- fftshift.hpp: uses Armadillo native circshift() (argument order: shift, dim).
- [ ] Step 3: Update CLAUDE.md
Update the Active Branches section and note MPI executables are modernized.
- [ ] Step 4: Commit
git add README.md CHANGELOG.md CLAUDE.md
git commit -m "docs: document MPI modernization and op_circshift removal"
Task 11: Final verification¶
- [ ] Step 1: Run clang-format
find forge apps tools \( -name "*.h" -o -name "*.hpp" -o -name "*.cpp" \) ! -name "*.mm" -print0 | xargs -0 clang-format -i
find forge apps tools \( -name "*.h" -o -name "*.hpp" -o -name "*.cpp" \) ! -name "*.mm" -print0 | xargs -0 clang-format --dry-run --Werror 2>&1
Commit if needed: git commit -m "style: apply clang-format"
- [ ] Step 2: Full non-MPI test run
./build/cpu_tests '~[Benchmark]'
./build/metal_tests '~[Benchmark]'
- [ ] Step 3: Full MPI build verification
cmake --build build_mpi -j4 2>&1 | grep "error:" | head -5
- [ ] Step 4: MPI executable smoke test
./build_mpi/forgeSenseMPI --version
./build_mpi/forgePcSenseMPI --version
./build_mpi/forgePcSenseMPI_TS --version