Wednesday, 21 December 2011

Compiling Rmpi and doMPI for Windows HPC’s MSMPI

For 32-bit only:

To get the Rmpi and doMPI packages working on Windows HPC, using Micrsoft’s MSMPI libraries:

PKG_CFLAGS   = -I"C:\Program Files\Microsoft HPC Pack 2008 SDK\Include" -DMPI2 -DWin32 "-D__int64=long long"
PKG_LIBS     = -L"C:\Program Files\Microsoft HPC Pack 2008 SDK\Lib\i386" -L"C:\Program Files\Microsoft HPC Pack 2008 SDK\Lib\amd64" -lmsmpi

  • Open an administrative command prompt (so that your R installation can be updated) and do the following:

cd <folder above the extracted packages>

set path=c:\Rtools\bin;c:\Rtools\MinGW\bin;c:\Rtools\MinGW64\bin;%PATH%

"c:\Program Files\R\R-2.14.0\bin\R.exe" --vanilla CMD INSTALL --build Rmpi

"c:\Program Files\R\R-2.14.0\bin\R.exe" --vanilla CMD INSTALL –build doMPI

  • Copy the zip files created, Rmpi_0.6-0.zip and doMPI_0.1-5.zip, to your cluster head node and pass the file paths to R’s install.packages function together with “foreach”.
  • library(doMPI) should then state that it has loaded Rmpi.

For 64-bit:

That all worked for i386 (32-bit), but I got an access violation when trying to load the Rmpi package for x64. So instead, I built the Rmpi sources using the Visual C++ 2010 compiler (some fixing required), and dropped the new dlls over the top of the ones installed in the R library folder.

Using parallel foreach on Windows HPC

MSMPI (Microsoft’s implementation of MPI) doesn’t support spawning (at least not in the 2008 R2 version), so you need to use the non-spawning method:

library(doMPI)

cl <- startMPIcluster()

registerDoMPI(cl)

… use of foreach and %dopar% …

closeCluster(cl)

mpi.quit()

Finally, you queue the job using mpiexec on R (or a batch file that calls R).

To run one worker per core:

Set the job resource type to core, and set the minimum and maximum number of cores for the task to the number of cores on your worker nodes.

Use the command mpiexec –n * myRrunner.bat

To run one worker per node:

Set the job resource type to node, and set the minimum and maximum number of nodes for the task to the number of nodes in your cluster.

Use the command mpiexec –cores 1 myRrunner.bat