OpenMPI ignoring slots and max-slots in hostfile
I am using openmpi 1.4.5 and am working on a cluster of 16 nodes, each
node having 32 cores. Here is the command I use to invoke my program:
$ /path_to_mpi/bin/mpirun --prefix /path_to_mpi --hostfile openmpi_hosts
--display-allocation -np 16 my_program
Here is how openmpi_hosts looks like:
$ cat openmpi_hosts
amc01.cluster slots=1 max-slots=1
amc02.cluster slots=1 max-slots=1
amc03.cluster slots=1 max-slots=1
amc04.cluster slots=1 max-slots=1
nd01.cluster slots=1 max-slots=1
nd02.cluster slots=1 max-slots=1
nd03.cluster slots=1 max-slots=1
nd04.cluster slots=1 max-slots=1
nd05.cluster slots=1 max-slots=1
nd06.cluster slots=1 max-slots=1
nd07.cluster slots=1 max-slots=1
nd08.cluster slots=1 max-slots=1
nd09.cluster slots=1 max-slots=1
nd10.cluster slots=1 max-slots=1
nd11.cluster slots=1 max-slots=1
nd12.cluster slots=1 max-slots=1
However, I get the following allocation:
====================== ALLOCATED NODES ======================
Data for node: Name: nd02.cluster Num slots: 32 Max slots: 0
Data for node: Name: nd03.cluster Num slots: 32 Max slots: 0
Data for node: Name: nd04.cluster Num slots: 32 Max slots: 0
Data for node: Name: nd05.cluster Num slots: 32 Max slots: 0
Data for node: Name: nd06.cluster Num slots: 32 Max slots: 0
Data for node: Name: nd07.cluster Num slots: 32 Max slots: 0
Data for node: Name: nd08.cluster Num slots: 32 Max slots: 0
Data for node: Name: nd09.cluster Num slots: 32 Max slots: 0
Data for node: Name: nd10.cluster Num slots: 32 Max slots: 0
Data for node: Name: nd11.cluster Num slots: 32 Max slots: 0
Data for node: Name: nd12.cluster Num slots: 32 Max slots: 0
Data for node: Name: amc01.cluster Num slots: 32 Max slots: 0
Data for node: Name: amc02.cluster Num slots: 32 Max slots: 0
Data for node: Name: amc03.cluster Num slots: 32 Max slots: 0
Data for node: Name: amc04.cluster Num slots: 32 Max slots: 0
Data for node: Name: nd01.cluster Num slots: 32 Max slots: 0
=================================================================
When I run my_program on a different cluster (with same installation for
openmpi 1.4.5), I get "Num slots: 1 Max slots: 1" for each of the nodes in
that cluster. This is correct.
Questions:
Why is openmpi ignoring my request of slots and max-slots in the hostfile?
How do I get the correct (expected) allocation?
No comments:
Post a Comment