$subject_val = "Re: [OMPI users] Machinefile option in opempi-1.3.2"; include("../../include/msg-header.inc"); ?>
Subject: Re: [OMPI users] Machinefile option in opempi-1.3.2
From: Rajesh Sudarsan (rsudarsan_at_[hidden])
Date: 2009-06-20 00:24:59
Hi Ralph,
Thanks for the reply. The default mapper does round-robin assignment
as long as I do not specify the machinefile in the following format:
n1
n2
n2
n1 where, n1 and n2 are two nodes in the cluster and I use two
slots within each node.
I have pasted the output and the display map for execution on 2, 4,8
and 16 processors. The mapper does not use the nodes in which it is
listed in the file.
The machinefile that I tested with uses two nodes n105 and n106 with 8
cores in each node.
n105
n105
n105
n105
n106
n106
n106
n106
n106
n106
n106
n106
n105
n105
n105
n105
When I run a hello world program on 2 processors which prints the
hostname, the output and the display map are as follows:
$ mpiexec --display-map -machinefile m3 -np 2 ./hello
======================== JOB MAP ========================
Data for node: Name: n106 Num procs: 2
Process OMPI jobid: [7838,1] Process rank: 0
Process OMPI jobid: [7838,1] Process rank: 1
=============================================================
Rank 0 is present in C version of Hello World...hostname = n106
Rank 1 of C version says: Hello world!..hostname = n106
On 4 processors the output is as follows
$ mpiexec --display-map -machinefile m3 -np 4 ./hello
======================== JOB MAP ========================
Data for node: Name: n106 Num procs: 4
Process OMPI jobid: [7294,1] Process rank: 0
Process OMPI jobid: [7294,1] Process rank: 1
Process OMPI jobid: [7294,1] Process rank: 2
Process OMPI jobid: [7294,1] Process rank: 3
=============================================================
Rank 0 is present in C version of Hello World...hostname = n106
Rank 1 of C version says: Hello world!..hostname = n106
Rank 3 of C version says: Hello world!..hostname = n106
Rank 2 of C version says: Hello world!..hostname = n106
On 8 processors the output is as follows:
$ mpiexec --display-map -machinefile m3 -np 8 ./hello
======================== JOB MAP ========================
Data for node: Name: n106 Num procs: 8
Process OMPI jobid: [7264,1] Process rank: 0
Process OMPI jobid: [7264,1] Process rank: 1
Process OMPI jobid: [7264,1] Process rank: 2
Process OMPI jobid: [7264,1] Process rank: 3
Process OMPI jobid: [7264,1] Process rank: 4
Process OMPI jobid: [7264,1] Process rank: 5
Process OMPI jobid: [7264,1] Process rank: 6
Process OMPI jobid: [7264,1] Process rank: 7
=============================================================
Rank 3 of C version says: Hello world!..hostname = n106
Rank 7 of C version says: Hello world!..hostname = n106
Rank 0 is present in C version of Hello World...hostname = n106
Rank 2 of C version says: Hello world!..hostname = n106
Rank 4 of C version says: Hello world!..hostname = n106
Rank 6 of C version says: Hello world!..hostname = n106
Rank 5 of C version says: Hello world!..hostname = n106
Rank 1 of C version says: Hello world!..hostname = n106
On 16 nodes the output is as follows:
$ mpiexec --display-map -machinefile m3 -np 16 ./hello
======================== JOB MAP ========================
Data for node: Name: n106 Num procs: 8
Process OMPI jobid: [7266,1] Process rank: 0
Process OMPI jobid: [7266,1] Process rank: 1
Process OMPI jobid: [7266,1] Process rank: 2
Process OMPI jobid: [7266,1] Process rank: 3
Process OMPI jobid: [7266,1] Process rank: 4
Process OMPI jobid: [7266,1] Process rank: 5
Process OMPI jobid: [7266,1] Process rank: 6
Process OMPI jobid: [7266,1] Process rank: 7
Data for node: Name: n105 Num procs: 8
Process OMPI jobid: [7266,1] Process rank: 8
Process OMPI jobid: [7266,1] Process rank: 9
Process OMPI jobid: [7266,1] Process rank: 10
Process OMPI jobid: [7266,1] Process rank: 11
Process OMPI jobid: [7266,1] Process rank: 12
Process OMPI jobid: [7266,1] Process rank: 13
Process OMPI jobid: [7266,1] Process rank: 14
Process OMPI jobid: [7266,1] Process rank: 15
=============================================================
Rank 10 of C version says: Hello world!..hostname = n105
Rank 12 of C version says: Hello world!..hostname = n105
Rank 13 of C version says: Hello world!..hostname = n105
Rank 14 of C version says: Hello world!..hostname = n105
Rank 0 is present in C version of Hello World...hostname = n106
Rank 1 of C version says: Hello world!..hostname = n106
Rank 3 of C version says: Hello world!..hostname = n106
Rank 6 of C version says: Hello world!..hostname = n106
Rank 7 of C version says: Hello world!..hostname = n106
Rank 15 of C version says: Hello world!..hostname = n105
Rank 8 of C version says: Hello world!..hostname = n105
Rank 11 of C version says: Hello world!..hostname = n105
Rank 4 of C version says: Hello world!..hostname = n106
Rank 2 of C version says: Hello world!..hostname = n106
Rank 5 of C version says: Hello world!..hostname = n106
Rank 9 of C version says: Hello world!..hostname = n105
Thanks,
Rajesh
On Fri, Jun 19, 2009 at 10:40 PM, Ralph Castain<rhc_at_[hidden]> wrote:
> If you do "man orte_hosts", you'll see a full explanation of how the various
> machinefile options work.
> The default mapper doesn't do any type of sorting - it is a round-robin
> mapper that just works its way through the provided nodes. We don't reorder
> them in any way.
> However, it does depend on the number of slots we are told each node has, so
> that might be what you are encountering. If you do a --display-map and send
> it along, I might be able to spot the issue.
> Thanks
> Ralph
>
> On Fri, Jun 19, 2009 at 1:35 PM, Rajesh Sudarsan <rsudarsan_at_[hidden]>
> wrote:
>>
>> Hi,
>>
>> I tested a simple hello world program on 5 nodes each with dual
>> quad-core processors. I noticed that openmpi does not always follow
>> the order of the processors indicated in the machinefile. Depending
>> upon the number of processors requested, openmpi does some type of
>> sorting to find the best node fit for a particular job and runs on
>> them. Is there a way to make openmpi to turn off this sorting and
>> strictly follow the order indicated in the machinefile?
>>
>> mpiexec supports three options to specify the machinefile -
>> default-machinefile, hostfile, and machinefile. Can anyone tell what
>> is the difference between these three options?
>>
>> Any help would be greatly appreciated.
>>
>> Thanks,
>> Rajesh
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>