Subject: Re: [OMPI users] Machinefile option in opempi-1.3.2
From: Rajesh Sudarsan (rsudarsan_at_[hidden])
Date: 2009-06-20 10:57:27


Thanks Ralph. It worked.

Regards,
Rajesh

On Sat, Jun 20, 2009 at 10:28 AM, Ralph Castain<rhc_at_[hidden]> wrote:
> Ah, yes - that is definitely true. What you need to use is the "seq" (for
> "sequential") mapper. Do the following on your cmd line:
> --hostfile hostfile -mca rmaps seq
> This will cause OMPI to map the process ranks according to the order in the
> hostfile. You need to specify one line for each node/rank, just as you have
> done.
> Ralph
>
> On Fri, Jun 19, 2009 at 10:24 PM, Rajesh Sudarsan <rsudarsan_at_[hidden]>
> wrote:
>>
>> Hi Ralph,
>>
>> Thanks for the reply.  The default mapper does round-robin assignment
>> as long as I do not specify the machinefile in the following format:
>>
>> n1
>> n2
>> n2
>> n1    where, n1 and n2 are two nodes in the cluster and I use two
>> slots within each node.
>>
>>
>> I have pasted the output and the display map for execution on 2, 4,8
>> and 16 processors. The mapper does not use the nodes in which it is
>> listed in the file.
>>
>> The machinefile that I tested with uses two nodes n105 and n106 with 8
>> cores in each node.
>>
>> n105
>> n105
>> n105
>> n105
>> n106
>> n106
>> n106
>> n106
>> n106
>> n106
>> n106
>> n106
>> n105
>> n105
>> n105
>> n105
>>
>> When I run a hello world program on 2 processors which prints the
>> hostname, the output and the display map are  as follows:
>>
>>
>> $ mpiexec --display-map -machinefile m3 -np 2 ./hello
>>
>>  ========================   JOB MAP   ========================
>>
>>  Data for node: Name: n106      Num procs: 2
>>        Process OMPI jobid: [7838,1] Process rank: 0
>>        Process OMPI jobid: [7838,1] Process rank: 1
>>
>>  =============================================================
>> Rank 0 is present in C version of Hello World...hostname = n106
>>  Rank 1 of C version says: Hello world!..hostname = n106
>>
>>
>>
>>
>> On 4 processors the output is as follows
>>
>> $ mpiexec --display-map -machinefile m3 -np 4 ./hello
>>
>>  ========================   JOB MAP   ========================
>>
>>  Data for node: Name: n106      Num procs: 4
>>        Process OMPI jobid: [7294,1] Process rank: 0
>>        Process OMPI jobid: [7294,1] Process rank: 1
>>        Process OMPI jobid: [7294,1] Process rank: 2
>>        Process OMPI jobid: [7294,1] Process rank: 3
>>
>>  =============================================================
>> Rank 0 is present in C version of Hello World...hostname = n106
>>  Rank 1 of C version says: Hello world!..hostname = n106
>>  Rank 3 of C version says: Hello world!..hostname = n106
>>  Rank 2 of C version says: Hello world!..hostname = n106
>>
>>
>>
>>
>> On 8 processors the output is as follows:
>>
>> $ mpiexec --display-map -machinefile m3 -np 8 ./hello
>>
>>  ========================   JOB MAP   ========================
>>
>>  Data for node: Name: n106      Num procs: 8
>>        Process OMPI jobid: [7264,1] Process rank: 0
>>        Process OMPI jobid: [7264,1] Process rank: 1
>>        Process OMPI jobid: [7264,1] Process rank: 2
>>        Process OMPI jobid: [7264,1] Process rank: 3
>>        Process OMPI jobid: [7264,1] Process rank: 4
>>        Process OMPI jobid: [7264,1] Process rank: 5
>>        Process OMPI jobid: [7264,1] Process rank: 6
>>        Process OMPI jobid: [7264,1] Process rank: 7
>>
>>  =============================================================
>>  Rank 3 of C version says: Hello world!..hostname = n106
>>  Rank 7 of C version says: Hello world!..hostname = n106
>> Rank 0 is present in C version of Hello World...hostname = n106
>>  Rank 2 of C version says: Hello world!..hostname = n106
>>  Rank 4 of C version says: Hello world!..hostname = n106
>>  Rank 6 of C version says: Hello world!..hostname = n106
>>  Rank 5 of C version says: Hello world!..hostname = n106
>>  Rank 1 of C version says: Hello world!..hostname = n106
>>
>>
>>
>> On 16 nodes the output is as follows:
>>
>> $ mpiexec --display-map -machinefile m3 -np 16 ./hello
>>
>>  ========================   JOB MAP   ========================
>>
>>  Data for node: Name: n106      Num procs: 8
>>        Process OMPI jobid: [7266,1] Process rank: 0
>>        Process OMPI jobid: [7266,1] Process rank: 1
>>        Process OMPI jobid: [7266,1] Process rank: 2
>>        Process OMPI jobid: [7266,1] Process rank: 3
>>        Process OMPI jobid: [7266,1] Process rank: 4
>>        Process OMPI jobid: [7266,1] Process rank: 5
>>        Process OMPI jobid: [7266,1] Process rank: 6
>>        Process OMPI jobid: [7266,1] Process rank: 7
>>
>>  Data for node: Name: n105      Num procs: 8
>>        Process OMPI jobid: [7266,1] Process rank: 8
>>        Process OMPI jobid: [7266,1] Process rank: 9
>>        Process OMPI jobid: [7266,1] Process rank: 10
>>        Process OMPI jobid: [7266,1] Process rank: 11
>>        Process OMPI jobid: [7266,1] Process rank: 12
>>        Process OMPI jobid: [7266,1] Process rank: 13
>>        Process OMPI jobid: [7266,1] Process rank: 14
>>        Process OMPI jobid: [7266,1] Process rank: 15
>>
>>  =============================================================
>>  Rank 10 of C version says: Hello world!..hostname = n105
>>  Rank 12 of C version says: Hello world!..hostname = n105
>>  Rank 13 of C version says: Hello world!..hostname = n105
>>  Rank 14 of C version says: Hello world!..hostname = n105
>> Rank 0 is present in C version of Hello World...hostname = n106
>>  Rank 1 of C version says: Hello world!..hostname = n106
>>  Rank 3 of C version says: Hello world!..hostname = n106
>>  Rank 6 of C version says: Hello world!..hostname = n106
>>  Rank 7 of C version says: Hello world!..hostname = n106
>>  Rank 15 of C version says: Hello world!..hostname = n105
>>  Rank 8 of C version says: Hello world!..hostname = n105
>>  Rank 11 of C version says: Hello world!..hostname = n105
>>  Rank 4 of C version says: Hello world!..hostname = n106
>>  Rank 2 of C version says: Hello world!..hostname = n106
>>  Rank 5 of C version says: Hello world!..hostname = n106
>>  Rank 9 of C version says: Hello world!..hostname = n105
>>
>>
>>
>> Thanks,
>> Rajesh
>>
>>
>>
>>
>>
>> On Fri, Jun 19, 2009 at 10:40 PM, Ralph Castain<rhc_at_[hidden]> wrote:
>> > If you do "man orte_hosts", you'll see a full explanation of how the
>> > various
>> > machinefile options work.
>> > The default mapper doesn't do any type of sorting - it is a round-robin
>> > mapper that just works its way through the provided nodes. We don't
>> > reorder
>> > them in any way.
>> > However, it does depend on the number of slots we are told each node
>> > has, so
>> > that might be what you are encountering. If you do a --display-map and
>> > send
>> > it along, I might be able to spot the issue.
>> > Thanks
>> > Ralph
>> >
>> > On Fri, Jun 19, 2009 at 1:35 PM, Rajesh Sudarsan <rsudarsan_at_[hidden]>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> I tested a simple hello world program on 5 nodes each with dual
>> >> quad-core processors. I noticed that openmpi does not always follow
>> >> the order of the processors indicated in the machinefile. Depending
>> >> upon the number of processors requested, openmpi does some type of
>> >> sorting to find the best node fit for a particular job and runs on
>> >> them.  Is there a way to make openmpi to turn off this sorting and
>> >> strictly follow the order indicated in the machinefile?
>> >>
>> >> mpiexec supports three options to specify the machinefile -
>> >> default-machinefile, hostfile, and machinefile. Can anyone tell what
>> >> is the difference between these three options?
>> >>
>> >> Any help would be greatly appreciated.
>> >>
>> >> Thanks,
>> >> Rajesh
>> >> _______________________________________________
>> >> users mailing list
>> >> users_at_[hidden]
>> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> >
>> > _______________________________________________
>> > users mailing list
>> > users_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>