Subject: Re: [OMPI users] Bug in 1.3.2?: sm btl and isend is serializes
From: George Bosilca (bosilca_at_[hidden])
Date: 2009-06-19 16:55:59


Mark,

MPI does not impose any global order on the messages. The only
requirement is that between two peers on the same communicator the
messages (or at least the part required for the matching) is delivered
in order. This make both execution traces you sent with your original
email (shared memory and TCP) valid from the MPI perspective.

Moreover, MPI doesn't impose any order in the matching when ANY_SOURCE
is used. In Open MPI we do the matching _ALWAYS_ starting from rank 0
to n in the specified communicator. BEWARE: The remaining of this
paragraph is deep black magic of an MPI implementation internals. The
main difference between the behavior of SM and TCP here directly
reflect their eager size, 4K for SM and 64K for TCP. Therefore, for
your example, for TCP all your messages are eager messages (i.e. are
completely transfered to the destination process in just one go),
while for SM they all require a rendez-vous. This directly impact the
ordering of the messages on the receiver, and therefore the order of
the matching. However, I have to insist on this, this behavior is
correct based on the MPI standard specifications.

   george.

On Jun 19, 2009, at 13:28 , Mark Bolstad wrote:

>
> Thanks, but that won't help. In the real application the messages
> are at least 25,000 bytes long, mostly much larger.
>
> Thanks,
> Mark
>
>
> On Fri, Jun 19, 2009 at 1:17 PM, Eugene Loh <Eugene.Loh_at_[hidden]>
> wrote:
> Mark Bolstad wrote:
>
> I have a small test code that I've managed to duplicate the results
> from a larger code. In essence, using the sm btl with ISend, I wind
> up with the communication being completely serialized, i.e., all the
> calls from process 1 complete, then all from 2, ...
>
> I need to do some other stuff, but might spend time on this later.
> For now, I'll just observe that your sends are rendezvous sends.
> E.g., if you decrease BUFLEN from 25000 to 2500 (namely, from over
> 4K to under 4K), the behavior should change (to what you'd expect).
> That may or may not help you, but I think it's an important
> observation in reasoning about this behavior.
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users