Subject: [OMPI users] Best way to overlap computation and transfer using MPI over TCP/Ethernet?
From: Lars Andersson (larsand_at_[hidden])
Date: 2009-06-04 03:53:52


Hi all,

I've been trying to get overlapping computation and data transfer to
work, without much success so far. What I'm trying to achieve is:

NODE 1:

   * Post nonblocking send (30MB data)

NODE 2:

   1) Post nonblocking receive

   2) do local work, while data is being received

   3) complete transfer posted in 1) (MPI_Wait)

   4) use received data

So, in my first test using a message size of 30MB, if I did nothing at
point 2) above, to complete the transfer in 3) takes about 0.8s.

In my second test, I simply put a sleep(3) at point 2), and expected
the MPI_Wait() call at 3) to finish almost instantly, since I assumed
that the message would have been transferred during the sleep. To my
disappointment tough, it took more or less the same time to finish the
MPI_Wait as without any sleep.

After browsing the forums, I realized that to make any communication
progress for these king of large messages, I usually need to block in
MPI_Wait, or repeatedly call MPI_Test. I guess that makes sense.

So, my questions is, how would you get around this and achieve optimal
computation/transfer overlap?

Would you try to intersperse the local work code in 2) with calls to
MPI_Test() ? If yes, how frequent would these calls have to be made?

Another possible solution that comes to mind is to spawn a separate
thread that does an MPI_Wait(). With Open MPI over Ethernet, would
that mean that the MPI_Wait thread would busy-loop, and thus steal up
to 50% of the CPU from the main thread doing the local computation
work?

Lots of questions, but I think this is a pretty common scenario.
Still, after a lot of browsing, I haven't been able to find any
concrete advice.

Thanks,

Lars