Subject: Re: [OMPI users] Problem with OpenMPI (MX btl and mtl) and threads
From: Scott Atchley (atchley_at_[hidden])
Date: 2009-06-09 11:06:47


Hi Francois,

I am not familiar with the internals of the OMPI code. Are you sure,
however, that threads are fully supported yet? I was under the
impression that thread support was still partial.

Can anyone else comment?

Scott

On Jun 8, 2009, at 8:43 AM, François Trahay wrote:

> Hi,
> I'm encountering some issues when running a multithreaded program with
> OpenMPI (trunk rev. 21380, configured with --enable-mpi-threads)
> My program (included in the tar.bz2) uses several pthreads that
> perform
> ping pongs concurrently (thread #1 uses tag #1, thread #2 uses tag
> #2, etc.)
> This program crashes over MX (either btl or mtl) with the following
> backtrace:
>
> concurrent_ping_v2: pml_cm_recvreq.c:53:
> mca_pml_cm_recv_request_completion: Assertion `0 ==
> ((mca_pml_cm_thin_recv_request_t*)base_request)-
> >req_base.req_pml_complete'
> failed.
> [joe0:01709] *** Process received signal ***
> [joe0:01709] *** Process received signal ***
> [joe0:01709] Signal: Segmentation fault (11)
> [joe0:01709] Signal code: Address not mapped (1)
> [joe0:01709] Failing at address: 0x1238949c4
> [joe0:01709] Signal: Aborted (6)
> [joe0:01709] Signal code: (-6)
> [joe0:01709] [ 0] /lib/libpthread.so.0 [0x7f57240be7b0]
> [joe0:01709] [ 1] /lib/libc.so.6(gsignal+0x35) [0x7f5722cba065]
> [joe0:01709] [ 2] /lib/libc.so.6(abort+0x183) [0x7f5722cbd153]
> [joe0:01709] [ 3] /lib/libc.so.6(__assert_fail+0xe9) [0x7f5722cb3159]
> [joe0:01709] [ 0] /lib/libpthread.so.0 [0x7f57240be7b0]
> [joe0:01709] [ 1]
> /home/ftrahay/sources/openmpi/trunk/install//lib/libopen-pal.so.0
> [0x7f57238d0a08]
> [joe0:01709] [ 2]
> /home/ftrahay/sources/openmpi/trunk/install//lib/libopen-pal.so.0
> [0x7f57238cf8cc]
> [joe0:01709] [ 3]
> /home/ftrahay/sources/openmpi/trunk/install//lib/libopen-pal.so.
> 0(opal_free+0x4e)
> [0x7f57238bdc69]
> [joe0:01709] [ 4]
> /home/ftrahay/sources/openmpi/trunk/install/lib/openmpi/mca_mtl_mx.so
> [0x7f572060b72f]
> [joe0:01709] [ 5]
> /home/ftrahay/sources/openmpi/trunk/install//lib/libopen-pal.so.
> 0(opal_progress+0xbc)
> [0x7f57238948e0]
> [joe0:01709] [ 6]
> /home/ftrahay/sources/openmpi/trunk/install/lib/openmpi/mca_pml_cm.so
> [0x7f572081145a]
> [joe0:01709] [ 7]
> /home/ftrahay/sources/openmpi/trunk/install/lib/openmpi/mca_pml_cm.so
> [0x7f57208113b7]
> [joe0:01709] [ 8]
> /home/ftrahay/sources/openmpi/trunk/install/lib/openmpi/mca_pml_cm.so
> [0x7f57208112e7]
> [joe0:01709] [ 9]
> /home/ftrahay/sources/openmpi/trunk/install//lib/libmpi.so.0(MPI_Recv
> +0x2bc)
> [0x7f5723e07690]
> [joe0:01709] [10] ./concurrent_ping_v2(client+0x123) [0x401404]
> [joe0:01709] [11] /lib/libpthread.so.0 [0x7f57240b6faa]
> [joe0:01709] [12] /lib/libc.so.6(clone+0x6d) [0x7f5722d5629d]
> [joe0:01709] *** End of error message ***
> [joe0:01709] [ 4]
> /home/ftrahay/sources/openmpi/trunk/install/lib/openmpi/mca_pml_cm.so
> [0x7f57208120bb]
> [joe0:01709] [ 5]
> /home/ftrahay/sources/openmpi/trunk/install/lib/openmpi/mca_mtl_mx.so
> [0x7f572060b80a]
> [joe0:01709] [ 6]
> /home/ftrahay/sources/openmpi/trunk/install//lib/libopen-pal.so.
> 0(opal_progress+0xbc)
> [0x7f57238948e0]
> [joe0:01709] [ 7]
> /home/ftrahay/sources/openmpi/trunk/install/lib/openmpi/mca_pml_cm.so
> [0x7f572081147a]
> [joe0:01709] [ 8]
> /home/ftrahay/sources/openmpi/trunk/install/lib/openmpi/mca_pml_cm.so
> [0x7f57208113b7]
> [joe0:01709] [ 9]
> /home/ftrahay/sources/openmpi/trunk/install/lib/openmpi/mca_pml_cm.so
> [0x7f57208112e7]
> [joe0:01709] [10]
> /home/ftrahay/sources/openmpi/trunk/install//lib/libmpi.so.0(MPI_Recv
> +0x2bc)
> [0x7f5723e07690]
> [joe0:01709] [11] ./concurrent_ping_v2(client+0x123) [0x401404]
> [joe0:01709] [12] /lib/libpthread.so.0 [0x7f57240b6faa]
> [joe0:01709] [13] /lib/libc.so.6(clone+0x6d) [0x7f5722d5629d]
> [joe0:01709] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 1 with PID 1709 on node joe0 exited
> on
> signal 6 (Aborted).
> --------------------------------------------------------------------------
>
>
> Any idea ?
>
> Francois Trahay
>
> <bug-report.tar.bz2>_______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users