From: Jelena Pjesivac-Grbovic (pjesa_at_[hidden])
Date: 2007-05-30 15:19:53


Hi Marcin,

what version of Open MPI did you use?
Is it still occurring?
It is also possible that the connection went down during the execution...
although, a segfault really should not occur.

Thanks,
Jelena

On Tue, 29 May 2007, Marcin Skoczylas wrote:

> hello,
>
> recently my administrator made some changes on our cluster and now I
> have a crash during MPI_Barrier:
>
> [our-host:12566] *** Process received signal ***
> [our-host:12566] Signal: Segmentation fault (11)
> [our-host:12566] Signal code: Address not mapped (1)
> [our-host:12566] Failing at address: 0x4
> [our-host:12566] [ 0] /lib/tls/libpthread.so.0 [0xa22f80]
> [our-host:12566] [ 1]
> /usr/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_progress+0x68f)
> [0xcd86d7]
> [our-host:12566] [ 2]
> /usr/lib/openmpi/mca_bml_r2.so(mca_bml_r2_progress+0x32) [0xcb7e3a]
> [our-host:12566] [ 3] /usr/lib/libopen-pal.so.0(opal_progress+0xed)
> [0xc2b221]
> [our-host:12566] [ 4] /usr/lib/libmpi.so.0 [0x3aecc5]
> [our-host:12566] [ 5] /usr/lib/libmpi.so.0(ompi_request_wait_all+0xec)
> [0x3ae784]
> [our-host:12566] [ 6]
> /usr/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_sendrecv_actual+0x77)
> [0xd025bb]
> [our-host:12566] [ 7]
> /usr/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_barrier_intra_recursivedoubling+0xde)
> [0xd05e3a]
> [our-host:12566] [ 8]
> /usr/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_barrier_intra_dec_fixed+0x44)
> [0xd027d8]
> [our-host:12566] [ 9] /usr/lib/libmpi.so.0(PMPI_Barrier+0x176) [0x3c0cea]
>
> Actually, I made small investigation and I realised that:
>
> [user_at_our-host]$ ssh our-host
> ssh(12704) ssh: connect to host our-host port 22: No route to host
>
> that could be the thing, I'm going to talk with my admin soon about this
> routing change, however if it is really this problem, shouldn't it be
> recognised during startup, f.e. in MPI_Init? Actually, I'm not sure...
> your comments?
>
> greetings, Marcin
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

--
Jelena Pjesivac-Grbovic, Pjesa
Graduate Research Assistant
Innovative Computing Laboratory
Computer Science Department, UTK
Claxton Complex 350
(865) 974 - 6722 
(865) 974 - 6321
jpjesiva_at_[hidden]
Murphy's Law of Research:
         Enough research will tend to support your theory.