Subject: Re: [OMPI users] Intermittent corruption
From: George Bosilca (bosilca_at_[hidden])
Date: 2009-06-11 17:36:59


Did you try to follow the advice on the LAPACK mailing list, i.e.
upgrade your compiler from the MAC OS X default (4.0.1) to 4.3.0 ?

Btw, what is the test you're running? Can you create a small test case
so I can try to reproduce it?

Thanks,
   george.

On Jun 11, 2009, at 17:02 , Nick Collier wrote:

> Hi,
>
> I'm developing under OSX 10.5.7 with Open-MPI 1.3.2 and am running
> into intermittent corruption when send / recv user defined data
> type. When running with less than four processes (i.e. mpirun -np
> [2,3]), the data is fine, when running with 4 or more the received
> data is intermittently corrupted. By corrupted, I mean things like
> what should be small integer values in a struct are very large as if
> the memory hasn't been assigned properly. This occurs intermittently
> -- some runs will be fine and others won't be, leading to crashes
> like:
>
> [belafonte:30191] *** Process received signal ***
> [belafonte:30191] Signal: Bus error (10)
> [belafonte:30191] Signal code: (2)
> [belafonte:30191] Failing at address: 0x9
> [belafonte:30191] [ 0] 2 libSystem.B.dylib
> 0x945af2bb _sigtramp + 43
> [belafonte:30191] [ 1] 3 ???
> 0xffffffff 0x0 + 4294967295
>
> I'm not sure how to proceed or what might be wrong. The closest
> thing I could find on google was http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=2&t=614
> where someone reports having issues with ScaLapack in combination
> with openmpi and OSX's stock gcc 4.01 that were fixed by using gcc
> 4.3.1.
>
> At any rate, any suggestions on how to move forward would be
> appreciated.
>
> thanks,
>
> Nick
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users