Subject: Re: [OMPI users] Did you break MPI_Abort recently?
From: Mostyn Lewis (Mostyn.Lewis_at_[hidden])
Date: 2009-06-27 21:37:45


Thank you.

DM

On Fri, 26 Jun 2009, Ralph Castain wrote:

> Man, was this a PITA to chase down. Finally found it, though. Fixed on trunk
> as of r21549
>
> Thanks!
> Ralph
>
>
> So something else is wrong.
> On Jun 25, 2009, at 3:19 PM, Mostyn Lewis wrote:
>
>> Just local machine - direct from the command line wth a script like
>> the one below. So, no launch mechanism.
>>
>> Fails on SUSE Linux Enterprise Server 10 (x86_64) - SP2 and
>> Fedora release 10 (Cambridge), for example.
>>
>> DM
>>
>> On Thu, 25 Jun 2009, Ralph Castain wrote:
>>
>>> Sorry - should have been more clear. Are you using rsh, qrsh (i.e., SGE),
>>> SLURM, Torque, ....?
>>>
>>> On Jun 25, 2009, at 2:54 PM, Mostyn Lewis wrote:
>>>
>>>> Something like:
>>>> #!/bin/ksh
>>>> set -x
>>>> PREFIX=$OPENMPI_GCC_SVN
>>>> export PATH=$OPENMPI_GCC_SVN/bin:$PATH
>>>> MCA="--mca btl tcp,self"
>>>> mpicc -g -O6 mpiabort.c
>>>> NPROCS=4
>>>> mpirun --prefix $PREFIX -x LD_LIBRARY_PATH $MCA -np $NPROCS -machinefile
>>>> fred ./a.out
>>>> DM
>>>> On Thu, 25 Jun 2009, Ralph Castain wrote:
>>>>> Using what launch environment?
>>>>> On Jun 25, 2009, at 2:29 PM, Mostyn Lewis wrote:
>>>>>> While using the BLACS test programs, I've seen that with recent SVN
>>>>>> checkouts
>>>>>> (including todays) the MPI_Abort test left procs running. The last SVN
>>>>>> I
>>>>>> have where it worked was 1.4a1r20936. By 1.4a1r21246 it fails.
>>>>>> Works O.K. in the standard 1.3.2 release.
>>>>>> A test program is below. GCC was used.
>>>>>> DM
>>>>>> #include <stdio.h>
>>>>>> #include <sys/types.h>
>>>>>> #include <unistd.h>
>>>>>> #include <math.h>
>>>>>> #include <mpi.h>
>>>>>> #define NUM_ITERS 100000
>>>>>> /* Prototype the function that we'll use below. */
>>>>>> static double f(double);
>>>>>> int
>>>>>> main(int argc, char *argv[])
>>>>>> {
>>>>>> int iter, rank, size, i;
>>>>>> int foo;
>>>>>> double PI25DT = 3.141592653589793238462643;
>>>>>> double mypi, pi, h, sum, x;
>>>>>> double startwtime = 0.0, endwtime;
>>>>>> int namelen;
>>>>>> char processor_name[MPI_MAX_PROCESSOR_NAME];
>>>>>> /* Normal MPI startup */
>>>>>> MPI_Init(&argc, &argv);
>>>>>> MPI_Comm_size(MPI_COMM_WORLD, &size);
>>>>>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>>>> MPI_Get_processor_name(processor_name, &namelen);
>>>>>> printf("Process %d of %d on %s\n", rank, size, processor_name);
>>>>>> /* Do approximations for 1 to 100 points */
>>>>>> /* sleep(5); */
>>>>>> for (iter = 2; iter < NUM_ITERS; ++iter) {
>>>>>> h = 1.0 / (double) iter;
>>>>>> sum = 0.0;
>>>>>> /* A slightly better approach starts from large i and works back */
>>>>>> if (rank == 0)
>>>>>> startwtime = MPI_Wtime();
>>>>>> for (i = rank + 1; i <= iter; i += size) {
>>>>>> x = h * ((double) i - 0.5);
>>>>>> sum += f(x);
>>>>>> }
>>>>>> mypi = h * sum;
>>>>>> if(iter == (NUM_ITERS - 1000)){
>>>>>> MPI_Barrier(MPI_COMM_WORLD);
>>>>>> if(rank == 2){
>>>>>> MPI_Abort(MPI_COMM_WORLD, -1);
>>>>>> } else {
>>>>>> /* Just loop */
>>>>>> foo = 1;
>>>>>> while(foo == 1){
>>>>>> foo = foo + 3;
>>>>>> foo = foo - 2;
>>>>>> foo = foo - 1;
>>>>>> }
>>>>>> }
>>>>>> }
>>>>>> MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
>>>>>> }
>>>>>> /* All done */
>>>>>> if (rank == 0) {
>>>>>> printf("%d points: pi is approximately %.16f, error = %.16f\n",
>>>>>> iter, pi, fabs(pi - PI25DT));
>>>>>> endwtime = MPI_Wtime();
>>>>>> printf("wall clock time = %f\n", endwtime - startwtime);
>>>>>> fflush(stdout);
>>>>>> }
>>>>>> MPI_Finalize();
>>>>>> return 0;
>>>>>> }
>>>>>> static double
>>>>>> f(double a)
>>>>>> {
>>>>>> return (4.0 / (1.0 + a * a));
>>>>>> }
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users