Subject: Re: [OMPI users] Did you break MPI_Abort recently?
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-06-26 18:40:44


Man, was this a PITA to chase down. Finally found it, though. Fixed on
trunk as of r21549

Thanks!
Ralph

So something else is wrong.
On Jun 25, 2009, at 3:19 PM, Mostyn Lewis wrote:

> Just local machine - direct from the command line wth a script like
> the one below. So, no launch mechanism.
>
> Fails on SUSE Linux Enterprise Server 10 (x86_64) - SP2 and
> Fedora release 10 (Cambridge), for example.
>
> DM
>
> On Thu, 25 Jun 2009, Ralph Castain wrote:
>
>> Sorry - should have been more clear. Are you using rsh, qrsh (i.e.,
>> SGE), SLURM, Torque, ....?
>>
>> On Jun 25, 2009, at 2:54 PM, Mostyn Lewis wrote:
>>
>>> Something like:
>>> #!/bin/ksh
>>> set -x
>>> PREFIX=$OPENMPI_GCC_SVN
>>> export PATH=$OPENMPI_GCC_SVN/bin:$PATH
>>> MCA="--mca btl tcp,self"
>>> mpicc -g -O6 mpiabort.c
>>> NPROCS=4
>>> mpirun --prefix $PREFIX -x LD_LIBRARY_PATH $MCA -np $NPROCS -
>>> machinefile fred ./a.out
>>> DM
>>> On Thu, 25 Jun 2009, Ralph Castain wrote:
>>>> Using what launch environment?
>>>> On Jun 25, 2009, at 2:29 PM, Mostyn Lewis wrote:
>>>>> While using the BLACS test programs, I've seen that with recent
>>>>> SVN checkouts
>>>>> (including todays) the MPI_Abort test left procs running. The
>>>>> last SVN I
>>>>> have where it worked was 1.4a1r20936. By 1.4a1r21246 it fails.
>>>>> Works O.K. in the standard 1.3.2 release.
>>>>> A test program is below. GCC was used.
>>>>> DM
>>>>> #include <stdio.h>
>>>>> #include <sys/types.h>
>>>>> #include <unistd.h>
>>>>> #include <math.h>
>>>>> #include <mpi.h>
>>>>> #define NUM_ITERS 100000
>>>>> /* Prototype the function that we'll use below. */
>>>>> static double f(double);
>>>>> int
>>>>> main(int argc, char *argv[])
>>>>> {
>>>>> int iter, rank, size, i;
>>>>> int foo;
>>>>> double PI25DT = 3.141592653589793238462643;
>>>>> double mypi, pi, h, sum, x;
>>>>> double startwtime = 0.0, endwtime;
>>>>> int namelen;
>>>>> char processor_name[MPI_MAX_PROCESSOR_NAME];
>>>>> /* Normal MPI startup */
>>>>> MPI_Init(&argc, &argv);
>>>>> MPI_Comm_size(MPI_COMM_WORLD, &size);
>>>>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>>> MPI_Get_processor_name(processor_name, &namelen);
>>>>> printf("Process %d of %d on %s\n", rank, size, processor_name);
>>>>> /* Do approximations for 1 to 100 points */
>>>>> /* sleep(5); */
>>>>> for (iter = 2; iter < NUM_ITERS; ++iter) {
>>>>> h = 1.0 / (double) iter;
>>>>> sum = 0.0;
>>>>> /* A slightly better approach starts from large i and works back
>>>>> */
>>>>> if (rank == 0)
>>>>> startwtime = MPI_Wtime();
>>>>> for (i = rank + 1; i <= iter; i += size) {
>>>>> x = h * ((double) i - 0.5);
>>>>> sum += f(x);
>>>>> }
>>>>> mypi = h * sum;
>>>>> if(iter == (NUM_ITERS - 1000)){
>>>>> MPI_Barrier(MPI_COMM_WORLD);
>>>>> if(rank == 2){
>>>>> MPI_Abort(MPI_COMM_WORLD, -1);
>>>>> } else {
>>>>> /* Just loop */
>>>>> foo = 1;
>>>>> while(foo == 1){
>>>>> foo = foo + 3;
>>>>> foo = foo - 2;
>>>>> foo = foo - 1;
>>>>> }
>>>>> }
>>>>> }
>>>>> MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
>>>>> }
>>>>> /* All done */
>>>>> if (rank == 0) {
>>>>> printf("%d points: pi is approximately %.16f, error = %.16f\n",
>>>>> iter, pi, fabs(pi - PI25DT));
>>>>> endwtime = MPI_Wtime();
>>>>> printf("wall clock time = %f\n", endwtime - startwtime);
>>>>> fflush(stdout);
>>>>> }
>>>>> MPI_Finalize();
>>>>> return 0;
>>>>> }
>>>>> static double
>>>>> f(double a)
>>>>> {
>>>>> return (4.0 / (1.0 + a * a));
>>>>> }
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users