Subject: Re: [OMPI users] Did you break MPI_Abort recently?
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-06-25 17:01:03


Sorry - should have been more clear. Are you using rsh, qrsh (i.e.,
SGE), SLURM, Torque, ....?

On Jun 25, 2009, at 2:54 PM, Mostyn Lewis wrote:

> Something like:
>
> #!/bin/ksh
> set -x
>
> PREFIX=$OPENMPI_GCC_SVN
> export PATH=$OPENMPI_GCC_SVN/bin:$PATH
> MCA="--mca btl tcp,self"
> mpicc -g -O6 mpiabort.c
> NPROCS=4
> mpirun --prefix $PREFIX -x LD_LIBRARY_PATH $MCA -np $NPROCS -
> machinefile fred ./a.out
>
> DM
>
> On Thu, 25 Jun 2009, Ralph Castain wrote:
>
>> Using what launch environment?
>>
>> On Jun 25, 2009, at 2:29 PM, Mostyn Lewis wrote:
>>
>>> While using the BLACS test programs, I've seen that with recent
>>> SVN checkouts
>>> (including todays) the MPI_Abort test left procs running. The last
>>> SVN I
>>> have where it worked was 1.4a1r20936. By 1.4a1r21246 it fails.
>>> Works O.K. in the standard 1.3.2 release.
>>> A test program is below. GCC was used.
>>> DM
>>> #include <stdio.h>
>>> #include <sys/types.h>
>>> #include <unistd.h>
>>> #include <math.h>
>>> #include <mpi.h>
>>> #define NUM_ITERS 100000
>>> /* Prototype the function that we'll use below. */
>>> static double f(double);
>>> int
>>> main(int argc, char *argv[])
>>> {
>>> int iter, rank, size, i;
>>> int foo;
>>> double PI25DT = 3.141592653589793238462643;
>>> double mypi, pi, h, sum, x;
>>> double startwtime = 0.0, endwtime;
>>> int namelen;
>>> char processor_name[MPI_MAX_PROCESSOR_NAME];
>>> /* Normal MPI startup */
>>> MPI_Init(&argc, &argv);
>>> MPI_Comm_size(MPI_COMM_WORLD, &size);
>>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>> MPI_Get_processor_name(processor_name, &namelen);
>>> printf("Process %d of %d on %s\n", rank, size, processor_name);
>>> /* Do approximations for 1 to 100 points */
>>> /* sleep(5); */
>>> for (iter = 2; iter < NUM_ITERS; ++iter) {
>>> h = 1.0 / (double) iter;
>>> sum = 0.0;
>>>
>>> /* A slightly better approach starts from large i and works back */
>>>
>>> if (rank == 0)
>>> startwtime = MPI_Wtime();
>>>
>>> for (i = rank + 1; i <= iter; i += size) {
>>> x = h * ((double) i - 0.5);
>>> sum += f(x);
>>> }
>>> mypi = h * sum;
>>>
>>> if(iter == (NUM_ITERS - 1000)){
>>> MPI_Barrier(MPI_COMM_WORLD);
>>> if(rank == 2){
>>> MPI_Abort(MPI_COMM_WORLD, -1);
>>> } else {
>>> /* Just loop */
>>> foo = 1;
>>> while(foo == 1){
>>> foo = foo + 3;
>>> foo = foo - 2;
>>> foo = foo - 1;
>>> }
>>> }
>>> }
>>> MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
>>> }
>>> /* All done */
>>> if (rank == 0) {
>>> printf("%d points: pi is approximately %.16f, error = %.16f\n",
>>> iter, pi, fabs(pi - PI25DT));
>>> endwtime = MPI_Wtime();
>>> printf("wall clock time = %f\n", endwtime - startwtime);
>>> fflush(stdout);
>>> }
>>> MPI_Finalize();
>>> return 0;
>>> }
>>> static double
>>> f(double a)
>>> {
>>> return (4.0 / (1.0 + a * a));
>>> }
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users