Subject: Re: [OMPI users] Did you break MPI_Abort recently?
From: Mostyn Lewis (Mostyn.Lewis_at_[hidden])
Date: 2009-06-25 17:19:39


Just local machine - direct from the command line wth a script like
the one below. So, no launch mechanism.

Fails on SUSE Linux Enterprise Server 10 (x86_64) - SP2 and
Fedora release 10 (Cambridge), for example.

DM

On Thu, 25 Jun 2009, Ralph Castain wrote:

> Sorry - should have been more clear. Are you using rsh, qrsh (i.e., SGE),
> SLURM, Torque, ....?
>
> On Jun 25, 2009, at 2:54 PM, Mostyn Lewis wrote:
>
>> Something like:
>>
>> #!/bin/ksh
>> set -x
>>
>> PREFIX=$OPENMPI_GCC_SVN
>> export PATH=$OPENMPI_GCC_SVN/bin:$PATH
>> MCA="--mca btl tcp,self"
>> mpicc -g -O6 mpiabort.c
>> NPROCS=4
>> mpirun --prefix $PREFIX -x LD_LIBRARY_PATH $MCA -np $NPROCS -machinefile
>> fred ./a.out
>>
>> DM
>>
>> On Thu, 25 Jun 2009, Ralph Castain wrote:
>>
>>> Using what launch environment?
>>>
>>> On Jun 25, 2009, at 2:29 PM, Mostyn Lewis wrote:
>>>
>>>> While using the BLACS test programs, I've seen that with recent SVN
>>>> checkouts
>>>> (including todays) the MPI_Abort test left procs running. The last SVN I
>>>> have where it worked was 1.4a1r20936. By 1.4a1r21246 it fails.
>>>> Works O.K. in the standard 1.3.2 release.
>>>> A test program is below. GCC was used.
>>>> DM
>>>> #include <stdio.h>
>>>> #include <sys/types.h>
>>>> #include <unistd.h>
>>>> #include <math.h>
>>>> #include <mpi.h>
>>>> #define NUM_ITERS 100000
>>>> /* Prototype the function that we'll use below. */
>>>> static double f(double);
>>>> int
>>>> main(int argc, char *argv[])
>>>> {
>>>> int iter, rank, size, i;
>>>> int foo;
>>>> double PI25DT = 3.141592653589793238462643;
>>>> double mypi, pi, h, sum, x;
>>>> double startwtime = 0.0, endwtime;
>>>> int namelen;
>>>> char processor_name[MPI_MAX_PROCESSOR_NAME];
>>>> /* Normal MPI startup */
>>>> MPI_Init(&argc, &argv);
>>>> MPI_Comm_size(MPI_COMM_WORLD, &size);
>>>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>> MPI_Get_processor_name(processor_name, &namelen);
>>>> printf("Process %d of %d on %s\n", rank, size, processor_name);
>>>> /* Do approximations for 1 to 100 points */
>>>> /* sleep(5); */
>>>> for (iter = 2; iter < NUM_ITERS; ++iter) {
>>>> h = 1.0 / (double) iter;
>>>> sum = 0.0;
>>>>
>>>> /* A slightly better approach starts from large i and works back */
>>>>
>>>> if (rank == 0)
>>>> startwtime = MPI_Wtime();
>>>>
>>>> for (i = rank + 1; i <= iter; i += size) {
>>>> x = h * ((double) i - 0.5);
>>>> sum += f(x);
>>>> }
>>>> mypi = h * sum;
>>>>
>>>> if(iter == (NUM_ITERS - 1000)){
>>>> MPI_Barrier(MPI_COMM_WORLD);
>>>> if(rank == 2){
>>>> MPI_Abort(MPI_COMM_WORLD, -1);
>>>> } else {
>>>> /* Just loop */
>>>> foo = 1;
>>>> while(foo == 1){
>>>> foo = foo + 3;
>>>> foo = foo - 2;
>>>> foo = foo - 1;
>>>> }
>>>> }
>>>> }
>>>> MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
>>>> }
>>>> /* All done */
>>>> if (rank == 0) {
>>>> printf("%d points: pi is approximately %.16f, error = %.16f\n",
>>>> iter, pi, fabs(pi - PI25DT));
>>>> endwtime = MPI_Wtime();
>>>> printf("wall clock time = %f\n", endwtime - startwtime);
>>>> fflush(stdout);
>>>> }
>>>> MPI_Finalize();
>>>> return 0;
>>>> }
>>>> static double
>>>> f(double a)
>>>> {
>>>> return (4.0 / (1.0 + a * a));
>>>> }
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users