include("../../include/msg-header.inc"); ?>
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-05-16 15:31:53
I unfortunately do not have these versions of compilers to test this
particular scenario.
I don't see anything obvious in the stack trace that would be causing
a problem.
I'm assuming that /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/
openmpi exists and is populated with all the components for the 1.2.1
installation (and no other plugins), right?
Can you run ompi_info, or does it also segv? (based on the stack
trace, I'm guessing that it will -- this is the code that is trying
to open Open MPI's plugins)
On May 8, 2007, at 12:27 AM, Luis Kornblueh wrote:
> Hi everybody,
>
> we've got some problems on our cluster with openmpi versions 1.2 and
> upward.
>
> The following setup does work:
>
> openmpi-1.2b3: SLES 9 SP3 with gcc/g++ 4.1.1 and PGI f95 6.1-1
>
> The following two setups give a SISEGV in mpiexec (stack see below)
>
> openmpi-1.2: SLES 9 SP3 with gcc/g++ 4.1.1 and PGI f95 6.1-1
> openmpi-1.2.1: SLES 9 SP3 with gcc/g++ 4.1.1 and PGI f95 6.1-1
>
> All have been compiled with
>
> export F77=pgf95
> export FC=pgf95
>
> ./configure --prefix=/sw/sles9-x64/voltaire/openmpi-1.2b3-pgi \
> --enable-pretty-print-stacktrace \
> --with-libnuma=/usr \
> --with-mvapi=/usr \
> --with-mvapi-libdir=/usr/lib64
>
> (with changing prefix, of course)
>
> The stack trace:
>
> Starting program: /scratch/work/system/sw/sles9-x64/voltaire/
> openmpi-1.2.1-pgi/bin/mpiexec -host tornado1 --prefix=$MPIROOT -v -
> np 8 `pwd`/osu_bw
> [Thread debugging using libthread_db enabled]
> [New Thread 182906198784 (LWP 30805)]
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 182906198784 (LWP 30805)]
> 0x0000002a957f1b5b in _int_free () from /sw/sles9-x64/voltaire/
> openmpi-1.2.1-pgi/lib/libopen-pal.so.0
> (gdb) where
> #0 0x0000002a957f1b5b in _int_free () from /sw/sles9-x64/voltaire/
> openmpi-1.2.1-pgi/lib/libopen-pal.so.0
> #1 0x0000002a957f1e7d in free () from /sw/sles9-x64/voltaire/
> openmpi-1.2.1-pgi/lib/libopen-pal.so.0
> #2 0x0000002a95563b72 in __tls_get_addr () from /lib64/ld-linux-
> x86-64.so.2
> #3 0x0000002a95fb51ec in __libc_dl_error_tsd () from /lib64/tls/
> libc.so.6
> #4 0x0000002a95dba6ec in __pthread_initialize_minimal_internal ()
> from /lib64/tls/libpthread.so.0
> #5 0x0000002a95dba419 in call_initialize_minimal () from /lib64/
> tls/libpthread.so.0
> #6 0x0000002a95ec9000 in ?? ()
> #7 0x0000002a95db9fe9 in _init () from /lib64/tls/libpthread.so.0
> #8 0x0000007fbfffe7c0 in ?? ()
> #9 0x0000002a9556168d in call_init () from /lib64/ld-linux-
> x86-64.so.2
> #10 0x0000002a9556179b in _dl_init_internal () from /lib64/ld-linux-
> x86-64.so.2
> #11 0x0000002a95fb39ac in dl_open_worker () from /lib64/tls/libc.so.6
> #12 0x0000002a955612de in _dl_catch_error () from /lib64/ld-linux-
> x86-64.so.2
> #13 0x0000002a95fb3160 in _dl_open () from /lib64/tls/libc.so.6
> #14 0x0000002a959413b5 in dlopen_doit () from /lib64/libdl.so.2
> #15 0x0000002a955612de in _dl_catch_error () from /lib64/ld-linux-
> x86-64.so.2
> #16 0x0000002a959416fa in _dlerror_run () from /lib64/libdl.so.2
> #17 0x0000002a95941362 in dlopen@@GLIBC_2.2.5 () from /lib64/
> libdl.so.2
> #18 0x0000002a957db2ee in vm_open () from /sw/sles9-x64/voltaire/
> openmpi-1.2.1-pgi/lib/libopen-pal.so.0
> #19 0x0000002a957d9645 in tryall_dlopen () from /sw/sles9-x64/
> voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0
> #20 0x0000002a957d981e in tryall_dlopen_module () from /sw/sles9-
> x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0
> #21 0x0000002a957daab1 in try_dlopen () from /sw/sles9-x64/voltaire/
> openmpi-1.2.1-pgi/lib/libopen-pal.so.0
> #22 0x0000002a957dacd6 in lt_dlopenext () from /sw/sles9-x64/
> voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0
> #23 0x0000002a957e04f5 in open_component () from /sw/sles9-x64/
> voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0
> #24 0x0000002a957e0f60 in mca_base_component_find () from /sw/sles9-
> x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0
> #25 0x0000002a957e189c in mca_base_components_open () from /sw/
> sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0
> #26 0x0000002a956a6119 in orte_rds_base_open () from /sw/sles9-x64/
> voltaire/openmpi-1.2.1-pgi/lib/libopen-rte.so.0
> #27 0x0000002a95681d18 in orte_init_stage1 () from /sw/sles9-x64/
> voltaire/openmpi-1.2.1-pgi/lib/libopen-rte.so.0
> #28 0x0000002a95684eba in orte_system_init () from /sw/sles9-x64/
> voltaire/openmpi-1.2.1-pgi/lib/libopen-rte.so.0
> #29 0x0000002a9568179d in orte_init () from /sw/sles9-x64/voltaire/
> openmpi-1.2.1-pgi/lib/libopen-rte.so.0
> #30 0x0000000000402a3a in orterun (argc=8, argv=0x7fbfffe778) at
> orterun.c:374
> #31 0x00000000004028d3 in main (argc=8, argv=0x7fbfffe778) at
> main.c:13
> (gdb) quit
>
> In case access to our cluster could help, we would be happy to
> provide an account.
>
> Cheerio,
> Luis
> --
> \\\\\\
> (-0^0-)
> --------------------------oOO--(_)--OOo-----------------------------
>
> Luis Kornblueh Tel. : +49-40-41173289
> Max-Planck-Institute for Meteorology Fax. : +49-40-41173298
> Bundesstr. 53
> D-20146 Hamburg Email: luis.kornblueh_at_[hidden]
> Federal Republic of Germany
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
-- Jeff Squyres Cisco Systems