include("../../include/msg-header.inc"); ?>
From: Luis Kornblueh (luis.kornblueh_at_[hidden])
Date: 2007-05-08 03:27:02
Hi everybody,
we've got some problems on our cluster with openmpi versions 1.2 and
upward.
The following setup does work:
openmpi-1.2b3: SLES 9 SP3 with gcc/g++ 4.1.1 and PGI f95 6.1-1
The following two setups give a SISEGV in mpiexec (stack see below)
openmpi-1.2: SLES 9 SP3 with gcc/g++ 4.1.1 and PGI f95 6.1-1
openmpi-1.2.1: SLES 9 SP3 with gcc/g++ 4.1.1 and PGI f95 6.1-1
All have been compiled with
export F77=pgf95
export FC=pgf95
./configure --prefix=/sw/sles9-x64/voltaire/openmpi-1.2b3-pgi \
--enable-pretty-print-stacktrace \
--with-libnuma=/usr \
--with-mvapi=/usr \
--with-mvapi-libdir=/usr/lib64
(with changing prefix, of course)
The stack trace:
Starting program: /scratch/work/system/sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/bin/mpiexec -host tornado1 --prefix=$MPIROOT -v -np 8 `pwd`/osu_bw
[Thread debugging using libthread_db enabled]
[New Thread 182906198784 (LWP 30805)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 182906198784 (LWP 30805)]
0x0000002a957f1b5b in _int_free () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0
(gdb) where
#0 0x0000002a957f1b5b in _int_free () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0
#1 0x0000002a957f1e7d in free () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0
#2 0x0000002a95563b72 in __tls_get_addr () from /lib64/ld-linux-x86-64.so.2
#3 0x0000002a95fb51ec in __libc_dl_error_tsd () from /lib64/tls/libc.so.6
#4 0x0000002a95dba6ec in __pthread_initialize_minimal_internal () from /lib64/tls/libpthread.so.0
#5 0x0000002a95dba419 in call_initialize_minimal () from /lib64/tls/libpthread.so.0
#6 0x0000002a95ec9000 in ?? ()
#7 0x0000002a95db9fe9 in _init () from /lib64/tls/libpthread.so.0
#8 0x0000007fbfffe7c0 in ?? ()
#9 0x0000002a9556168d in call_init () from /lib64/ld-linux-x86-64.so.2
#10 0x0000002a9556179b in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
#11 0x0000002a95fb39ac in dl_open_worker () from /lib64/tls/libc.so.6
#12 0x0000002a955612de in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#13 0x0000002a95fb3160 in _dl_open () from /lib64/tls/libc.so.6
#14 0x0000002a959413b5 in dlopen_doit () from /lib64/libdl.so.2
#15 0x0000002a955612de in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#16 0x0000002a959416fa in _dlerror_run () from /lib64/libdl.so.2
#17 0x0000002a95941362 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2
#18 0x0000002a957db2ee in vm_open () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0
#19 0x0000002a957d9645 in tryall_dlopen () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0
#20 0x0000002a957d981e in tryall_dlopen_module () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0
#21 0x0000002a957daab1 in try_dlopen () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0
#22 0x0000002a957dacd6 in lt_dlopenext () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0
#23 0x0000002a957e04f5 in open_component () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0
#24 0x0000002a957e0f60 in mca_base_component_find () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0
#25 0x0000002a957e189c in mca_base_components_open () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0
#26 0x0000002a956a6119 in orte_rds_base_open () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-rte.so.0
#27 0x0000002a95681d18 in orte_init_stage1 () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-rte.so.0
#28 0x0000002a95684eba in orte_system_init () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-rte.so.0
#29 0x0000002a9568179d in orte_init () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-rte.so.0
#30 0x0000000000402a3a in orterun (argc=8, argv=0x7fbfffe778) at orterun.c:374
#31 0x00000000004028d3 in main (argc=8, argv=0x7fbfffe778) at main.c:13
(gdb) quit
In case access to our cluster could help, we would be happy to
provide an account.
Cheerio,
Luis
-- \\\\\\ (-0^0-) --------------------------oOO--(_)--OOo----------------------------- Luis Kornblueh Tel. : +49-40-41173289 Max-Planck-Institute for Meteorology Fax. : +49-40-41173298 Bundesstr. 53 D-20146 Hamburg Email: luis.kornblueh_at_[hidden] Federal Republic of Germany