$subject_val = "Re: [OMPI users] Segmentation fault (11)"; include("../../include/msg-header.inc"); ?>
Subject: Re: [OMPI users] Segmentation fault (11)
From: Josh Hursey (jjhursey_at_[hidden])
Date: 2009-06-16 13:39:56
(Sorry for the delay. I have been on travel, and just now getting
caught up on email.)
It looks like the checkpoint is corrupted. This can be caused by a
number of things. Usually it is caused by memory corruption in the
application that then further muddles the checkpoint generated. Are
you able to get a stack trace from the core dump resulting from the
segfault on restart?
What do you mean by the checkpoint "hangs forever just before ending"?
Do you have to CTRL-C the application, or is the checkpoint just
taking a long time to finish?
-- Josh
On Jun 15, 2009, at 11:30 AM, Kritiraj Sajadah wrote:
>
> Dear All,
> I have installed BLCR 0.8.1 and OPENMPI 1.3 on a linux
> platform. However, when i tried checkpoiting an application, it
> hangs forever just before ending.
>
> A chekcpoint file is generated. However, when i try restarting it, i
> get the following error:
>
> raj_at_sun06:~$ ompi-restart ompi_global_snapshot_22390.ckpt
> [sun06:22423] *** Process received signal ***
> [sun06:22423] Signal: Segmentation fault (11)
> [sun06:22423] Signal code: Address not mapped (1)
> [sun06:22423] Failing at address: (nil)
> [sun06:22423] [ 0] [0xb7fb640c]
> [sun06:22423] [ 1] /usr/local/openmpi/lib/libopen-pal.so.
> 0(opal_crs_blcr_restart+0x103) [0xb7f76925]
> [sun06:22423] [ 2] opal-restart [0x8049435]
> [sun06:22423] [ 3] /lib/libc.so.6(__libc_start_main+0xe5) [0xb7d9a455]
> [sun06:22423] [ 4] opal-restart [0x8049001]
> [sun06:22423] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 0 with PID 22423 on node sun06
> exited on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
>
> Any help will be very appreciated.
>
> kind regards,
>
> Raj
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users