Subject: Re: [OMPI users] vfs_write returned -14
From: Josh Hursey (jjhursey_at_[hidden])
Date: 2009-06-16 20:42:07


Did you try checkpointing a non-MPI application with BLCR on the
cluster? If that does not work then I would suspect that BLCR is not
working properly on the system.

However if a non-MPI application can be checkpointed and restarted
correctly on this machine then it may be something odd with the Open
MPI installation or runtime environment. To help debug here I would
need to know how Open MPI was configured and how the application was
ran on the machine (command line arguments, environment variables, ...).

I should note that for the program that you sent it is important that
you compile Open MPI with the Fault Tolerance Thread enabled to ensure
a timely checkpoint. Otherwise the checkpoint will be delayed until
the MPI program enters the MPI_Finalize function.

Let me know what you find out.

Josh

On Jun 16, 2009, at 5:08 PM, Kritiraj Sajadah wrote:

>
> Hi Josh,
>
> Thanks for the email. I have install BLCR 0.8.1 and openmpi 1.3 on
> my laptop with Ubuntu 8.04 on it. It works fine.
>
> I now tried the installation on the cluster ( on one machine for
> now) in my university. ( the administrator installed it) i am not
> sure if he followed the steps i gave him.
>
> I am checkpointing a simple mpi application which looks as follows:
>
> #include <mpi.h>
> #include <stdio.h>
>
> int main(int argc, char **argv)
> {
> int rank,size;
> MPI_Init(&argc, &argv);
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> MPI_Comm_size(MPI_COMM_WORLD, &size);
> printf("I am processor no %d of a total of %d procs \n", rank, size);
> system("sleep 30");
> printf("I am processor no %d of a total of %d procs \n", rank, size);
> system("sleep 30");
> printf("I am processor no %d of a total of %d procs \n", rank, size);
> system("sleep 30");
> printf("bye \n");
> MPI_Finalize();
> return 0;
> }
>
> Do you think its better to re install BLCR?
>
>
> Thanks
>
> Raj
> --- On Tue, 6/16/09, Josh Hursey <jjhursey_at_[hidden]> wrote:
>
>> From: Josh Hursey <jjhursey_at_[hidden]>
>> Subject: Re: [OMPI users] vfs_write returned -14
>> To: "Open MPI Users" <users_at_[hidden]>
>> Date: Tuesday, June 16, 2009, 6:42 PM
>>
>> These are errors from BLCR. It may be a problem with your
>> BLCR installation and/or your application. Are you able to
>> checkpoint/restart a non-MPI application with BLCR on these
>> machines?
>>
>> What kind of MPI application are you trying to checkpoint?
>> Some of the MPI interfaces are not fully supported at the
>> moment (outlined in the FT User Document that I mentioned in
>> a previous email).
>>
>> -- Josh
>>
>> On Jun 16, 2009, at 11:30 AM, Kritiraj Sajadah wrote:
>>
>>>
>>> Dear All,
>>> I have install
>> openmpi 1.3 and blcr 0.8.1 on a linux machine (ubuntu).
>> however, when i try checkpointing an MPI application, I get
>> the following error:
>>>
>>> - vfs_write returned -14
>>> - file_header: write returned -14
>>>
>>> Can someone help please.
>>>
>>> Regards,
>>>
>>> Raj
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users