include("../../include/msg-header.inc"); ?>
From: Laurent Nguyen (laurent.nguyen_at_[hidden])
Date: 2007-05-10 12:35:36
Hi Tim,
Ok, I thank you for all theses precisions. I also add "static int
pls_poe_cancel_operation(void)" similary to you, and I can continue the
compilation. But, I had another problem. In ompi/mpi/cxx/mpicxx.cc,
three variables are already defined. The preprocessor set them to the
constant of C. So, I put theses lines in comment:
//const int SEEK_SET = MPI_SEEK_SET;
//const int SEEK_CUR = MPI_SEEK_CUR;
//const int SEEK_END = MPI_SEEK_END;
After that, I can achieve to compile OpenMPI. I didn't try to launch it
in rsh mode. But I tried to launch it with POE.
But firstly I remind here my experience with OpenMPI 1.1.x on IBM. My
machine has some restriction, but I have two ways for launching an
application:
- interactive mode: OpenMPI didn't work in this mode. I have this error:
$ export MP_PROCS=2
$ mpiexec -n 2 myprog.exe
ERROR: 0031-125 Fewer nodes (1) specified in
/tmpdir/inter/int.ssos181-130093928631562/a-UWUb than tasks (2).
I think it is because of my machine configuration
- batch mode (for queuing): OpenMPI worked, but some functions didn't
work (like MPI_Comm_Spawn). And it seems that performances during
communications are very bad. (But in intra-nodes, it has the same
performance as MPI constructor)
Then, I hope OpenMPI 1.2.xxx work on SP4, but I have the same problem in
interactive mode. And in batch mode, I have the error:
[0,0,0] ORTE_ERROR_LOG: Not implemented in file errmgr_hnp.c at line 90
--------------------------------------------------------------------------
mpiexec was unable to cleanly terminate the daemons for this job.
Returned value Not implemented instead of ORTE_SUCCESS.
--------------------------------------------------------------------------
I think it is like you said before, POE isn't yet implemented.
I was interested for OpenMPI because it support MPI-2. Since OpenMPI
1.1.1, I install all the version on my SP4 for testing. My impressions are:
- it seems to be very difficult for developpers to implement OpenMPI on
SP4 and I hope one day they achieve it ;)
- in my context, my institution puts many restrictions on the use of our
machine, that's why my tests are incomplete. (On the same way, rsh
command is forbidden between our nodes...)
So, I really thank you for your explanations and precisions.
Best Regards,
**************************************
NGUYEN Anh-Khai Laurent
Equipe Support Utilisateur
Email : laurent.nguyen_at_[hidden]
Tél : 01.69.35.85.66
Adresse : IDRIS - Institut du Développement et des Ressources en
Informatique Scientifique
CNRS
Batiment 506
BP 167
F - 91403 ORSAY Cedex
Site Web : http://www.idris.fr
**************************************
Tim Prins a écrit :
> Hi Laurent,
>
> Unfortunately, as far as I know, none of the current Open MPI developers has
> access to a system with POE, so the POE process launcher has fallen into
> disrepair. Attached is a patch that should allow you to compile (however, you
> may also need to add #include <signal.h> to pls_poe_module.c).
>
> Though this should allow the compile to succeed, launching with POE may not
> work (it has not been tested for quite a while). If it doesn't work, you
> should use the rsh launcher instead (pass -mca pls rsh on the command line,
> or set the parameter using one of the methods here:
> http://www.open-mpi.org/faq/?category=tuning#setting-mca-params).
>
> Sorry about this. We have an IBM machine at my institution which I am told
> will have POE on it 'soon', but I am not sure when. Once it does, we will be
> working on getting POE well supported again.
>
> I should mention that we do use LoadLeveler on one of our machines and Open
> MPI seems to work with it quite well. I would be interested in hearing how it
> works for you.
>
> Hope this helps, let me know if this works.
>
> Thanks,
>
> Tim
>
> On Thursday 10 May 2007 02:57 am, Laurent Nguyen wrote:
>> Hello,
>>
>> I tried to install OpenMPI 1.2 but I saw there some problems when
>> compiling files with POE. When OpenMPI 1.2.1 was released, I saw in the
>> bug fixes that this problem was fixed. Then I tried, but it still
>> doesn't work. The problem comes from orte/mca/pls/poe/pls_poe_module.c.
>> A static function "static int pls_poe_cancel_operation(void);" is
>> declared but not defined in the files. I don't know if my configuration
>> make it bug.
>>
>> So, if someone achieved to install OpenMPI 1.2.1 on IBM, I would like to
>> have some advices.
>>
>> Thank you for your help,
>>
>> PS: I attached some output files of my installation
>>
>> ------------------------------------------------------------------------
>>
>> Index: orte/mca/pls/poe/pls_poe_module.c
>> ===================================================================
>> --- orte/mca/pls/poe/pls_poe_module.c (revision 14640)
>> +++ orte/mca/pls/poe/pls_poe_module.c (working copy)
>> @@ -37,6 +37,7 @@
>> #include "opal/mca/base/mca_base_param.h"
>> #include "opal/util/argv.h"
>> #include "opal/util/opal_environ.h"
>> +#include "opal/util/output.h"
>>
>> #include "orte/mca/errmgr/errmgr.h"
>> #include "orte/mca/gpr/gpr.h"
>> @@ -69,7 +70,10 @@
>> static int pls_poe_signal_job(orte_jobid_t jobid, int32_t signal, opal_list_t *attrs);
>> static int pls_poe_signal_proc(const orte_process_name_t *name, int32_t signal);
>> static int pls_poe_finalize(void);
>> -static int pls_poe_cancel_operation(void);
>> +static int pls_poe_cancel_operation(void) {
>> + return ORTE_ERR_NOT_IMPLEMENTED;
>> +}
>> +
>>
>> orte_pls_base_module_t orte_pls_poe_module = {
>> pls_poe_launch_job,