include("../../include/msg-header.inc"); ?>
From: Andrew Friedley (afriedle_at_[hidden])
Date: 2007-05-09 10:02:52
OK, strange but good. Yeah I wouldn't be surprised if something has
been changed, though I wouldn't know what, and I don't have time right
now to go digging :( Maybe Don Kerr knows something?
Andrew
Boris Bierbaum wrote:
> I've run the whole IMB Benchmark Suite on 2, 3, and 4 nodes with 2
> processes per node and --mca btl udapl,self. I didn't encouter any problems.
>
> The comment above line 197 says that dat_ep_query() returns wrong port
> numbers (which it does indeed), but I can't find any call to
> dat_ep_query() in the uDAPL BTL code. Maybe the comment is out of date?
>
> Boris
>
>
> Andrew Friedley wrote:
>> You say that fixes the problem, does it work even when running more than
>> one MPI process per node? (that is the case the hack fixes) Simply
>> doing an mpirun with a -np paremeter higher than the number of nodes you
>> have set up should trigger this case, and making sure to use '-mca btl
>> udapl,self' (ie not SM or anything else).
>>
>> Andrew
>>
>> Boris Bierbaum wrote:
>>> It has been explained in a different thread on [ofa-general] that the
>>> problem lies in a combination of the OpenIB-cma provider not setting the
>>> local and remote port numbers on endpoints correctly and Open MPI
>>> stepping over the IA to save the port number to circumvent this problem,
>>> thereby confusing the provider.
>>>
>>> I commented out line 197 in ompi/mca/btl/udapl/btl_udapl.c (Open MPI
>>> 1.2.1 release) and this fixes the problem. As the problem in the
>>> provider is currently being fixed, the whole saving of the port number
>>> in the uDAPL BTL code will be unnecessary in the future.
>>>
>>> Steve Wise wrote:
>>>>>> Can the UDAPL OFED wizards shed any light on the error messages that
>>>>>> are listed below? In particular, these seem to be worrysome:
>>>>>>
>>>>>>> setup_listener Permission denied
>>>>>> setup_listener Address already in use
>>>>> These failures are from rdma_cm_bind indicating the port is already
>>>>> bound to this IA address. How are you creating the service point?
>>>>> dat_psp_create or dat_psp_create_any? If it is psp_create_any then you
>>>>> will see some failures until it gets to a free port. That is normal.
>>>>> Just make sure your create call returns DAT_SUCCESS.
>>>>>
>>>> Arlin, why doesn't dapl_psp_create_any() just pass a port of zero down
>>>> and let the rdma-cma pick an available port number?
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> general mailing list
>>>> general_at_[hidden]
>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>>>
>>>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>>>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>