$subject_val = "Re: [OMPI users] Problem with OpenMPI (MX btl and mtl) and threads"; include("../../include/msg-header.inc"); ?>
Subject: Re: [OMPI users] Problem with OpenMPI (MX btl and mtl) and threads
From: Brian Barrett (brbarret_at_[hidden])
Date: 2009-06-11 14:21:01
Almost assuredly, the MTL is not thread safe, and such support is
unlikely to happen in the short term. You might be better off
concentrating on the BTL, as George has done significant work on that
front.
Brian
On Jun 11, 2009, at 12:20 PM, François Trahay wrote:
> The stack trace is from the MX MTL (I attach the backtraces I get
> with both MX MTL and MX BTL)
> Here is the program that I use. It is quite simple. It runs ping
> pongs concurrently (with one thread per node, then with two threads
> per node, etc.)
> The error occurs when two threads run concurrently.
>
> Francois
>
> Scott Atchley wrote:
>> Brian and George,
>>
>> I do not know if the stack trace is complete, but I do not see any
>> mx_* functions called which would indicate a crash inside MX due to
>> multiple threads trying to complete the same request. It does show
>> an assert failed.
>>
>> Francois, is the stack trace from the MX MTL or BTL? Can you send a
>> small program that reproduces this abort?
>>
>> Scott
>>
>>
>> On Jun 11, 2009, at 12:25 PM, Brian Barrett wrote:
>>
>>> Neither the CM PML or the MX MTL has been looked at for thread
>>> safety. There's not much code to cause problems in the CM PML.
>>> The MX MTL would likely need some work to ensure the restrictions
>>> Scott mentioned are met (currently, there's no such guarantee in
>>> the MX MTL).
>>>
>>> Brian
>>>
>>> On Jun 11, 2009, at 10:21 AM, George Bosilca wrote:
>>>
>>>> The comment on the FAQ (and on the other thread) is only true for
>>>> some BTLs (TCP, SM and MX). I don't have resources to test for
>>>> the others BTL, it is their developers responsibility to do the
>>>> required modifications to make them thread safe.
>>>>
>>>> In addition, I have to confess that I never tested the MTL for
>>>> thread safety. It is a completely different implementations for
>>>> the message passing, supposed to map directly on top of the
>>>> underlying network capabilities. However, there are clearly few
>>>> places where thread safety should be enforced in the MTL layer,
>>>> and I don't know if this is the case.
>>>>
>>>> george.
>>>>
>>>> On Jun 11, 2009, at 09:35 , Scott Atchley wrote:
>>>>
>>>>> Francois,
>>>>>
>>>>> For threads, the FAQ has:
>>>>>
>>>>> http://www.open-mpi.org/faq/?category=supported-systems#thread-support
>>>>>
>>>>> It mentions that thread support is designed in, but lightly
>>>>> tested. It is also possible that the FAQ is out of date and
>>>>> MPI_THREAD_MULTIPLE is fully supported.
>>>>>
>>>>> The stack trace below shows:
>>>>>
>>>>> opal_free()
>>>>> opal_progress()
>>>>> MPI_Recv()
>>>>>
>>>>> I do not know this code, but it may be in the higher level code
>>>>> that calls the BTLs and/or MTLs and it would be a place to see
>>>>> if that code handles the TCP BTL differently than MX BTL/MTL.
>>>>>
>>>>> MX is thread safe with the caveat that two threads may not try
>>>>> to complete the same request at the same time. This includes
>>>>> calling mx_test(), mx_wait(), mx_test_any() and/or mx_wait_any()
>>>>> where the latter two have match bits and match mask that could
>>>>> complete a request being tested/waited by another thread.
>>>>>
>>>>> Scott
>>>>>
>>>>> On Jun 11, 2009, at 6:00 AM, François Trahay wrote:
>>>>>
>>>>>> Well, according to George Bosilca (http://www.open-mpi.org/community/lists/users/2005/02/0005.php
>>>>>> ), threads are supported in OpenMPI.
>>>>>> The program I try to run works with the TCP stack and MX driver
>>>>>> is thread-safe, so i guess the problem comes from the MX BTL or
>>>>>> MTL.
>>>>>>
>>>>>> Francois
>>>>>>
>>>>>>
>>>>>> Scott Atchley wrote:
>>>>>>> Hi Francois,
>>>>>>>
>>>>>>> I am not familiar with the internals of the OMPI code. Are you
>>>>>>> sure, however, that threads are fully supported yet? I was
>>>>>>> under the impression that thread support was still partial.
>>>>>>>
>>>>>>> Can anyone else comment?
>>>>>>>
>>>>>>> Scott
>>>>>>>
>>>>>>> On Jun 8, 2009, at 8:43 AM, François Trahay wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>> I'm encountering some issues when running a multithreaded
>>>>>>>> program with
>>>>>>>> OpenMPI (trunk rev. 21380, configured with --enable-mpi-
>>>>>>>> threads)
>>>>>>>> My program (included in the tar.bz2) uses several pthreads
>>>>>>>> that perform
>>>>>>>> ping pongs concurrently (thread #1 uses tag #1, thread #2
>>>>>>>> uses tag #2, etc.)
>>>>>>>> This program crashes over MX (either btl or mtl) with the
>>>>>>>> following
>>>>>>>> backtrace:
>>>>>>>>
>>>>>>>> concurrent_ping_v2: pml_cm_recvreq.c:53:
>>>>>>>> mca_pml_cm_recv_request_completion: Assertion `0 ==
>>>>>>>> ((mca_pml_cm_thin_recv_request_t*)base_request)-
>>>>>>>> >req_base.req_pml_complete'
>>>>>>>> failed.
>>>>>>>> [joe0:01709] *** Process received signal ***
>>>>>>>> [joe0:01709] *** Process received signal ***
>>>>>>>> [joe0:01709] Signal: Segmentation fault (11)
>>>>>>>> [joe0:01709] Signal code: Address not mapped (1)
>>>>>>>> [joe0:01709] Failing at address: 0x1238949c4
>>>>>>>> [joe0:01709] Signal: Aborted (6)
>>>>>>>> [joe0:01709] Signal code: (-6)
>>>>>>>> [joe0:01709] [ 0] /lib/libpthread.so.0 [0x7f57240be7b0]
>>>>>>>> [joe0:01709] [ 1] /lib/libc.so.6(gsignal+0x35) [0x7f5722cba065]
>>>>>>>> [joe0:01709] [ 2] /lib/libc.so.6(abort+0x183) [0x7f5722cbd153]
>>>>>>>> [joe0:01709] [ 3] /lib/libc.so.6(__assert_fail+0xe9)
>>>>>>>> [0x7f5722cb3159]
>>>>>>>> [joe0:01709] [ 0] /lib/libpthread.so.0 [0x7f57240be7b0]
>>>>>>>> [joe0:01709] [ 1]
>>>>>>>> /home/ftrahay/sources/openmpi/trunk/install//lib/libopen-
>>>>>>>> pal.so.0
>>>>>>>> [0x7f57238d0a08]
>>>>>>>> [joe0:01709] [ 2]
>>>>>>>> /home/ftrahay/sources/openmpi/trunk/install//lib/libopen-
>>>>>>>> pal.so.0
>>>>>>>> [0x7f57238cf8cc]
>>>>>>>> [joe0:01709] [ 3]
>>>>>>>> /home/ftrahay/sources/openmpi/trunk/install//lib/libopen-
>>>>>>>> pal.so.0(opal_free+0x4e)
>>>>>>>> [0x7f57238bdc69]
>>>>>>>> [joe0:01709] [ 4]
>>>>>>>> /home/ftrahay/sources/openmpi/trunk/install/lib/openmpi/
>>>>>>>> mca_mtl_mx.so
>>>>>>>> [0x7f572060b72f]
>>>>>>>> [joe0:01709] [ 5]
>>>>>>>> /home/ftrahay/sources/openmpi/trunk/install//lib/libopen-
>>>>>>>> pal.so.0(opal_progress+0xbc)
>>>>>>>> [0x7f57238948e0]
>>>>>>>> [joe0:01709] [ 6]
>>>>>>>> /home/ftrahay/sources/openmpi/trunk/install/lib/openmpi/
>>>>>>>> mca_pml_cm.so
>>>>>>>> [0x7f572081145a]
>>>>>>>> [joe0:01709] [ 7]
>>>>>>>> /home/ftrahay/sources/openmpi/trunk/install/lib/openmpi/
>>>>>>>> mca_pml_cm.so
>>>>>>>> [0x7f57208113b7]
>>>>>>>> [joe0:01709] [ 8]
>>>>>>>> /home/ftrahay/sources/openmpi/trunk/install/lib/openmpi/
>>>>>>>> mca_pml_cm.so
>>>>>>>> [0x7f57208112e7]
>>>>>>>> [joe0:01709] [ 9]
>>>>>>>> /home/ftrahay/sources/openmpi/trunk/install//lib/libmpi.so.
>>>>>>>> 0(MPI_Recv+0x2bc)
>>>>>>>> [0x7f5723e07690]
>>>>>>>> [joe0:01709] [10] ./concurrent_ping_v2(client+0x123) [0x401404]
>>>>>>>> [joe0:01709] [11] /lib/libpthread.so.0 [0x7f57240b6faa]
>>>>>>>> [joe0:01709] [12] /lib/libc.so.6(clone+0x6d) [0x7f5722d5629d]
>>>>>>>> [joe0:01709] *** End of error message ***
>>>>>>>> [joe0:01709] [ 4]
>>>>>>>> /home/ftrahay/sources/openmpi/trunk/install/lib/openmpi/
>>>>>>>> mca_pml_cm.so
>>>>>>>> [0x7f57208120bb]
>>>>>>>> [joe0:01709] [ 5]
>>>>>>>> /home/ftrahay/sources/openmpi/trunk/install/lib/openmpi/
>>>>>>>> mca_mtl_mx.so
>>>>>>>> [0x7f572060b80a]
>>>>>>>> [joe0:01709] [ 6]
>>>>>>>> /home/ftrahay/sources/openmpi/trunk/install//lib/libopen-
>>>>>>>> pal.so.0(opal_progress+0xbc)
>>>>>>>> [0x7f57238948e0]
>>>>>>>> [joe0:01709] [ 7]
>>>>>>>> /home/ftrahay/sources/openmpi/trunk/install/lib/openmpi/
>>>>>>>> mca_pml_cm.so
>>>>>>>> [0x7f572081147a]
>>>>>>>> [joe0:01709] [ 8]
>>>>>>>> /home/ftrahay/sources/openmpi/trunk/install/lib/openmpi/
>>>>>>>> mca_pml_cm.so
>>>>>>>> [0x7f57208113b7]
>>>>>>>> [joe0:01709] [ 9]
>>>>>>>> /home/ftrahay/sources/openmpi/trunk/install/lib/openmpi/
>>>>>>>> mca_pml_cm.so
>>>>>>>> [0x7f57208112e7]
>>>>>>>> [joe0:01709] [10]
>>>>>>>> /home/ftrahay/sources/openmpi/trunk/install//lib/libmpi.so.
>>>>>>>> 0(MPI_Recv+0x2bc)
>>>>>>>> [0x7f5723e07690]
>>>>>>>> [joe0:01709] [11] ./concurrent_ping_v2(client+0x123) [0x401404]
>>>>>>>> [joe0:01709] [12] /lib/libpthread.so.0 [0x7f57240b6faa]
>>>>>>>> [joe0:01709] [13] /lib/libc.so.6(clone+0x6d) [0x7f5722d5629d]
>>>>>>>> [joe0:01709] *** End of error message ***
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>> mpirun noticed that process rank 1 with PID 1709 on node joe0
>>>>>>>> exited on
>>>>>>>> signal 6 (Aborted).
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> Any idea ?
>>>>>>>>
>>>>>>>> Francois Trahay
>>>>>>>>
>>>>>>>> <bug-
>>>>>>>> report.tar.bz2>_______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> ftrahay_at_joe0 mpiexec --mca pml cm --mca btl mx -machinefile ./joe -
> np 2 ./concurrent_ping
> [1 communicating threads]
> thread #0
> [1 communicating threads]
> [0] 1 3.071749 0.326 0.310
> [0] 2 3.065395 0.652 0.622
> [0] 4 3.064346 1.305 1.245
> [0] 8 3.058898 2.615 2.494
> [2 communicating threads]
> thread #1
> [2 communicating threads]
> concurrent_ping: pml_cm_recvreq.c:53:
> mca_pml_cm_recv_request_completion: Assertion `0 ==
> ((mca_pml_cm_thin_recv_request_t*)base_request)-
> >req_base.req_pml_complete' failed.
> [joe0:16355] *** Process received signal ***
> [joe0:16355] Signal: Segmentation fault (11)
> [joe0:16355] Signal code: Address not mapped (1)
> [joe0:16355] Failing at address: 0x1b3f769c4
> [joe0:16355] *** Process received signal ***
> [joe0:16355] Signal: Aborted (6)
> [joe0:16355] Signal code: (-6)
> [joe0:16355] [ 0] /lib/libpthread.so.0 [0x7f64b34b07b0]
> [joe0:16355] [ 1] /lib/libc.so.6(gsignal+0x35) [0x7f64b3181065]
> [joe0:16355] [ 2] /lib/libc.so.6(abort+0x183) [0x7f64b3184153]
> [joe0:16355] [ 3] /lib/libc.so.6(__assert_fail+0xe9) [0x7f64b317a159]
> [joe0:16355] [ 0] /lib/libpthread.so.0 [0x7f64b34b07b0]
> [joe0:16355] [ 1] /home/ftrahay/sources/openmpi/trunk/install//lib/
> libopen-pal.so.0 [0x7f64b3fb2a08]
> [joe0:16355] [ 2] /home/ftrahay/sources/openmpi/trunk/install//lib/
> libopen-pal.so.0 [0x7f64b3fb18cc]
> [joe0:16355] [ 3] /home/ftrahay/sources/openmpi/trunk/install//lib/
> libopen-pal.so.0(opal_free+0x4e) [0x7f64b3f9fc69]
> [joe0:16355] [ 4] /home/ftrahay/sources/openmpi/trunk/install/lib/
> openmpi/mca_mtl_mx.so [0x7f64b0ad272f]
> [joe0:16355] [ 5] /home/ftrahay/sources/openmpi/trunk/install//lib/
> libopen-pal.so.0(opal_progress+0xbc) [0x7f64b3f768e0]
> [joe0:16355] [ 6] /home/ftrahay/sources/openmpi/trunk/install/lib/
> openmpi/mca_pml_cm.so [0x7f64b0cd845a]
> [joe0:16355] [ 7] /home/ftrahay/sources/openmpi/trunk/install/lib/
> openmpi/mca_pml_cm.so [0x7f64b0cd83b7]
> [joe0:16355] [ 8] /home/ftrahay/sources/openmpi/trunk/install/lib/
> openmpi/mca_pml_cm.so [0x7f64b0cd82e7]
> [joe0:16355] [ 9] /home/ftrahay/sources/openmpi/trunk/install//lib/
> libmpi.so.0(MPI_Recv+0x2bc) [0x7f64b44e9690]
> [joe0:16355] [10] ./concurrent_ping(client+0xf5) [0x401185]
> [joe0:16355] [11] /lib/libpthread.so.0 [0x7f64b34a8faa]
> [joe0:16355] [12] /lib/libc.so.6(clone+0x6d) [0x7f64b321d29d]
> [joe0:16355] *** End of error message ***
> [joe0:16355] [ 4] /home/ftrahay/sources/openmpi/trunk/install/lib/
> openmpi/mca_pml_cm.so [0x7f64b0cd90bb]
> [joe0:16355] [ 5] /home/ftrahay/sources/openmpi/trunk/install/lib/
> openmpi/mca_mtl_mx.so [0x7f64b0ad280a]
> [joe0:16355] [ 6] /home/ftrahay/sources/openmpi/trunk/install//lib/
> libopen-pal.so.0(opal_progress+0xbc) [0x7f64b3f768e0]
> [joe0:16355] [ 7] /home/ftrahay/sources/openmpi/trunk/install/lib/
> openmpi/mca_pml_cm.so [0x7f64b0cd847a]
> [joe0:16355] [ 8] /home/ftrahay/sources/openmpi/trunk/install/lib/
> openmpi/mca_pml_cm.so [0x7f64b0cd83b7]
> [joe0:16355] [ 9] /home/ftrahay/sources/openmpi/trunk/install/lib/
> openmpi/mca_pml_cm.so [0x7f64b0cd82e7]
> [joe0:16355] [10] /home/ftrahay/sources/openmpi/trunk/install//lib/
> libmpi.so.0(MPI_Recv+0x2bc) [0x7f64b44e9690]
> [joe0:16355] [11] ./concurrent_ping(client+0xf5) [0x401185]
> [joe0:16355] [12] /lib/libpthread.so.0 [0x7f64b34a8faa]
> [joe0:16355] [13] /lib/libc.so.6(clone+0x6d) [0x7f64b321d29d]
> [joe0:16355] *** End of error message ***
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 1 with PID 16355 on node joe0
> exited on signal 6 (Aborted).
> --------------------------------------------------------------------------
> ftrahay_at_joe0$
> ftrahay_at_joe0$ mpiexec --mca pml cm --mca mtl mx -machinefile ./joe -
> np 2 ./concurrent_ping
> [1 communicating threads]
> thread #0
> [1 communicating threads]
> [0] 1 3.066409 0.326 0.311
> [0] 2 3.054011 0.655 0.625
> [0] 4 3.055394 1.309 1.249
> [0] 8 3.057003 2.617 2.496
> [2 communicating threads]
> thread #1
> [2 communicating threads]
> unknown request type 4
> concurrent_ping: pml_cm_recvreq.c:53:
> mca_pml_cm_recv_request_completion: Assertion `0 ==
> ((mca_pml_cm_thin_recv_request_t*)base_request)-
> >req_base.req_pml_complete' failed.
> [joe0:16337] *** Process received signal ***
> [joe0:16337] Signal: Aborted (6)
> [joe0:16337] Signal code: (-6)
> [joe0:16337] [ 0] /lib/libpthread.so.0 [0x7f5ed8efc7b0]
> [joe0:16337] [ 1] /lib/libc.so.6(gsignal+0x35) [0x7f5ed8bcd065]
> [joe0:16337] [ 2] /lib/libc.so.6(abort+0x183) [0x7f5ed8bd0153]
> [joe0:16337] [ 3] /lib/libc.so.6(__assert_fail+0xe9) [0x7f5ed8bc6159]
> [joe0:16337] [ 4] /home/ftrahay/sources/openmpi/trunk/install/lib/
> openmpi/mca_pml_cm.so [0x7f5ed67250bb]
> [joe0:16337] [ 5] /home/ftrahay/sources/openmpi/trunk/install/lib/
> openmpi/mca_mtl_mx.so [0x7f5ed651e80a]
> [joe0:16337] [ 6] /home/ftrahay/sources/openmpi/trunk/install//lib/
> libopen-pal.so.0(opal_progress+0xbc) [0x7f5ed99c28e0]
> [joe0:16337] [ 7] /home/ftrahay/sources/openmpi/trunk/install/lib/
> openmpi/mca_pml_cm.so [0x7f5ed672447a]
> [joe0:16337] [ 8] /home/ftrahay/sources/openmpi/trunk/install/lib/
> openmpi/mca_pml_cm.so [0x7f5ed67243b7]
> [joe0:16337] [ 9] /home/ftrahay/sources/openmpi/trunk/install/lib/
> openmpi/mca_pml_cm.so [0x7f5ed67242e7]
> [joe0:16337] [10] /home/ftrahay/sources/openmpi/trunk/install//lib/
> libmpi.so.0(MPI_Recv+0x2bc) [0x7f5ed9f35690]
> [joe0:16337] [11] ./concurrent_ping(client+0xf5) [0x401185]
> [joe0:16337] [12] /lib/libpthread.so.0 [0x7f5ed8ef4faa]
> [joe0:16337] [13] /lib/libc.so.6(clone+0x6d) [0x7f5ed8c6929d]
> [joe0:16337] *** End of error message ***
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 1 with PID 16337 on node joe0
> exited on signal 6 (Aborted).
> --------------------------------------------------------------------------
> ftrahay_at_joe0:$
> /*
> * NewMadeleine
> * Copyright (C) 2006 (see AUTHORS file)
> *
> * This program is free software; you can redistribute it and/or modify
> * it under the terms of the GNU General Public License as published by
> * the Free Software Foundation; either version 2 of the License, or
> (at
> * your option) any later version.
> *
> * This program is distributed in the hope that it will be useful, but
> * WITHOUT ANY WARRANTY; without even the implied warranty of
> * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> * General Public License for more details.
> */
> #include <stdlib.h>
> #include <stdio.h>
> #include <stdint.h>
> #include <string.h>
> #include <unistd.h>
> #include "mpi.h"
>
> #include <semaphore.h>
> #include <sched.h>
>
> /* This program performs several ping pong in parallel.
> * This evaluates the efficienty to access nmad from 1, 2, 3, ...n
> threads simultanously
> */
>
> #define LEN_DEFAULT 4
> #define WARMUPS_DEFAULT 1000
> #define LOOPS_DEFAULT 10000
> #define THREADS_DEFAULT 16
> #define DATA_CONTROL_ACTIVATED 0
>
> static int comm_rank = -1;
> static int comm_size = -1;
> static char host_name[1024] = "";
>
> static int max_len = 16;
> static int loops;
> static int threads;
> static int warmups;
>
> static sem_t ready_sem;
>
> static int go;
>
> static __inline__
> uint32_t _next(uint32_t len, uint32_t multiplier, uint32_t increment)
> {
> if (!len)
> return 1+increment;
> else
> return len*multiplier+increment;
> }
>
> void usage_ping() {
> fprintf(stderr, "-L len - packet length [%d]\n", LEN_DEFAULT);
> fprintf(stderr, "-N iterations - iterations [%d]\n", LOOPS_DEFAULT);
> fprintf(stderr, "-T thread - number of communicating threads [%d]
> \n", THREADS_DEFAULT);
> fprintf(stderr, "-W warmup - number of warmup iterations [%d]\n",
> WARMUPS_DEFAULT);
> }
>
> static void fill_buffer(char *buffer, int len) {
> unsigned int i = 0;
>
> for (i = 0; i < len; i++) {
> buffer[i] = 'a'+(i%26);
> }
> }
>
> static void clear_buffer(char *buffer, int len) {
> memset(buffer, 0, len);
> }
>
> #if DATA_CONTROL_ACTIVATED
> static void control_buffer(char *msg, char *buffer, int len) {
> tbx_bool_t ok = tbx_true;
> unsigned char expected_char;
> unsigned int i = 0;
>
> for(i = 0; i < len; i++){
> expected_char = 'a'+(i%26);
>
> if(buffer[i] != expected_char){
> printf("Bad data at byte %d: expected %c, received %c\n",
> i, expected_char, buffer[i]);
> ok = tbx_false;
> }
> }
>
>
> if (!ok) {
> printf("Controling %s - ", msg);
> printf("%d bytes reception failed\n", len);
>
> TBX_FAILURE("data corruption");
> } else {
> printf("ok\n");
> }
> }
> #endif
>
>
> void
> server(void* arg) {
> int my_pos = (uint8_t)arg;
> char *buf = NULL;
> uint8_t tag = (uint8_t)arg;
> int i, k;
> int len;
>
> buf = malloc(max_len);
> clear_buffer(buf, max_len);
> for(i = my_pos; i <= threads; i++) {
> /* Be sure all the communicating threads have been created before
> we start */
> while(go < i )
> sched_yield();
>
> for(len=1; len < max_len; len*=2){
> for(k = 0; k < loops + warmups; k++) {
>
> MPI_Request request;
>
> MPI_Recv(buf, len, MPI_CHAR, (comm_rank+1)%2, tag,
> MPI_COMM_WORLD, MPI_STATUS_IGNORE);
>
> #if DATA_CONTROL_ACTIVATED
> control_buffer("received", buf, len);
> #endif
> MPI_Send(buf, len , MPI_CHAR, (comm_rank+1)%2, tag,
> MPI_COMM_WORLD);
>
> }
> }
>
> sem_post(&ready_sem);
> }
> }
>
> int
> client(void *arg) {
> int my_pos = (uint8_t)arg;
> uint8_t tag = (uint8_t)my_pos;
> char *buf = NULL;
> double t1, t2;
> double sum, lat, bw_million_byte, bw_mbyte;
> int i, k;
> int len;
>
> fprintf(stderr, "thread #%d\n", my_pos);
> buf = malloc(max_len);
> clear_buffer(buf, max_len);
>
> fill_buffer(buf, len);
> for(i = my_pos; i <= threads; i++) {
> /* Be sure all the communicating threads have been created before
> we start */
> while(go < i )
> sched_yield();
>
> for(len=1; len < max_len; len*=2){
> for(k = 0; k < warmups; k++) {
> MPI_Request request;
> #if DATA_CONTROL_ACTIVATED
> control_buffer("sending", buf, len);
> #endif
> MPI_Send(buf, len, MPI_CHAR, (comm_rank+1)%2, tag,
> MPI_COMM_WORLD);
>
> MPI_Recv(buf, len, MPI_CHAR, (comm_rank+1)%2, tag,
> MPI_COMM_WORLD, MPI_STATUS_IGNORE);
> #if DATA_CONTROL_ACTIVATED
> control_buffer("received", buf, len);
> #endif
> }
>
> t1= MPI_Wtime();
>
> for(k = 0; k < loops; k++) {
> MPI_Request request;
> #if DATA_CONTROL_ACTIVATED
> control_buffer("sending", buf, len);
> #endif
> MPI_Send(buf, len, MPI_CHAR, (comm_rank+1)%2, tag,
> MPI_COMM_WORLD);
> MPI_Recv(buf, len, MPI_CHAR, (comm_rank+1)%2, tag,
> MPI_COMM_WORLD, MPI_STATUS_IGNORE);
> #if DATA_CONTROL_ACTIVATED
> control_buffer("received", buf, len);
> #endif
> }
>
> t2 = MPI_Wtime();
>
> sum = (t2 - t1)*1e6;
>
> lat = sum / (2 * loops);
> bw_million_byte = len * (loops / (sum / 2));
> bw_mbyte = bw_million_byte / 1.048576;
>
> printf("[%d]\t%d\t%lf\t%8.3f\t%8.3f\n", my_pos, len, lat,
> bw_million_byte, bw_mbyte);
> fflush(stdout);
> }
>
> sem_post(&ready_sem);
> }
> }
> int
> main(int argc,
> char **argv) {
> int i, j;
> pthread_t * pid;
> static sem_t bourrin_ready;
> pthread_attr_t attr;
>
> //len = LEN_DEFAULT;
> loops = LOOPS_DEFAULT;
> threads = THREADS_DEFAULT;
> warmups = WARMUPS_DEFAULT;
>
> int provided;
> int needed = MPI_THREAD_MULTIPLE;
> MPI_Init_thread(&argc, &argv, needed, &provided);
> if(provided < needed){
> fprintf(stderr, "needed: %d, provided: %d\n", needed, provided);
> exit(0);
> }
> MPI_Comm_size(MPI_COMM_WORLD, &comm_size);
> MPI_Comm_rank(MPI_COMM_WORLD, &comm_rank);
>
>
> if (argc > 1 && !strcmp(argv[1], "--help")) {
> usage_ping();
> exit(0);
> }
>
> for(i=1 ; i<argc ; i+=2) {
> if (!strcmp(argv[i], "-N")) {
> loops = atoi(argv[i+1]);
> }
> else if (!strcmp(argv[i], "-L")) {
> //len = atoi(argv[i+1]);
> }
> else if (!strcmp(argv[i], "-T")) {
> threads = atoi(argv[i+1]);
> }
> else if (!strcmp(argv[i], "-W")) {
> warmups = atoi(argv[i+1]);
> }
> else {
> fprintf(stderr, "Illegal argument %s\n", argv[i]);
> usage_ping();
> exit(0);
> }
> }
>
> pthread_attr_init(&attr);
> pid = malloc(sizeof(pthread_t) * threads);
> sem_init(&ready_sem, 0, 0);
>
> go = 0;
> for (i = 0 ; i< threads ; i++) {
> printf("[%d communicating threads]\n", i+1);
> if (comm_rank == 0) {
> pthread_create(&pid[i], &attr, (void*)server, (uint8_t)i);
> } else {
> pthread_create(&pid[i], &attr, (void*)client, (uint8_t)i);
> }
>
> for( j = 0; j <= i; j++){
> sem_wait(&ready_sem);
> go=j;
> }
> go++;
> }
>
> for(i=0;i<threads;i++)
> pthread_join(pid[i],NULL);
>
> MPI_Finalize();
> exit(0);
> }
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users