Subject: Re: [OMPI users] Valgrind writev() errors with 1.3.2.
From: tom fogal (tfogal_at_[hidden])
Date: 2009-06-09 01:41:16


George Bosilca <bosilca_at_[hidden]> writes:
> There is a whole page on valgrind web page about this topic. Please
> read http://valgrind.org/docs/manual/manual-core.html#manual-core.suppress
> for more information.

Even better, Ralph (et al.) is if we could just make valgrind think
this is defined memory. One can do this with client requests:

  http://valgrind.org/docs/manual/mc-manual.html#mc-manual.clientreqs

in particular, the VALGRIND_MAKE_MEM_DEFINED. This would prevent vg
from warning about it, without having to memset the whole buffer or
similar.

Is requesting that be done here enough? Or shall I open a ticket?

Thanks,

-tom

> On Jun 8, 2009, at 15:24 , Ralph Castain wrote:
>
> > We deliberately choose to not initialize our msg buffers as this
> > takes considerable time. Instead, we fill in only the portion
> > required by a given message, and then send only that much of the
> > buffer. Thus, the uninitialized portion is ignored.
> >
> > I don't know of a way to tell valgrind to ignore it, I'm afraid -
> > perhaps a valgrind guru can be of help. :-/
> >
> > Ralph
> >
> >
> > On Mon, Jun 8, 2009 at 1:09 PM, tom fogal <tfogal_at_[hidden]>
> > wrote:
> > Hi all,
> >
> > I've configured a source build of OpenMPI 1.3.2 with valgrind enabled
> > [1], and I'm seeing a lot of errors with writev() when I run this
> > under
> > valgrind. For example, with the following `hello, world' program:
> >
> > #include <stdio.h>
> > #include <mpi.h>
> >
> > int main(int argc, char *argv[]) {
> > MPI_Init(&argc, &argv);
> >
> > puts("Hello, world!");
> > MPI_Finalize();
> > return 0;
> > }
> >
> > I see errors like the following:
> >
> > ==12342== Syscall param writev(vector[...]) points to uninitialised
> > byte(s)
> > ==12342== at 0x61DF733: writev (in /lib/libc-2.7.so)
> > ==12342== by 0x7889AB9: mca_oob_tcp_msg_send_handler
> > (oob_tcp_msg.c:265)
> > ==12342== by 0x788B1A0: mca_oob_tcp_peer_send (oob_tcp_peer.c:197)
> > ==12342== by 0x788FF2A: mca_oob_tcp_send_nb (oob_tcp_send.c:167)
> > ==12342== by 0x767C7EC: orte_rml_oob_send (rml_oob_send.c:137)
> > ==12342== by 0x767D19A: orte_rml_oob_send_buffer (rml_oob_send.c:
> > 269)
> > ==12342== by 0x7C9F3DF: allgather (grpcomm_bad_module.c:369)
> > ==12342== by 0x7C9FD9E: modex (grpcomm_bad_module.c:497)
> > ==12342== by 0x4E6DCAF: ompi_mpi_init (ompi_mpi_init.c:626)
> >
> > The full vg log is appended [2]. Of course, I could just suppress
> > this error, but I get this for a lot (every?) MPI call which does
> > communication, it seems (broadcasts, sends, recv's, allgathers, etc.).
> > I'm worried a suppression would suppress too much / suppress an error
> > I've caused.
> >
> > Have others seen this? Can I suppress perhaps from the
> > orte_rml_oob_send_buffer down (safely)?
> >
> > -tom
> >
> > [1] configured via: gnu_pkg \
> > --enable-debug \
> > --enable-memchecker \
> > --disable-mpi-f77 \
> > --enable-pretty-print-stacktrace \
> > --enable-cxx-exceptions \
> > --enable-mpi-threads \
> > --with-valgrind=${PREFIX} \
> > --without-gm \
> > --without-mx \
> > --without-openib \
> > --without-psm \
> > --with-pic \
> > --with-gnu-ld
> > where gnu_pkg is basically a function which calls configure with
> > --prefix=${PREFIX}.
> >
> > [2]
> > ==12342== Memcheck, a memory error detector.
> > ==12342== Copyright (C) 2002-2008, and GNU GPL'd, by Julian Seward
> > et al.
> > ==12342== Using LibVEX rev 1884, a library for dynamic binary
> > translation.
> > ==12342== Copyright (C) 2004-2008, and GNU GPL'd, by OpenWorks LLP.
> > ==12342== Using valgrind-3.4.1, a dynamic binary instrumentation
> > framework.
> > ==12342== Copyright (C) 2000-2008, and GNU GPL'd, by Julian Seward
> > et al.
> > ==12342== For more details, rerun with: -v
> > ==12342==
> > ==12342== My PID = 12342, parent PID = 12341. Prog and args are:
> > ==12342== ./a.out
> > ==12342==
> > ==12342== Warning: client syscall munmap tried to modify addresses
> > 0xffffffffffffffff-0xffe
> > ==12342== Syscall param writev(vector[...]) points to uninitialised
> > byte(s)
> > ==12342== at 0x61DF733: writev (in /lib/libc-2.7.so)
> > ==12342== by 0x7889AB9: mca_oob_tcp_msg_send_handler
> > (oob_tcp_msg.c:265)
> > ==12342== by 0x788B1A0: mca_oob_tcp_peer_send (oob_tcp_peer.c:197)
> > ==12342== by 0x788FF2A: mca_oob_tcp_send_nb (oob_tcp_send.c:167)
> > ==12342== by 0x767C7EC: orte_rml_oob_send (rml_oob_send.c:137)
> > ==12342== by 0x767D19A: orte_rml_oob_send_buffer (rml_oob_send.c:
> > 269)
> > ==12342== by 0x7C9F3DF: allgather (grpcomm_bad_module.c:369)
> > ==12342== by 0x7C9FD9E: modex (grpcomm_bad_module.c:497)
> > ==12342== by 0x4E6DCAF: ompi_mpi_init (ompi_mpi_init.c:626)
> > ==12342== by 0x4EAAC88: PMPI_Init (pinit.c:80)
> > ==12342== by 0x400857: main (hello.c:5)
> > ==12342== Address 0x677697b is 107 bytes inside a block of size 256
> > alloc'd
> > ==12342== at 0x4C22A51: realloc (vg_replace_malloc.c:429)
> > ==12342== by 0x53DCBE0: opal_dss_buffer_extend
> > (dss_internal_functions.c:63)
> > ==12342== by 0x53DE4BA: opal_dss_copy_payload (dss_load_unload.c:
> > 164)
> > ==12342== by 0x7C9F314: allgather (grpcomm_bad_module.c:363)
> > ==12342== by 0x7C9FD9E: modex (grpcomm_bad_module.c:497)
> > ==12342== by 0x4E6DCAF: ompi_mpi_init (ompi_mpi_init.c:626)
> > ==12342== by 0x4EAAC88: PMPI_Init (pinit.c:80)
> > ==12342== by 0x400857: main (hello.c:5)
> > ==12342== Uninitialised value was created by a stack allocation
> > ==12342== at 0x53FFA60: opal_ifinit (if.c:147)
> > {
> > <insert a suppression name here>
> > Memcheck:Param
> > writev(vector[...])
> > fun:writev
> > fun:mca_oob_tcp_msg_send_handler
> > fun:mca_oob_tcp_peer_send
> > fun:mca_oob_tcp_send_nb
> > fun:orte_rml_oob_send
> > fun:orte_rml_oob_send_buffer
> > fun:allgather
> > fun:modex
> > fun:ompi_mpi_init
> > fun:PMPI_Init
> > fun:main
> > }
> > ==12342==
> > ==12342== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 307
> > from 3)
> > ==12342== malloc/free: in use at exit: 204,012 bytes in 2,022 blocks.
> > ==12342== malloc/free: 10,382 allocs, 8,360 frees, 14,603,162 bytes
> > allocated.
> > ==12342== For a detailed leak analysis, rerun with: --leak-check=yes
> > ==12342== For counts of detected errors, rerun with: -v
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users