Subject: Re: [OMPI users] Valgrind writev() errors with 1.3.2.
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-06-09 18:31:49


Sounds like the better solution to me! And far less work... ;-)

Perhaps if we post a valgrind suppression file on the OMPI web site,
and/or include it in our releases, we could help users avoid the
problems. We could update the file as more areas are identified so we
eventually have a really good suppression file for people to use!

Make sense?

On Jun 9, 2009, at 1:22 PM, George Bosilca wrote:

> It is not as simple as it sound. The problem is not coming from the
> OOB, it just surface there. The header we add on the wire is well
> aligned and completely initialized. The problem is coming from the
> buffer that the OOB TCP is asked to send, buffer which is only
> partially initialized. This buffer is not something that the OOB can
> set, so the proposed approach will not work. Unfortunately, in order
> to completely remove these false positives, all layer using OOB
> would have to be scanned in order to make sure that they avoid
> sending uninitialized data. This is way too much work, for a so
> little benefit.
>
> As the user level is not supposed to use the OOB to send data, all
> calls going from orte_rml_oob_send can be safely ignored by
> valgrind. I'll advocate the usage of the following suppression rule
> with valgrind. This will save a lot of output for the user, and save
> us (ompi developers) a lot of time!
>
> {
> ORTE OOB suppression rule
> Memcheck:Param
> writev(vector[...])
> fun:writev
> ...
> fun:orte_rml_oob_send
> ...
> fun:main
> }
>
> george.
>
> On Jun 9, 2009, at 11:01 , Ralph Castain wrote:
>
>> I can't speak to all of the OMPI code, but I can certainly create a
>> new configure option --valgrind-friendly that would initialize the
>> OOB comm buffers and other RTE-related memory to eliminate such
>> warnings.
>>
>> I would prefer to configure it out rather than adding a bunch of
>> "if-then" checks for envars to avoid having the performance hit
>> when not needed.
>>
>> Would that help?
>>
>> On Tue, Jun 9, 2009 at 11:40 AM, tom fogal <tfogal_at_[hidden]>
>> wrote:
>> jody <jody.xha_at_[hidden]> writes:
>> > I made a suppression file for the irrelevant memory leaks of
>> ompi: I
>> > make no claim that it catches all possible ones, but it catches all
>> > that appear in my code.
>> [snip]
>>
>> Thanks, Jody.
>>
>> What are the chances something like this could be added /
>> maintained in
>> the OpenMPI tree? It would be great to have something 1)
>> maintained by
>> someone more knowledgeable about these errors than me, and 2)
>> installed
>> by default when I setup my toolchain for parallel debugging.
>>
>> > On Tue, Jun 9, 2009 at 3:28 PM, Jeff Squyres<jsquyres_at_[hidden]>
>> wrote:
>> > > This is worth adding to the FAQ.
>> > >
>> > > On Jun 9, 2009, at 2:31 AM, Ashley Pittman wrote:
>> > >
>> > >> On Mon, 2009-06-08 at 23:41 -0600, tom fogal wrote:
>> > >> > George Bosilca <bosilca_at_[hidden]> writes:
>> > >> > > There is a whole page on valgrind web page about this
>> topic. Please
>> > >> > > read
>> > >> > > http://valgrind.org/docs/manual/manual-core.html#manual-core.suppress
>> > >> > > for more information.
>> > >> >
>> > >> > Even better, Ralph (et al.) is if we could just make
>> valgrind think
>> > >> > this is defined memory. One can do this with client requests:
>> > >> >
>> > >> > http://valgrind.org/docs/manual/mc-manual.html#mc-manual.clientreqs
>> > >>
>> > >> Using the Valgrind client requests unnecessarily is a very bad
>> idea,
>> > >> they are intended for where applications use their own memory
>> allocator
>> > >> (i.e. replace malloc/free) or are using custom kernel modules or
>> > >> hardware which Valgrind doesn't know about.
>>
>> Okay, sure, I realize it was a bit of an abuse of the intended use of
>> the tool.
>>
>> > >> The correct solution is either to not send un-initialised memory
>> > >> or to suppress the error using a suppression file as George
>> > >> said. As the error is from MPI_Init() you can safely ignore it
>> > >> from a end-user perspective.
>>
>> As I mentioned in my initial message, MPI_Init is only one such
>> error; I get them in a lot of MPI calls, seemingly anything that does
>> communication. Though I've heard differently on this list, this
>> led me
>> to believe I was doing something wrong in my code.
>>
>> It seems like the only way I could verify that I'm not causing these
>> errors myself is to grok the call stacks I'm given for each vg error
>> and figure out where the uninitialized memory comes from, and then
>> make
>> a judgement call for myself whether this makes sense to suppress. Or
>> I could mail the list about every error I see and ask for
>> confirmation
>> that it's benign/suppressable. Most likely, I'll take the simple
>> approach and just use the suppression file I was given, but that's
>> prone to be fragile and break with a future OpenMPI release.
>>
>> What about an environment variable which enables slower,
>> valgrind-friendly behavior? There's precedent in other libraries,
>> e.g.
>> glib [1].
>>
>> -tom
>>
>> [1] http://library.gnome.org/devel/glib/stable/glib-running.html
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users