Subject: Re: [OMPI users] Valgrind writev() errors with 1.3.2.
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-06-12 17:21:48


Yes, makes sense. I opened a ticket a few days ago that pointed to
the beginning of this thread. It's just a matter of someone actually
going to do it. Might be useful to do both: maintain a suppression
file in the distribution and put something in the FAQ.

On Jun 9, 2009, at 6:31 PM, Ralph Castain wrote:

> Sounds like the better solution to me! And far less work... ;-)
>
> Perhaps if we post a valgrind suppression file on the OMPI web site,
> and/or include it in our releases, we could help users avoid the
> problems. We could update the file as more areas are identified so we
> eventually have a really good suppression file for people to use!
>
> Make sense?
>
> On Jun 9, 2009, at 1:22 PM, George Bosilca wrote:
>
> > It is not as simple as it sound. The problem is not coming from the
> > OOB, it just surface there. The header we add on the wire is well
> > aligned and completely initialized. The problem is coming from the
> > buffer that the OOB TCP is asked to send, buffer which is only
> > partially initialized. This buffer is not something that the OOB can
> > set, so the proposed approach will not work. Unfortunately, in order
> > to completely remove these false positives, all layer using OOB
> > would have to be scanned in order to make sure that they avoid
> > sending uninitialized data. This is way too much work, for a so
> > little benefit.
> >
> > As the user level is not supposed to use the OOB to send data, all
> > calls going from orte_rml_oob_send can be safely ignored by
> > valgrind. I'll advocate the usage of the following suppression rule
> > with valgrind. This will save a lot of output for the user, and save
> > us (ompi developers) a lot of time!
> >
> > {
> > ORTE OOB suppression rule
> > Memcheck:Param
> > writev(vector[...])
> > fun:writev
> > ...
> > fun:orte_rml_oob_send
> > ...
> > fun:main
> > }
> >
> > george.
> >
> > On Jun 9, 2009, at 11:01 , Ralph Castain wrote:
> >
> >> I can't speak to all of the OMPI code, but I can certainly create a
> >> new configure option --valgrind-friendly that would initialize the
> >> OOB comm buffers and other RTE-related memory to eliminate such
> >> warnings.
> >>
> >> I would prefer to configure it out rather than adding a bunch of
> >> "if-then" checks for envars to avoid having the performance hit
> >> when not needed.
> >>
> >> Would that help?
> >>
> >> On Tue, Jun 9, 2009 at 11:40 AM, tom fogal <tfogal_at_[hidden]>
> >> wrote:
> >> jody <jody.xha_at_[hidden]> writes:
> >> > I made a suppression file for the irrelevant memory leaks of
> >> ompi: I
> >> > make no claim that it catches all possible ones, but it catches
> all
> >> > that appear in my code.
> >> [snip]
> >>
> >> Thanks, Jody.
> >>
> >> What are the chances something like this could be added /
> >> maintained in
> >> the OpenMPI tree? It would be great to have something 1)
> >> maintained by
> >> someone more knowledgeable about these errors than me, and 2)
> >> installed
> >> by default when I setup my toolchain for parallel debugging.
> >>
> >> > On Tue, Jun 9, 2009 at 3:28 PM, Jeff Squyres<jsquyres_at_[hidden]>
> >> wrote:
> >> > > This is worth adding to the FAQ.
> >> > >
> >> > > On Jun 9, 2009, at 2:31 AM, Ashley Pittman wrote:
> >> > >
> >> > >> On Mon, 2009-06-08 at 23:41 -0600, tom fogal wrote:
> >> > >> > George Bosilca <bosilca_at_[hidden]> writes:
> >> > >> > > There is a whole page on valgrind web page about this
> >> topic. Please
> >> > >> > > read
> >> > >> > > http://valgrind.org/docs/manual/manual-core.html#manual-core.suppress
> >> > >> > > for more information.
> >> > >> >
> >> > >> > Even better, Ralph (et al.) is if we could just make
> >> valgrind think
> >> > >> > this is defined memory. One can do this with client
> requests:
> >> > >> >
> >> > >> > http://valgrind.org/docs/manual/mc-manual.html#mc-manual.clientreqs
> >> > >>
> >> > >> Using the Valgrind client requests unnecessarily is a very bad
> >> idea,
> >> > >> they are intended for where applications use their own memory
> >> allocator
> >> > >> (i.e. replace malloc/free) or are using custom kernel
> modules or
> >> > >> hardware which Valgrind doesn't know about.
> >>
> >> Okay, sure, I realize it was a bit of an abuse of the intended
> use of
> >> the tool.
> >>
> >> > >> The correct solution is either to not send un-initialised
> memory
> >> > >> or to suppress the error using a suppression file as George
> >> > >> said. As the error is from MPI_Init() you can safely ignore
> it
> >> > >> from a end-user perspective.
> >>
> >> As I mentioned in my initial message, MPI_Init is only one such
> >> error; I get them in a lot of MPI calls, seemingly anything that
> does
> >> communication. Though I've heard differently on this list, this
> >> led me
> >> to believe I was doing something wrong in my code.
> >>
> >> It seems like the only way I could verify that I'm not causing
> these
> >> errors myself is to grok the call stacks I'm given for each vg
> error
> >> and figure out where the uninitialized memory comes from, and then
> >> make
> >> a judgement call for myself whether this makes sense to
> suppress. Or
> >> I could mail the list about every error I see and ask for
> >> confirmation
> >> that it's benign/suppressable. Most likely, I'll take the simple
> >> approach and just use the suppression file I was given, but that's
> >> prone to be fragile and break with a future OpenMPI release.
> >>
> >> What about an environment variable which enables slower,
> >> valgrind-friendly behavior? There's precedent in other libraries,
> >> e.g.
> >> glib [1].
> >>
> >> -tom
> >>
> >> [1] http://library.gnome.org/devel/glib/stable/glib-running.html
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Jeff Squyres
Cisco Systems