Subject: Re: [OMPI users] Valgrind writev() errors with 1.3.2.
From: tom fogal (tfogal_at_[hidden])
Date: 2009-06-09 13:40:06


jody <jody.xha_at_[hidden]> writes:
> I made a suppression file for the irrelevant memory leaks of ompi: I
> make no claim that it catches all possible ones, but it catches all
> that appear in my code.
[snip]

Thanks, Jody.

What are the chances something like this could be added / maintained in
the OpenMPI tree? It would be great to have something 1) maintained by
someone more knowledgeable about these errors than me, and 2) installed
by default when I setup my toolchain for parallel debugging.

> On Tue, Jun 9, 2009 at 3:28 PM, Jeff Squyres<jsquyres_at_[hidden]> wrote:
> > This is worth adding to the FAQ.
> >
> > On Jun 9, 2009, at 2:31 AM, Ashley Pittman wrote:
> >
> >> On Mon, 2009-06-08 at 23:41 -0600, tom fogal wrote:
> >> > George Bosilca <bosilca_at_[hidden]> writes:
> >> > > There is a whole page on valgrind web page about this topic. Please
> >> > > read
> >> > > http://valgrind.org/docs/manual/manual-core.html#manual-core.suppress
> >> > >   for more information.
> >> >
> >> > Even better, Ralph (et al.) is if we could just make valgrind think
> >> > this is defined memory.  One can do this with client requests:
> >> >
> >> >   http://valgrind.org/docs/manual/mc-manual.html#mc-manual.clientreqs
> >>
> >> Using the Valgrind client requests unnecessarily is a very bad idea,
> >> they are intended for where applications use their own memory allocator
> >> (i.e. replace malloc/free) or are using custom kernel modules or
> >> hardware which Valgrind doesn't know about.

Okay, sure, I realize it was a bit of an abuse of the intended use of
the tool.

> >> The correct solution is either to not send un-initialised memory
> >> or to suppress the error using a suppression file as George
> >> said.  As the error is from MPI_Init() you can safely ignore it
> >> from a end-user perspective.

As I mentioned in my initial message, MPI_Init is only one such
error; I get them in a lot of MPI calls, seemingly anything that does
communication. Though I've heard differently on this list, this led me
to believe I was doing something wrong in my code.

It seems like the only way I could verify that I'm not causing these
errors myself is to grok the call stacks I'm given for each vg error
and figure out where the uninitialized memory comes from, and then make
a judgement call for myself whether this makes sense to suppress. Or
I could mail the list about every error I see and ask for confirmation
that it's benign/suppressable. Most likely, I'll take the simple
approach and just use the suppression file I was given, but that's
prone to be fragile and break with a future OpenMPI release.

What about an environment variable which enables slower,
valgrind-friendly behavior? There's precedent in other libraries, e.g.
glib [1].

-tom

[1] http://library.gnome.org/devel/glib/stable/glib-running.html