Subject: Re: [OMPI users] Problem with qlogic cards InfiniPath_QLE7240and AlltoAll call
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-06-27 07:39:30


For the web archives, the user posted a similar question on the
OpenFabrics list and had their question answered by someone from QLogic.

On Jun 26, 2009, at 9:46 PM, Nifty Tom Mitchell wrote:

> On Thu, Jun 25, 2009 at 10:29:39AM -0700, D'Auria, Raffaella wrote:
> >
> > Dear All,
> > I have been encountering a fatal type "error polling LP CQ with
> status
> > RETRY EXCEEDED ERROR status number 12" whenever I try to run a
> simple
> > MPI code (see below) that performs an AlltoAll call.
> > We are running the OpenMPI 1.3.2 stack on top of the OFED 1.4.1
> stack.
> > Our cluster is composed of mostly Mellanox HCAs (MT_03B0140001)
> and
> > some Qlogic (InfiniPath_QLE724) cards.
> > The problem manifests itself as soon as the size of the vector,
> which
> > components are being swapped between processes with the all to
> all
> > call, is equal or larger than 68MB.
> > Please note that I have this problem only when at least one of
> the
> > computational nodes in the host list of mpiexec is a node with
> the
> > qlogic card InfiniPath_QLE724.
>
> Look at btl flags....
> It is possible that the InfiniPath_QLE7240 fast transport path for
> MPI is not
> connecting to the Mellanox HCA. The default fast path for cards
> like the QLE7240 use the PSM library that Mellanox does not know
> about.
>
> The mpirun man page hints at this but does not divulge what btl is
> and how to expore the modular component archecture (MCA).
>
>
> --
> T o m M i t c h e l l
> Found me a new hat, now what?
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Jeff Squyres
Cisco Systems