$subject_val = "Re: [OMPI users] Openmpi and processor affinity"; include("../../include/msg-header.inc"); ?>
Subject: Re: [OMPI users] Openmpi and processor affinity
From: Iftikhar Rathore (irathore_at_[hidden])
Date: 2009-06-02 23:25:05
Guss
Thanks for the reply and it was a typo (Im sick). I have updated to
1.3.2 since my last post and have tried checking cpu affinity by using
f and j it shows processes spread on all 8 cores in the beginning, but
it does eventually shows all processes running on 0,
My P and Q's are made for an 890 run, I have done this run with other
mpi implementation before. I have made sure that I am using the right
mpirun, but as Jeff pointed out I may have a mixed build and I am
investigating it more and will post my findings.
Regards
On Tue, 2009-06-02 at 20:58 -0400, Gus Correa wrote:
> Hi Iftikhar
>
> Iftikhar Rathore wrote:
> > Hi
> > We are using openmpi version 1.2.8 (packaged with ofed-1.4). I am trying
> > to run hpl-2.0 (linpak). We have two intel quad core CPU's in all our
> > server (8 total cores) and all hosts in the hostfile have lines that
> > look like "10.100.0.227 slots=8max_slots=8".
>
> Is this a typo on your email or on your hostfile?
>
> > look like "10.100.0.227 slots=8max_slots=8".
>
> There should be blank space between the number of slots and max_slots:
>
> 10.100.0.227 slots=8 max_slots=8
>
> Another possibility is that you may be inadvertently using another
> mpirun on the system.
>
> A third possibility:
> Does your HPL.dat file require 896 processors?
> The product P x Q on each (P,Q) pair should match 896.
> If it is less, HPL will run on less processors, i.e., on P x Q only.
> (If it is more, HPL will issue an error message and stop.)
> Is this what is happening?
>
> A fourth one ...:
> Are you sure processor affinity is not correct?
> Do the processes drift across the cores?
> Typing 1 on top is not enough to clarify this.
> To see the process-to-core map on top,
> type "f" (for fields),
> then "j" (to display the CPU/core number),
> and wait for several minutes to see if processor/core (header "P")
> and the process ID (header "PID"),
> drift or not.
>
> Even when I launch less processes than the available/requested cores
> "--mca mpi_paffinity_alone 1" works right here,
> as I just checked, with P=4 and Q=1 on HPL.dat
> and with -np 8 on mpiexec.
>
> **
>
> I recently ran a bunch of HPL tests with --mca mpi_paffinity_alone 1
> and OpenMPI 1.3.2, built from source, and the processor affinity seems
> to work (i.e., the processes stick to the cores).
> Building from source quite simple, and would give you the latest OpenMPI.
>
> I don't know if 1.2.8 (which you are using)
> has a problem with mpi_paffinity_alone,
> but the OpenMPI developers may clarify this.
>
>
> I hope this helps,
> Gus Correa
> ---------------------------------------------------------------------
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> ---------------------------------------------------------------------
>
> >
> > Now when I use mpirun (even with --mca mpi_paffinity_alone 1) it does
> > not keep the affinity, the processes seem to gravitate towards first
> > four cores (using top and hitting 1). I know I do have MCA paffinity
> > available.
> >
> > [root_at_devi DLR_WB_88]# ompi_info | grep paffinity
> > [devi.cisco.com:26178] mca: base: component_find: unable to open btl openib: file not found (ignored)
> > MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.8)
> >
> > The command line I am using is:
> >
> > # mpirun -nolocal -np 896 -v --mca mpi_paffinity_alone 1 -hostfile /mnt/apps/hosts/896_8slots /mnt/apps/bin/xhpl
> >
> > Am I doing something wrong and is there a way to confirm cpu affinity besides hitting "1" on top.
> >
> >
> > [root_at_devi DLR_WB_88]# mpirun -V
> > mpirun (Open MPI) 1.2.8
> >
> > Report bugs to http://www.open-mpi.org/community/help/
> >
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
-- Iftikhar Rathore Technical Marketing Engineer Server Access Virtualization BU. Cisco Systems, Inc. Phone: +1 408 853 5322 Mobile: +1 636 675 2982