Subject: Re: [OMPI users] CPU user time vs. system time
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-06-29 07:25:28


My $0.02: there is not much useful that you can learn from system time
vs. user time. The only meaningful metric is total wall clock
execution time.

Open MPI's progression engine is designed to poll aggressively; this
approach does not work well in oversubscribed environments. You can
set "mpi_yield_when_idle" (BTW, I assume you mean "--mca
mpi_yield_when_idle 1"), but all that does it make every MPI process
call sched_yield() frequently to voluntarily yield its position in the
kernel's scheduling algorithm. So every MPI process will still poll
aggressively, but they'll give up their run slot at just about every
iteration. No matter how you do it, this guarantees a loss of
performance when running in oversubscribed scenarios.

It's not clear from your text; are *all* of your runs in
oversubscribed environments?

On Jun 28, 2009, at 5:57 PM, Qiming He wrote:

> I try a couple of things including your suggestion. I also find out
> this has been reported before,
> http://www.open-mpi.org/community/lists/users/2007/03/2904.php
> but there seems to be no clear solution so far:
>
> Here is what I observe:
> I keep the problem size fixed with 24 processes. I use two nodes
> with 8-core each and 2-core each.
>
> 1. When it is oversubscribed (12 process/processor), sys vs. user
> time is much higher than less-subscribed (1.5 process/processor).
> Almost The wall clock does not improve too much :-(
>
> 2. I try following options, individually and collectively, no
> difference
> mpirun --mpi_yield_when_idle 1 --mca btl tcp,sm,self --mca
> coll_hierarch_priority 100 ...
>
> 3. older openmpi version (1.3) seems to be better than new version
> (1.3.2), but not significantly.
>
> By the way, I am working on Amazon EC2 (VM-host). Will that make any
> difference?
>
> Please advise
>
> Thanks
>
>
>
> On Fri, Jun 26, 2009 at 11:28 PM, Ralph Castain <rhc_at_[hidden]>
> wrote:
> If you are running fewer processes on your nodes than they have
> processors, then you can improve performance by adding
>
> -mca mpi_paffinity_alone 1
>
> to your cmd line. This will bind your processes to individual cores,
> which helps with latency. If your program involves collectives, then
> you can try setting
>
> -mca coll_hierarch_priority 100
>
> This will activate the hierarchical collectives, which utilize
> shared memory for messages between procs on the same node.
>
> Ralph
>
>
>
> On Jun 26, 2009, at 9:09 PM, Qiming He wrote:
>
> Hi all,
>
> I am new to OpenMPI, and have an urgent run-time question. I have
> openmpi-1.3.2 compiled with Intel Fortran compiler v.11 simply by
>
> ./configure --prefix=<my-dir> F77=ifort FC=ifort
> then I set my LD_LIBRARY_PATH to include <openmpi-lib> and <intel-lib>
> and compile my Fortran program properly. No compilation error.
>
> I run my program on single node. Everything looks ok. However, when
> I run it on multiple nodes.
> mpirun -np <num> --hostfile <my-hosts> <my-program>
> The performance is much worse than a single node with the same size
> of the problem to solve (MPICH2 has 50% improvement)
>
> I use top and saidar to find that user time (CPU user) is much lower
> than system time (CPU system), i.e,
> only small portion of CPU time is used by user application, while
> the rest is busy with system.
> No wonder I got bad performance. I am assuming "CPU system" is used
> for MPI communication.
> I notice the total traffic (on eth0) is not that big (~5Mb/sec).
> What is CPU system busy for?
>
> Can anyone help? Anything I need to tune?
>
> Thanks in advance
>
> -Qiming
>
>
>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems