$subject_val = "Re: [OMPI users] PBSPro/OpenMPI Errors"; include("../../include/msg-header.inc"); ?>
Subject: Re: [OMPI users] PBSPro/OpenMPI Errors
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-06-27 08:45:31
On Jun 25, 2009, at 12:06 PM, Robert Jackson wrote:
> When using OpenMPI and nwchem standalone (mpirun --byslot --mca btl
> self,sm,tcp --mca btl_base_verbose 30 --mca btl_tcp_if_exclude
> lo,eth1 $NWCHEM h2o.nw > & h2o.nwo.$$) the job runs fine.
>
> When running the same job via the PBSPro scheduler I get errors. The
> PBS script is called nwrun and is run with the following command
> qsub V S /bin/bash ./nwrun.
Odd.
I'm unfortunately unfamiliar with nwchem; it looks like the error is
coming from ARMCI. Have you checked with the nwchem authors to see
what this error means?
> Error listing from error file:
> ARMCI configured for 4 cluster nodes. Network protocol is 'TCP/IP
> Sockets'.
> 1:trying connect to host=compute-1-4.local, port=35506 t=5 111
> 1:armci_CreateSocketAndConnect: connect failed: -1
> trying to connect:: Connection refused
> 1:armci_CreateSocketAndConnect: connect failed: -1
> Last System Error Message from Task 1:: Connection refused
> [compute-1-4.local:04739] MPI_ABORT invoked on rank 1 in
> communicator MPI_COMM_WORLD with errorcode -1
-- Jeff Squyres Cisco Systems