Subject: Re: [OMPI users] MPI-IO: reading an unformatted binary fortran file
From: Rob Latham (robl_at_[hidden])
Date: 2009-06-16 16:22:28


On Thu, Jun 11, 2009 at 05:33:58PM -0400, Greg Fischer wrote:
> I'm attempting to wrap my brain around the MPI I/O mechanisms, and I was
> hoping to find some guidance. I'm trying to read a file that contains a
> 117-character string, followed by a series records that contain integers and
> reals. The following code would read it in serial:
>
> ---
> character(len=117) :: cfx1
>
> read (nin) cfx1
> do i=1,end_of_file
> read(nin) integer1,integer2,real1,real2,real3,real4,real5,real6,real7
> enddo
> ---

Please note that raw binary fortran i/o acts nothing like raw binary C
i/o. What I mean is that you have a fortran read there, and it's
pulling out records from your fortran file, but who knows how much
padding your compiler put between members of one of these records.

> To simplify the problem, I removed the "cfx1" string from the file I'm
> reading, and created an MPI_TYPE_STRUCT as follows:
>
> ---
> length( 1 ) = 1
> length( 2 ) = 2
> length( 3 ) = 7
> length( 3 ) = 1
> disp( 1 ) = 0
> disp( 2 ) = sizeof( MPI_LB )
> disp( 3 ) = disp( 2 ) + 2*sizeof(MPI_INTEGER)
> disp( 4 ) = disp( 3 ) + 7*sizeof(MPI_REAL)
> type( 1 ) = MPI_LB
> type( 2 ) = MPI_INTEGER
> type( 3 ) = MPI_REAL
> type( 4 ) = MPI_UB
>
> call MPI_TYPE_STRUCT( 4, length, disp, type, sptype, ierr )
> call MPI_TYPE_COMMIT( sptype, ierr )

There's absolutely no guarantee that records line up in memory like
they do in unformatted binary Fortran files. Fortran could put more,
less, or the same padding between records.

> This almost works. With some fiddling (I can't seem to make it work right
> now), I'm able to get most of the reals and integers into "sourcepart", but
> something doesn't line up quite correctly. I've spent a lot of time looking
> at the documentation and tutorials on the web, but haven't found a resource
> that helps me work through this problem.

Yup. Take into consideration that I'm a shameless C dude, but Fortran
i/o is pure evil!

> Ultimately, the objective will be to allow an arbitrary number of processes
> read this file, with each record being uniquely read by a single process.
> (e.g. process 1 read record 1, process 2 reads record 2, process 1 reads
> record 3, process 2 reads record 4, etc.)
>
> What's the best way to skin this cat? Any assistance would be greatly
> appreciated.

Well, you could use something like parallel-netcdf or parallel-HDF5
which does everything you want to do already, with the added advantage
of being a self-describing portable file format that you could
exchange with collaborators or visualize with a whole ecosystem of
netcdf viewers.

How did you create this file? I'm kind of surprised you cannot
MPI_FILE_READ back what you've written. The MPI-IO library just
provides a wrapper around C system calls, so if you created this file
with fortran, you'll have to read it back with fortran.

Since you eventually want to do parallel I/O, I'd suggest creating the
file with MPI-IO (Even if it is MPI_FILE_WRITE from rank 0 or a
single process) as well as reading it back (perhaps with
MPI_FILE_READ_AT_ALL).

==rob

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA