MPI intercommunicator collective operation problem (Fortran code)

MPI intercommunicator collective operation problem (Fortran code)

Post by rlnaf » Thu, 24 Jun 2004 02:01:08

am posting this question to the Fortran forum as well; it was
previous submitted to the MPI forum, but hasn't received any
attention. (Are all MPI users c programmers?)

I'm having difficulty using the collective operation MPI_Bcast in an
intercommunication mode. We have a Linux cluster and are running
lam-7.0.5-1 for MPI. The operations that I am attempting are spelled
out in the book "Using MPI-2". In particular, I am spawning new
processes using the MPI_COMM_SPAWN function:

call MPI_COMM_SPAWN('comm_test2.ex', MPI_ARGV_NULL, numslaves, &
slavecomm, MPI_ERRCODES_IGNORE, ierr)

This call appears in the sample program on page 236. The child has
the corresponding MPI_COMM_GET_PARENT function (page 241):

call MPI_COMM_GET_PARENT(parentcomm, ierr)

From the parent, I am attempting to broadcast a single, double
precision variable to all the child processes (page 236):

slavecomm, errcode)

The child has the corresponding function (page 241):

call MPI_BCAST(value, no_values, MPI_DOUBLE_PRECISION, parent, &
parentcomm, errcode)

These operations are all contained in Fortran programs. The sample
programs and the reference itself (page 238, section 7.2.3) indicate
that using MPI_BCAST thusly is doable. Hopefully, it is a dumb error
on my part and not in the LAM installation.
I am using LF95 to compile the programs:

(bash) lobo.pts/2% mpif77 spawn_test2.f90 -o spawn_test2.ex
-R/usr/local/lf95/lib -lpthread
f95: warning: -pthread cannot be specified.
Internal subprogram name(error_class)
2005-W: "spawn_test2.f90", line 48: MPI_SUCESS is used but never
Encountered 0 errors, 1 warning in file spawn_test2.f90.

(bash) lobo.pts/2% mpif77 comm_test2.f90 -o comm_test2.ex
-R/usr/local/lf95/lib -lpthread
f95: warning: -pthread cannot be specified.
Encountered 0 errors, 0 warnings in file comm_test2.f90.

I have four processors active:

(bash) lobo.pts/2% lamnodes
n0 lobo01:1:origin,this_node
n1 lobo02:1:
n2 lobo03:1:
n3 lobo04:1:

Execution produces the following results:

(bash) lobo.pts/2% mpiexec -n 1 spawn_test2.ex
**MPI_COMM_WORLD= 0 spawn loc 1
process_id= 0
input numslaves
**MPI_COMM_WORLD= 0 spawn loc 2 slavecomm=
**MPI_BCAST: MPI_ROOT= 0 slavecomm= 48 value= 2.000000000000000
no_values= 1
**MPI_BCAST loc 3 errcode= 16
known error not in this list
End of error message

child name=lobo02 child rank= 1 child size= 2 parentcomm= 48
child child rank= 0 child size= 2
parentcomm= 48
MPI_Bcast: unclassified: error code (rank 1, MPI_COMM_PARENT)
Rank (1, MPI_COMM_WORLD): Call stack within LAM:
Rank (1, MPI_COMM_WORLD): - MPI_Bcast()
Rank (1, MPI_COMM_WORLD): - main()

The two programs (parent: spawn_test2.f90, child: comm_test2.f90)
follow. If anyone can lend me assistance, I would be appreciative.

Richard L Naff

program spawn
include 'mpif.h'
! ... kv: precision of variables used in assembly
integer, parameter :: kv=selected_real_kind(p=8)
integer :: numslaves, slavecomm, str_len, errcode, errcla