The pghpf compiler supports the EXTRINSIC keyword with the F77_LOCAL, HPF_SERIAL or the HPF_LOCAL argument. Extrinsics allow an HPF programmer to call non-HPF procedures or local HPF procedures and to have procedure arguments mapped from the caller to the called procedure. The EXTRINSIC prefix on an INTERFACE definition declares the interface to use when calling the program. To use an extrinsic, the program needs to supply an explicit interface using the INTERFACE block.
A called local Fortran 77 (or a C routine with matching arguments) procedure may use the underlying communication primitives upon which the HPF runtime is based, or may use the generic pghpf send and receive routines. The set of generic routines listed in this section may be expanded in the future and is supported on all systems.
For example, the code on the following page defines an interface to the DOTP_BLK procedure:
DOTP (N, X, Y, A)The called local routine DOTP_BLK is shown below. Note, since the data is distributed, the called routine must determine which data it owns, and handle the communications and computations on that portion of the data.
USE HPF_LIBRARY
INTEGER*4 N
REAL*8 X(N), Y(N), A
C
!HPF$ DISTRIBUTE (BLOCK) :: X
!HPF$ ALIGN (:) WITH X(:) :: Y
C
INTERFACE
EXTRINSIC (F77_LOCAL) SUBROUTINE DOTP_BLK
& (RANK, SHAPE, N, X, Y, A)
INTEGER*4, INTENT(IN) :: RANK
INTEGER*4, INTENT(IN) :: SHAPE(RANK)
INTEGER*4, INTENT(IN) :: N
REAL*8, INTENT(IN) :: X(N)
REAL*8, INTENT(IN) :: Y(N)
REAL*8, INTENT(OUT) :: A
!HPF$ DISTRIBUTE (BLOCK) :: X
!HPF$ ALIGN (:) WITH X(:) :: Y
END SUBROUTINE DOTP_BLK
END INTERFACE
C
CALL DOTP_BLK (SIZE(PROCESSORS_SHAPE()), PROCESSORS_SHAPE(),
& N, X, Y, A)
C
RETURN
END
DOTP_BLK computes the dot product of global vectors X and Y and returns the result in A on each processor. Each processor determines which portions of X and Y it owns, computes the dot product of the local portion, and then performs the necessary communication to complete the dot product on each processor.
CWhen using EXTRINSIC(F77_LOCAL), the extrinsic is a Fortran 77 program unit and must be compiled using a Fortran 77 compiler rather than with pghpf. The .o file produced by compiling the extrinsic with the -c option can then be linked with the HPF calling program by including it on the pghpf link line.
C D O T P R O D U C T
C
C PURPOSE:
C To compute the dot product of two
C block-distributed double-precision vectors.
C
C
SUBROUTINE DOTP_BLK (RANK, SHAPE, N, X, Y, A)
INTEGER*4 RANK, SHAPE(RANK), N
REAL*8 X(*), Y(*), A
C
INCLUDE '/usr/pgi/arch/include/pglocal.f'
INTEGER MAXCPUS
PARAMETER (MAXCPUS = 2048)
INTEGER MYCPU, NCPUS, COORD(7)
INTEGER I, J
DOUBLE PRECISION TA(0:2047)
C
C Get my processor number and number of processors.
C
MYCPU = PGHPF_MYPROCNUM()
NCPUS = PGHPF_NPROCS()
C
C Determine processor arrangement information.
C
CALL PGHPF_PROCNUM_TO_COORD (MYCPU, RANK, SHAPE, COORD)
C
C Check for error conditions.
C
IF (RANK .NE. 1) THEN
PRINT *, "DOTP: Processor arrangement must be rank 1"
STOP
ENDIF
IF (N .LE. 0) RETURN
C
IF (SHAPE(1) .GT. MAXCPUS) THEN
PRINT *, "DOTP: # CPUs must be less than:",MAXCPUS+1
STOP
ENDIF
C
C Determine how many elements reside on this processor
C
BLKSZ = (N + SHAPE(1) - 1) / SHAPE(1)
MYCT = MIN((N - MYCPU * BLKSZ), BLKSZ)
MYCT = MAX(MYCT,0)
C
C Allocate an array to hold intermediate results and do the
C local dot product
C
TA(MYCPU) = 0.0D0
DO I = 1, MYCT
TA(MYCPU) = TA(MYCPU) + X(I) * Y(I)
ENDDO
C
C Broadcast the results to all other processors
C
IF (SHAPE(1) .GT. 1) THEN
DO I = 0, SHAPE(1) - 1
IF (I .EQ. MYCPU) THEN
DO J = 0, SHAPE(1) - 1
IF (J .NE. MYCPU) THEN
CALL PGHPF_CSEND (J,TA(MYCPU),1,1,PGLCL_REAL8)
ENDIF
ENDDO
ELSE
CALL PGHPF_CRECV (I,TA(I),1,1,PGLCL_REAL8)
ENDIF
ENDDO
ENDIF
C
C Complete global sum of intermediate results
C
A = 0.0D0
DO I = 0, SHAPE(1) - 1
A = A + TA(I)
ENDDO
C
RETURN
END
For example, if DOTP_BLK.F is the local Fortran 77 routine, compile it as follows:
% pgf77 -c DOTP_BLK.FThen compile the HPF main program and link in the extrinsic as follows:
% pghpf DOTP.hpf DOTP_BLK.oIf desired, the local routine can be compiled using pghpf with the -Mnohpfc option present on the compile line:
% pghpf -Mnohpfc -c DOTP_BLK.F
% pghpf DOTP.hpf DOTP_BLK.o
In addition to the underlying-communication-support routines and the generic routines, there a few routines common to both. This section covers the common routines.
This routine returns the pghpf runtime's notion of the number of processors for the current execution of the program.
C interface:
int __hpf_nprocs()Fortran interface:
nprocs = __hpf_nprocs()
integer pghpf_nprocs
external pghpf_nprocs
nprocs = pghpf_nprocs()
Returns the pghpf runtime's notion of the current processor number; this will be between 0 and number_of_processors()-1.
C interface:
int __hpf_myprocnum()Fortran interface:
myprocnum = __hpf_myprocnum()
integer pghpf_myprocnum
external pghpf_myprocnum
myprocnum = pghpf_myprocnum()
C interface:
void __hpf_procnum_to_coordFortran interface:
(int procnum, int rank, int *shape, int *coord)
integer procnum, rank, shape(rank), coord(rank)The rank and shape arguments describe the processor grid. The pghpf processor number given by procnum is translated to grid coordinates returned in coord. Grid coordinates are integers between 1 and the size of the corresponding grid dimension. If the processor number is outside the bounds of the processor grid, zeroes are returned in coord.
call pghpf_procnum_to_coord(procnum, rank, shape, coord)
C interface:
int __hpf_coord_to_procnum(int rank,Fortran interface:
int *shape, int *coord)
integer procnum, rank, shape(rank), coord(rank)The rank and shape arguments describe the processor grid. The processor grid coordinates in coord are translated to a pghpf processor number. Grid coordinates are integers between 1 and the size of the corresponding grid dimension. If the coordinates are outside the bounds of the processor grid, -1 is returned.
integer pghpf_coord_to_procnum
external pghpf_coord_to_procnum
procnum = pghpf_coord_to_procnum(rank, shape, coord)
These are the generic pghpf local communication routines. They are available on all systems.
The data types for the generic local communications routines for the C interface are defined in the file /usr/pgi/arch/include/pglocal.h. (where arch is your system's architecture i.e. SOLARIS, RS6000, SGI, or some other system). The data types for the Fortran interface are defined in /usr/pgi/arch/include/pglocal.f.
This routine allows the local program to send or receive non-character data. These routines block until the data is delivered.
C interface:
void __hpf_csend(int cpu, void *adr, int cnt,Fortran interface:
int str, int typ) void __hpf_crecv(int cpu, void *adr,
int cnt, int str, int typ)
integer cpu, cnt, str, typThe cpu argument is the pghpf processor number for the remote partner, adr is the local data address, cnt is the number of data items to transfer, typ is the data type, and str is the stride between each item in the local array (in item units).
integer adr(*)
call pghpf_csend(cpu, adr, cnt, str, typ)
call pghpf_crecv(cpu, adr, cnt, str, typ)
For performance reasons, data transferred by pghpf_csend and pghpf_crecv may not be buffered as in older releases of pghpf (version older than pghpf 2.0), so programs that used to run under pghpf 1.3 may hang with release 2.0 or newer. The solution is to change the f77_local routine so that processors "pair off" when exchanging messages, when one processor calls pghpf_csend the partner processor must call pghpf_crecv. A simple way to decide who sends first is to compare the processor numbers, for example:
old:
call pghpf_csend(partner, x, ...)new:
call pghpf_crecv(partner, y, ...)
me = pghpf_myprocnum()
if (partner .lt. me) then
call pghpf_csend(partner, x, ...)
call pghpf_crecv(partner, y, ...)
else
call pghpf_crecv(partner, y, ...)
call pghpf_csend(partner, x, ...)
end
Send or receive character data. These routines block until the data is delivered.
Fortran interface:
integer cpu, cnt, strThe cpu argument is the pghpf processor number for the remote partner, adr is the local data address, cnt is the number of character items to transfer, and str is the stride between each item in the local character array (in item units). Each character item is a fixed-length sequence of characters.
character*(*) adr(*)
call pghpf_csendchar(cpu, adr, cnt, str)
call pghpf_crecvchar(cpu, adr, cnt, str)
Note that pghpf_csend and pghpf_crecv do not allow a processor to send a message to itself. The code must handle this case if it can arise in the user's algorithm. For example, the preceding example could be extended as shown here:
me = pghpf_myprocnum()
if (partner .eq. me) then
y = x
else if (partner .lt. me) then
call pghpf_csend(partner, x, ...)
call pghpf_crecv(partner, y, ...)
else
call pghpf_crecv(partner, y, ...)
call pghpf_csend(partner, x, ...)
end
pghpf implementations using MPI provide the following additional routine.
C interface:
int __hpf_tid(int procnum)Fortran interface:
tid = __hpf_tid(procnum)
integer pghpf_tidTranslates the pghpf processor number to the processor identifier used by MPI.
external pghpf_tid
itid = pghpf_tid(iprocnum)
pghpf implementations using PVM provide the following additional routine.
C interface:
int __hpf_tid(int procnum)Fortran interface:
tid = __hpf_tid(procnum)
integer pghpf_tidTranslates the pghpf processor number to the processor identifier used by PVM (the tid).
external pghpf_tid
itid = pghpf_tid(iprocnum)
A called local HPF_LOCAL routine is similar to a called F77_LOCAL routine with the exception that the language of the local routine is HPF with restrictions, as specified in section 8.4, "Local Routines Written in HPF" in The High Performance Fortran Handbook. An HPF_LOCAL routine may use the underlying communication primitives described in the previous sections, or may use the generic pghpf send and receive routines. These routines are described in the previous sections. An HPF_LOCAL routine may also use the HPF_LOCAL_LIBRARY procedures to query global arguments, or for determining processor information. The support HPF_LOCAL_LIBRARY routines are found in Appendix C of The pghpf Reference Manual.
An HPF_SERIAL routine should be compiled using pghpf and will execute on only one processor. The caller treats the extrinsic HPF_SERIAL procedure the same as an identically coded HPF procedure, although performance may differ.