8 Extrinsics

8.1 Extrinsic F77_LOCAL Support
8.2 Extrinsic HPF_LOCAL Support
8.3 Extrinsic HPF_SERIAL Support

The pghpf compiler supports the EXTRINSIC keyword with the F77_LOCAL, HPF_SERIAL or the HPF_LOCAL argument. Extrinsics allow an HPF programmer to call non-HPF procedures or local HPF procedures and to have procedure arguments mapped from the caller to the called procedure. The EXTRINSIC prefix on an INTERFACE definition declares the interface to use when calling the program. To use an extrinsic, the program needs to supply an explicit interface using the INTERFACE block.

8.1 Extrinsic F77_LOCAL Support

A called local Fortran 77 (or a C routine with matching arguments) procedure may use the underlying communication primitives upon which the HPF runtime is based, or may use the generic pghpf send and receive routines. The set of generic routines listed in this section may be expanded in the future and is supported on all systems.

For example, the code on the following page defines an interface to the DOTP_BLK procedure:

      DOTP (N, X, Y, A)
      USE HPF_LIBRARY
      INTEGER*4    N
      REAL*8       X(N), Y(N), A
C
!HPF$ DISTRIBUTE (BLOCK) :: X
!HPF$ ALIGN (:) WITH X(:) :: Y
C
      INTERFACE
         EXTRINSIC (F77_LOCAL) SUBROUTINE DOTP_BLK
     &   (RANK, SHAPE, N, X, Y, A)
         INTEGER*4, INTENT(IN) :: RANK
         INTEGER*4, INTENT(IN) :: SHAPE(RANK)
         INTEGER*4, INTENT(IN) :: N
         REAL*8,    INTENT(IN) :: X(N)
         REAL*8,    INTENT(IN) :: Y(N)
         REAL*8,    INTENT(OUT) :: A
!HPF$ DISTRIBUTE (BLOCK) :: X
!HPF$ ALIGN (:) WITH X(:) :: Y
         END SUBROUTINE DOTP_BLK
      END INTERFACE
C
      CALL DOTP_BLK (SIZE(PROCESSORS_SHAPE()), PROCESSORS_SHAPE(),
     &               N, X, Y, A)
C
      RETURN
      END

The called local routine DOTP_BLK is shown below. Note, since the data is distributed, the called routine must determine which data it owns, and handle the communications and computations on that portion of the data.

DOTP_BLK computes the dot product of global vectors X and Y and returns the result in A on each processor. Each processor determines which portions of X and Y it owns, computes the dot product of the local portion, and then performs the necessary communication to complete the dot product on each processor.

C
C D O T   P R O D U C T
C
C PURPOSE:
C       To compute the dot product of two 
C       block-distributed double-precision vectors.
C       
C
      SUBROUTINE DOTP_BLK (RANK, SHAPE, N, X, Y, A)
      INTEGER*4    RANK, SHAPE(RANK), N
      REAL*8       X(*), Y(*), A
C
      INCLUDE '/usr/pgi/arch/include/pglocal.f'
      INTEGER MAXCPUS
      PARAMETER (MAXCPUS = 2048)
      INTEGER MYCPU, NCPUS, COORD(7)
      INTEGER I, J
      DOUBLE PRECISION TA(0:2047)
C
C Get my processor number and number of processors.
C
      MYCPU = PGHPF_MYPROCNUM()
      NCPUS = PGHPF_NPROCS()
C
C Determine processor arrangement information.
C
      CALL PGHPF_PROCNUM_TO_COORD (MYCPU, RANK, SHAPE, COORD)
C
C Check for error conditions.
C
      IF (RANK .NE. 1) THEN
         PRINT *, "DOTP: Processor arrangement must be rank 1"
         STOP
      ENDIF



      IF (N .LE. 0) RETURN
C
      IF (SHAPE(1) .GT. MAXCPUS) THEN
         PRINT *, "DOTP: # CPUs must be less than:",MAXCPUS+1
         STOP
      ENDIF
C
C Determine how many elements reside on this processor
C
      BLKSZ = (N + SHAPE(1) - 1) / SHAPE(1)
      MYCT = MIN((N - MYCPU * BLKSZ), BLKSZ)
      MYCT = MAX(MYCT,0)
C
C Allocate an array to hold intermediate results and do the
C local dot product
C
      TA(MYCPU) = 0.0D0
      DO I = 1, MYCT
         TA(MYCPU) = TA(MYCPU) + X(I) * Y(I)
      ENDDO
C
C Broadcast the results to all other processors
C
      IF (SHAPE(1) .GT. 1) THEN
         DO I = 0, SHAPE(1) - 1
            IF (I .EQ. MYCPU) THEN
              DO J = 0, SHAPE(1) - 1
                 IF (J .NE. MYCPU) THEN
                CALL PGHPF_CSEND (J,TA(MYCPU),1,1,PGLCL_REAL8)
                 ENDIF
              ENDDO
            ELSE
              CALL PGHPF_CRECV (I,TA(I),1,1,PGLCL_REAL8)
            ENDIF
         ENDDO
      ENDIF
C
C Complete global sum of intermediate results
C
      A = 0.0D0
      DO I = 0, SHAPE(1) - 1
         A = A + TA(I)
      ENDDO
C
      RETURN
      END

When using EXTRINSIC(F77_LOCAL), the extrinsic is a Fortran 77 program unit and must be compiled using a Fortran 77 compiler rather than with pghpf. The .o file produced by compiling the extrinsic with the -c option can then be linked with the HPF calling program by including it on the pghpf link line.

For example, if DOTP_BLK.F is the local Fortran 77 routine, compile it as follows:

% pgf77 -c DOTP_BLK.F

Then compile the HPF main program and link in the extrinsic as follows:

% pghpf DOTP.hpf DOTP_BLK.o

If desired, the local routine can be compiled using pghpf with the -Mnohpfc option present on the compile line:

% pghpf -Mnohpfc -c DOTP_BLK.F
% pghpf DOTP.hpf DOTP_BLK.o

8.1.1 Common Routines

In addition to the underlying-communication-support routines and the generic routines, there a few routines common to both. This section covers the common routines.

Get number of processors

This routine returns the pghpf runtime's notion of the number of processors for the current execution of the program.

C interface:

int __hpf_nprocs()
nprocs = __hpf_nprocs()

Fortran interface:

integer pghpf_nprocs
external pghpf_nprocs
nprocs = pghpf_nprocs()

Get my processor number

Returns the pghpf runtime's notion of the current processor number; this will be between 0 and number_of_processors()-1.

C interface:

int __hpf_myprocnum()
myprocnum = __hpf_myprocnum()

Fortran interface:

integer pghpf_myprocnum
external pghpf_myprocnum
myprocnum = pghpf_myprocnum()

Translate pghpf processor number to processor grid coordinates

C interface:

void __hpf_procnum_to_coord
(int procnum, int rank, int *shape, int *coord)

Fortran interface:

integer procnum, rank, shape(rank), coord(rank)
call pghpf_procnum_to_coord(procnum, rank, shape, coord)

The rank and shape arguments describe the processor grid. The pghpf processor number given by procnum is translated to grid coordinates returned in coord. Grid coordinates are integers between 1 and the size of the corresponding grid dimension. If the processor number is outside the bounds of the processor grid, zeroes are returned in coord.

Translate processor grid coordinates to pghpf processor number

C interface:

int __hpf_coord_to_procnum(int rank, 
                           int *shape, int *coord)

Fortran interface:

integer procnum, rank, shape(rank), coord(rank)
integer pghpf_coord_to_procnum
external pghpf_coord_to_procnum
procnum = pghpf_coord_to_procnum(rank, shape, coord)

The rank and shape arguments describe the processor grid. The processor grid coordinates in coord are translated to a pghpf processor number. Grid coordinates are integers between 1 and the size of the corresponding grid dimension. If the coordinates are outside the bounds of the processor grid, -1 is returned.

8.1.2 Generic routines

These are the generic pghpf local communication routines. They are available on all systems.

The data types for the generic local communications routines for the C interface are defined in the file /usr/pgi/arch/include/pglocal.h. (where arch is your system's architecture i.e. SOLARIS, RS6000, SGI, or some other system). The data types for the Fortran interface are defined in /usr/pgi/arch/include/pglocal.f.

Send/receive non-character data

This routine allows the local program to send or receive non-character data. These routines block until the data is delivered.

C interface:

	void __hpf_csend(int cpu, void *adr, int cnt, 
                       int str, int typ)
	void __hpf_crecv(int cpu, void *adr, 
                       int cnt, int str, int typ)

Fortran interface:

integer cpu, cnt, str, typ
integer adr(*)
call pghpf_csend(cpu, adr, cnt, str, typ)
call pghpf_crecv(cpu, adr, cnt, str, typ)

The cpu argument is the pghpf processor number for the remote partner, adr is the local data address, cnt is the number of data items to transfer, typ is the data type, and str is the stride between each item in the local array (in item units).

For performance reasons, data transferred by pghpf_csend and pghpf_crecv may not be buffered as in older releases of pghpf (version older than pghpf 2.0), so programs that used to run under pghpf 1.3 may hang with release 2.0 or newer. The solution is to change the f77_local routine so that processors "pair off" when exchanging messages, when one processor calls pghpf_csend the partner processor must call pghpf_crecv. A simple way to decide who sends first is to compare the processor numbers, for example:

old:

	call pghpf_csend(partner, x, ...)
	call pghpf_crecv(partner, y, ...)

new:

	me = pghpf_myprocnum()
	if (partner .lt. me) then
	    call pghpf_csend(partner, x, ...)
	    call pghpf_crecv(partner, y, ...)
	else
	    call pghpf_crecv(partner, y, ...)
	    call pghpf_csend(partner, x, ...)
	end

Send/receive Fortran character data

Send or receive character data. These routines block until the data is delivered.

Fortran interface:

integer cpu, cnt, str
character*(*) adr(*)
call pghpf_csendchar(cpu, adr, cnt, str)
call pghpf_crecvchar(cpu, adr, cnt, str)

The cpu argument is the pghpf processor number for the remote partner, adr is the local data address, cnt is the number of character items to transfer, and str is the stride between each item in the local character array (in item units). Each character item is a fixed-length sequence of characters.

Note that pghpf_csend and pghpf_crecv do not allow a processor to send a message to itself. The code must handle this case if it can arise in the user's algorithm. For example, the preceding example could be extended as shown here:

	me = pghpf_myprocnum()
	if (partner .eq. me) then
	    y = x
	else if (partner .lt. me) then
	    call pghpf_csend(partner, x, ...)
	    call pghpf_crecv(partner, y, ...)
	else
	    call pghpf_crecv(partner, y, ...)
	    call pghpf_csend(partner, x, ...)
	end

8.1.3 MPI

pghpf implementations using MPI provide the following additional routine.

Translate processor number to MPI processor identifier

C interface:

int __hpf_tid(int procnum)
tid = __hpf_tid(procnum)

Fortran interface:

integer pghpf_tid
external pghpf_tid
itid = pghpf_tid(iprocnum)

Translates the pghpf processor number to the processor identifier used by MPI.

8.1.4 PVM

pghpf implementations using PVM provide the following additional routine.

Translate processor number to PVM processor identifier

C interface:

int __hpf_tid(int procnum)
tid = __hpf_tid(procnum)

Fortran interface:

integer pghpf_tid
external pghpf_tid
itid = pghpf_tid(iprocnum)

Translates the pghpf processor number to the processor identifier used by PVM (the tid).

8.2 Extrinsic HPF_LOCAL Support

A called local HPF_LOCAL routine is similar to a called F77_LOCAL routine with the exception that the language of the local routine is HPF with restrictions, as specified in section 8.4, "Local Routines Written in HPF" in The High Performance Fortran Handbook. An HPF_LOCAL routine may use the underlying communication primitives described in the previous sections, or may use the generic pghpf send and receive routines. These routines are described in the previous sections. An HPF_LOCAL routine may also use the HPF_LOCAL_LIBRARY procedures to query global arguments, or for determining processor information. The support HPF_LOCAL_LIBRARY routines are found in Appendix C of The pghpf Reference Manual.

8.3 Extrinsic HPF_SERIAL Support

An HPF_SERIAL routine should be compiled using pghpf and will execute on only one processor. The caller treats the extrinsic HPF_SERIAL procedure the same as an identically coded HPF procedure, although performance may differ.