Re: NEC-LIST: NEC2 on a Supercomputer

From: Ulrich Jakobus <u.jakobus_at_email.domain.hidden>
Date: Thu, 30 Nov 2000 18:04:55 +0200


This reply comes quite late, but since I haven't seen an answer before
on the list, I hope it is still useful.

On Fri, 10 Nov 2000 13:22:25 +0100, Abraham Rubinstein wrote:

>I intend to compile NEC2 on the Swiss-T1, a supercomputer developed by
>Federal Institute of Technology. The prototype has 64 Gflop/s peak
>performance and a total of 36 GByte of RAM.
>I have compiled NEC2 with Compaq Visual Fortran using LAPACK. For this
>supercomputer, we have MPI (Message Passing Interface), in order to
>take advantage of the parallel processing. I have found 2 instructions
>in MPI: PZGETRF and PZGETRS. These two instructions are the
>equivalents for the ZGETRF and ZGETRS instructions contained in
>LAPACK, the ones used to optimize NEC2 performance.
>The question is: does anyone knows if this is true?

In principle, you are right. There are the LAPACK functions ZGETRF and
ZGETRS, and the corresponding parallel equivalents in the ScaLAPACK
library are PZGETRF and PZGETRF (in the out-of-core prototype of
ScaLAPACK, there are even functions PFZGETRF and PFZGETRS for an
out-of-core solution). More information on ScaLAPACK and the contained
function interfaces can be found at

>Has anyone ever worked with MPI?

Yes, our company has a long experience porting a MoM code to different
parallel machines (Linux clusters, HP, SUN, SGI SMPs, but also
massively parallel supercomputers such as CRAY T3E or NEC SX/5), and
from this experience ...

>Could it be as simple as changing FACTR and SOLVE
>routines in NEC2 and then changing the names of the LAPACK
>instructions with the names of the MPI instructions?

... I have to tell you that it is not that simple. Even if for large
problems you want to parallelise only the matrix solution and not the
setup of the MoM matrix or the near- and far-field computation etc.,
the subroutines PZGETRF and PZGETRS you are referring to above assume
that your MoM matrix is distributed in memory according to a certain
scheme amongst the various processes (e.g. a two-dimensional block
cyclic distribution with a certain block size). I don't know whether
this Swiss-T1 computer is actually a shared or distributed memory
parallel machine, but in general you have also already during the MoM
setup to make sure that each node stores the matrix elements assigned
to it according to the specified scheme (ScaLAPACK is actually based
on BLACS for this, information on BLACS can be found at

Hope this helps a bit

Ulrich Jakobus

Dr.-Ing. Ulrich Jakobus Electromagnetic Software & Systems (EMSS)
Postal address: PO Box 1354, Stellenbosch 7599, South Africa
Visiting addr.: Quantum House, Quantum Steet 6, Technopark,
                Stellenbosch 7600, South Africa
Phone: +27 21 880-1880 Fax: +27 21 880-1936 Cell: +27 82 788-3729
E-Mail: WWW:
Received on Thu Nov 30 2000 - 12:46:07 EST

This archive was generated by hypermail 2.2.0 : Sat Oct 02 2010 - 00:10:40 EDT