NEC-LIST: Using LAPACK with NEC

From: Tom Wallace <twallace_at_email.domain.hidden>
Date: Mon, 29 Nov 1999 13:20:51 -0500

Over the past year or two, several people have independently realized
that replacing NEC's linear algebra routines (FACTR and SOLVE) with
routines from the LAPACK linear algebra library is both extremely
simple and very rewarding. I've put a little description together,
with benchmarks and example code, to show how beneficial this can be.

LAPACK is a standardized set of routines for linear algebra that has
been developed to provide high quality, high speed implementations of
commonly needed functions. There are FORTRAN versions available on the
net at http://www.netlib.org/lapack, as well as a FAQ
(http://www.netlib.org/lapack/faq.html).

The big payoff from using LAPACK comes from using a version of the
library optimized for your processor, rather than simply compiling the
generic FORTRAN versions. Because it is the standard library for
linear algebra, all workstation and supercomputer vendors provide an
optimized version of LAPACK for their machines; some of them charge
for it (e.g., Sun, which provides it as part of the Sun Performance
Workshop) and some of them give it away (e.g. SGI, which lets you
download the SCSL Scientific Library for free).

Luckily for the many people using PCs under Windows 9X or NT, Intel
has a free version which supports Digital Visual Fortran (and the
Intel Fortran Compiler, which isn't terribly popular). It's called the
Intel Math Kernel Library, and it can be downloaded from
http://www.intel.com/vtune/perflib/mkl/index.htm. [Linux versions are
available from http://www.cs.utk.edu/~ghenry/distrib/ - mod.]

What can an optimized LAPACK interface do for you? Here's a list of
runtimes for NEC problems of various sizes, using the original NEC
code and a version modified to use Intel's optimized LAPACK
library. As you can see, the benefits are dramatic:

 Size (segments) Original LAPACK (1 proc.) LAPACK (2 proc.)
     400 4.0 1.9 1.7
     800 25.4 10.0 7.8
    1600 243 60 43
    3200 1917 382 244

All of these problems had no symmetry, and were run on the same dual
processor 400 MHz Pentium II.

Note that using an optimized library may also allow you to use
multiple processors if you have them (the Intel library does this, but
only under NT, of course). The speed increase is significant, and it
gets larger as the problem size increases. This is because using
LAPACK speeds up the factoring of the interaction matrix (the FACTOR
time reported by NEC), and that time increases as the cube of the
number of segments. For 4000-5000 segment problems, the speed
improvement can easily be a factor of 10.

How do you get this to work? To do it, you will need to have the NEC
source code, and already be able to compile it and produce working
versions of the original NEC code. You will also need to locate a
LAPACK library compatible with your compiler, or choose a compiler for
which an optimized LAPACK library is available (check the LAPACK FAQ
under Vendor-Supplied BLAS; many of those also provide the LAPACK
library). If you use Digital Fortran on a PC, for example, get the
Intel library; on a Mac, Absoft provides a LAPACK library with Pro
Fortran 6.0.

If you use a Mac or a PC, you may find that your favorite compiler
doesn't have an optimized LAPACK library; it's up to you to decide
whether the speed improvement is worth changing compilers. You may
also download the FORTRAN versions from http://www.netlib.org/ and
compile them (there are a LARGE number of files), but don't expect the
dramatic speed improvements you can get with an optimized library.

Take your copy of NEC and locate the FACTR and SOLVE subroutines. Cut
SUBROUTINE FACTR and SUBROUTINE SOLVE (only!) out of the file and save
them in a separate file. Don't modify any other routines (like FACTRS
or SOLGF)!

Next, copy the two replacement routines below into another file:

- Cut Here -

      SUBROUTINE FACTR (N,A,IP,NDIM)
C
C Performs LU decomposition of the COMPLEX*16 matrix A, using
C routines from the LAPACK linear algebra library.
C
C Although these implementations of FACTR and SOLVE are
C functionally equivalent to the original NEC routines, they
C do not produce identical intermediate results. This means
C you MUST replace both FACTR and SOLVE in the NEC code, and
C recompute any numerical Green's function files, for NEC to
C operate correctly.
C
      INTEGER N,NDIM,IP(*)
      COMPLEX*16 A(NDIM,*)
      CALL ZGETRF(N,N,A,NDIM,IP,INFO)
      IF(INFO .NE. 0) THEN
        WRITE(3,*) ' ERROR IN FACTR'
        STOP
      ENDIF
      RETURN
      END

      SUBROUTINE SOLVE (N,A,IP,B,NDIM)
C
C Solves the linear equation transpose(A) * x = b (which is what
C NEC wants) from the LU decomposition computed by FACTR, using
C LAPACK.
C
      INTEGER N,NDIM,IP(*)
      COMPLEX*16 A(NDIM,*),B(*)
      INTEGER INFO
      CALL ZGETRS('T',N,1,A,NDIM,IP,B,NDIM,INFO)
      IF(INFO .NE. 0) THEN
        WRITE(3,*) ' ERROR IN SOLVE'
        STOP
      ENDIF
      RETURN
      END

- Cut Here -

READ the warning in the source code! You must recompute any NGF files
before you can use the LAPACK interface version! (And don't use a NGF
problem as a test problem below).

Now, you can easily make two versions of NEC: one using the original
routines, by linking the main NEC file (FACTR and SOLVE removed) with
the original FACTR/SOLVE file, and a LAPACK version, by linking the
main NEC file with the LAPACK interface file and the LAPACK
library. Run the two versions on a simple test problem. The output
files should be identical except for the lines giving the run
times. (Of course, we also hope the run times are significantly
shorter for the LAPACK version.) That's it -- you should have a faster
version of NEC!

Of course, you must do careful testing using both versions before you
start using the LAPACK version in production work -- it is YOUR
responsibility to verify that the interface is operating properly on
your system.

I hope that this is beneficial to the NEC user community; I think it's
an important update to NEC that should save many users many hundreds
of hours of waiting for their runs to finish.

---------------------------------------------------------------
  Tom Wallace (twallace_at_apti.com) phone: (202) 223-8808
  Advanced Power Technologies, Inc. fax: (202) 223-1377
  1250 24th St. NW, Suite 850
  Washington, DC 20037
Received on Mon Nov 29 1999 - 20:01:43 EST

This archive was generated by hypermail 2.2.0 : Sat Oct 02 2010 - 00:10:39 EDT