NEC-LIST: NEC4 Optimum Desktop Platform

From: Carlier, Paul \(E-mail\) <pcarlier_at_email.domain.hidden>
Date: Tue, 6 Feb 2001 12:17:44 -0000

Dear Fellow CEMists,

This is in response to John Wood's enquiry re a platform for running
NEC with many segments.

I have been experimenting with many PC based platforms which, using
the LAPACK routines from the Intel Math Kernel Library to factor and
solve the matrix, resulting in some very impressive execution times.
When time permits, I hope to write up more detailed notes, but for the
time being I can give you the following information.

A Pentium III 800MHz single processor PC running NEC2 compiled with
Compaq (formerly Digital) Visual Fortran, using the LAPACK routines
instead of the original FACTR and SOLVE, runs 11,000 segments (no
symmetry) in about 2 hours. To run in double precision, you need 2GB
RAM for this but you can run single precision with only 1GB. For most
applications, single precision is enough. I find that in any cases
where there is more than a negligible difference between the double
and single precision results, the model is badly conditioned and any
results (DP or SP) are suspect.

The motherboard I was using has a front side bus speed of 133MHz and
the SDRAM was standard 168 pin DIMM PC133 (not ECC registered). Using
PC 100 SDRAM adds about 10% to the execution time.

With UK prices for generic PCC 133 SDRAM at around US$100 for a 256MB
DIMM, the problem is not getting enough RAM but finding a motherboard
that will hold the requisite amount. Most have only 3 DIMM slots,
although some are available with 4. You also have to look out for the
maximum amount that the board will control. This may only be 1.5GB,
even for a 4 slot board. You pay a premium for 512MB DIMMs, which
work out at about US$500 in the UK, but should be dropping in price.

The machine quoted above was running Windows98 SE. Windows 2000 will
give a slight improvement, although not a great deal. It will allow
the use of dual Pentium processors though.

The interesting thing I have discovered is that the Intel optimised
LAPACK routines work just as well on AMD's Althlon Thunderbird
processor as they do on the Pentium. I think that this is because
most of the improvement comes from the optimised use of the on-die
cache, of which the Athlon has at least as much as the Pentium. The
following results are all for a standard 4000 segment test run:

Pentium III 800 MHz Fill 50 sec Factor 323 sec
Athlon 900MHz Fill 39 sec Factor 235 sec
Pentium 450MHz Fill 76 sec Factor 537 sec
Celeron 466MHz Fill 200 sec Factor 800 sec
Pentium III Dual processor 800 MHz Fill 45 sec Factor 290 sec

The Celeron figures are a bit slower than they should be because the
motherboard was under performing for some reason. I am still looking
into this, but other tests have shown that the Celeron is not that
much slower than an equivalent clock speed Pentium.

The Dual processor Pentium was only fractionally faster than the
single because the EXE file was not compiled for a dual processor
machine. Processor utilisation was only 53% throughout the run. I
think that most of the speed advantage over the single processor
Pentium was due to the fact that it was running Windows 2000.

The Athlon figures look good. For equivalent clock speed, the Athlon
is generally slightly ahead of the Pentium in the standard benchmark
tests, due to its slightly faster FPU. The Althon is generally
cheaper (I could get a 1.1GHz Athlon PC for the price of a similar
spec P800 machine), but I believe that the only motherboards available
for it have no more than 3 DIMM slots, which limits the maximum RAM to
1.5GB (or 768MB with the more readily available 256MB DIMMS).

I hope that this information is of use. Please feel free to ask any
further questions, which I shall do my best to answer.

Regards,

Paul Carlier
FanField Ltd.
Braxted Park
Witham
Essex CM8 3XB England
(Tel. +44 (0) 1621 893500)
www.fanfield.co.uk
Received on Fri Feb 09 2001 - 14:29:29 EST

This archive was generated by hypermail 2.2.0 : Sat Oct 02 2010 - 00:10:41 EDT