Robert Harley (Robert.Harley@inria.fr)
Sun, 21 Feb 1999 17:02:55 +0100 (MET)
We have a problem with sqrt on Alpha Linux.
1. The original square root gave bogus answers for very small inputs.
2. The current one is ten times too slow.
3. The fast one by Wesner & Goto doesn't get the last bit correct.
Ideally we should have one that's correct (well duh), fast, and gets
the last bit right (as required by the IEEE standard, so we can
replace the default one in libm). Then most people will be happy and
those who want to lose last bit accuracy in exchange for the extra bit
of speed can still do so.
Anyway we should grab this particular bull by the horns, IMHO.
In the next messages I'll send a C implementation (using 64-bit longs)
and an ASM one (for Alpha of course).
I left out the code to handle negative inputs, NaNs and so on for now
to avoid getting distracted by them.
What I'm hoping is that some of you will see ways to simplify and/or
speed up the code. I've tried tuning the ASM a bit, but I don't
understand the 21164's scheduling well. For instance there's one
place I have a FNOP and putting a UNOP costs five extra cycles. I've
no idea why!
The total time seems to be 101 or 103 cycles or thereabouts. Anyone
want to knock of a few cycles to get under the 100 barrier?
This archive was generated by hypermail 2.0b3 on Sun Feb 21 1999 - 09:00:26 PST