X-Git-Url: https://oss.titaniummirror.com/gitweb?a=blobdiff_plain;f=gmp%2Fmpn%2Fpa64%2FREADME;fp=gmp%2Fmpn%2Fpa64%2FREADME;h=6234a407f28b72754257c1a41521ee8e0533bc61;hb=6fed43773c9b0ce596dca5686f37ac3fc0fa11c0;hp=0000000000000000000000000000000000000000;hpb=27b11d56b743098deb193d510b337ba22dc52e5c;p=msp430-gcc.git diff --git a/gmp/mpn/pa64/README b/gmp/mpn/pa64/README new file mode 100644 index 00000000..6234a407 --- /dev/null +++ b/gmp/mpn/pa64/README @@ -0,0 +1,67 @@ +Copyright 1999, 2001, 2002, 2004 Free Software Foundation, Inc. + +This file is part of the GNU MP Library. + +The GNU MP Library is free software; you can redistribute it and/or modify +it under the terms of the GNU Lesser General Public License as published by +the Free Software Foundation; either version 3 of the License, or (at your +option) any later version. + +The GNU MP Library is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY +or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public +License for more details. + +You should have received a copy of the GNU Lesser General Public License +along with the GNU MP Library. If not, see http://www.gnu.org/licenses/. + + + + +This directory contains mpn functions for 64-bit PA-RISC 2.0. + +PIPELINE SUMMARY + +The PA8x00 processors have an orthogonal 4-way out-of-order pipeline. Each +cycle two ALU operations and two MEM operations can issue, but just one of the +MEM operations may be a store. The two ALU operations can be almost any +combination of non-memory operations. Unlike every other processor, integer +and fp operations are completely equal here; they both count as just ALU +operations. + +Unfortunately, some operations cause hickups in the pipeline. Combining +carry-consuming operations like ADD,DC with operations that does not set carry +like ADD,L cause long delays. Skip operations also seem to cause hickups. If +several ADD,DC are issued consecutively, or if plain carry-generating ADD feed +ADD,DC, stalling does not occur. We can effectively issue two ADD,DC +operations/cycle. + +Latency scheduling is not as important as making sure to have a mix of ALU and +MEM operations, but for full pipeline utilization, it is still a good idea to +do some amount of latency scheduling. + +Like for all other processors, RAW memory scheduling is critically important. +Since integer multiplication takes place in the floating-point unit, the GMP +code needs to handle this problem frequently. + +STATUS + +* mpn_lshift and mpn_rshift run at 1.5 cycles/limb on PA8000 and at 1.0 + cycles/limb on PA8500. With latency scheduling, the numbers could + probably be improved to 1.0 cycles/limb for all PA8x00 chips. + +* mpn_add_n and mpn_sub_n run at 2.0 cycles/limb on PA8000 and at about + 1.6875 cycles/limb on PA8500. With latency scheduling, this could + probably be improved to get close to 1.5 cycles/limb. A problem is the + stalling of carry-inputting instructions after instructions that do not + write to carry. + +* mpn_mul_1, mpn_addmul_1, and mpn_submul_1 run at between 5.625 and 6.375 + on PA8500 and later, and about a cycle/limb slower on older chips. The + code uses ADD,DC for adjacent limbs, and relies heavily on reordering. + + +REFERENCES + +Hewlett Packard, "64-Bit Runtime Architecture for PA-RISC 2.0", version 3.3, +October 1997.