X-Git-Url: https://oss.titaniummirror.com/gitweb?a=blobdiff_plain;f=gmp%2Fdoc%2Ftasks.html;fp=gmp%2Fdoc%2Ftasks.html;h=1c3a12b29a34d1d9a1db61892a0fc428effd9057;hb=6fed43773c9b0ce596dca5686f37ac3fc0fa11c0;hp=0000000000000000000000000000000000000000;hpb=27b11d56b743098deb193d510b337ba22dc52e5c;p=msp430-gcc.git diff --git a/gmp/doc/tasks.html b/gmp/doc/tasks.html new file mode 100644 index 00000000..1c3a12b2 --- /dev/null +++ b/gmp/doc/tasks.html @@ -0,0 +1,910 @@ + + +
++Copyright 2000, 2001, 2002, 2003, 2004, 2006, 2008, 2009 Free Software +Foundation, Inc. + +This file is part of the GNU MP Library. + +The GNU MP Library is free software; you can redistribute it and/or modify +it under the terms of the GNU Lesser General Public License as published +by the Free Software Foundation; either version 3 of the License, or (at +your option) any later version. + +The GNU MP Library is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY +or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public +License for more details. + +You should have received a copy of the GNU Lesser General Public License +along with the GNU MP Library. If not, see http://www.gnu.org/licenses/. ++ + +
These are itemized GMP development tasks. Not all the tasks + listed here are suitable for volunteers, but many of them are. + Please see the projects file for more + sizeable projects. + +
CAUTION: This file needs updating. Many of the tasks here have +either already been taken care of, or have become irrelevant. + +
_LONG_LONG_LIMB
in gmp.h is not namespace clean. Reported
+ by Patrick Pelissier.
+ _LONG_LONG_LIMB
in past releases, so
+ need to be careful about changing it. It used to be a define
+ applications had to set for long long limb systems, but that in
+ particular is no longer relevant now that it's established automatically.
+_mpz_realloc
with a small (1 limb) size.
+mpz_XXX(a,a,a)
.
+mpf_t
numbers with exponents >2^53 on
+ machines with 64-bit mp_exp_t
, the precision of
+ __mp_bases[base].chars_per_bit_exactly
is insufficient and
+ mpf_get_str
aborts. Detect and compensate. Alternately,
+ think seriously about using some sort of fixed-point integer value.
+ Avoiding unnecessary floating point is probably a good thing in general,
+ and it might be faster on some CPUs.
+mpf_eq
is not always correct, when one operand is
+ 1000000000... and the other operand is 0111111111..., i.e., extremely
+ close. There is a special case in mpf_sub
for this
+ situation; put similar code in mpf_eq
. [In progress.]
+mpf_eq
doesn't implement what gmp.texi specifies. It should
+ not use just whole limbs, but partial limbs. [In progress.]
+mpf_set_str
doesn't validate it's exponent, for instance
+ garbage 123.456eX789X is accepted (and an exponent 0 used), and overflow
+ of a long
is not detected.
+mpf_add
doesn't check for a carry from truncated portions of
+ the inputs, and in that respect doesn't implement the "infinite precision
+ followed by truncate" specified in the manual.
+mpz_add
etc, which doesn't work
+ when those routines are coming from a DLL (because they're effectively
+ function pointer global variables themselves). Need to rearrange perhaps
+ to a set of calls to a test function rather than iterating over an array.
+mpz_pow_ui
: Detect when the result would be more memory than
+ a size_t
can represent and raise some suitable exception,
+ probably an alloc call asking for SIZE_T_MAX
, and if that
+ somehow succeeds then an abort
. Various size overflows of
+ this kind are not handled gracefully, probably resulting in segvs.
+ mpz_n_pow_ui
, detect when the count of low zero bits
+ exceeds an unsigned long
. There's a (small) chance of this
+ happening but still having enough memory to represent the value.
+ Reported by Winfried Dreckmann in for instance mpz_ui_pow_ui (x,
+ 4UL, 1431655766UL)
.
+mpf
: Detect exponent overflow and raise some exception.
+ It'd be nice to allow the full mp_exp_t
range since that's
+ how it's been in the past, but maybe dropping one bit would make it
+ easier to test if e1+e2 goes out of bounds.
+mpf_cmp
: For better cache locality, don't test for low zero
+ limbs until the high limbs fail to give an ordering. Reduce code size by
+ turning the three mpn_cmp
's into a single loop stopping when
+ the end of one operand is reached (and then looking for a non-zero in the
+ rest of the other).
+mpf_mul_2exp
, mpf_div_2exp
: The use of
+ mpn_lshift
for any size<=prec means repeated
+ mul_2exp
and div_2exp
calls accumulate low zero
+ limbs until size==prec+1 is reached. Those zeros will slow down
+ subsequent operations, especially if the value is otherwise only small.
+ If low bits of the low limb are zero, use mpn_rshift
so as
+ to not increase the size.
+mpn_dc_sqrtrem
: Don't use mpn_addmul_1
with
+ multiplier==2, instead either mpn_addlsh1_n
when available,
+ or mpn_lshift
+mpn_add_n
if not.
+mpn_dc_sqrtrem
, mpn_sqrtrem2
: Don't use
+ mpn_add_1
and mpn_sub_1
for 1 limb operations,
+ instead ADDC_LIMB
and SUBC_LIMB
.
+mpn_sqrtrem2
: Use plain variables for sp[0]
and
+ rp[0]
calculations, so the compiler needn't worry about
+ aliasing between sp
and rp
.
+mpn_sqrtrem
: Some work can be saved in the last step when
+ the remainder is not required, as noted in Paul's paper.
+mpq_add
, mpq_add
: The division "op1.den / gcd"
+ is done twice, where of course only once is necessary. Reported by Larry
+ Lambe.
+mpq_add
, mpq_sub
: The gcd fits a single limb
+ with high probability and in this case modlimb_invert
could
+ be used to calculate the inverse just once for the two exact divisions
+ "op1.den / gcd" and "op2.den / gcd", rather than letting
+ mpn_divexact_1
do it each time. This would require a new
+ mpn_preinv_divexact_1
interface. Not sure if it'd be worth
+ the trouble.
+mpq_add
, mpq_sub
: The use of
+ mpz_mul(x,y,x)
causes temp allocation or copying in
+ mpz_mul
which can probably be avoided. A rewrite using
+ mpn
might be best.
+mpn_gcdext
: Don't test count_leading_zeros
for
+ zero, instead check the high bit of the operand and avoid invoking
+ count_leading_zeros
. This is an optimization on all
+ machines, and significant on machines with slow
+ count_leading_zeros
, though it's possible an already
+ normalized operand might not be encountered very often.
+umul_ppmm
to use floating-point for generating the
+ most significant limb (if BITS_PER_MP_LIMB
<= 52 bits).
+ (Peter Montgomery has some ideas on this subject.)
+umul_ppmm
code in longlong.h: Add partial
+ products with fewer operations.
+mpz_set_ui
. This would be both small and
+ fast, especially for compile-time constants, but would make application
+ binaries depend on having 1 limb allocated to an mpz_t
,
+ preventing the "lazy" allocation scheme below.
+mpz_[cft]div_ui
and maybe
+ mpz_[cft]div_r_ui
. A __gmp_divide_by_zero
+ would be needed for the divide by zero test, unless that could be left to
+ mpn_mod_1
(not sure currently whether all the risc chips
+ provoke the right exception there if using mul-by-inverse).
+mpz_fits_s*_p
. The setups for
+ LONG_MAX
etc would need to go into gmp.h, and on Cray it
+ might, unfortunately, be necessary to forcibly include <limits.h>
+ since there's no apparent way to get SHRT_MAX
with an
+ expression (since short
and unsigned short
can
+ be different sizes).
+mpz_powm
and mpz_powm_ui
aren't very
+ fast on one or two limb moduli, due to a lot of function call
+ overheads. These could perhaps be handled as special cases.
+mpz_powm
and mpz_powm_ui
want better
+ algorithm selection, and the latter should use REDC. Both could
+ change to use an mpn_powm
and mpn_redc
.
+mpz_powm
REDC should do multiplications by g[]
+ using the division method when they're small, since the REDC form of a
+ small multiplier is normally a full size product. Probably would need a
+ new tuned parameter to say what size multiplier is "small", as a function
+ of the size of the modulus.
+mpz_powm
REDC should handle even moduli if possible. Maybe
+ this would mean for m=n*2^k doing mod n using REDC and an auxiliary
+ calculation mod 2^k, then putting them together at the end.
+mpn_gcd
might be able to be sped up on small to
+ moderate sizes by improving find_a
, possibly just by
+ providing an alternate implementation for CPUs with slowish
+ count_leading_zeros
.
+mpn_divexact_by3c
exists.
+mpf_set_str
produces low zero limbs when a string has a
+ fraction but is exactly representable, eg. 0.5 in decimal. These could be
+ stripped to save work in later operations.
+mpz_and
, mpz_ior
and mpz_xor
should
+ use mpn_and_n
etc for the benefit of the small number of
+ targets with native versions of those routines. Need to be careful not to
+ pass size==0. Is some code sharing possible between the mpz
+ routines?
+mpf_add
: Don't do a copy to avoid overlapping operands
+ unless it's really necessary (currently only sizes are tested, not
+ whether r really is u or v).
+mpf_add
: Under the check for v having no effect on the
+ result, perhaps test for r==u and do nothing in that case, rather than
+ currently it looks like an MPN_COPY_INCR
will be done to
+ reduce prec+1 limbs to prec.
+mpf_div_ui
: Instead of padding with low zeros, call
+ mpn_divrem_1
asking for fractional quotient limbs.
+mpf_div_ui
: Eliminate TMP_ALLOC
. When r!=u
+ there's no overlap and the division can be called on those operands.
+ When r==u and is prec+1 limbs, then it's an in-place division. If r==u
+ and not prec+1 limbs, then move the available limbs up to prec+1 and do
+ an in-place there.
+mpf_div_ui
: Whether the high quotient limb is zero can be
+ determined by testing the dividend for high<divisor. When non-zero, the
+ divison can be done on prec dividend limbs instead of prec+1. The result
+ size is also known before the division, so that can be a tail call (once
+ the TMP_ALLOC
is eliminated).
+mpn_divrem_2
could usefully accept unnormalized divisors and
+ shift the dividend on-the-fly, since this should cost nothing on
+ superscalar processors and avoid the need for temporary copying in
+ mpn_tdiv_qr
.
+mpf_sqrt
: If r!=u, and if u doesn't need to be padded with
+ zeros, then there's no need for the tp temporary.
+mpq_cmp_ui
could form the num1*den2
and
+ num2*den1
products limb-by-limb from high to low and look at
+ each step for values differing by more than the possible carry bit from
+ the uncalculated portion.
+mpq_cmp
could do the same high-to-low progressive multiply
+ and compare. The benefits of karatsuba and higher multiplication
+ algorithms are lost, but if it's assumed only a few high limbs will be
+ needed to determine an order then that's fine.
+mpn_add_1
, mpn_sub_1
, mpn_add
,
+ mpn_sub
: Internally use __GMPN_ADD_1
etc
+ instead of the functions, so they get inlined on all compilers, not just
+ gcc and others with inline
recognised in gmp.h.
+ __GMPN_ADD_1
etc are meant mostly to support application
+ inline mpn_add_1
etc and if they don't come out good for
+ internal uses then special forms can be introduced, for instance many
+ internal uses are in-place. Sometimes a block of code is executed based
+ on the carry-out, rather than using it arithmetically, and those places
+ might want to do their own loops entirely.
+__gmp_extract_double
on 64-bit systems could use just one
+ bitfield for the mantissa extraction, not two, when endianness permits.
+ Might depend on the compiler allowing long long
bit fields
+ when that's the only actual 64-bit type.
+TMP_FREE
releases all memory, so
+ there's an allocate and free every time a top-level function using
+ TMP
is called. Would need
+ mp_set_memory_functions
to tell tal-notreent.c to release
+ any cached memory when changing allocation functions though.
+__gmp_tmp_alloc
from tal-notreent.c could be partially
+ inlined. If the current chunk has enough room then a couple of pointers
+ can be updated. Only if more space is required then a call to some sort
+ of __gmp_tmp_increase
would be needed. The requirement that
+ TMP_ALLOC
is an expression might make the implementation a
+ bit ugly and/or a bit sub-optimal.
++#define TMP_ALLOC(n) + ((ROUND_UP(n) > current->end - current->point ? + __gmp_tmp_increase (ROUND_UP (n)) : 0), + current->point += ROUND_UP (n), + current->point - ROUND_UP (n)) ++
__mp_bases
has a lot of data for bases which are pretty much
+ never used. Perhaps the table should just go up to base 16, and have
+ code to generate data above that, if and when required. Naturally this
+ assumes the code would be smaller than the data saved.
+__mp_bases
field big_base_inverted
is only used
+ if USE_PREINV_DIVREM_1
is true, and could be omitted
+ otherwise, to save space.
+mpz_get_str
, mtox
: For power-of-2 bases, which
+ are of course fast, it seems a little silly to make a second pass over
+ the mpn_get_str
output to convert to ASCII. Perhaps combine
+ that with the bit extractions.
+mpz_gcdext
: If the caller requests only the S cofactor (of
+ A), and A<B, then the code ends up generating the cofactor T (of B) and
+ deriving S from that. Perhaps it'd be possible to arrange to get S in
+ the first place by calling mpn_gcdext
with A+B,B. This
+ might only be an advantage if A and B are about the same size.
+mpz_n_pow_ui
does a good job with small bases and stripping
+ powers of 2, but it's perhaps a bit too complicated for what it gains.
+ The simpler mpn_pow_1
is a little faster on small exponents.
+ (Note some of the ugliness in mpz_n_pow_ui
is due to
+ supporting mpn_mul_2
.)
+ mpz_n_pow_ui
should be
+ confined to single limb operands for simplicity and since that's where
+ the greatest gain would be.
+ mpn_pow_1
and mpz_n_pow_ui
would be
+ merged. The reason mpz_n_pow_ui
writes to an
+ mpz_t
is that its callers leave it to make a good estimate
+ of the result size. Callers of mpn_pow_1
already know the
+ size by separate means (mp_bases
).
+mpz_invert
should call mpn_gcdext
directly.
+invert_limb
on various processors might benefit from the
+ little Newton iteration done for alpha and ia64.
+mpn_addlsh1_n
could be implemented with
+ mpn_addmul_1
, since that code at 3.5 is a touch faster than
+ a separate lshift
and add_n
at
+ 1.75+2.125=3.875. Or very likely some specific addlsh1_n
+ code could beat both.
+mpn_mul_1
,
+ mpn_addmul_1
, and mpn_submul_1
.
+mpn_mul_1
, mpn_addmul_1
,
+ and mpn_submul_1
for the 21164. This should use both integer
+ multiplies and floating-point multiplies. For the floating-point
+ operations, the single-limb multiplier should be split into three 21-bit
+ chunks, or perhaps even better in four 16-bit chunks. Probably possible
+ to reach 9 cycles/limb.
+__builtin_ctzl
,
+ __builtin_clzl
and __builtin_popcountl
using
+ the corresponding CIX ct
instructions, and
+ __builtin_alpha_cmpbge
. These should give GCC more
+ information about sheduling etc than the asm
blocks
+ currently used in longlong.h and gmp-impl.h.
+alloca
on this system,
+ making configure
choose the slower
+ malloc-reentrant
allocation method. Is there a better way?
+ Maybe variable-length arrays per notes below.
+.align
is not used since it pads
+ with garbage. Does the code get the intended slotting required for the
+ claimed speeds? .align
at the start of a function would
+ presumably be safe no matter how it pads.
+count_leading_zeros
can use the clz
+ instruction. For GCC 3.4 and up, do this via __builtin_clzl
+ since then gcc knows it's "predicable".
+__builtin_popcount
which can be
+ used instead of an asm
block. The builtin should give gcc
+ more opportunities for scheduling, bundling and predication.
+ __builtin_ctz
similarly (it just uses popcount as per
+ current longlong.h).
+mpn_mul_1
, mpn_addmul_1
,
+ for s2 < 2^32 (or perhaps for any zero 16-bit s2 chunk). Not sure how
+ much this can improve the speed, though, since the symmetry that we rely
+ on is lost. Perhaps we can just gain cycles when s2 < 2^16, or more
+ accurately, when two 16-bit s2 chunks which are 16 bits apart are zero.
+mpn_submul_1
, analogous to
+ mpn_addmul_1
.
+umul_ppmm
. Using four
+ "mulx
"s either with an asm block or via the generic C code is
+ about 90 cycles. Try using fp operations, and also try using karatsuba
+ for just three "mulx
"s.
+mpn_lshift
, mpn_rshift
.
+ Will give 2 cycles/limb. Trivial modifications of mpn/sparc64 should do.
+mulx
for umul_ppmm
if
+ possible (see commented out code in longlong.h). This is unlikely to
+ save more than a couple of cycles, so perhaps isn't worth bothering with.
+__sparc_v9__
+ or anything to indicate V9 support when -mcpu=v9 is selected. See
+ gcc/config/sol2-sld-64.h. Will need to pass something through from
+ ./configure to select the right code in longlong.h. (Currently nothing
+ is lost because mulx
for multiplying is commented out.)
+mpn_divexact_1
and
+ mpn_modexact_1c_odd
can use a 64-bit inverse and take
+ 64-bits at a time from the dividend, as per the 32-bit divisor case in
+ mpn/sparc64/mode1o.c. This must be done in assembler, since the full
+ 64-bit registers (%gN
) are not available from C.
+mpn_divexact_by3c
can work 64-bits at a time
+ using mulx
, in assembler. This would be the same as for
+ sparc64.
+modlimb_invert
might save a few cycles from
+ masking down to just the useful bits at each point in the calculation,
+ since mulx
speed depends on the highest bit set. Either
+ explicit masks or small types like short
and
+ int
ought to work.
+popc
: This chip reputedly implements
+ popc
properly (see gcc sparc.md). Would need to recognise
+ it as sparchalr1
or something in configure / config.sub /
+ config.guess. popc_limb
in gmp-impl.h could use this (per
+ commented out code). count_trailing_zeros
could use it too.
+mpn_addmul_1
, mpn_submul_1
, and
+ mpn_mul_1
. The current code runs at 11 cycles/limb. It
+ should be possible to saturate the cache, which will happen at 8
+ cycles/limb (7.5 for mpn_mul_1). Write special loops for s2 < 2^32;
+ it should be possible to make them run at about 5 cycles/limb.
+powerpc*
.
+mpn_addmul_1
, mpn_submul_1
, and
+ mpn_mul_1
. Use both integer and floating-point operations,
+ possibly two floating-point and one integer limb per loop. Split operands
+ into four 16-bit chunks for fast fp operations. Should easily reach 9
+ cycles/limb (using one int + one fp), but perhaps even 7 cycles/limb
+ (using one int + two fp).
+mpn_rshift
could do the same sort of unrolled loop
+ as mpn_lshift
. Some judicious use of m4 might let the two
+ share source code, or with a register to control the loop direction
+ perhaps even share object code.
+mpn_mul_basecase
and mpn_sqr_basecase
+ for important machines. Helping the generic sqr_basecase.c with an
+ mpn_sqr_diagonal
might be enough for some of the RISCs.
+mpn_lshift
/mpn_rshift
.
+ Will bring time from 1.75 to 1.25 cycles/limb.
+mpn_lshift
for shifts by 1. (See
+ Pentium code.)
+rep
+ movs
would upset GCC register allocation for the whole function.
+ Is this still true in GCC 3? It uses rep movs
itself for
+ __builtin_memcpy
. Examine the code for some simple and
+ complex functions to find out. Inlining rep movs
would be
+ desirable, it'd be both smaller and faster.
+mpn_lshift
and mpn_rshift
can come
+ down from 6.0 c/l to 5.5 or 5.375 by paying attention to pairing after
+ shrdl
and shldl
, see mpn/x86/pentium/README.
+mpn_lshift
and mpn_rshift
+ might benefit from some destination prefetching.
+mpn_divrem_1
might be able to use a
+ mul-by-inverse, hoping for maybe 30 c/l.
+mpn_lshift
and mpn_rshift
might be able to
+ do something branch-free for unaligned startups, and shaving one insn
+ from the loop with alternative indexing might save a cycle.
+mpn_lshift
.
+ The pipeline is now extremely deep, perhaps unnecessarily deep.
+mpn_mul_basecase
and
+ mpn_sqr_basecase
. This should use a "vertical multiplication
+ method", to avoid carry propagation. splitting one of the operands in
+ 11-bit chunks.
+mpn_lshift
by 31 should use the special rshift
+ by 1 code, and vice versa mpn_rshift
by 31 should use the
+ special lshift by 1. This would be best as a jump across to the other
+ routine, could let both live in lshift.asm and omit rshift.asm on finding
+ mpn_rshift
already provided.
+mpn_com_n
and mpn_and_n
etc very probably
+ wants a pragma like MPN_COPY_INCR
.
+mpn_lshift
, mpn_rshift
,
+ mpn_popcount
and mpn_hamdist
are nice and small
+ and could be inlined to avoid function calls.
+TMP_ALLOC
to use them, or introduce a new scheme. Memory
+ blocks wanted unconditionally are easy enough, those wanted only
+ sometimes are a problem. Perhaps a special size calculation to ask for a
+ dummy length 1 when unwanted, or perhaps an inlined subroutine
+ duplicating code under each conditional. Don't really want to turn
+ everything into a dog's dinner just because Cray don't offer an
+ alloca
.
+mpn_get_str
on power-of-2 bases ought to vectorize.
+ Does it? bits_per_digit
and the inner loop over bits in a
+ limb might prevent it. Perhaps special cases for binary, octal and hex
+ would be worthwhile (very possibly for all processors too).
+BSWAP_LIMB_FETCH
looks like it could be done with
+ lrvg
, as per glibc sysdeps/s390/s390-64/bits/byteswap.h.
+ This is only for 64-bit mode or something is it, since 32-bit mode has
+ other code? Also, is it worth using for BSWAP_LIMB
too, or
+ would that mean a store and re-fetch? Presumably that's what comes out
+ in glibc.
+count_leading_zeros
for 64-bit machines:
+ + if ((x >> 32) == 0) { x <<= 32; cnt += 32; } + if ((x >> 48) == 0) { x <<= 16; cnt += 16; } + ...+
__inline
which could perhaps
+ be used in __GMP_EXTERN_INLINE
. What would be the right way
+ to identify suitable versions of that compiler?
+cc
is rumoured to have an _int_mult_upper
+ (in <intrinsics.h>
like Cray), but it didn't seem to
+ exist on some IRIX 6.5 systems tried. If it does actually exist
+ somewhere it would very likely be an improvement over a function call to
+ umul.asm.
+mpn_get_str
final divisions by the base with
+ udiv_qrnd_unnorm
could use some sort of multiply-by-inverse
+ on suitable machines. This ends up happening for decimal by presenting
+ the compiler with a run-time constant, but the same for other bases would
+ be good. Perhaps use could be made of the fact base<256.
+mpn_umul_ppmm
, mpn_udiv_qrnnd
: Return a
+ structure like div_t
to avoid going through memory, in
+ particular helping RISCs that don't do store-to-load forwarding. Clearly
+ this is only possible if the ABI returns a structure of two
+ mp_limb_t
s in registers.
+ mpz_crr
(Chinese Remainder Reconstruction).
+mpz_init
and mpq_init
could do lazy allocation.
+ Set ALLOC(var)
to 0 to indicate nothing allocated, and let
+ _mpz_realloc
do the initial alloc. Set
+ z->_mp_d
to a dummy that mpz_get_ui
and
+ similar can unconditionally fetch from. Niels Möller has had a go at
+ this.
+ mpz_init
and then
+ more or less immediately reallocating.
+ mpz_init
would only store magic values in the
+ mpz_t
fields, and could be inlined.
+ mpz_t z = MPZ_INITIALIZER;
, which might be convenient
+ for globals.
+ mpz_set_ui
and other similar routines needn't check the
+ size allocated and can just store unconditionally.
+ mpz_set_ui
and perhaps others like
+ mpz_tdiv_r_ui
and a prospective
+ mpz_set_ull
could be inlined.
+ mpf_out_raw
and mpf_inp_raw
. Make sure
+ format is portable between 32-bit and 64-bit machines, and between
+ little-endian and big-endian machines. A format which MPFR can use too
+ would be good.
+mpn_and_n
... mpn_copyd
: Perhaps make the mpn
+ logops and copys available in gmp.h, either as library functions or
+ inlines, with the availability of library functions instantiated in the
+ generated gmp.h at build time.
+mpz_set_str
etc variants taking string lengths rather than
+ null-terminators.
+mpz_andn
, mpz_iorn
, mpz_nand
,
+ mpz_nior
, mpz_xnor
might be useful additions,
+ if they could share code with the current such functions (which should be
+ possible).
+mpz_and_ui
etc might be of use sometimes. Suggested by
+ Niels Möller.
+mpf_set_str
and mpf_inp_str
could usefully
+ accept 0x, 0b etc when base==0. Perhaps the exponent could default to
+ decimal in this case, with a further 0x, 0b etc allowed there.
+ Eg. 0xFFAA@0x5A. A leading "0" for octal would match the integers, but
+ probably something like "0.123" ought not mean octal.
+GMP_LONG_LONG_LIMB
or some such could become a documented
+ feature of gmp.h, so applications could know whether to
+ printf
a limb using %lu
or %Lu
.
+GMP_PRIdMP_LIMB
and similar defines following C99
+ <inttypes.h> might be of use to applications printing limbs. But
+ if GMP_LONG_LONG_LIMB
or whatever is added then perhaps this
+ can easily enough be left to applications.
+gmp_printf
could accept %b
for binary output.
+ It'd be nice if it worked for plain int
etc too, not just
+ mpz_t
etc.
+gmp_printf
in fact could usefully accept an arbitrary base,
+ for both integer and float conversions. A base either in the format
+ string or as a parameter with *
should be allowed. Maybe
+ &13b
(b for base) or something like that.
+gmp_printf
could perhaps accept mpq_t
for float
+ conversions, eg. "%.4Qf"
. This would be merely for
+ convenience, but still might be useful. Rounding would be the same as
+ for an mpf_t
(ie. currently round-to-nearest, but not
+ actually documented). Alternately, perhaps a separate
+ mpq_get_str_point
or some such might be more use. Suggested
+ by Pedro Gimeno.
+mpz_rscan0
or mpz_revscan0
or some such
+ searching towards the low end of an integer might match
+ mpz_scan0
nicely. Likewise for scan1
.
+ Suggested by Roberto Bagnara.
+mpz_bit_subset
or some such to test whether one integer is a
+ bitwise subset of another might be of use. Some sort of return value
+ indicating whether it's a proper or non-proper subset would be good and
+ wouldn't cost anything in the implementation. Suggested by Roberto
+ Bagnara.
+mpf_get_ld
, mpf_set_ld
: Conversions between
+ mpf_t
and long double
, suggested by Dan
+ Christensen. Other long double
routines might be desirable
+ too, but mpf
would be a start.
+ long double
is an ANSI-ism, so everything involving it would
+ need to be suppressed on a K&R compiler.
+ configure
to recognise
+ the format in use, MPFR has a start on this. Often long
+ double
is the same as double
, which is easy but
+ pretty pointless. A single float format detector macro could look at
+ double
then long double
+ long
+ double
, eg. xlc on AIX can use either 64-bit or 128-bit. It's
+ probably simplest to regard this as a compiler compatibility issue, and
+ leave it to users or sysadmins to ensure application and library code is
+ built the same.
+mpz_sqrt_if_perfect_square
: When
+ mpz_perfect_square_p
does its tests it calculates a square
+ root and then discards it. For some applications it might be useful to
+ return that root. Suggested by Jason Moxham.
+mpz_get_ull
, mpz_set_ull
,
+ mpz_get_sll
, mpz_get_sll
: Conversions for
+ long long
. These would aid interoperability, though a
+ mixture of GMP and long long
would probably not be too
+ common. Since long long
is not always available (it's in
+ C99 and GCC though), disadvantages of using long long
in
+ libgmp.a would be
+ #ifdef
block to decide if the
+ application compiler could take the long long
+ prototypes.
+ LIBGMP_HAS_LONGLONG
might be wanted to
+ indicate whether the functions are available. (Applications using
+ autoconf could probe the library too.)
+ long long
to
+ application compile time, by having something like
+ mpz_set_2ui
called with two halves of a long
+ long
. Disadvantages of this would be,
+ long
+ long
is normally passed as two halves anyway.
+ mpz_get_ull
would be a rather big inline, or would have
+ to be two function calls.
+ mpz_get_sll
would be a worse inline, and would put the
+ treatment of -0x10..00
into applications (see
+ mpz_get_si
correctness above).
+ long long
is probably the lesser evil, if only
+ because it makes best use of gcc. In fact perhaps it would suffice to
+ guarantee long long
conversions only when using GCC for both
+ application and library. That would cover free software, and we can
+ worry about selected vendor compilers later.
+ long long
should be available always. We'd probably prefer
+ to have the C and C++ the same in respect of long long
+ support, but it would be possible to have it unconditionally in gmpxx.h,
+ by some means or another.
+mpz_strtoz
parsing the same as strtol
.
+ Suggested by Alexander Kruppa.
+umul_ppmm
in longlong.h always uses umull
,
+ but is that available only for M series chips or some such? Perhaps it
+ should be configured in some way.
+-mschedule=7200
etc parameter,
+ which could be driven by an exact hppa cpu type.
+AC_C_BIGENDIAN
seems the best way to handle that for GMP.
+*-*-aix*
. It might be more reliable to do some sort of
+ feature test, examining the compiler output perhaps. It might also be
+ nice to merge the aix.m4 files into powerpc-defs.m4.
+AC_OUTPUT
+ would work, but it might upset "make" to have things like L$
+ get into the Makefiles through AC_SUBST
.
+ AC_CONFIG_COMMANDS
would be the alternative. With some
+ careful m4 quoting the changequote
calls might not be
+ needed, which might free up the order in which things had to be output.
+CCAS
, CCASFLAGS
+ scheme. Though we probably wouldn't be using its assembler support we
+ could try to use those variables in compatible ways.
+GMP_LDFLAGS
could probably be done with plain
+ LDFLAGS
already used by automake for all linking. But with
+ a bit of luck the next libtool will pass pretty much all
+ CFLAGS
through to the compiler when linking, making
+ GMP_LDFLAGS
unnecessary.
+-c
and -o
together in the
+ .S and .asm rules, but apparently that isn't completely portable (there's
+ an autoconf AC_PROG_CC_C_O
test for it). So far we've not
+ had problems, but perhaps the rules could be rewritten to use "foo.s" as
+ the temporary, or to do a suitable "mv" of the result. The only danger
+ from using foo.s would be if a compile failed and the temporary foo.s
+ then looked like the primary source. Hopefully if the
+ SUFFIXES
are ordered to have .S and .asm ahead of .s that
+ wouldn't happen. Might need to check.
+_gmp_rand
is not particularly fast on the linear
+ congruential algorithm and could stand various improvements.
+ gmp_randstate_t
(or
+ _mp_algdata
rather) to save some copying.
+ 2exp
modulus, to
+ avoid mpn_mul
calls. Perhaps the same for two limbs.
+ lc
code, to avoid a function call and
+ TMP_ALLOC
for every chunk.
+ 2exp
and general LC cases should be split,
+ for clarity (if the general case is retained).
+ gmp_randstate_t
used for parameters perhaps should become
+ gmp_randstate_ptr
the same as other types.
+mpz_class(string)
, etc: Use the C++ global locale to
+ identify whitespace.
+ mpf_class(string)
: Use the C++ global locale decimal point,
+ rather than the C one.
+ mpz_set_str
etc forms
+ available for mpz_t
too, not just mpz_class
+ etc.
+mpq_class operator+=
: Don't emit an unnecssary
+ mpq_set(q,q)
before mpz_addmul
etc.
+mpz_class(const char *)
, etc: since they're normally
+ not fast anyway, and we can hide the exception throw
.
+ mpz_class(string)
, etc: to hide the cstr
+ needed to get to the C conversion function.
+ mpz_class string, char*
etc constructors: likewise to
+ hide the throws and conversions.
+ mpz_class::get_str
, etc: to hide the char*
+ to string
conversion and free. Perhaps
+ mpz_get_str
can write directly into a
+ string
, to avoid copying.
+ string
returning variants
+ available for use with plain mpz_t
etc too.
+ mpz_gcdext
and mpn_gcdext
ought to document
+ what range of values the generated cofactors can take, and preferably
+ ensure the definition uniquely specifies the cofactors for given inputs.
+ A basic extended Euclidean algorithm or multi-step variant leads to
+ |x|<|b| and |y|<|a| or something like that, but there's probably
+ two solutions under just those restrictions.
+mpz_divisible_ui_p
rather than
+ mpz_tdiv_qr_ui
. (Of course dividing multiple primes at a
+ time would be better still.)
+libgmp
. This establishes good cross-checks, but it might be
+ better to use simple reference routines where possible. Where it's not
+ possible some attention could be paid to the order of the tests, so a
+ libgmp
routine is only used for tests once it seems to be
+ good.
+MUL_FFT_THRESHOLD
etc: the FFT thresholds should allow a
+ return to a previous k at certain sizes. This arises basically due to
+ the step effect caused by size multiples effectively used for each k.
+ Looking at a graph makes it fairly clear.
+__gmp_doprnt_mpf
does a rather unattractive round-to-nearest
+ on the string returned by mpf_get_str
. Perhaps some variant
+ of mpf_get_str
could be made which would better suit.
+ASSERT
s at the start of each user-visible mpz/mpq/mpf
+ function to check the validity of each mp?_t
parameter, in
+ particular to check they've been mp?_init
ed. This might
+ catch elementary mistakes in user programs. Care would need to be taken
+ over MPZ_TMP_INIT
ed variables used internally. If nothing
+ else then consistency checks like size<=alloc, ptr not
+ NULL
and ptr+size not wrapping around the address space,
+ would be possible. A more sophisticated scheme could track
+ _mp_d
pointers and ensure only a valid one is used. Such a
+ scheme probably wouldn't be reentrant, not without some help from the
+ system.
+getrusage
and gettimeofday
are reliable.
+ Currently we pretend in configure that the dodgy m68k netbsd 1.4.1
+ getrusage
doesn't exist. If a test might take a long time
+ to run then perhaps cache the result in a file somewhere.
+speed_unittime
determined, independent of the method in use.
+sysconf(_SC_CLK_TCK)
, since it seems to be clock cycle
+ based. Is this true for all Cray systems? Would like some documentation
+ or something to confirm.
+mpz_inp_str
(etc) doesn't say when it stops reading digits.
+mpn_get_str
isn't terribly clear about how many digits it
+ produces. It'd probably be possible to say at most one leading zero,
+ which is what both it and mpz_get_str
currently do. But
+ want to be careful not to bind ourselves to something that might not suit
+ another implementation.
+va_arg
doesn't do the right thing with mpz_t
+ etc directly, but instead needs a pointer type like MP_INT*
.
+ It'd be good to show how to do this, but we'd either need to document
+ mpz_ptr
and friends, or perhaps fallback on something
+ slightly nasty with void*
.
+The following may or may not be feasible, and aren't likely to get done in the +near future, but are at least worth thinking about. + +
mpn_umul_ppmm
, and the corresponding umul.asm file could be
+ included in libgmp only in that case, the same as is effectively done for
+ __clz_tab
. Likewise udiv.asm and perhaps cntlz.asm. This
+ would only be a very small space saving, so perhaps not worth the
+ complexity.
+mpz_get_si
returns 0x80000000 for -0x100000000, whereas it's
+ sort of supposed to return the low 31 (or 63) bits. But this is
+ undocumented, and perhaps not too important.
+mpz_init_set*
and mpz_realloc
could allocate
+ say an extra 16 limbs over what's needed, so as to reduce the chance of
+ having to do a reallocate if the mpz_t
grows a bit more.
+ This could only be an option, since it'd badly bloat memory usage in
+ applications using many small values.
+mpq
functions could perhaps check for numerator or
+ denominator equal to 1, on the assumption that integers or
+ denominator-only values might be expected to occur reasonably often.
+count_trailing_zeros
is used on more or less uniformly
+ distributed numbers in a couple of places. For some CPUs
+ count_trailing_zeros
is slow and it's probably worth handling
+ the frequently occurring 0 to 2 trailing zeros cases specially.
+mpf_t
might like to let the exponent be undefined when
+ size==0, instead of requiring it 0 as now. It should be possible to do
+ size==0 tests before paying attention to the exponent. The advantage is
+ not needing to set exp in the various places a zero result can arise,
+ which avoids some tedium but is otherwise perhaps not too important.
+ Currently mpz_set_f
and mpf_cmp_ui
depend on
+ exp==0, maybe elsewhere too.
+__gmp_allocate_func
: Could use GCC __attribute__
+ ((malloc))
on this, though don't know if it'd do much. GCC 3.0
+ allows that attribute on functions, but not function pointers (see info
+ node "Attribute Syntax"), so would need a new autoconf test. This can
+ wait until there's a GCC that supports it.
+mpz_add_ui
contains two __GMPN_COPY
s, one from
+ mpn_add_1
and one from mpn_sub_1
. If those two
+ routines were opened up a bit maybe that code could be shared. When a
+ copy needs to be done there's no carry to append for the add, and if the
+ copy is non-empty no high zero for the sub.
+The following tasks apply to chips or systems that are old and/or obsolete. +It's unlikely anything will be done about them unless anyone is actively using +them. + +
configure --nfp
but that option is gone now that autoconf is
+ used. The file could go somewhere suitable in the mpn search if any
+ chips might benefit from it, though it's possible we don't currently
+ differentiate enough exact cpu types to do this properly.
+double
floats are straightforward and
+ could perhaps be handled directly in __gmp_extract_double
+ and maybe in mpn_get_d
, rather than falling back on the
+ generic code. (Both formats are detected by configure
.)
+