This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: new alpha division routines


Richard Henderson <rth@redhat.com> writes:

> On Sun, Mar 28, 2004 at 03:24:31PM +0200, Falk Hueffner wrote:
>> incidentally, I was also playing with the division routines lately, or
>> more precisely __remqu, since it seems to take most time (a whopping
>> 30% of all libc cycles when compiling stuff with gcc).
>
> Indeed.  I have a patch to the hashing routines to pre-compute
> a multiplictive inverse of the primes that we use.  I've been
> meaning to do some timing tests to see how much it helps.

Another option would be to just use powers of two as table size and
use a non-sucking hash function. I think that is already used in some
of the special-cases hash tables in gcc.

>> * Special case power-of-two divisor. People do that quite often.
>
> I ran my system for a while with an LD_PRELOAD library to capture
> divisions.

Could you perhaps send me the source for that? I'd be interested in
some __remqu stats.

> So yes, Y a power of two does occur moderately frequently.

I'd expect it to be more frequent for the remainder functions. It's
probably not worth doing for divisions.

>> * Special case small dividends (compared to divisor).
>
> In what way?

x < y and x < 2y is detected early. This seems to only apply to
modulo. 

> I wonder if we can use divs for the second division?  It would save
> about three cycles on ev6, but between 7-29 on ev5 and 29 cycles on ev4.
> We'd have to be able to show that the remainder has no more than 24
> bits set at this point...

At first glance, that should actually work. There ought to be only
about 12 bits left.

> Another option might be to unroll the fixup loop N times such that N
> covers the majority of operands that we actually see.  That would 
> cure the bulk of the branch predicion problem and might even come
> in at less than ev6 12 cycle divs.

I tried that briefly, but it didn't work so well considering each
unrolled loop takes 3 cycles. It should be better on ev5 though where
cmov is cheaper and division more expensive.

>> By the way, what do you think about inlining the 32-bit ops from gcc?
>
> It's a good idea.  Need to think about how to do it though, since
> now we've got to inline the trap-if-zero code too.

Well, division by zero is undefined behaviour... do you think there's
code which relies on it being trappable and resumable?

Anyway, here's my not very well tested patch for gcc... if you think
I'm on the right track I could try to clean it up and submit it to
gcc-patches.

-- 
	Falk
? alpha.md.foo
Index: alpha.md
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/alpha/alpha.md,v
retrieving revision 1.194.2.16.2.1
diff -u -p -r1.194.2.16.2.1 alpha.md
--- alpha.md	21 Feb 2004 23:10:34 -0000	1.194.2.16.2.1
+++ alpha.md	11 Mar 2004 14:00:36 -0000
@@ -58,6 +58,7 @@
    (UNSPEC_PERR		26)
    (UNSPEC_CTLZ		27)
    (UNSPEC_CTPOP	28)
+   (UNSPEC_DIVC		29)
   ])
 
 ;; UNSPEC_VOLATILE:
@@ -771,6 +772,32 @@
   [(set_attr "type" "imul")
    (set_attr "opsize" "udi")])
 
+(define_expand "divsi3_fp"
+  [(set (match_dup 3)
+	(sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "")))
+   (set (match_dup 4)
+	(sign_extend:DI (match_operand:SI 2 "nonimmediate_operand" "")))
+   (set (match_dup 5)
+	(float:DF (match_dup 3)))
+   (set (match_dup 6)
+	(float:DF (match_dup 4)))
+   (set (match_dup 7)
+	(unspec:DF [(match_dup 5)
+	       	    (match_dup 6)] UNSPEC_DIVC))
+   (set (match_dup 8)
+	(fix:DI (match_dup 7)))
+   (set (match_operand:SI 0 "nonimmediate_operand" "")
+	(subreg:SI (match_dup 8) 0))]
+  "TARGET_FP"
+{
+  operands[3] = gen_reg_rtx (DImode);
+  operands[4] = gen_reg_rtx (DImode);
+  operands[5] = gen_reg_rtx (DFmode);
+  operands[6] = gen_reg_rtx (DFmode);
+  operands[7] = gen_reg_rtx (DFmode);
+  operands[8] = gen_reg_rtx (DImode);
+})
+
 ;; The divide and remainder operations take their inputs from r24 and
 ;; r25, put their output in r27, and clobber r23 and r28 on all
 ;; systems except Unicos/Mk. On Unicos, the standard library provides
@@ -783,7 +810,7 @@
 ;; problem.  Is it worth the complication here to eliminate the sign
 ;; extension?
 
-(define_expand "divsi3"
+(define_expand "divsi3_libcall"
   [(set (match_dup 3)
 	(sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "")))
    (set (match_dup 4)
@@ -801,7 +828,48 @@
   operands[5] = gen_reg_rtx (DImode);
 })
 
-(define_expand "udivsi3"
+(define_expand "divsi3"
+  [(set (match_operand:SI 0 "register_operand" "")
+	(div:SI (match_operand:SI 1 "register_operand" "")
+		(match_operand:SI 2 "register_operand" "")))]
+  ""
+{
+  if (TARGET_FP && !optimize_size)
+    emit_insn (gen_divsi3_fp (operands[0], operands[1], operands[2]));
+  else if (!TARGET_ABI_OPEN_VMS && !TARGET_ABI_UNICOSMK)
+    emit_insn (gen_divsi3_libcall (operands[0], operands[1], operands[2]));
+  else
+    FAIL;
+  DONE;
+})
+
+(define_expand "udivsi3_fp"
+  [(set (match_dup 3)
+	(zero_extend:DI (match_operand:SI 1 "nonimmediate_operand" "")))
+   (set (match_dup 4)
+	(zero_extend:DI (match_operand:SI 2 "nonimmediate_operand" "")))
+   (set (match_dup 5)
+	(float:DF (match_dup 3)))
+   (set (match_dup 6)
+	(float:DF (match_dup 4)))
+   (set (match_dup 7)
+	(unspec:DF [(match_dup 5)
+	       	    (match_dup 6)] UNSPEC_DIVC))
+   (set (match_dup 8)
+	(fix:DI (match_dup 7)))
+   (set (match_operand:SI 0 "nonimmediate_operand" "")
+	(subreg:SI (match_dup 8) 0))]
+  "TARGET_FP"
+{
+  operands[3] = gen_reg_rtx (DImode);
+  operands[4] = gen_reg_rtx (DImode);
+  operands[5] = gen_reg_rtx (DFmode);
+  operands[6] = gen_reg_rtx (DFmode);
+  operands[7] = gen_reg_rtx (DFmode);
+  operands[8] = gen_reg_rtx (DImode);
+})
+
+(define_expand "udivsi3_libcall"
   [(set (match_dup 3)
 	(sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "")))
    (set (match_dup 4)
@@ -819,7 +887,99 @@
   operands[5] = gen_reg_rtx (DImode);
 })
 
-(define_expand "modsi3"
+(define_expand "udivsi3"
+  [(set (match_operand:SI 0 "register_operand" "")
+	(udiv:SI (match_operand:SI 1 "register_operand" "")
+		(match_operand:SI 2 "register_operand" "")))]
+  ""
+{
+  if (TARGET_FP && !optimize_size)
+    emit_insn (gen_udivsi3_fp (operands[0], operands[1], operands[2]));
+  else if (!TARGET_ABI_OPEN_VMS && !TARGET_ABI_UNICOSMK)
+    emit_insn (gen_udivsi3_libcall (operands[0], operands[1], operands[2]));
+  else
+    FAIL;
+  DONE;
+})
+
+;(define_expand "modsi3_fp"
+;  [(set (match_dup 3)
+;	(sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "")))
+;   (set (match_dup 4)
+;	(sign_extend:DI (match_operand:SI 2 "nonimmediate_operand" "")))
+;   (set (match_dup 5)
+;	(float:DF (match_dup 3)))
+;   (set (match_dup 6)
+;	(float:DF (match_dup 4)))
+;   (set (match_dup 7)
+;	(unspec:DF [(match_dup 5)
+;	       	    (match_dup 6)] UNSPEC_DIVC))
+;   (set (match_dup 8)
+;	(fix:DI (match_dup 7)))
+;   (set (match_dup 9)
+;	(mult:SI (subreg:SI (match_dup 8) 0)
+;		 (match_dup 2)))
+;   (set (match_operand:SI 0 "nonimmediate_operand" "")
+;	(minus:SI (match_dup 1)
+;		  (match_dup 9)))]
+;  "TARGET_FP"
+;{
+;  operands[3] = gen_reg_rtx (DImode);
+;  operands[4] = gen_reg_rtx (DImode);
+;  operands[5] = gen_reg_rtx (DFmode);
+;  operands[6] = gen_reg_rtx (DFmode);
+;  operands[7] = gen_reg_rtx (DFmode);
+;  operands[8] = gen_reg_rtx (DImode);
+;  operands[9] = gen_reg_rtx (SImode);
+;})
+
+(define_expand "modsi3_fp"
+  [(use (match_operand:SI 0 "nonimmediate_operand" ""))
+   (use (match_operand:SI 1 "nonimmediate_operand" ""))
+   (use (match_operand:SI 2 "nonimmediate_operand" ""))]
+  "TARGET_FP"
+{
+  rtx tmp1, tmp2, tmp3, tmp4, tmp5, tmp6, tmp7;
+  rtx skip1, skip2, cmp;
+  rtx skip_label = gen_label_rtx ();
+  tmp1 = gen_reg_rtx (DImode);
+  tmp2 = gen_reg_rtx (DImode);
+  tmp3 = gen_reg_rtx (DFmode);
+  tmp4 = gen_reg_rtx (DFmode);
+  tmp5 = gen_reg_rtx (DFmode);
+  tmp6 = gen_reg_rtx (DImode);
+  tmp7 = gen_reg_rtx (SImode);
+  skip1 = gen_reg_rtx (DImode);
+  skip2 = gen_reg_rtx (DImode);
+  cmp1 = gen_reg_rtx (DImode);
+  cmp2 = gen_reg_rtx (DImode);
+
+  emit_insn (gen_extendsidi2 (tmp1, operands[1]));
+  emit_insn (gen_extendsidi2 (tmp2, operands[2])); 
+  emit_insn (gen_floatdidf2 (tmp3, tmp1));
+  emit_insn (gen_floatdidf2 (tmp4, tmp2));
+
+  emit_insn (gen_subdi3 (skip1, tmp2, const1_rtx));
+  emit_insn (gen_anddi3 (skip2, tmp2, skip1));
+  emit_insn (gen_anddi3 (gen_lowpart (DImode, operands[0]), tmp1, skip1));
+
+  emit_insn (gen_rtx_SET (VOIDmode, cmp1,
+			  gen_rtx_fmt_ee (LT, DImode, operands[1], const0_rtx));
+  emit_insn (gen_rtx_SET (VOIDmode, cmp2,
+			  gen_rtx_fmt_ee (LT, DImode, operands[2], const0_rtx));
+
+  emit_insn (gen_cmpdi (skip2, const0_rtx));
+  emit_jump_insn (gen_beq (skip_label));
+
+  emit_insn (gen_divc (tmp5, tmp3, tmp4));
+  emit_insn (gen_fix_truncdfdi2 (tmp6, tmp5));
+  emit_insn (gen_mulsi3 (tmp7, operands[2], gen_lowpart (SImode, tmp6)));
+  emit_insn (gen_subsi3 (operands[0], operands[1], tmp7));
+  emit_label (skip_label);
+  DONE;
+})
+
+(define_expand "modsi3_libcall"
   [(set (match_dup 3)
 	(sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "")))
    (set (match_dup 4)
@@ -837,7 +997,53 @@
   operands[5] = gen_reg_rtx (DImode);
 })
 
-(define_expand "umodsi3"
+(define_expand "modsi3"
+  [(set (match_operand:SI 0 "register_operand" "")
+	(mod:SI (match_operand:SI 1 "register_operand" "")
+		(match_operand:SI 2 "register_operand" "")))]
+  ""
+{
+  if (TARGET_FP && !optimize_size)
+    emit_insn (gen_modsi3_fp (operands[0], operands[1], operands[2]));
+  else if (!TARGET_ABI_OPEN_VMS && !TARGET_ABI_UNICOSMK)
+    emit_insn (gen_modsi3_libcall (operands[0], operands[1], operands[2]));
+  else
+    FAIL;
+  DONE;
+})
+
+(define_expand "umodsi3_fp"
+  [(set (match_dup 3)
+	(zero_extend:DI (match_operand:SI 1 "nonimmediate_operand" "")))
+   (set (match_dup 4)
+	(zero_extend:DI (match_operand:SI 2 "nonimmediate_operand" "")))
+   (set (match_dup 5)
+	(float:DF (match_dup 3)))
+   (set (match_dup 6)
+	(float:DF (match_dup 4)))
+   (set (match_dup 7)
+	(unspec:DF [(match_dup 5)
+	       	    (match_dup 6)] UNSPEC_DIVC))
+   (set (match_dup 8)
+	(fix:DI (match_dup 7)))
+   (set (match_dup 9)
+	(mult:SI (subreg:SI (match_dup 8) 0)
+		 (match_dup 2)))
+   (set (match_operand:SI 0 "nonimmediate_operand" "")
+	(minus:SI (match_dup 1)
+		  (match_dup 9)))]
+  "TARGET_FP"
+{
+  operands[3] = gen_reg_rtx (DImode);
+  operands[4] = gen_reg_rtx (DImode);
+  operands[5] = gen_reg_rtx (DFmode);
+  operands[6] = gen_reg_rtx (DFmode);
+  operands[7] = gen_reg_rtx (DFmode);
+  operands[8] = gen_reg_rtx (DImode);
+  operands[9] = gen_reg_rtx (SImode);
+})
+
+(define_expand "umodsi3_libcall"
   [(set (match_dup 3)
 	(sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "")))
    (set (match_dup 4)
@@ -855,6 +1061,21 @@
   operands[5] = gen_reg_rtx (DImode);
 })
 
+(define_expand "umodsi3"
+  [(set (match_operand:SI 0 "register_operand" "")
+	(umod:SI (match_operand:SI 1 "register_operand" "")
+		(match_operand:SI 2 "register_operand" "")))]
+  ""
+{
+  if (TARGET_FP && !optimize_size)
+    emit_insn (gen_umodsi3_fp (operands[0], operands[1], operands[2]));
+  else if (!TARGET_ABI_OPEN_VMS && !TARGET_ABI_UNICOSMK)
+    emit_insn (gen_umodsi3_libcall (operands[0], operands[1], operands[2]));
+  else
+    FAIL;
+  DONE;
+})
+
 (define_expand "divdi3"
   [(parallel [(set (match_operand:DI 0 "register_operand" "")
 		   (div:DI (match_operand:DI 1 "register_operand" "")
@@ -2743,6 +2964,17 @@
   [(set_attr "type" "fdiv")
    (set_attr "trap" "yes")
    (set_attr "round_suffix" "normal")
+   (set_attr "trap_suffix" "u_su_sui")])
+
+(define_insn "divc"
+  [(set (match_operand:DF 0 "register_operand" "=f")
+	(unspec:DF [(match_operand:DF 1 "reg_or_0_operand" "fG")
+		    (match_operand:DF 2 "reg_or_0_operand" "fG")] UNSPEC_DIVC))]
+  "TARGET_FP"
+  "div%-%/ %R1,%R2,%0"
+  [(set_attr "type" "fdiv")
+   (set_attr "trap" "yes")
+   (set_attr "round_suffix" "c")
    (set_attr "trap_suffix" "u_su_sui")])
 
 (define_insn "*divdf_ext1"

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]