• AVR Freaks

Helpful ReplyHot!Unsigned 32 by 32 divide routine in assembler

Page: << < ..67 Showing page 7 of 7
Author
1and0
Access is Denied
  • Total Posts : 11523
  • Reward points : 0
  • Joined: 2007/05/06 12:03:20
  • Location: Harry's Gray Matter
  • Status: offline
Re: Final Drop for: Unsigned 32 by 32 divide routine in assembler 2020/09/19 16:56:23 (permalink)
0
Spinlectrix
See thread post #117 above.

Will check it out later. :)
 
Can this operation (w0 = w0 - Carry_bit) be performed with one instruction?
 
 
Spinlectrix
'E' Core Performance Stats:  Average Cycles      Worst-Case Cycles
No Remainder Returned            54.06                     55
With Remainder Returned         56.00                     59

How are you determining the worse case cycles?  According to my code above, it specs at ...
; For PIC24E and dsPIC33E:
;   Tcy(max) = 4 + 2*18 + 6 + 7     = 53 (quotient & remainder)
;            = 4 + 2*18 + 6 + 7 - 1 = 52 (quotient only)
 

post edited by 1and0 - 2020/09/19 17:01:46
Spinlectrix
Junior Member
  • Total Posts : 72
  • Reward points : 0
  • Joined: 2016/03/20 14:48:37
  • Location: 0
  • Status: offline
Re: Final Drop for: Unsigned 32 by 32 divide routine in assembler 2020/09/19 17:18:30 (permalink)
0
1and0
How are you determining the worse case cycles? 



I am running the routines on real hardware (a dsPIC33EP64GS504, specifically) and capturing instruction timing from a 'c' call to the assembly routine.
 
    #if CountCycles 
        StartTime = ReadInstructionCounter();
    #endif //CountCycles
        quotient.LongWord = sdiv3232(numerator.LongWord,denominator.LongWord);
    #if CountCycles
        StopTime = ReadInstructionCounter();
        deltaT = StopTime - (StartTime + CaptureTimeOverhead);
        ElapsedTime += deltaT;
        if (deltaT > WorstTime)
            WorstTime = deltaT;

 
The CaptureTimeOverhead variable is initialized earlier in the main, to compensate for all the overhead from the inline ReadInstructionCounter() routine, (which is implemented by just reading TMR3:TMR2 as a 32-bit counter, incrementing at the processor's clock.)
static inline __attribute__ ((always_inline))
long ReadInstructionCounter(void){
    union DoubleWordUnion result;
    result.Words.LoWord = TMR2;
    result.Words.HiWord = TMR3HLD;
    return (result.LongWord);
}

When using this method, I have accurately measured everything from in-line assembly to library calls.
 
In the .lst file, I can see that before the RCALL instruction, there are 3 mov.d instructions (representing 6 cycles), and that upon return there are another 2 movs, storing away the results, so I have compensated for those cycles in my numbers posted.
 
Remember that worst-case times may be a due to the double-divide calculation path, as well!
And also that the:
   btss   SR,#0
   mov.d w6,w2
sequence takes 3 cycles, regardless of the state of the test-bit!
 
Hmmmm.  The other method I just tried involves using the Simulator:  The 'Stopwatch' feature can be reset just before an RCALL instruction by using the Window->Debugging->Disassembly feature in MPLAB-X. When I just did that, the numbers look 4 cycles shorter.   Hmmm.
 
post edited by Spinlectrix - 2020/09/20 08:46:09
Spinlectrix
Junior Member
  • Total Posts : 72
  • Reward points : 0
  • Joined: 2016/03/20 14:48:37
  • Location: 0
  • Status: offline
Re: Final Drop for: Unsigned 32 by 32 divide routine in assembler 2020/09/20 10:05:38 (permalink)
0
1and0Please post your random dividend and divisor generator.



        RandShifts = _Q15random();
        
        denominator.UWords.UHiWord = _Q15random();
        // Don't divide by zero:
        do denominator.UWords.ULoWord = _Q15random();
        while (denominator.ULongword == 0);
        // Shift-right the magnitude of demonimator randomly by up to 31 times, so long as
       //  the shifts don't reduce the magnitude to zero.
        denomshifts = 0;
        while ( (denomshifts < (RandShifts & 31)) && ((denominator.ULongword>>1) !=0) ){
           denominator.ULongword >>= 1;
           denomshifts += 1;
        }
        
        numerator.UWords.UHiWord = _Q15random();
        numerator.UWords.ULoWord = _Q15random();
 
        // Randomly right-shift the numerator by up to 31 times
        numerator.ULongword >>= (RandShifts >> 11);
        
        // If we simply do true random number generation, half the time the numerator will be
        // smaller than the denominator, resulting in an answer of 0 at least half the time.
        // This would not represent ordinary data. Here, we shift to make more realistic divides.
        while ( (numerator.ULongword != 0) &&
                (numerator.ULongword < denominator.ULongword) ) {
           numerator.ULongword = (numerator.ULongword<<1) + RandShifts;
           RandShifts >>= 1;
        }
        
    #if CollectStats
        if ((denominator.Words.HiWord == 0) || (denominator.Words.HiWord == -1))
            ZeroHiWord += 1;
        else
            ShiftDivideMethod += 1;
    #endif

 
The concept is to randomly generate dividends and divisors equally spaced across all magnitudes of long values, while never generating a divisor of zero, while (virtually) always having the dividend larger than the divisor (If an application were just looking to pass a threshold of the dividend growing larger than the divisor, it would likely use a compare, not a long divide), so the generator ensures dividend >= divisor.
 
When I check the stats over 1M divides, it almost perfectly exercises each branch of the divide by 50%.
post edited by Spinlectrix - 2020/09/20 10:27:16
1and0
Access is Denied
  • Total Posts : 11523
  • Reward points : 0
  • Joined: 2007/05/06 12:03:20
  • Location: Harry's Gray Matter
  • Status: offline
Re: Final Drop for: Unsigned 32 by 32 divide routine in assembler 2020/09/20 10:33:16 (permalink)
0
Very nice and impressed with your optimization to the algorithm here. I assume your tests also check and verify the results are correct; i.e.

  Quotient * Divisor + Remainder = Dividend

and for no remainder,

  Quotient * Divisor <= Dividend
 
and
 
  (Quotient + 1) * Divisor > Dividend
Spinlectrix
Junior Member
  • Total Posts : 72
  • Reward points : 0
  • Joined: 2016/03/20 14:48:37
  • Location: 0
  • Status: offline
Re: Final Drop for: Unsigned 32 by 32 divide routine in assembler 2020/09/20 10:44:57 (permalink)
0
1and0
...
I assume your tests also check and verify the results are correct

 
Ummm, actually I've only been testing equivalence of the quotients against the library divide result, not of the remainders!
 
I will correct this now...
 
<Edit:> In addition to your checks, I added one that for any signed quotients which return non-zero remainders, the sign of the remainder must match the sign of the dividend.
 
And (at least over the first 1M randomly generated divides) the remainders and quotients pass all the tests.
post edited by Spinlectrix - 2020/09/20 12:12:41
1and0
Access is Denied
  • Total Posts : 11523
  • Reward points : 0
  • Joined: 2007/05/06 12:03:20
  • Location: Harry's Gray Matter
  • Status: offline
Re: Final Drop for: Unsigned 32 by 32 divide routine in assembler 2020/09/20 10:57:16 (permalink)
5 (1)
And make sure the

Remainder < Divisor
1and0
Access is Denied
  • Total Posts : 11523
  • Reward points : 0
  • Joined: 2007/05/06 12:03:20
  • Location: Harry's Gray Matter
  • Status: offline
Re: Final Drop for: Unsigned 32 by 32 divide routine in assembler 2020/09/20 11:39:51 (permalink)
0
Spinlectrix
... not of the remainders!

Also, I'm not quite sure what the C modulo % operator does when its operands are signed. That is, modulo and remainder are not the same.
 
Spinlectrix
Junior Member
  • Total Posts : 72
  • Reward points : 0
  • Joined: 2016/03/20 14:48:37
  • Location: 0
  • Status: offline
Re: Final Drop for: Unsigned 32 by 32 divide routine in assembler 2020/09/20 12:14:55 (permalink)
0
1and0
And make sure the
Remainder < Divisor

 Darn.  Forgot this obvious one; will add it for completeness, but I am not worried.
 
<Edit:  Added, run and tested over the (same) first 1M randomly generated divides >
post edited by Spinlectrix - 2020/09/20 12:57:44
1and0
Access is Denied
  • Total Posts : 11523
  • Reward points : 0
  • Joined: 2007/05/06 12:03:20
  • Location: Harry's Gray Matter
  • Status: offline
Re: Final Drop for: Unsigned 32 by 32 divide routine in assembler 2020/09/20 13:41:23 (permalink)
0
Spinlectrix
... for completeness, but I am not worried.

I agreed. I think our routines are good, just don't want to miss a rare case if there's any.
Spinlectrix
Junior Member
  • Total Posts : 72
  • Reward points : 0
  • Joined: 2016/03/20 14:48:37
  • Location: 0
  • Status: offline
Re: Final Drop for: Unsigned 32 by 32 divide routine in assembler 2020/09/20 16:40:33 (permalink)
5 (1)
And, to firmly prove that I am dodging my work that is really more important, here is an inline c callable function that will do a udiv3232 operation.  Since it will be in-lined, the compiler is free to choose whatever input registers it want to for the operands, so in addition to eliminating the RCALL/RET overhead there's often (depending on situation) no mov.d's to load up registers before the call:
struct DoubleWord {
    int16_t LoWord;
    int16_t HiWord;
};
struct UDoubleWord {
    uint16_t ULoWord;
    uint16_t UHiWord;
};
union DoubleWordUnion {
    long Longword;
    unsigned long ULongword;
    struct DoubleWord Words;
    struct UDoubleWord UWords;
};

#define HasMULW true
#define MulRepeats 17

static inline __attribute__ ((always_inline))
unsigned long UlongDIV(unsigned long numerator, unsigned long denominator) {
union DoubleWordUnion quotient;
asm __volatile__ (
   " mov %2,w4 \n ff1l %d3,w7 \n bra NC, 0f \n"
   " repeat #%4 \n div.uw %d2,%3 \n exch w0,w4 \n"
   " repeat #%4 \n div.ud w0,%3 \n mov w4,w1 \n bra 9f \n"
 "0: mov %d2,w5 \n subr w7,#17,w7 \n subr w7,#16,w0 \n"
   " sl %d3,w0,w0 \n lsr %3,w7,w6 \n ior w0,w6,w6 \n"
   " lsr w5,w1 \n rrc w4,w0 \n"
   " repeat #%4 \n div.ud w0, w6 \n"
   " dec w7,w7 \n lsr w0,w7,w1 \n"
#if HasMULW
   " mulw.uu %d3,w1,w0 \n"
#else
   " mul.uu %d3,w1,w6 \n mov w6, w0 "
#endif
   " mul.uu %3,w1,w6 \n"
   " sub w4,w6,w6 \n subb w5,w7,w7 \n sub w7,w0,w7 \n"
   " subb w1,#0,w0 \n clr w1 \n"
 "9: "
 /*                        w0 %0                            w1 %1    */
:/*outputs*/ "=a" (quotient.UWords.ULoWord), "=b" (quotient.UWords.UHiWord)
/*                %d2:%2            %d3:%3               %4      */
:/*inputs*/ "C" (numerator), "e" (denominator), "i" (MulRepeats) :/*clobbers*/ "cc", "w4","w5","w6","w7");
return (quotient.ULongword);
}

 
When the whole fast ulong divide routine is displayed like this, its kind of hard to appreciate just how difficult generating the precisely correct 29 instructions can be!
post edited by Spinlectrix - 2020/09/20 18:19:47
Spinlectrix
Junior Member
  • Total Posts : 72
  • Reward points : 0
  • Joined: 2016/03/20 14:48:37
  • Location: 0
  • Status: offline
Re: Final Drop for: Unsigned 32 by 32 divide routine in assembler 2020/09/20 19:10:48 (permalink)
5 (1)
Lastly (hopefully), since we are defining an inline c assembly instruction, the Dividend does not have to come in as w1:w0.  Since we are going to clobber w1:w0 with the quotient anyway, by insisting that the Dividend operands are passed in registers other than w1:w0 allows us to save yet another cycle, and give the compiler still more choices of operand registers! (Notice that in the following code only w2 and w3 are marked as clobbered) meaning that not only will the divisor registers remain unaltered, but the dividend registers will as well:  Unsigned long divide now down to 28 instructions and 47 cycles! Not merely shorter than the standard library ulong = ulong/ulong call, but over 10 times faster as well! :

struct DoubleWord {
    int16_t LoWord;
    int16_t HiWord;
};

struct UDoubleWord {
    uint16_t ULoWord;
    uint16_t UHiWord;
};

union DoubleWordUnion {
    long Longword;
    unsigned long ULongword;
    struct DoubleWord Words;
    struct UDoubleWord UWords;
};

static inline __attribute__ ((always_inline))
unsigned long UlongDIV(unsigned long numerator, unsigned long denominator) {
union DoubleWordUnion quotient;
asm __volatile__ (
   " ff1l %d3,w2 \n bra NC, 0f \n"
   " repeat #%4 \n div.uw %d2,%3 \n mov w0,w2 \n mov %2,w0 \n"
   " repeat #%4 \n div.ud w0,%3 \n mov w2,w1 \n bra 9f \n"
 "0: subr w2,#17,w3 \n subr w3,#16,w0 \n"
   " sl %d3,w0,w0 \n lsr %3,w3,w2 \n ior w0,w2,w2 \n"
   " lsr %d2,w1 \n rrc %2,w0 \n"
   " repeat #%4 \n div.ud w0, w2 \n"
   " dec w3,w3 \n lsr w0,w3,w1 \n"
#if HasMULW
   " mulw.uu %d3,w1,w0 \n"
#else
   " mul.uu %d3,w1,w2 \n mov w2,w0 "
#endif
   " mul.uu %3,w1,w2 \n"
   " sub %2,w2,w2 \n subb %d2,w3,w3 \n sub w3,w0,w3 \n"
   " subb w1,#0,w0 \n clr w1 \n"
 "9: "
/*                            w0 %0                        w1 %1     */
:/*outputs*/ "=&a" (quotient.UWords.ULoWord), "=&b" (quotient.UWords.UHiWord)
/*                 %d2:%2            %d3:%3              %4      */
:/*inputs*/ "e" (numerator), "e" (denominator), "i" (MulRepeats) :/*clobbers*/ "cc","w2","w3");
return (quotient.ULongword);
}

post edited by Spinlectrix - 2020/09/20 20:42:10
1and0
Access is Denied
  • Total Posts : 11523
  • Reward points : 0
  • Joined: 2007/05/06 12:03:20
  • Location: Harry's Gray Matter
  • Status: offline
Re: Final Drop for: Unsigned 32 by 32 divide routine in assembler 2020/09/23 20:12:51 (permalink)
5 (1)
Finally got some free time to look over your latest optimization. The following shaves two more cycles when the remainder is returned in W7:W6 and the divisor in W3:W2. The dividend can be returned in W5:W4 at the expense of two more cycles in the Divisor <= 0xFFFF leg.
; Return Values
;   w1:w0 = Quotient  = Dividend / Divisor
;   w3:w2 = Divisor
;   w7:w6 = Remainder = Dividend % Divisor  , if remainder is requested
;
; Clobbered Registers
;   w5:w4   (Dividend can be returned at expense of +2 Tcy)

 
Now, for E-core,
 
Tcy(max) = 49,  no remainder
              = 51,  with remainder
         
;
; Unsigned Long Integer Division of 32-bit by 32-bit
;
;   uint32_t
;   udivmod3232 (uint32_t dividend, uint32_t divisor, uint32_t *remainder)
;
;   uint32_t
;   udiv3232 (uint32_t dividend, uint32_t divisor)
;
;
; Dividend = Quotient * Divisor + Remainder
;
; Calling Parameters
;   w1:w0 = Dividend
;   w3:w2 = Divisor
;
; Return Values
;   w1:w0 = Quotient  = Dividend / Divisor
;   w3:w2 = Divisor
;   w7:w6 = Remainder = Dividend % Divisor  , if remainder is requested
;
; Clobbered Registers
;   w5:w4   (Dividend can be returned at expense of +2 Tcy)
;
; Tcy(max) = 7 + 2*DIV + RET + [1]             , Divisor <= 0xFFFF
; Tcy(max) = 20 + BRA + DIV + RET + [4] - <1>  , Divisor >  0xFFFF
;
; Notes:
; * Add the cycle in [] when computing the remainder.
; * Minus the cycle in <> when using the MULW instruction.
;
; For PIC24F, PIC24H, dsPIC30F and dsPIC33F:
;   Tcy(max) =
;
; For PIC24E and dsPIC33E:
;   Tcy(max) =
;
; For dsPIC33C:
;   Tcy(max) =
;
; Ref: https://www.microchip.com/forums/FindPost/1151719
;
 
; Conditional assembly:
; * Define NO_REMAINDER to return only the quotient and no remainder.
; * Adding the remainder takes 3 more program memory words and 3 more
;   instruction cycles maximum which are negligible, so it is the
;   default by having NO_REMAINDER not defined.
;
; NO_REMAINDER = 1
 
.ifdef  __PIC24E
PIC_24E_33C_33E = 1
.endif
.ifdef  __dsPIC33E
PIC_24E_33C_33E = 1
.endif
.ifdef  __dsPIC33C
PIC_24E_33C_33E = 1
.endif
 
; Loop counter value for the REPEAT instruction of a DIV instruction.
 
.ifdef __dsPIC33C
DIV_RCOUNT = (6-1)          ; execute DIV 6 consecutive times
.else
DIV_RCOUNT = (18-1)         ; execute DIV 18 consecutive times
.endif
 
; Division Routine Names
 
.ifndef NO_REMAINDER
udivmod3232:
.else
udiv3232:
.endif
 
        mov     w0,w4       ; save dividend LSW
        ff1l    w3,w7       ; w7 = first one left of divisor MSW
        bra     nc,0f       ; if (divisor <= 0xFFFF) branch
;-------
; Case of Divisor <= 0xFFFF
;
; Here the high word of the divisor is zero, so the quotient may overflow a
; 16-bit word.
;
;   w1:w0 = w1:w0 / w2
;         = [(w1 / w2) << 16] + [(w1 % w2):w0 / w2]
;
;   w7:w6 = w1:w0 % w2
        
; Divide high word of dividend by the divisor.
        
        repeat  #DIV_RCOUNT ; calculate quotient MSW (16/16 divide)
        div.uw  w1,w2       ; w0 = w1 / w2, w1 = w1 % w2
                            ; remainder becomes dividend MSW of next divide
        exch    w0,w4       ; save quotient MSW, restore dividend LSW
        
; Divide remainder and low word of dividend by the divisor.
        
        repeat  #DIV_RCOUNT ; calculate quotient LSW (32/16 divide)
        div.ud  w0,w2       ; w0 = (w1:w0) / w2, w1 = (w1:w0) % w2
        
    .ifndef NO_REMAINDER
        mul.uu  w1,#1,w6    ; w7:w6 = remainder
    .endif
        mov     w4,w1       ; restore quotient MSW
        return              ; return with divisor MSW = 0
;-------
; Case of Divisor > 0xFFFF
;
; Here the high word of the divisor is non-zero, so the quotient will fit in
; a 16-bit word; i.e. the high word of the quotient is guaranteed to be zero.
;
;   w1:w0 = w1:w0 / w3:w2
;         = w5:w4 / w3:w2
;         = [(w5:w4) >> n] / [(w3:w2) >> n]
;
;   w7:w6 = w1:w0 % w3:w2
        
; Save dividend for correction of the quotient later.
        
0:      mov     w1,w5       ; save dividend MSW
        
; Calculate shift counts for normalization.
        
        subr    w7,#17,w7   ; right shift >> (w7 = 17 - w7), w7 = n = [1..16]
        subr    w7,#16,w0   ; left  shift << (w0 = 16 - w7)
        
; Normalize the divisor so its left-most 1 is at the MSb of its LSW.
        
        sl      w3,w0,w0    ; divisor' = divisor >> n
        lsr     w2,w7,w6    ; "
        ior     w0,w6,w6    ; w6 = (w3:w2) >> n
        
; Consequently, scale the dividend to ensure the quotient fits in its LSW;
; also, keeping more dividend bits minimizes the quotient rounding error.
        
        lsr     w5,w1       ; dividend' = dividend >> 1
        rrc     w4,w0       ;     w1:w0 = w5:w4 >> 1
        
; Divide scaled dividend by the normalized divisor to get an estimated quotient
; where the quotient error Err(q) = estimated quotient - correct quotient.
        
        repeat  #DIV_RCOUNT ; calculate the 16-bit quotient (32/16 divide)
        div.ud  w0,w6       ; w0 = (w1:w0) / w6, no overflow here
        
; Undo the scaling of dividend by 2 and apply the same ratio as the divisor.
        
        dec     w7,w7       ; new shift to account for quotient <<= 1
        lsr     w0,w7,w1    ; w1 = quotient >>= (n - 1), Err(q) = [0,+1]
        
; Calculate (quotient * divisor) separately so it won't overflow.
    
    .ifdef  PIC_24E_33C_33E
        mulw.uu w1,w3,w0    ;    w0    = w1 * w3
    .else
        mul.uu  w1,w3,w6    ; w7:w6    = w1 * w3
        mov     w6,w0       ;    w0    = w1 * w3 = quotient * divisor MSW
    .endif
        mul.uu  w1,w2,w6    ;    w7:w6 = w1 * w2 = quotient * divisor LSW
        
; Calculate remainder = (dividend - quotient * divisor).
        
        sub     w4,w6,w6    ; w7:w6 = dividend - quotient * divisor
        subb    w5,w7,w7    ;       = dividend - quotient * divisor LSW
        sub     w7,w0,w7    ;                  - quotient * divisor MSW
    
        subb    w1,#0,w0    ; if (negative) quotient is 1 too big, so quotient--
    
; Correct remainder if quotient was corrected.
    
    .ifndef NO_REMAINDER
        cpsne   w1,w0       ; if (quotient did not need correction)
        retlw   #0,w1       ;   return with quotient MSW = 0, w7:w6 = remainder
        
        add     w6,w2,w6    ; else one too many divisor was subtracted
        addc    w7,w3,w7    ;   so add one divisor back into remainder
    .endif
        retlw   #0,w1       ; return with quotient MSW = 0

1and0
Access is Denied
  • Total Posts : 11523
  • Reward points : 0
  • Joined: 2007/05/06 12:03:20
  • Location: Harry's Gray Matter
  • Status: offline
Re: Final Drop for: Unsigned 32 by 32 divide routine in assembler 2020/09/24 02:25:56 (permalink)
5 (2)
Here's one that does this:
; Calling Parameters
;   w1:w0 = Dividend
;   w3:w2 = Divisor
;
; Return Values
;   w1:w0 = Quotient  = Dividend / Divisor
;   w3:w2 = Divisor
;   w5:w4 = Remainder = Dividend % Divisor  , when remainder is requested
;   w7:w6 = Dividend


;
; Unsigned Long Integer Division of 32-bit by 32-bit
;
;   uint32_t
;   udivmod3232 (uint32_t dividend, uint32_t divisor, uint32_t *remainder)
;
;   uint32_t
;   udiv3232 (uint32_t dividend, uint32_t divisor)
;
;
; Dividend = Quotient * Divisor + Remainder
;
; Calling Parameters
;   w1:w0 = Dividend
;   w3:w2 = Divisor
;
; Return Values
;   w1:w0 = Quotient  = Dividend / Divisor
;   w3:w2 = Divisor
;   w5:w4 = Remainder = Dividend % Divisor  , when remainder is requested
;   w7:w6 = Dividend
;
; Tcy(max) =  9       +     2*DIV + RET + [1]  , Divisor <= 0xFFFF
; Tcy(max) = 20 - <1> + BRA + DIV + RET + [4]  , Divisor >  0xFFFF
;
; Notes:
; * Tcy counts do not include the CALL/RCALL instruction.
; * Minus the cycle in <> when using the MULW instruction.
; * Add the cycle in [] when computing the remainder.
;
; For PIC24F, PIC24H, dsPIC30F and dsPIC33F:
;   Tcy1 =  9 +   2*18 + 3 + [1] = 48 [49]
;   Tcy2 = 20 + 2 + 18 + 3 + [4] = 43 [47]
;   Tcy(max) = 48 [49]
;
; For PIC24E and dsPIC33E:
;   Tcy1 =  9 +   2*18 + 6 + [1] = 51 [52]
;   Tcy2 = 19 + 4 + 18 + 6 + [4] = 47 [51]
;   Tcy(max) = 51 [52]
;
; For dsPIC33C:
;   Tcy1 =  9 +    2*6 + 6 + [1] = 27 [28]
;   Tcy2 = 19 + 4 +  6 + 6 + [4] = 35 [39]
;   Tcy(max) = 35 [39]  , can swap the two cases to reduce Tcy.
;
; Ref: https://www.microchip.com/forums/FindPost/1151719
;
 
; Conditional assembly:
; * Define NO_REMAINDER to return only the quotient and no remainder.
; * Adding the remainder takes 3 more program memory words and 3 more
;   instruction cycles maximum which are negligible, so it is the
;   default by having NO_REMAINDER not defined.
;
; NO_REMAINDER = 1
 
.ifdef  __PIC24E
PIC_24E_33C_33E = 1
.endif
.ifdef  __dsPIC33E
PIC_24E_33C_33E = 1
.endif
.ifdef  __dsPIC33C
PIC_24E_33C_33E = 1
.endif

; Loop counter value for the REPEAT instruction of a DIV instruction.
   
.ifdef __dsPIC33C
DIV_RCOUNT = (6-1)          ; execute DIV 6 consecutive times
.else
DIV_RCOUNT = (18-1)         ; execute DIV 18 consecutive times
.endif
 
; Division Routine Names
 
.ifndef NO_REMAINDER
udivmod3232:
.else
udiv3232:
.endif
 
        mov.d   w0,w6       ; save dividend
        ff1l    w3,w5       ; w5 = first one left of divisor MSW
        bra     nc,0f       ; if (divisor <= 0xFFFF) branch
;-------
; Case of Divisor <= 0xFFFF
;
; Here the high word of the divisor is zero, so the quotient may overflow a
; 16-bit word.
;
;   w1:w0 = w1:w0 / w2
;         = [(w1 / w2) << 16] + [(w1 % w2):w0 / w2]
;
;   w5:w4 = w1:w0 % w2
        
; Divide high word of dividend by the divisor.
        
        repeat  #DIV_RCOUNT ; calculate quotient MSW (16/16 divide)
        div.uw  w1,w2       ; w0 = w1 / w2, w1 = w1 % w2
                            ; remainder becomes dividend MSW of next divide
        mov     w0,w3       ; save quotient MSW
        mov     w6,w0       ; restore dividend LSW
        
; Divide remainder and low word of dividend by the divisor.
        
        repeat  #DIV_RCOUNT ; calculate quotient LSW (32/16 divide)
        div.ud  w0,w2       ; w0 = (w1:w0) / w2, w1 = (w1:w0) % w2
        
    .ifndef NO_REMAINDER
        mul.uu  w1,#1,w4    ; w5:w4 = remainder
    .endif
        mov     w3,w1       ; restore quotient MSW
        retlw   #0,w3       ; return with divisor MSW = 0
;-------
; Case of Divisor > 0xFFFF
;
; Here the high word of the divisor is non-zero, so the quotient will fit in
; a 16-bit word; i.e. the high word of the quotient is guaranteed to be zero.
;
;   w1:w0 = w1:w0 / w3:w2
;         = w7:w6 / w3:w2
;         = {[(w7:w6) >> n] / [(w3:w2) >> n]}
;         = {[(w7:w6) >> 1] / [(w3:w2) >> n]} >> (n - 1)
;
;   w5:w4 = w1:w0 % w3:w2
        
; Calculate shift counts for normalization.
        
0:      subr    w5,#17,w5   ; right shift >> (w5 = 17 - w5), w5 = n = [1..16]
        subr    w5,#16,w0   ; left  shift << (w0 = 16 - w5)
        
; Normalize the divisor so its left-most 1 is at the MSb of its LSW.
        
        sl      w3,w0,w0    ; divisor' = divisor >> n
        lsr     w2,w5,w4    ; "
        ior     w0,w4,w4    ; w4 = (w3:w2) >> n
        
; Consequently, scale the dividend to ensure the quotient fits in its LSW,
; while retaining maximum precision to minimize quotient rounding error.
        
        lsr     w7,w1       ; dividend' = dividend >> 1
        rrc     w6,w0       ;     w1:w0 = w7:w6 >> 1
        
; Divide scaled dividend by the normalized divisor to get an estimated quotient
; where the quotient error Err(q) = estimated quotient - correct quotient.
        
        repeat  #DIV_RCOUNT ; calculate the 16-bit quotient (32/16 divide)
        div.ud  w0,w4       ; w0 = (w1:w0) / w4, no overflow here
        
; Undo the scaling of dividend by 2 and apply the same ratio as the divisor.
        
        dec     w5,w5       ; new shift to account for quotient <<= 1
        lsr     w0,w5,w1    ; w1 = quotient >>= (n - 1), Err(q) = [0,+1]
        
; Calculate (quotient * divisor) separately so it won't overflow.
        
    .ifdef  PIC_24E_33C_33E
        mulw.uu w1,w3,w0    ;    w0    = w1 * w3
    .else
        mul.uu  w1,w3,w4    ; w5:w4    = w1 * w3
        mov     w4,w0       ;    w0    = w1 * w3 = quotient * divisor MSW
    .endif
        mul.uu  w1,w2,w4    ;    w5:w4 = w1 * w2 = quotient * divisor LSW
        
; Calculate remainder = (dividend - quotient * divisor).
        
        sub     w6,w4,w4    ; w5:w4 = dividend - quotient * divisor = remainder
        subb    w7,w5,w5    ;       = dividend - quotient * divisor LSW
        sub     w5,w0,w5    ;                  - quotient * divisor MSW
        
        subb    w1,#0,w0    ; if (negative) quotient is 1 too big, so quotient--
        
; Correct remainder if quotient was corrected.
        
    .ifndef NO_REMAINDER
        cpsne   w1,w0       ; if (quotient did not need correction)
        retlw   #0,w1       ;   return with quotient MSW = 0, w5:w4 = remainder
        
        add     w2,w4,w4    ; else one too many divisor was subtracted
        addc    w3,w5,w5    ;   so add one divisor back into remainder
    .endif
        retlw   #0,w1       ; return with quotient MSW = 0



1and0
Access is Denied
  • Total Posts : 11523
  • Reward points : 0
  • Joined: 2007/05/06 12:03:20
  • Location: Harry's Gray Matter
  • Status: offline
Re: Final Drop for: Unsigned 32 by 32 divide routine in assembler 2020/09/25 16:26:07 (permalink)
0
And here is the corresponding signed version:
;
; Signed Long Integer Division of 32-bit by 32-bit
;
;   int32_t
;   divmod3232 (int32_t dividend, int32_t divisor, int32_t *remainder)
;
;   int32_t
;   div3232 (int32_t dividend, int32_t divisor)
;
;
; Dividend = Quotient * Divisor + Remainder
;
; Calling Parameters
;   w1:w0 = Dividend
;   w3:w2 = Divisor
;
; Return Values
;   w1:w0 = Quotient  = Dividend / Divisor
;   w3:w2 = Divisor
;   w5:w4 = Remainder = Dividend % Divisor  , when remainder is requested
;   w7:w6 = Dividend
;
 
.ifndef NO_REMAINDER
divmod3232:
.else
div3232:
.endif
 
        push    w1          ; save signed dividend MSW
        push    w3          ; save signed divisor MSW
        
    .ifdef  PIC_24E_33C_33E
        btsc    w1,#15      ; if (dividend < 0) negate dividend
        neg     w0,w0       ; "
        btsc    w1,#15      ; "
        subbr   w1,#0,w1    ; "
        
        btsc    w3,#15      ; if (divisor < 0) negate divisor
        neg     w2,w2       ; "
        btsc    w3,#15      ; "
        subbr   w3,#0,w3    ; "
    .else
        btss    w1,#15      ; if (dividend < 0) negate dividend
        bra     0f          ; "
        neg     w0,w0       ; "
        subbr   w1,#0,w1    ; "
0:      
        btss    w3,#15      ; if (divisor < 0) negate divisor
        bra     1f          ; "
        neg     w2,w2       ; "
        subbr   w3,#0,w3    ; "
1:
    .endif
        
    .ifndef NO_REMAINDER
        rcall   udivmod3232 ; call unsigned 32/32
    .else
        rcall   udiv3232    ; call unsigned 32/32
    .endif
        
        pop     w3          ; restore signed divisor MSW
        btsc    w3,#15      ; "
        neg     w2,w2       ; and its LSW
        
  .ifndef NO_REMAINDER
        pop     w7          ; restore signed dividend MSW
        btss    w7,#15      ; if (dividend is negative)
        bra     2f          ;   "
        neg     w6,w6       ;   restore signed dividend LSW
        neg     w4,w4       ;   negate remainder
        subbr   w5,#0,w5    ;   "
2:        
        xor     w7,w3,w7    ; calculate sign of quotient
        
    .ifdef  PIC_24E_33C_33E
        btsc    w7,#15      ; if (sign is negative) negate quotient
        neg     w0,w0       ; "
        btsc    w7,#15      ; "
        subbr   w1,#0,w1    ; "
    .else
        btss    w7,#15      ; if (sign is negative) negate quotient
        bra     3f
        neg     w0,w0       ; "
        subbr   w1,#0,w1    ; "
3:
    .endif
        
        xor     w7,w3,w7    ; restore signed dividend MSW
        return              ; return
  .else
        pop     w7          ; restore signed dividend MSW
        btsc    w7,#15      ; "
        neg     w6,w6       ; and its LSW
        
        xor     w7,w3,w5    ; calculate sign of quotient
       
        btss    w5,#15      ; if (sign is negative) negate quotient
        return              ; "
        neg     w0,w0       ; "
        subbr   w1,#0,w1    ; "
        return              ; return
  .endif

Edit: Added conditional assembly.
post edited by 1and0 - 2020/09/26 09:19:10
1and0
Access is Denied
  • Total Posts : 11523
  • Reward points : 0
  • Joined: 2007/05/06 12:03:20
  • Location: Harry's Gray Matter
  • Status: offline
Re: Unsigned 32 by 32 divide routine in assembler 2020/10/22 07:03:13 (permalink)
0
Spinlectrix
.. it may not be too soon to start thinking about how/where to best locate a repository of similar useful routines:  For example, I have an ATAN2(y,x) routine which calculates both arc tangents and vector magnitudes executing in less than 65 cycles.  Is there already such a location?

Could you please post your ATAN2(y,x) routine?  Thanks.
Page: << < ..67 Showing page 7 of 7
Jump to:
© 2020 APG vNext Commercial Version 4.5