• Forums
• Posts
Latest Posts
Active Posts
Recently Visited
Search Results
• Page Extras
• Forum Themes
• AVR Freaks

Page: << < ..67 Showing page 7 of 7
Author
1and0
Access is Denied
• Total Posts : 11523
• Reward points : 0
• Joined: 2007/05/06 12:03:20
• Location: Harry's Gray Matter
• Status: offline
Re: Final Drop for: Unsigned 32 by 32 divide routine in assembler 2020/09/19 16:56:23 (permalink)
0
Spinlectrix

Will check it out later. :)

Can this operation (w0 = w0 - Carry_bit) be performed with one instruction?

Spinlectrix
'E' Core Performance Stats:  Average Cycles      Worst-Case Cycles
No Remainder Returned            54.06                     55
With Remainder Returned         56.00                     59

How are you determining the worse case cycles?  According to my code above, it specs at ...
`; For PIC24E and dsPIC33E:;   Tcy(max) = 4 + 2*18 + 6 + 7     = 53 (quotient & remainder);            = 4 + 2*18 + 6 + 7 - 1 = 52 (quotient only) `

post edited by 1and0 - 2020/09/19 17:01:46
Spinlectrix
Junior Member
• Total Posts : 72
• Reward points : 0
• Joined: 2016/03/20 14:48:37
• Location: 0
• Status: offline
Re: Final Drop for: Unsigned 32 by 32 divide routine in assembler 2020/09/19 17:18:30 (permalink)
0
1and0
How are you determining the worse case cycles?

I am running the routines on real hardware (a dsPIC33EP64GS504, specifically) and capturing instruction timing from a 'c' call to the assembly routine.

`    #if CountCycles         StartTime = ReadInstructionCounter();    #endif //CountCycles        quotient.LongWord = sdiv3232(numerator.LongWord,denominator.LongWord);    #if CountCycles         StopTime = ReadInstructionCounter();         deltaT = StopTime - (StartTime + CaptureTimeOverhead);        ElapsedTime += deltaT;        if (deltaT > WorstTime)            WorstTime = deltaT;`

The CaptureTimeOverhead variable is initialized earlier in the main, to compensate for all the overhead from the inline ReadInstructionCounter() routine, (which is implemented by just reading TMR3:TMR2 as a 32-bit counter, incrementing at the processor's clock.)
`static inline __attribute__ ((always_inline))long ReadInstructionCounter(void){    union DoubleWordUnion result;    result.Words.LoWord = TMR2;    result.Words.HiWord = TMR3HLD;    return (result.LongWord);}`

When using this method, I have accurately measured everything from in-line assembly to library calls.

In the .lst file, I can see that before the RCALL instruction, there are 3 mov.d instructions (representing 6 cycles), and that upon return there are another 2 movs, storing away the results, so I have compensated for those cycles in my numbers posted.

Remember that worst-case times may be a due to the double-divide calculation path, as well!
And also that the:
btss   SR,#0
mov.d w6,w2
sequence takes 3 cycles, regardless of the state of the test-bit!

Hmmmm.  The other method I just tried involves using the Simulator:  The 'Stopwatch' feature can be reset just before an RCALL instruction by using the Window->Debugging->Disassembly feature in MPLAB-X. When I just did that, the numbers look 4 cycles shorter.   Hmmm.

post edited by Spinlectrix - 2020/09/20 08:46:09
Spinlectrix
Junior Member
• Total Posts : 72
• Reward points : 0
• Joined: 2016/03/20 14:48:37
• Location: 0
• Status: offline
Re: Final Drop for: Unsigned 32 by 32 divide routine in assembler 2020/09/20 10:05:38 (permalink)
0

`        RandShifts = _Q15random();                denominator.UWords.UHiWord = _Q15random();        // Don't divide by zero:        do denominator.UWords.ULoWord = _Q15random();        while (denominator.ULongword == 0);        // Shift-right the magnitude of demonimator randomly by up to 31 times, so long as       //  the shifts don't reduce the magnitude to zero.        denomshifts = 0;        while ( (denomshifts < (RandShifts & 31)) && ((denominator.ULongword>>1) !=0) ){           denominator.ULongword >>= 1;           denomshifts += 1;        }                numerator.UWords.UHiWord = _Q15random();        numerator.UWords.ULoWord = _Q15random();         // Randomly right-shift the numerator by up to 31 times        numerator.ULongword >>= (RandShifts >> 11);                // If we simply do true random number generation, half the time the numerator will be        // smaller than the denominator, resulting in an answer of 0 at least half the time.        // This would not represent ordinary data. Here, we shift to make more realistic divides.        while ( (numerator.ULongword != 0) &&                (numerator.ULongword < denominator.ULongword) ) {           numerator.ULongword = (numerator.ULongword<<1) + RandShifts;           RandShifts >>= 1;        }            #if CollectStats        if ((denominator.Words.HiWord == 0) || (denominator.Words.HiWord == -1))            ZeroHiWord += 1;        else            ShiftDivideMethod += 1;    #endif`

The concept is to randomly generate dividends and divisors equally spaced across all magnitudes of long values, while never generating a divisor of zero, while (virtually) always having the dividend larger than the divisor (If an application were just looking to pass a threshold of the dividend growing larger than the divisor, it would likely use a compare, not a long divide), so the generator ensures dividend >= divisor.

When I check the stats over 1M divides, it almost perfectly exercises each branch of the divide by 50%.
post edited by Spinlectrix - 2020/09/20 10:27:16
1and0
Access is Denied
• Total Posts : 11523
• Reward points : 0
• Joined: 2007/05/06 12:03:20
• Location: Harry's Gray Matter
• Status: offline
Re: Final Drop for: Unsigned 32 by 32 divide routine in assembler 2020/09/20 10:33:16 (permalink)
0
Very nice and impressed with your optimization to the algorithm here. I assume your tests also check and verify the results are correct; i.e.

Quotient * Divisor + Remainder = Dividend

and for no remainder,

Quotient * Divisor <= Dividend

and

(Quotient + 1) * Divisor > Dividend
Spinlectrix
Junior Member
• Total Posts : 72
• Reward points : 0
• Joined: 2016/03/20 14:48:37
• Location: 0
• Status: offline
Re: Final Drop for: Unsigned 32 by 32 divide routine in assembler 2020/09/20 10:44:57 (permalink)
0
1and0
...
I assume your tests also check and verify the results are correct

Ummm, actually I've only been testing equivalence of the quotients against the library divide result, not of the remainders!

I will correct this now...

<Edit:> In addition to your checks, I added one that for any signed quotients which return non-zero remainders, the sign of the remainder must match the sign of the dividend.

And (at least over the first 1M randomly generated divides) the remainders and quotients pass all the tests.
post edited by Spinlectrix - 2020/09/20 12:12:41
1and0
Access is Denied
• Total Posts : 11523
• Reward points : 0
• Joined: 2007/05/06 12:03:20
• Location: Harry's Gray Matter
• Status: offline
Re: Final Drop for: Unsigned 32 by 32 divide routine in assembler 2020/09/20 10:57:16 (permalink)
5 (1)
And make sure the

Remainder < Divisor
1and0
Access is Denied
• Total Posts : 11523
• Reward points : 0
• Joined: 2007/05/06 12:03:20
• Location: Harry's Gray Matter
• Status: offline
Re: Final Drop for: Unsigned 32 by 32 divide routine in assembler 2020/09/20 11:39:51 (permalink)
0
Spinlectrix
... not of the remainders!

Also, I'm not quite sure what the C modulo % operator does when its operands are signed. That is, modulo and remainder are not the same.

Spinlectrix
Junior Member
• Total Posts : 72
• Reward points : 0
• Joined: 2016/03/20 14:48:37
• Location: 0
• Status: offline
Re: Final Drop for: Unsigned 32 by 32 divide routine in assembler 2020/09/20 12:14:55 (permalink)
0
1and0
And make sure the
Remainder < Divisor

Darn.  Forgot this obvious one; will add it for completeness, but I am not worried.

<Edit:  Added, run and tested over the (same) first 1M randomly generated divides >
post edited by Spinlectrix - 2020/09/20 12:57:44
1and0
Access is Denied
• Total Posts : 11523
• Reward points : 0
• Joined: 2007/05/06 12:03:20
• Location: Harry's Gray Matter
• Status: offline
Re: Final Drop for: Unsigned 32 by 32 divide routine in assembler 2020/09/20 13:41:23 (permalink)
0
Spinlectrix
... for completeness, but I am not worried.

I agreed. I think our routines are good, just don't want to miss a rare case if there's any.
Spinlectrix
Junior Member
• Total Posts : 72
• Reward points : 0
• Joined: 2016/03/20 14:48:37
• Location: 0
• Status: offline
Re: Final Drop for: Unsigned 32 by 32 divide routine in assembler 2020/09/20 16:40:33 (permalink)
5 (1)
And, to firmly prove that I am dodging my work that is really more important, here is an inline c callable function that will do a udiv3232 operation.  Since it will be in-lined, the compiler is free to choose whatever input registers it want to for the operands, so in addition to eliminating the RCALL/RET overhead there's often (depending on situation) no mov.d's to load up registers before the call:
`struct DoubleWord {    int16_t LoWord;    int16_t HiWord;};struct UDoubleWord {    uint16_t ULoWord;    uint16_t UHiWord;};union DoubleWordUnion {    long Longword;    unsigned long ULongword;    struct DoubleWord Words;    struct UDoubleWord UWords;};#define HasMULW true#define MulRepeats 17static inline __attribute__ ((always_inline))unsigned long UlongDIV(unsigned long numerator, unsigned long denominator) {union DoubleWordUnion quotient;asm __volatile__ (   " mov %2,w4 \n ff1l %d3,w7 \n bra NC, 0f \n"   " repeat #%4 \n div.uw %d2,%3 \n exch w0,w4 \n"   " repeat #%4 \n div.ud w0,%3 \n mov w4,w1 \n bra 9f \n" "0: mov %d2,w5 \n subr w7,#17,w7 \n subr w7,#16,w0 \n"   " sl %d3,w0,w0 \n lsr %3,w7,w6 \n ior w0,w6,w6 \n"   " lsr w5,w1 \n rrc w4,w0 \n"   " repeat #%4 \n div.ud w0, w6 \n"   " dec w7,w7 \n lsr w0,w7,w1 \n"#if HasMULW   " mulw.uu %d3,w1,w0 \n"#else   " mul.uu %d3,w1,w6 \n mov w6, w0 "#endif   " mul.uu %3,w1,w6 \n"   " sub w4,w6,w6 \n subb w5,w7,w7 \n sub w7,w0,w7 \n"   " subb w1,#0,w0 \n clr w1 \n" "9: " /*                        w0 %0                            w1 %1    */:/*outputs*/ "=a" (quotient.UWords.ULoWord), "=b" (quotient.UWords.UHiWord)/*                %d2:%2            %d3:%3               %4      */:/*inputs*/ "C" (numerator), "e" (denominator), "i" (MulRepeats) :/*clobbers*/ "cc", "w4","w5","w6","w7");return (quotient.ULongword);}`

When the whole fast ulong divide routine is displayed like this, its kind of hard to appreciate just how difficult generating the precisely correct 29 instructions can be!
post edited by Spinlectrix - 2020/09/20 18:19:47
Spinlectrix
Junior Member
• Total Posts : 72
• Reward points : 0
• Joined: 2016/03/20 14:48:37
• Location: 0
• Status: offline
Re: Final Drop for: Unsigned 32 by 32 divide routine in assembler 2020/09/20 19:10:48 (permalink)
5 (1)
Lastly (hopefully), since we are defining an inline c assembly instruction, the Dividend does not have to come in as w1:w0.  Since we are going to clobber w1:w0 with the quotient anyway, by insisting that the Dividend operands are passed in registers other than w1:w0 allows us to save yet another cycle, and give the compiler still more choices of operand registers! (Notice that in the following code only w2 and w3 are marked as clobbered) meaning that not only will the divisor registers remain unaltered, but the dividend registers will as well:  Unsigned long divide now down to 28 instructions and 47 cycles! Not merely shorter than the standard library ulong = ulong/ulong call, but over 10 times faster as well! :

`struct DoubleWord {    int16_t LoWord;    int16_t HiWord;};struct UDoubleWord {    uint16_t ULoWord;    uint16_t UHiWord;};union DoubleWordUnion {    long Longword;    unsigned long ULongword;    struct DoubleWord Words;    struct UDoubleWord UWords;};static inline __attribute__ ((always_inline))unsigned long UlongDIV(unsigned long numerator, unsigned long denominator) {union DoubleWordUnion quotient;asm __volatile__ (   " ff1l %d3,w2 \n bra NC, 0f \n"   " repeat #%4 \n div.uw %d2,%3 \n mov w0,w2 \n mov %2,w0 \n"   " repeat #%4 \n div.ud w0,%3 \n mov w2,w1 \n bra 9f \n" "0: subr w2,#17,w3 \n subr w3,#16,w0 \n"   " sl %d3,w0,w0 \n lsr %3,w3,w2 \n ior w0,w2,w2 \n"   " lsr %d2,w1 \n rrc %2,w0 \n"   " repeat #%4 \n div.ud w0, w2 \n"   " dec w3,w3 \n lsr w0,w3,w1 \n"#if HasMULW   " mulw.uu %d3,w1,w0 \n"#else   " mul.uu %d3,w1,w2 \n mov w2,w0 "#endif   " mul.uu %3,w1,w2 \n"   " sub %2,w2,w2 \n subb %d2,w3,w3 \n sub w3,w0,w3 \n"   " subb w1,#0,w0 \n clr w1 \n" "9: "/*                            w0 %0                        w1 %1     */:/*outputs*/ "=&a" (quotient.UWords.ULoWord), "=&b" (quotient.UWords.UHiWord)/*                 %d2:%2            %d3:%3              %4      */:/*inputs*/ "e" (numerator), "e" (denominator), "i" (MulRepeats) :/*clobbers*/ "cc","w2","w3");return (quotient.ULongword);}`

post edited by Spinlectrix - 2020/09/20 20:42:10
1and0
Access is Denied
• Total Posts : 11523
• Reward points : 0
• Joined: 2007/05/06 12:03:20
• Location: Harry's Gray Matter
• Status: offline
Re: Final Drop for: Unsigned 32 by 32 divide routine in assembler 2020/09/23 20:12:51 (permalink)
5 (1)
Finally got some free time to look over your latest optimization. The following shaves two more cycles when the remainder is returned in W7:W6 and the divisor in W3:W2. The dividend can be returned in W5:W4 at the expense of two more cycles in the Divisor <= 0xFFFF leg.
`; Return Values;   w1:w0 = Quotient  = Dividend / Divisor;   w3:w2 = Divisor;   w7:w6 = Remainder = Dividend % Divisor  , if remainder is requested;; Clobbered Registers;   w5:w4   (Dividend can be returned at expense of +2 Tcy)`

Now, for E-core,

Tcy(max) = 49,  no remainder
= 51,  with remainder

`;; Unsigned Long Integer Division of 32-bit by 32-bit;;   uint32_t;   udivmod3232 (uint32_t dividend, uint32_t divisor, uint32_t *remainder);;   uint32_t;   udiv3232 (uint32_t dividend, uint32_t divisor);;; Dividend = Quotient * Divisor + Remainder;; Calling Parameters;   w1:w0 = Dividend;   w3:w2 = Divisor;; Return Values;   w1:w0 = Quotient  = Dividend / Divisor;   w3:w2 = Divisor;   w7:w6 = Remainder = Dividend % Divisor  , if remainder is requested;; Clobbered Registers;   w5:w4   (Dividend can be returned at expense of +2 Tcy);; Tcy(max) = 7 + 2*DIV + RET + [1]             , Divisor <= 0xFFFF; Tcy(max) = 20 + BRA + DIV + RET + [4] - <1>  , Divisor >  0xFFFF;; Notes:; * Add the cycle in [] when computing the remainder.; * Minus the cycle in <> when using the MULW instruction.;; For PIC24F, PIC24H, dsPIC30F and dsPIC33F:;   Tcy(max) = ;; For PIC24E and dsPIC33E:;   Tcy(max) = ;; For dsPIC33C:;   Tcy(max) = ;; Ref: https://www.microchip.com/forums/FindPost/1151719;  ; Conditional assembly:; * Define NO_REMAINDER to return only the quotient and no remainder.; * Adding the remainder takes 3 more program memory words and 3 more;   instruction cycles maximum which are negligible, so it is the;   default by having NO_REMAINDER not defined.; ; NO_REMAINDER = 1  .ifdef  __PIC24EPIC_24E_33C_33E = 1.endif.ifdef  __dsPIC33EPIC_24E_33C_33E = 1.endif.ifdef  __dsPIC33CPIC_24E_33C_33E = 1.endif  ; Loop counter value for the REPEAT instruction of a DIV instruction.  .ifdef __dsPIC33CDIV_RCOUNT = (6-1)          ; execute DIV 6 consecutive times.elseDIV_RCOUNT = (18-1)         ; execute DIV 18 consecutive times.endif  ; Division Routine Names  .ifndef NO_REMAINDERudivmod3232:.elseudiv3232:.endif          mov     w0,w4       ; save dividend LSW        ff1l    w3,w7       ; w7 = first one left of divisor MSW        bra     nc,0f       ; if (divisor <= 0xFFFF) branch;-------; Case of Divisor <= 0xFFFF;; Here the high word of the divisor is zero, so the quotient may overflow a; 16-bit word.;;   w1:w0 = w1:w0 / w2;         = [(w1 / w2) << 16] + [(w1 % w2):w0 / w2];;   w7:w6 = w1:w0 % w2        ; Divide high word of dividend by the divisor.                repeat  #DIV_RCOUNT ; calculate quotient MSW (16/16 divide)        div.uw  w1,w2       ; w0 = w1 / w2, w1 = w1 % w2                            ; remainder becomes dividend MSW of next divide        exch    w0,w4       ; save quotient MSW, restore dividend LSW        ; Divide remainder and low word of dividend by the divisor.                repeat  #DIV_RCOUNT ; calculate quotient LSW (32/16 divide)        div.ud  w0,w2       ; w0 = (w1:w0) / w2, w1 = (w1:w0) % w2            .ifndef NO_REMAINDER        mul.uu  w1,#1,w6    ; w7:w6 = remainder    .endif        mov     w4,w1       ; restore quotient MSW        return              ; return with divisor MSW = 0;-------; Case of Divisor > 0xFFFF;; Here the high word of the divisor is non-zero, so the quotient will fit in; a 16-bit word; i.e. the high word of the quotient is guaranteed to be zero.;;   w1:w0 = w1:w0 / w3:w2;         = w5:w4 / w3:w2;         = [(w5:w4) >> n] / [(w3:w2) >> n];;   w7:w6 = w1:w0 % w3:w2        ; Save dividend for correction of the quotient later.        0:      mov     w1,w5       ; save dividend MSW        ; Calculate shift counts for normalization.                subr    w7,#17,w7   ; right shift >> (w7 = 17 - w7), w7 = n = [1..16]        subr    w7,#16,w0   ; left  shift << (w0 = 16 - w7)        ; Normalize the divisor so its left-most 1 is at the MSb of its LSW.                sl      w3,w0,w0    ; divisor' = divisor >> n        lsr     w2,w7,w6    ; "        ior     w0,w6,w6    ; w6 = (w3:w2) >> n        ; Consequently, scale the dividend to ensure the quotient fits in its LSW;; also, keeping more dividend bits minimizes the quotient rounding error.                lsr     w5,w1       ; dividend' = dividend >> 1        rrc     w4,w0       ;     w1:w0 = w5:w4 >> 1        ; Divide scaled dividend by the normalized divisor to get an estimated quotient; where the quotient error Err(q) = estimated quotient - correct quotient.                repeat  #DIV_RCOUNT ; calculate the 16-bit quotient (32/16 divide)        div.ud  w0,w6       ; w0 = (w1:w0) / w6, no overflow here        ; Undo the scaling of dividend by 2 and apply the same ratio as the divisor.                dec     w7,w7       ; new shift to account for quotient <<= 1        lsr     w0,w7,w1    ; w1 = quotient >>= (n - 1), Err(q) = [0,+1]        ; Calculate (quotient * divisor) separately so it won't overflow.        .ifdef  PIC_24E_33C_33E        mulw.uu w1,w3,w0    ;    w0    = w1 * w3    .else        mul.uu  w1,w3,w6    ; w7:w6    = w1 * w3        mov     w6,w0       ;    w0    = w1 * w3 = quotient * divisor MSW    .endif        mul.uu  w1,w2,w6    ;    w7:w6 = w1 * w2 = quotient * divisor LSW        ; Calculate remainder = (dividend - quotient * divisor).                sub     w4,w6,w6    ; w7:w6 = dividend - quotient * divisor        subb    w5,w7,w7    ;       = dividend - quotient * divisor LSW        sub     w7,w0,w7    ;                  - quotient * divisor MSW            subb    w1,#0,w0    ; if (negative) quotient is 1 too big, so quotient--    ; Correct remainder if quotient was corrected.        .ifndef NO_REMAINDER        cpsne   w1,w0       ; if (quotient did not need correction)         retlw   #0,w1       ;   return with quotient MSW = 0, w7:w6 = remainder                add     w6,w2,w6    ; else one too many divisor was subtracted        addc    w7,w3,w7    ;   so add one divisor back into remainder    .endif        retlw   #0,w1       ; return with quotient MSW = 0`

1and0
Access is Denied
• Total Posts : 11523
• Reward points : 0
• Joined: 2007/05/06 12:03:20
• Location: Harry's Gray Matter
• Status: offline
Re: Final Drop for: Unsigned 32 by 32 divide routine in assembler 2020/09/24 02:25:56 (permalink)
5 (2)
Here's one that does this:
`; Calling Parameters;   w1:w0 = Dividend;   w3:w2 = Divisor;; Return Values;   w1:w0 = Quotient  = Dividend / Divisor;   w3:w2 = Divisor;   w5:w4 = Remainder = Dividend % Divisor  , when remainder is requested;   w7:w6 = Dividend`

`;; Unsigned Long Integer Division of 32-bit by 32-bit;;   uint32_t;   udivmod3232 (uint32_t dividend, uint32_t divisor, uint32_t *remainder);;   uint32_t;   udiv3232 (uint32_t dividend, uint32_t divisor);;; Dividend = Quotient * Divisor + Remainder;; Calling Parameters;   w1:w0 = Dividend;   w3:w2 = Divisor;; Return Values;   w1:w0 = Quotient  = Dividend / Divisor;   w3:w2 = Divisor;   w5:w4 = Remainder = Dividend % Divisor  , when remainder is requested;   w7:w6 = Dividend;; Tcy(max) =  9       +     2*DIV + RET + [1]  , Divisor <= 0xFFFF; Tcy(max) = 20 - <1> + BRA + DIV + RET + [4]  , Divisor >  0xFFFF;; Notes:; * Tcy counts do not include the CALL/RCALL instruction.; * Minus the cycle in <> when using the MULW instruction.; * Add the cycle in [] when computing the remainder.;; For PIC24F, PIC24H, dsPIC30F and dsPIC33F:;   Tcy1 =  9 +   2*18 + 3 + [1] = 48 [49];   Tcy2 = 20 + 2 + 18 + 3 + [4] = 43 [47];   Tcy(max) = 48 [49];; For PIC24E and dsPIC33E:;   Tcy1 =  9 +   2*18 + 6 + [1] = 51 [52];   Tcy2 = 19 + 4 + 18 + 6 + [4] = 47 [51];   Tcy(max) = 51 [52];; For dsPIC33C:;   Tcy1 =  9 +    2*6 + 6 + [1] = 27 [28];   Tcy2 = 19 + 4 +  6 + 6 + [4] = 35 [39];   Tcy(max) = 35 [39]  , can swap the two cases to reduce Tcy.;; Ref: https://www.microchip.com/forums/FindPost/1151719;  ; Conditional assembly:; * Define NO_REMAINDER to return only the quotient and no remainder.; * Adding the remainder takes 3 more program memory words and 3 more;   instruction cycles maximum which are negligible, so it is the;   default by having NO_REMAINDER not defined.; ; NO_REMAINDER = 1  .ifdef  __PIC24EPIC_24E_33C_33E = 1.endif.ifdef  __dsPIC33EPIC_24E_33C_33E = 1.endif.ifdef  __dsPIC33CPIC_24E_33C_33E = 1.endif; Loop counter value for the REPEAT instruction of a DIV instruction.   .ifdef __dsPIC33CDIV_RCOUNT = (6-1)          ; execute DIV 6 consecutive times.elseDIV_RCOUNT = (18-1)         ; execute DIV 18 consecutive times.endif  ; Division Routine Names  .ifndef NO_REMAINDERudivmod3232:.elseudiv3232:.endif          mov.d   w0,w6       ; save dividend        ff1l    w3,w5       ; w5 = first one left of divisor MSW        bra     nc,0f       ; if (divisor <= 0xFFFF) branch;-------; Case of Divisor <= 0xFFFF;; Here the high word of the divisor is zero, so the quotient may overflow a; 16-bit word.;;   w1:w0 = w1:w0 / w2;         = [(w1 / w2) << 16] + [(w1 % w2):w0 / w2];;   w5:w4 = w1:w0 % w2        ; Divide high word of dividend by the divisor.                repeat  #DIV_RCOUNT ; calculate quotient MSW (16/16 divide)        div.uw  w1,w2       ; w0 = w1 / w2, w1 = w1 % w2                            ; remainder becomes dividend MSW of next divide        mov     w0,w3       ; save quotient MSW        mov     w6,w0       ; restore dividend LSW        ; Divide remainder and low word of dividend by the divisor.                repeat  #DIV_RCOUNT ; calculate quotient LSW (32/16 divide)        div.ud  w0,w2       ; w0 = (w1:w0) / w2, w1 = (w1:w0) % w2            .ifndef NO_REMAINDER        mul.uu  w1,#1,w4    ; w5:w4 = remainder    .endif        mov     w3,w1       ; restore quotient MSW        retlw   #0,w3       ; return with divisor MSW = 0;-------; Case of Divisor > 0xFFFF;; Here the high word of the divisor is non-zero, so the quotient will fit in; a 16-bit word; i.e. the high word of the quotient is guaranteed to be zero.;;   w1:w0 = w1:w0 / w3:w2;         = w7:w6 / w3:w2;         = {[(w7:w6) >> n] / [(w3:w2) >> n]};         = {[(w7:w6) >> 1] / [(w3:w2) >> n]} >> (n - 1);;   w5:w4 = w1:w0 % w3:w2        ; Calculate shift counts for normalization.        0:      subr    w5,#17,w5   ; right shift >> (w5 = 17 - w5), w5 = n = [1..16]        subr    w5,#16,w0   ; left  shift << (w0 = 16 - w5)        ; Normalize the divisor so its left-most 1 is at the MSb of its LSW.                sl      w3,w0,w0    ; divisor' = divisor >> n        lsr     w2,w5,w4    ; "        ior     w0,w4,w4    ; w4 = (w3:w2) >> n        ; Consequently, scale the dividend to ensure the quotient fits in its LSW,; while retaining maximum precision to minimize quotient rounding error.                lsr     w7,w1       ; dividend' = dividend >> 1        rrc     w6,w0       ;     w1:w0 = w7:w6 >> 1        ; Divide scaled dividend by the normalized divisor to get an estimated quotient; where the quotient error Err(q) = estimated quotient - correct quotient.                repeat  #DIV_RCOUNT ; calculate the 16-bit quotient (32/16 divide)        div.ud  w0,w4       ; w0 = (w1:w0) / w4, no overflow here        ; Undo the scaling of dividend by 2 and apply the same ratio as the divisor.                dec     w5,w5       ; new shift to account for quotient <<= 1        lsr     w0,w5,w1    ; w1 = quotient >>= (n - 1), Err(q) = [0,+1]        ; Calculate (quotient * divisor) separately so it won't overflow.            .ifdef  PIC_24E_33C_33E        mulw.uu w1,w3,w0    ;    w0    = w1 * w3    .else        mul.uu  w1,w3,w4    ; w5:w4    = w1 * w3        mov     w4,w0       ;    w0    = w1 * w3 = quotient * divisor MSW    .endif        mul.uu  w1,w2,w4    ;    w5:w4 = w1 * w2 = quotient * divisor LSW        ; Calculate remainder = (dividend - quotient * divisor).                sub     w6,w4,w4    ; w5:w4 = dividend - quotient * divisor = remainder        subb    w7,w5,w5    ;       = dividend - quotient * divisor LSW        sub     w5,w0,w5    ;                  - quotient * divisor MSW                subb    w1,#0,w0    ; if (negative) quotient is 1 too big, so quotient--        ; Correct remainder if quotient was corrected.            .ifndef NO_REMAINDER        cpsne   w1,w0       ; if (quotient did not need correction)         retlw   #0,w1       ;   return with quotient MSW = 0, w5:w4 = remainder                add     w2,w4,w4    ; else one too many divisor was subtracted        addc    w3,w5,w5    ;   so add one divisor back into remainder    .endif        retlw   #0,w1       ; return with quotient MSW = 0`

1and0
Access is Denied
• Total Posts : 11523
• Reward points : 0
• Joined: 2007/05/06 12:03:20
• Location: Harry's Gray Matter
• Status: offline
Re: Final Drop for: Unsigned 32 by 32 divide routine in assembler 2020/09/25 16:26:07 (permalink)
0
And here is the corresponding signed version:
`;; Signed Long Integer Division of 32-bit by 32-bit;;   int32_t;   divmod3232 (int32_t dividend, int32_t divisor, int32_t *remainder);;   int32_t;   div3232 (int32_t dividend, int32_t divisor);;; Dividend = Quotient * Divisor + Remainder;; Calling Parameters;   w1:w0 = Dividend;   w3:w2 = Divisor;; Return Values;   w1:w0 = Quotient  = Dividend / Divisor;   w3:w2 = Divisor;   w5:w4 = Remainder = Dividend % Divisor  , when remainder is requested;   w7:w6 = Dividend;  .ifndef NO_REMAINDERdivmod3232:.elsediv3232:.endif          push    w1          ; save signed dividend MSW        push    w3          ; save signed divisor MSW            .ifdef  PIC_24E_33C_33E        btsc    w1,#15      ; if (dividend < 0) negate dividend        neg     w0,w0       ; "        btsc    w1,#15      ; "        subbr   w1,#0,w1    ; "                btsc    w3,#15      ; if (divisor < 0) negate divisor        neg     w2,w2       ; "        btsc    w3,#15      ; "        subbr   w3,#0,w3    ; "    .else        btss    w1,#15      ; if (dividend < 0) negate dividend        bra     0f          ; "        neg     w0,w0       ; "        subbr   w1,#0,w1    ; "0:              btss    w3,#15      ; if (divisor < 0) negate divisor        bra     1f          ; "        neg     w2,w2       ; "        subbr   w3,#0,w3    ; "1:    .endif            .ifndef NO_REMAINDER        rcall   udivmod3232 ; call unsigned 32/32    .else        rcall   udiv3232    ; call unsigned 32/32    .endif                pop     w3          ; restore signed divisor MSW        btsc    w3,#15      ; "        neg     w2,w2       ; and its LSW          .ifndef NO_REMAINDER        pop     w7          ; restore signed dividend MSW        btss    w7,#15      ; if (dividend is negative)        bra     2f          ;   "        neg     w6,w6       ;   restore signed dividend LSW        neg     w4,w4       ;   negate remainder        subbr   w5,#0,w5    ;   "2:                xor     w7,w3,w7    ; calculate sign of quotient            .ifdef  PIC_24E_33C_33E        btsc    w7,#15      ; if (sign is negative) negate quotient        neg     w0,w0       ; "        btsc    w7,#15      ; "        subbr   w1,#0,w1    ; "    .else        btss    w7,#15      ; if (sign is negative) negate quotient        bra     3f        neg     w0,w0       ; "        subbr   w1,#0,w1    ; "3:    .endif                xor     w7,w3,w7    ; restore signed dividend MSW        return              ; return  .else        pop     w7          ; restore signed dividend MSW        btsc    w7,#15      ; "        neg     w6,w6       ; and its LSW                xor     w7,w3,w5    ; calculate sign of quotient                btss    w5,#15      ; if (sign is negative) negate quotient        return              ; "        neg     w0,w0       ; "        subbr   w1,#0,w1    ; "        return              ; return  .endif`

post edited by 1and0 - 2020/09/26 09:19:10
1and0
Access is Denied
• Total Posts : 11523
• Reward points : 0
• Joined: 2007/05/06 12:03:20
• Location: Harry's Gray Matter
• Status: offline
Re: Unsigned 32 by 32 divide routine in assembler 2020/10/22 07:03:13 (permalink)
0
Spinlectrix
.. it may not be too soon to start thinking about how/where to best locate a repository of similar useful routines:  For example, I have an ATAN2(y,x) routine which calculates both arc tangents and vector magnitudes executing in less than 65 cycles.  Is there already such a location?