• AVR Freaks

Hot!Signed 32 by 32 divide routine in assembler

Page: 12 > Showing page 1 of 2
Author
Spinlectrix
Junior Member
  • Total Posts : 72
  • Reward points : 0
  • Joined: 2016/03/20 14:48:37
  • Location: 0
  • Status: offline
2020/09/07 11:42:13 (permalink)
0

Signed 32 by 32 divide routine in assembler

Endevoring to expand upon the collaboration of thread 1151719 ("Unsigned 32 by 32 divide routine in assembler"), I have attempted to adapt the algorithm there to work with signed longs.
 
The code still needs testing, and it is missing a segment which optionally calculates accurate Remainders, but (as one would expect) it produces long divides an order of magnitude faster than standard long divide libraries on PIC24/dsPIC33 class processors:
 
 .section .text, code ; Normal Code Section.
 .align 2
 .global _sdiv3232 ; Make Visible Externally.
 .type _sdiv3232, @function ; Debugger Information in ELS.

; Signed Integer Division of 32-bit by 32-bit
;
; Credits (All Microchip Forum Usernames):
; Spinlectrix - Original Coding. Trim Register Usage. Speedups. Adaptation to Signed op.
; 1and0 - Fix Off-by-One Bug, Several Speedups.
; GlennP17321 - Off-by-One Bug Found, One Speedup, Conditionalize Remainder Calculations.
;
; Calling Parameters
; W1:W0 = Dividend.
; W3:W2 = Divisor.
;
; Return Values
; W1:W0 = Quotient.
; W3:W2 = Remainder (optional - adds instructions and cycles).
;
; Used Registers
; W4-W7.
;
; Ref: https://www.microchip.com/forums/FindPost/1151719
;
; Instruction Counts Depend on the Build Parameter: "Remainder".
; +Remainder: ?? Instructions.
; -Remainder: 33 Instructions.
;

; Select Code Generation Option: Compute / Ignore Remainder.
; Comment next line entirely out to avoid extra cycles of Remainder calculation
;Remainder = 1

; Specify Number of Repeats to Perform a Divide.
; This Should be in a Microchip Supplied Include File, But Was NOT Found.
.ifdef __dsPIC33C
DivideRepeatCount = 6        ;  6 for 33C Models.
.else
DivideRepeatCount = 18        ; 18 for All Other Models.
.endif

; When Divisor.HO == 0:
; Perform Long Division using 16/16 and 32/16 Hardware Instructions.
;
; +----------------+----------------+
; | DividendHO DividendLO |
; +----------------+----------------+
;
; 1. QuotientHO = DividendHO / DivisorLO. Use 16/16 Divide.
; 2. Replace DividendHO with Remainder from 1:
;
; +----------------+----------------+
; | Remainder DividendLO |
; +----------------+----------------+
;
; 3. QuotientLO = (Remainder | DividendLO) / DivisorLO. Use 32/16 Divide.
; 4. RemainderHO = 0, RemainderLO = Remainder from 3.
; 5. Quotient = (QuotientHO | QuotiendLO).


; When Divisor.HO != 0:
;
; 1. Form Approximate (Normalized) Divisor per Diagram Below.
;
; Sample Divisor (FindFirstOne = 6, RightShift = 11, LeftShift = 5):
; +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
; |0 0 0 0 0 1 a b c d e f g h i j k l m n o p q r s t u v w x y z|
; +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
; | H O | L O |
; 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
; 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
; | |
; +---------------+---------------+
; |
; | Approximate 16-Bit Divisor:
; | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
; +--> |1 a b c d e f g h i j k l m n o|
; +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
; From HO | From LO
; 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
; 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
;
; 2. Shift the U32 Dividend >> 'Right Shift' to Match Approximate Divisor.
; 3. Divide using 32/16 Instruction. Quotient MAY be One Greater due to Divisor Truncation.
; 4. Calculate Remainder = Dividend - (Divisor * Quotient). If Negative, Quotient Too Large.
; 5. If Remainder Negative, THEN Quotient--; AND Remainder += Divisor.

_sdiv3232:
 mov.w W0, W4 ; Copy Dividend.LO in W4.

; Find first bit-change in Divisor.HO (if any).
 fbcl W3, W5 ; W5 = First bit-change from left.
 bra NC, 0f ; !C(arry) Implies at least one bit-change; Divisor.HO != 0 or -1.


; |Divisor| <= 0x7fff. [Divisor.HO == 0] Perform Long Division because Quotient MAY Not Fit in Word,
; Divide Dividend.HO / Divisor.LO.; (but the Divisor _does_ fit in a word!)
 repeat #(DivideRepeatCount-1) ; Division Repeat Count Depends on 33C (6) or 'Other' (18).
 div.sw W1, W2 ; Divide Step. At End: W0 = W1 / W2 and W1 = W1 % W2.

; Save Quotient.HO; Restore Dividend.LO.
 exch W0, W4 ; Save Quotient.HO; Restore Dividend.LO.

; Divide (Remainder:Dividend.LO) / Divisor.
 repeat #(DivideRepeatCount-1) ; Division Repeat Count Depends on 33C (6) or 'Other' (18).
 div.sd W0, W2 ; Divide Step. At End: W0 = (W1:W0) / W2 and W1 = (W1:W0) % W2.

; Set W3:W2 to Remainder if Requested.
 .ifdef Remainder
 mul.su W1, #1, W2 ; W3:W2 is Remainder.
 .endif

; Recover Quotient.HO.
 mov W4, W1 ; W1 = Quotient.HO (from W4 - Temp Storage).

; Done. Return to Caller.
 return ; Return to Caller.

0:
; Divisor > 0x0FFFF. [Divisor.HO > 0] Perform Simple Division and Possible Correction.
; Here Divisor.HO > 0, so the Result Will Fit into a Word (Quotient.HO must be zero).
    
; Copy divisor for possible correction of the quotient later.
; Preserve Dividend.HO in W5.
 exch W1, W5 ; W1 = FirstOne Position; W5 = Dividend.HO.

; Calculate Shift Counts.
 add W1, #16, W1 ; W1 = Right Shift Count.
 subr W1, #16, W0 ; W0 = Left Shift Count.

; 'Normalized' 16-Bit Divisor Created by Shifting the Actual Divisor RIGHT by "RightShiftCount".
 sl W3, W0, W7 ; Divisor.HO <<= LeftShiftCount.
 lsr W2, W1, W6 ; Divisor.LO >>= RightShiftCount.
 ior W6, W7, W6 ; Divisor.LO = (Divisor >> RightShiftCount). Note: MSb = 1.

; Apply Corresponding Shifts to Dividend to preserve the ratio of Dividend to the Divisor.
 sl W5, W0, W7 ; W7 = Dividend.HO << LeftShiftCount.
 lsr W4, W1, W0 ; Dividend.LO >>= RightShiftCount.
 asr W5, W1, W1 ; Dividend.HO >>= RightShiftCount.
 ior W7, W0, W0 ; Dividend (W1:W0) = (Dividend >> RightShiftCount).

; Perform Division of 32/16 Unsigned using "div.sd" Instruction.
 repeat #(DivideRepeatCount-1) ; Division Repeat Count Depends on 33C (6) or 'Other' (18).
 div.sd W0, W6 ; Divide Step. At End: W0 = (W1:W0) / W6 and W1 = (W1:W0) % W6.

.ifndef Remainder ; No Remainder required -- Just produce a bit-accurate quotient:
 bra NZ,1f ; No need to correct quotient unless divide result sets Z flag (Remainder == 0).
 
; Form Quotient * Divisor.
 mul.ss W3, W0, W6 ; W7:W6 = Divisor.HO * Quotient.
 mul.us W2, W0, W2 ; W3:W2 = Divisor.LO * Quotient.
 add W3, W6, W3 ; W3:W2 = Divisor * Quotient
 
 asr W5,#15, W1 ; Set W1 = -1 if Dividend is negative, 0 otherwise.
 
; Remainder = Dividend - (Divisor * Quotient). Correct Quotient.LO (and Remainder?) if Negative.
 sub W4, W2, W2 ; Subtract LO Words.
 subb W5, W3, W3 ; Subtract HO Words. Borrow bit is set iff Remainder is Negative
 
 btss SR, #1 ; If result was 0, no adjustment is required (skip iff Z-flag)
 subb W0, W1, W0 ; Adjust Quotient as necessary (subtract 1, or add one)
  
.else ; Always calculate accurate Remainder
 ; NOT YET WRITTEN!!
.endif

; Done. Return to Caller.
1: asr W0,#15, W1 ; Extend sign of answer into quotient.HO
 return
 
.end

post edited by Spinlectrix - 2020/09/07 12:51:01
#1

24 Replies Related Threads

    Spinlectrix
    Junior Member
    • Total Posts : 72
    • Reward points : 0
    • Joined: 2016/03/20 14:48:37
    • Location: 0
    • Status: offline
    Re: Signed 32 by 32 divide routine in assembler 2020/09/07 13:05:13 (permalink)
    0
    !! This code has serious problems when the first bit change occurs across the word break.  I am investigating fixes now... :-(  !!
     
    <Updated Edit:>
    I'm still working on this; unfortunately the case where the bit-change occurs across the word boundaries would appear to necessitate an entire alternate method of calculating the signed divide, involving pre-calculating the sign of the result, taking absolute values, performing an unsigned divide, and correcting the quotient's sign.  I also would have to work in 1and0's (Harry's) updates to the routine overall, so it may be a while!
    post edited by Spinlectrix - 2020/09/08 23:33:04
    #2
    Spinlectrix
    Junior Member
    • Total Posts : 72
    • Reward points : 0
    • Joined: 2016/03/20 14:48:37
    • Location: 0
    • Status: offline
    Re: Signed 32 by 32 divide routine in assembler 2020/09/13 14:32:33 (permalink)
    0
    Ok, look out; there's been some code bloat:  In trying to keep signed division as fast as possible, there are now four methods, instead of just the two that were used in the unsigned version: One for positive results where hi word of quotient is 0, and one for negative, and then one for when the divisor fits entirely into a word, but then one more for when the MSB of the one-word divisor is where the first bit-change occurs, so the div.sd instruction cannot be effectively utilized.  (This last condition will be the longest path through the code -- we take abs() of the operands, perform the div op, and finally restore the appropriate sign.)  While the worst-case path is thus significantly worse, the typical case should be very much like the unsigned 32 divide:
     
     <Edit:>  Crap.  I found some cases which yield an erroneous result — still needs more work before its ready for prime-time!
    I'm going to have to extend the tester a bit to make it more robust as well...
     
                            ----- DO NOT USE THIS ROUTINE!!  MANY ERRORS!! --- LOOK FURTHER DOWN THIS THREAD!
    <>
     

     
        .section    .text, code                ; Normal Code Section.
        .align    2

    ;
    ; Signed Long Integer Division of 32-bit by 32-bit
    ;
    ;   sint32_t
    ;   sdivmod3232 (sint32_t dividend, sint32_t divisor, sint32_t *remainder)
    ;
    ;   sint32_t
    ;   sdiv3232 (suint32_t dividend, sint32_t divisor)
    ;
    ;
    ; Dividend = Quotient * Divisor + Remainder
    ;
    ; Calling Parameters
    ;   w1:w0 = Dividend
    ;   w3:w2 = Divisor
    ;
    ; Return Values
    ;   w1:w0 = Quotient
    ;   w3:w2 = Remainder
    ;
    ; Used Registers
    ;   w7,w6, w5,w4
    ;
    ; Tcy(max) = 12 + DIV + 16 + RETLW (quotient & remainder) ??
    ;          = 12 + DIV + 13 + RETLW (quotient only)    ??
    ;
    ; Tcy = 48 to 49    , w/ remainder for PIC24F device family
    ;
    ; Ref: https://www.microchip.com/forums/FindPost/1151719
    ;
    ; Credits (All Microchip Forum Usernames):
    ; * Spinlectrix, 1and0, and GlennP17321.
    ;
     
    ; Conditional assembly:
    ; * #define NO_REMAINDER to return only the quotient and no remainder.
    ; * Adding the remainder takes 3 more program memory words and 3 more
    ;   instruction cycles max which are negligible, so it's the default.
    ;
      .ifndef Remainder
        Remainder = 0            ; Compute Remainder (if Value != 0).
      .endif
     
    ; Loop counter value for the REPEAT instruction of a DIV instruction.
     
    .ifdef __dsPIC33C
       DIV_RCOUNT = (6-1)    ; execute DIV 6 consecutive times
    .else
       DIV_RCOUNT = (18-1)   ; execute DIV 18 consecutive times
    .endif
     
    ; Division Routine Names
     
    .if Remainder
        .global    sdivmod3232                ; Make Visible Externally.
        .type    sdivmod3232, @function        ; Debugger Information in ELS.
    _sdivmod3232:
    .else
        .global    _sdiv3232            ; Make Visible Externally.
        .type    _sdiv3232, @function        ; Debugger Information in ELS.
    _sdiv3232:
    .endif
     
            mov     w0,w4       ; save dividend LSW
            fbcl    w3,w5       ; w5 = first one left of divisor MSW
            bra     c,0f        ; if (divisor <= 0xFFFF) branch
    ;-------
    ; Case of Divisor > 0xFFFF
    ;
    ; Here the high word of the divisor is non-zero, so the quotient will fit in
    ; a 16-bit word; i.e. the high word of the quotient is guaranteed to be zero.
    ;
    ;   w1:w0 = w1:w0 / w3:w2
    ;         = w5:w4 / w3:w2
    ;         = [(w5:w4) >> n] / [(w3:w2) >> n]
            
    ; Copy divisor for correction of the quotient later
            exch    w1,w5       ; w1 = first one of divisor, w5 = dividend MSW
            
    ; Calculate shift counts for normalization
        add    w1,#16,w7 ; W1 = Right Shift Count.
        subr    w7,#16,w0 ; W0 = Left  Shift Count.
            
    ; Normalize the divisor so its left-most 1 is at the MSb of its LSW
            sl      w3,w0,w0    ; divisor = divisor >> n
            lsr     w2,w7,w6    ; "
            ior     w0,w6,w6    ; w6 = (w3:w2) >> n
            
    ; Consequently, scale the dividend to ensure the quotient fits in its LSW
            asr     w5,w1       ; dividend = dividend >> 1
            rrc     w4,w0       ;     w1:w0 = w5:w4 >> 1
            
    ; Divide scaled dividend by the normalized divisor
            repeat  #DIV_RCOUNT ; calculate the 16-bit quotient (32/16 divide)
            div.sd  w0,w6       ; w0 = (w1:w0) / w6, no overflow here
            
    ; Undo the scaling of dividend by 2 and apply the same ratio as the divisor
            dec     w7,w7       ; new shift to account for quotient <<= 1
            asr     w0,w7,w1    ; quotient >>= (n - 1)
            
        bra    N,1f            ; Take care of negative quotients
    ; Correct quotient for rounding error due to LSbs discarded by the shifts
    ; Adjust magnitude of quotient smaller by one, unless it is already zero
        btss    SR,#1        ; Skip if Z-flag is set (skip if == 0)
        dec    w1,w1        ; Decrease magnitude of positive quotients by one
                                ; So (quotient * divisor) will not overflow below
                    
            mul.ss  w1,w3,w6    ; quotient * divisor
            mov     w6,w0       ;   w0    = w1 * w3
            mul.su  w1,w2,w6    ; + w7:w6 = w1 * w2
            add     w7,w0,w7    ; = w7:w6 = quotient * divisor, no overflow here
            
    ; Calculate the remainder to correct the final quotient
            sub     w4,w6,w6    ; w7:w6 = dividend - quotient * divisor
            subb    w5,w7,w7    ;       = remainder #1, if it's < divisor
            
        sub     w6,w2,w2    ; if ((dividend - quotient * divisor) >= divisor)
            subb    w7,w3,w3    ;   w3:w2 = remainder #2,        Carry = 1
                            
    .if Remainder
            btss    SR,#0       ; get the correct remainder
            mov.d   w6,w2       ;
    .endif

        addc    w1,#0,w0    ; correct the quotient += Carry    
        retlw    #0,w1  ; 0 into quotient.HO


    1:    ; This is much like a mirror of what happens if the quotient is positive
        inc    w1,w1        ; Start by decreasing the magnitude of the quotient
            mul.ss  w1,w3,w6    ; quotient * divisor
            mov     w6,w0       ;   w0    = w1 * w3
            mul.su  w1,w2,w6    ; + w7:w6 = w1 * w2
            add     w7,w0,w7    ; = w7:w6 = quotient * divisor, no overflow here
            
    ; Calculate the remainder to correct the final quotient
            sub     w4,w6,w6    ; w7:w6 = dividend - quotient * divisor
            subb    w5,w7,w7    ;       = Remainder #1, if it's < divisor
            
        add     w2,w6,w2    ; if ((dividend - quotient * divisor) >= divisor)
            addc    w3,w7,w3    ;   w3:w2 = Remainder #2,        Carry = 1
        
        btsc    SR,#1        ; Iff result was zero,
        bclr    SR,#0        ; Clear carry bit
    .if Remainder    ;
            btsc    SR,#0       ; get the correct remainder
            mov.d   w6,w2       ; switch to Remainder #2
    .endif
        subb    w1,#0,w0    ; correct the quotient -= 1 if Z or /C    
        asr    w0,#15,w1   ; Extend sign of answer into quotient.HO
        return
        
    ;-------
    ; Case of Divisor <= 0xFFFF
    ;
    ; Here the high word of the divisor is zero, so the quotient may overflow a
    ; 16-bit word.
    ;
    ;   w1:w0 = w1:w0 / w2
    ;         = [(w1 / w2) << 16] + [(w1 % w2):w0 / w2]
            
    ; Divide high word of dividend by the one-word divisor
    0:      xor    w3,w2,w5    ; Is the first bit-change at the word-break?
        bra    N,2f        ; If so, use the alternate method, starting at 2:
        
            repeat  #DIV_RCOUNT ; calculate quotient MSW (16/16 divide)
            div.sw  w1,w2       ; w0 = w1 / w2, w1 = w1 % w2
                                ; remainder becomes dividend MSW of next divide
            exch    w0,w4       ; save quotient MSW, restore dividend LSW
            
    ; Divide remainder and low word of dividend by the divisor
            repeat  #DIV_RCOUNT ; calculate quotient LSW (32/16 divide)
            div.sd  w0,w2       ; w0 = (w1:w0) / w2, w1 = (w1:w0) % w2
            
    .if Remainder
            mul.su  w1,#1,w2    ; w3:w2 = remainder
    .endif
            mov     w4,w1       ; restore quotient MSW
            return              ; return
        
    2:  ; Here the first bit-change is at the word break.
         ; Perform a division of the absolute values of Dividend and Divisor, and
         ; follow-up by correcting the quotient (and remainder) to its proper sign.
          xor      w3,w1,w5     ; Store sign of answer in top bit of w5
          btsc     w3,#15        ; Skip this negate iff the Divisor is positive
          neg      w2,w2        ; Make divisor lower word an unsigned value
    .if Remainder
          asr      w1,#15,w6    ; Preserve the sign bit of the Dividend in w6=-1 or 0.
          btss     w1,#15       ; Sign of remainder is same as sign of Dividend:
          mov      #1,w6        ; Make w6 = +1, or leave it at -1
    .endif
          btsc     w1,#15        ; Skip next negate iff Dividend is positive
          neg      w4,w4        ; Make Dividend lower word an unsigned value
          btsc     w1,#15        ; Again, skip next negate iff Dividend is positive
          subbr    w1,#0,w1     ;

          repeat  #DIV_RCOUNT ; calculate quotient MSW (16/16 divide)
          div.uw  w1,w2       ; w0 = w1 / w2, w1 = w1 % w2
                                ; remainder becomes dividend MSW of next divide
          exch    w0,w4       ; save quotient MSW, restore dividend LSW
            
    ; Divide remainder and low word of dividend by the divisor
          repeat  #DIV_RCOUNT ; calculate quotient LSW (32/16 divide)
          div.ud  w0,w2       ; w0 = (w1:w0) / w2, w1 = (w1:w0) % w2
           
          
    .if Remainder
            mul.us  w1,w6,w2    ; w3:w2 = remainder
    .endif
            mov     w4,w1       ; restore quotient MSW
        btsc    w5,#15        ; Restore proper sign for quotient
        neg    w0,w0        
        btsc    w5,#15
        subbr   w1,#0,w1
            return              ; return
        
    .end

     
    post edited by Spinlectrix - 2020/09/17 20:26:01
    #3
    Spinlectrix
    Junior Member
    • Total Posts : 72
    • Reward points : 0
    • Joined: 2016/03/20 14:48:37
    • Location: 0
    • Status: offline
    Re: Signed 32 by 32 divide routine in assembler 2020/09/15 17:42:41 (permalink)
    0
    Arg.  Grumble, grumble, *!%##. 
     
    (Still struggling...)
    #4
    1and0
    Access is Denied
    • Total Posts : 11312
    • Reward points : 0
    • Joined: 2007/05/06 12:03:20
    • Location: Harry's Gray Matter
    • Status: online
    Re: Signed 32 by 32 divide routine in assembler 2020/09/15 18:30:26 (permalink)
    0
    Spinlectrix, I think I have the case of divisor > 0xFFFF figured out. I will do more testing and optimizing before I post the code. ;)  Then work on the other case.  Unfortunately, I don't have much free time lately to work on this.
     
    #5
    Gort2015
    Klaatu Barada Nikto
    • Total Posts : 4003
    • Reward points : 0
    • Joined: 2015/04/30 10:49:57
    • Location: 0
    • Status: online
    Re: Signed 32 by 32 divide routine in assembler 2020/09/16 06:23:21 (permalink)
    0
    This was written on an old FJ33.
    .global _div32, _divs32
    ; -------------------------------------
    .section Divide32, code
    ; -------------------------------------
    ; structure
    Div32_quotient_L  = 0
    Div32_quotient_H  = 2
    Div32_Modular_L   = 4
    Div32_Modular_H   = 6
    ; -------------------------------------
    arg_Divident_L    = w0
    arg_Divident_H    = w1
    arg_Divisor_L     = w2
    arg_Divisor_H     = w3
    arg_pointer_Div32 = w4
    x                 = w5
    t_L               = w6
    t_H               = w7
    borrow            = w8

    _divs32: ; int divs32(long Divident, long Devisor, long quotient[2])  ; and Modular
        ; if both neg, make pos
        and     arg_Divident_H, arg_Divisor_H, x
        btss    x, #15
        bra     not_both_neg
        subr    arg_Divident_L, #0, arg_Divident_L
        subbr   arg_Divident_H, #0, arg_Divident_H
        subr    arg_Divisor_L,  #0, arg_Divisor_L
        subbr   arg_Divisor_H,  #0, arg_Divisor_H
        ; - - - - - - - - - - - - - - - -
    not_both_neg:
        push    w9
        clr     w9
        ; - - - - - - - - - - - - - - - -
        btss    arg_Divident_H, #15
        bra     dent_not_neg
        subr    arg_Divident_L, #0, arg_Divident_L
        subbr   arg_Divident_H, #0, arg_Divident_H
        setm    w9  ; at least one is neg
        ; - - - - - - - - - - - - - - - -
    dent_not_neg:
        btss    arg_Divisor_H, #15
        bra     sor_not_neg
        subr    arg_Divisor_L, #0, arg_Divisor_L
        subbr   arg_Divisor_H, #0, arg_Divisor_H
        setm    w9  ; at least one is neg
        ; - - - - - - - - - - - - - - - -
    sor_not_neg:
        rcall   _div32
        ; test at least one neg
        cp0     w9
        bra     z, 1f
        ; - - - - - - - - - - - - - - - -
        ; make both quote and mod neg
        clr     w1
        sub     w1, [arg_pointer_Div32],   [arg_pointer_Div32]
        subb    w1, [++arg_pointer_Div32], [arg_pointer_Div32]
        sub     w1, [++arg_pointer_Div32], [arg_pointer_Div32]
        subb    w1, [++arg_pointer_Div32], [arg_pointer_Div32]
        ; - - - - - - - - - - - - - - - -
    1:  pop     w9
        return
    ; -------------------------------------
    ; -------------------------------------
    _div32: ; int div32(unsigned long Divident, unsigned long Devisor, unsigned long quotient[2])  ; and Modular
        push    w8
        mov     #1, t_L
        clr     t_H
        ; - - - - - - - - - - - - - - - -
        ; check high word for 1
        ff1l    arg_Divisor_H, x
        bra     c, checkLow    ; not found, check low word
        dec     x, x
        ; - - - - - - - - - - - - - - - -
        ; shift 32bit
        sl      arg_Divisor_H, x, arg_Divisor_H
        subr    x, #16, borrow
        lsr     arg_Divisor_L, borrow, borrow
        sl      arg_Divisor_L, x, arg_Divisor_L
        ior     arg_Divisor_H, borrow, arg_Divisor_H
        ; - - - - - - - - - - - - - - - -
        subr    x, #16, borrow
        lsr     t_L, borrow, borrow
        sl      t_L, x, t_L
        ior     t_H, borrow, t_H
        bra     done_shift  
        ; - - - - - - - - - - - - - - - -
    checkLow:
        ; check low word for 1
        ff1l    arg_Divisor_L, x
        bra     nc, notzero
        mov     #ERR_DIVIDE_BY_ZERO, w0   ; divide by zero
        bra     exit
        ; - - - - - - - - - - - - - - - -
    notzero:
        ; shift 32bit
        dec     x, x
        mov     arg_Divisor_L, arg_Divisor_H
        clr     arg_Divisor_L
        sl      arg_Divisor_H, x, arg_Divisor_H
        ; - - - - - - - - - - - - - - - -
        mov     t_L, t_H
        clr     t_L
        sl      t_H, x, t_H
        ; - - - - - - - - - - - - - - - -
    done_shift:
        clr     [arg_pointer_Div32++]
        clr     [arg_pointer_Div32--]
    loop:
        cp      arg_Divident_L, arg_Divisor_L
        cpb     arg_Divident_H, arg_Divisor_H
        bra     nc, noadd
        ; - - - - - - - - - - - - - - - -
        sub     arg_Divident_L, arg_Divisor_L, arg_Divident_L
        subb    arg_Divident_H, arg_Divisor_H, arg_Divident_H
        add     t_L, [arg_pointer_Div32], [arg_pointer_Div32++]
        addc    t_H, [arg_pointer_Div32], [arg_pointer_Div32--]
        ; - - - - - - - - - - - - - - - -
    noadd:
        lsr     arg_Divisor_H, arg_Divisor_H
        rrc     arg_Divisor_L, arg_Divisor_L
        lsr     t_H, t_H
        rrc     t_L, t_L
        ; - - - - - - - - - - - - - - - -
        ; complete?
        cp0     t_L
        cpb     t_H, #0
        bra     nz, loop
        ; store modular
        mov     arg_Divident_L, [arg_pointer_Div32+4]
        mov     arg_Divident_H, [arg_pointer_Div32+6]
        mov     ERR_OK, w0
        ; - - - - - - - - - - - - - - - -
    exit:
        pop     w8
        return  ; ERROR
    ; -------------------------------------
    .end

     
    ERR_OK             = 0
    ERR_DIVIDE_BY_ZERO = 1

     

    MPLab X playing up, bug in your code? Nevermind, Star Trek:Discovery will be with us soon.
    https://www.youtube.com/watch?v=Iu1qa8N2ID0
    + ST:Continues, "What Ships are Made for", Q's back.
    #6
    GlennP
    Super Member
    • Total Posts : 833
    • Reward points : 0
    • Joined: 2009/03/29 15:04:55
    • Location: El Paso County, CO, USA
    • Status: offline
    Re: Signed 32 by 32 divide routine in assembler 2020/09/16 12:38:35 (permalink)
    0
    This was written last night before Gort's Post, but I thought I'd include it in the mix.  The handling of sign is brute-force (similar to Gort's), but it uses Harry's last Unsigned algorithm.
     
    Like Harry, I haven't had much time available for this.
     
    The test code checks all paths of interest.
     
    GP
    #7
    Gort2015
    Klaatu Barada Nikto
    • Total Posts : 4003
    • Reward points : 0
    • Joined: 2015/04/30 10:49:57
    • Location: 0
    • Status: online
    Re: Signed 32 by 32 divide routine in assembler 2020/09/16 12:58:48 (permalink)
    0
    Code I wrote 6 Years ago, my dsc double dabble function can be used to display 32 results.
     
    As in
     
    int qvalue32 (long Value, int Format, int Justify);

    MPLab X playing up, bug in your code? Nevermind, Star Trek:Discovery will be with us soon.
    https://www.youtube.com/watch?v=Iu1qa8N2ID0
    + ST:Continues, "What Ships are Made for", Q's back.
    #8
    Spinlectrix
    Junior Member
    • Total Posts : 72
    • Reward points : 0
    • Joined: 2016/03/20 14:48:37
    • Location: 0
    • Status: offline
    Re: Signed 32 by 32 divide routine in assembler 2020/09/17 12:08:24 (permalink)
    0
    Ok, I'm pretty sure this is the best I can do:  After a stupid amount of trying to optimize using actual signed divide routines, there kept being so many special-cases which added an instruction or branch here or there, it finally turned out to be more optimal to just do the obvious move and take the ABS(operators) before the op, and then correct for the final sign.
     
    There is a much steeper cycle penalty when calculating the Remainder with the signed routines, so don't (unless you really need it).
     
    I've successfully done a significant amount of testing on this routine without Remainder calculation, but by no means an exhaustive test.
     
    In doing the optimization, I also managed to shave off 3 cycles of the unsigned divide routine on what turns out to be by far the most common path (just 2 cycles for core 'E' variants), which I will post on the unsigned 32 by 32 thread.
     
        .section    .text, code                ; Normal Code Section.
        .align    2

    ;
    ; Signed Long Integer Division of 32-bit by 32-bit
    ;
    ;   sint32_t
    ;   sdivmod3232 (sint32_t dividend, sint32_t divisor, sint32_t *remainder)
    ;
    ;   sint32_t
    ;   sdiv3232 (suint32_t dividend, sint32_t divisor)
    ;
    ;
    ; Dividend = Quotient * Divisor + Remainder
    ;
    ; Calling Parameters
    ;   w1:w0 = Dividend
    ;   w3:w2 = Divisor
    ;
    ; Return Values
    ;   w1:w0 = Quotient
    ;   w3:w2 = Remainder
    ;
    ; Used Registers
    ;   w7,w6, w5,w4 and up to 2 words on the stack
    ;
    ;
    ; Ref: https://www.microchip.com/forums/FindPost/1151719
    ;
    ; Credits (All Microchip Forum Usernames):
    ; * Spinlectrix, 1and0, and GlennP17321.
    ;
     
    ; Conditional assembly:
    ; * #define Remainder = 0 to return only the quotient and no remainder.
    .ifndef Remainder
       Remainder = 0            ; Compute Remainder (if Value != 0).
    .endif
     
    ; Loop counter value for the REPEAT instruction of a DIV instruction.
    .ifdef __dsPIC33C
       DIV_RCOUNT = (6-1)    ; execute DIV 6 consecutive times
    .else
       DIV_RCOUNT = (18-1)   ; execute DIV 18 consecutive times
    .endif
     
    ; Division Routine Names
    .if Remainder
        .global  _sdivmod3232            ; Make Visible Externally.
        .type    _sdivmod3232, @function ; Debugger Information in ELS.
    _sdivmod3232:
    .else
        .global  _sdiv3232               ; Make Visible Externally.
        .type    _sdiv3232, @function    ; Debugger Information in ELS.
    _sdiv3232:
    .endif
     
        xor    w1, w3,[w15++]   ; Preserve sign of Quotient on stack for restoration later

    ; Take ABS() of Divisor (negate the divisor, iff it is passed in as a negative value)
        btsc   w3,#15
        neg    w2,w2
        btsc   w3,#15
        subbr  w3,#0,w3
        
    .if Remainder ; For optimal remainder sign calcs, ABS(Dividend) is done seperately in each section
        ; Find First '1' Bit in Divisor.HO (if Any).
            ff1l   w3, w7           ; w7 = First One Position; 1 = HO, 16 = LO.
            bra    C, 0f            ; C(arry) Implies NO Ones in w3, so Divisor.HO == 0.
        
            mov    w1, [w15++]      ; Preserve sign of Dividend on stack (to restore sign of Remainder)
        
        ; Take ABS() of Dividend (negate the dividend, iff it is passed in as a negative value)
            btsc   w1,#15
            neg    w0,w0
            btsc   w1,#15
            subbr  w1,#0,w1
        
    .else ; Case for no remainder: Only need one copy of Dividend ABS().
    ; Take ABS() of Dividend (negate the dividend, iff it is passed in as a negative value)
        btsc   w1,#15
        neg    w0,w0
        btsc   w1,#15
        subbr  w1,#0,w1
        
        ff1l   w3, w7      ; w7 = First One Position; 1 = HO, 16 = LO.
        bra    C, 0f       ; C(arry) Implies NO Ones in w3, so Divisor.HO == 0.
    .endif
        
        mov.d  w0,w4       ; Copy ABS() of original Dividend to w5:w4
        
    ;And from here on, we are just operating on unsigned values (until we restore sign of Quotient):

    ; Divisor > 0x0FFFF.  [Divisor.HO > 0]  Perform Simple Division and Possible Correction.
    ; Here Divisor.HO > 0, so the Result Will Fit into a Word (Quotient.HO must be zero).

    ; Calculate Shift Counts for Normalizing the Divisor.
        subr   w7, #17, w7          ; w7 = Right Shift Count.
        subr   w7, #16, w0          ; w0 = Left  Shift Count.

    ; 'Normalized' 16-Bit Divisor Created by Shifting the Actual Divisor RIGHT by "RightShiftCount".
        sl    w3, w0, w0           ; Divisor.HO <<=  LeftShiftCount.
        lsr    w2, w7, w6           ; Divisor.LO >>= RightShiftCount.
        ior    w0, w6, w6           ; Divisor.LO = (Divisor >> RightShiftCount).  Note: MSb = 1.

    ; Shift Dividend Right by 1: W1:W0 = W5:W4 >> 1.
    ; Retains Maximum Precision in Quotient (Next Step) while Assuring Quotient Fits in One Word.
        lsr    w5, w1               ; Dividend.HO >>= 1.
        rrc    w4, w0               ; Dividend.LO >>= 1 (and HO Bit from Dividend.HO Bit 0).

    ; Perform Division of 32/16 Unsigned using "div.ud" Instruction.
        repeat  #DIV_RCOUNT         ; Division Repeat Count Depends on 33C (6) or 'Other' (18).
        div.ud  w0, w6              ; Divide Step.  At End: w0 = (w1:w0) / w6 and w1 = (w1:w0) % w6.

    ; Scale Quotient by RightShiftCount-1 to Compensate for Divisor Shifting.
        dec    w7, w7               ; Reduce RightShiftCount (Dividend Already Shifted Right One Above).
        lsr    w0, w7, w0           ; Quotient (w0) >> (RightShiftCount-1).  Quotient Now Scaled Back to 1:1.

    ; Form Quotient * Divisor.
        mul.uu w3, w0, w6           ; w7:w6 = Divisor.HO * Quotient.
        mov    w6, w1               ; Save (Divisor.HO * Quotient).LO in w1.
        mul.uu w2, w0, w6           ; w7:w6 = Divisor.LO * Quotient.
        add    w7, w1, w7           ; w7:w6 = Divisor * Quotient.
     
    .if Remainder
            exch    w2, w6              ; Swap divisor with (divisor*quotient) product
            exch    w3, w7              ; (To preserve Divisor in case we have to recalculate remainder)
            bra     C,  1f              ; Go decrease Quotient iff add caused Carry
     
    ; Calculate the default remainder for the probable quotient
            sub     w4, w2, w2          ; w3:w2 = dividend - quotient * divisor
            subb    w5, w3, w3          ;       = Remainder (probably)

            bra     C,3f                ; Magnitude of Remainder correct -- go restore sign.
     
    ; Correct quotient, Calculate the remainder for the corrected quotient
    1:      dec     w0, w0              ; Adjust quotient down by one
            mul.uu  w7, w0, w2          ; w3:w2 = Divisor.HO * Quotient.
            mov     w2, w1              ; Save (Divisor.HO * Quotient).LO in w1.
            mul.uu  w6, w0, w2          ; w3:w2 = Divisor.LO * Quotient.
            add     w3, w1, w3          ; w3:w2 = Divisor * Quotient.
            sub     w4, w2, w2          ; w3:w2 = dividend - quotient * divisor
            subb    w5, w3, w3          ;       = Remainder
     
    3:      cp0 [--w15]                 ; Recall original sign of Divided
            bra NN, 5f                  ; Iff it was positive, no need to change
            neg     w2, w2              ; Negate sign of Remainder
            subbr   w3,#0, w3           ; to match Divided
    5:      btss    [--w15],#15         ; Skip return iff Quotient is negative
            retlw   #0,w1               ; Return if Quotient is positive
        
            clr     w1                  ; Negate Quotient
            neg     w0,w0
            subbr   w1, #0, w1
            return                      ; return with negated Quotient
     
    .else ; Case for no Remainder necessary
        bra    C, 1f                ; Go decrease Quotient iff add caused Carry
     
    ; Determine if dividend - (quotient * divisor) is non-negative
        sub     w4, w6, w6          ; w7:w6 = dividend - quotient * divisor
        subb    w5, w7, w7          ;       = Remainder (probably)

        btss    SR, #0              ; Iff Carry is clear, we have to adjust the quotient, so do not return
    1:  dec     w0, w0              ; Adjust quotient down by one
        btss    [--w15],#15         ; Skip return iff Quotient is negative
        retlw   #0,w1               ; Return if Quotient is positive
        
        clr     w1                  ; Negate Quotient
        neg     w0,w0
        subbr   w1, #0, w1
        return                      ; return with negated Quotient
    .endif
        
    0:; |Divisor| < 0x10000.  [Divisor.HO == 0 or -1]  Perform Long Division because Quotient MAY Not Fit in Word.
    ; Divide Dividend.HO / Divisor.LO.
    .if Remainder
        asr      w1,#15,w6    ; Preserve the sign bit of the Dividend in w6=-1 or 0.
        rlc      w6,w6        ; This will change a w6=0 to w6=1, or leave w6=-1 unchanged.
                              ; (To get here, Carry flag must have been set)
        
    ; Take ABS() of Dividend (negate the dividend, iff it is passed in as a negative value)
        btsc   w1,#15
        neg    w0,w0
        btsc   w1,#15
        subbr  w1,#0,w1
    .endif
        
        mov    w0,w4        ; Preserve Dividend.LO

        repeat  #DIV_RCOUNT ; calculate quotient MSW (16/16 divide)
        div.uw  w1,w2       ; w0 = w1 / w2, w1 = w1 % w2
                            ; remainder becomes dividend MSW of next divide
        exch    w0,w4       ; save quotient MSW, restore dividend LSW
            
    ; Divide remainder and low word of dividend by the divisor
        repeat  #DIV_RCOUNT ; calculate quotient LSW (32/16 divide)
        div.ud  w0,w2       ; w0 = (w1:w0) / w2, w1 = (w1:w0) % w2    
          
    .if Remainder
        mul.us  w1,w6,w2    ; w3:w2 = remainder
    .endif
        mov     w4,w1       ; restore quotient MSW
        btss    [--w15],#15 ; Skip return iff Quotient is negative
        return              ; Return if Quotient is positive
        neg     w0,w0       ; Negate Quotient
        subbr   w1,#0,w1
        return              ; return negated Quotient
    .end
     
     
     

    post edited by Spinlectrix - 2020/09/17 12:20:11
    #9
    1and0
    Access is Denied
    • Total Posts : 11312
    • Reward points : 0
    • Joined: 2007/05/06 12:03:20
    • Location: Harry's Gray Matter
    • Status: online
    Re: Signed 32 by 32 divide routine in assembler 2020/09/17 18:50:22 (permalink)
    0
    All you guys are taking the easy way out; i.e. pre-calculating the sign of the result, taking absolute values, performing an unsigned divide, and then correcting the signs. But, you guys handle the sign bits in complicated ways. ;)

    To use the unsigned divide to perform the signed divide is simple:
    ;
    ; Signed Long Integer Division of 32-bit by 32-bit
    ;
    ; int32_t
    ; divmod3232 (int32_t dividend, int32_t divisor, int32_t *remainder)
    ;
    ; Dividend = Quotient * Divisor + Remainder
    ;
    ; Calling Parameters
    ;   w1:w0 = Dividend
    ;   w3:w2 = Divisor
    ;
    ; Return Values
    ;   w1:w0 = Quotient  = Dividend / Divisor
    ;   w3:w2 = Remainder = Dividend % Divisor
    ;
    ; Used Registers
    ;   w7,w6, w5,w4
    ;
    divmod3232:
            xor     w1,w3,w5    ; w8<15> = sign bit (0 = positive, 1 = negative)
            rlc     w1,w4       ; Carry  = dividend sign
            rrc     w5,w5       ; w8<15> = dividend sign, w8<14> = quotient sign
            push    w5          ; push sign bits to stack

            btsc    w1,#15      ; if (dividend < 0) negate dividend
            neg     w0,w0       ; "
            btsc    w1,#15      ; "
            subbr   w1,#0,w1    ; "
            
            btsc    w3,#15      ; if (divisor < 0) negate divisor
            neg     w2,w2       ; "
            btsc    w3,#15      ; "
            subbr   w3,#0,w3    ; "

            rcall   udivmod3232 ; call unsigned 32/32 divide
            pop     w5          ; pop sign bits from stack
            
        .ifndef NO_REMAINDER
            btsc    w5,#15      ; if (dividend is negative) remainder is negative
            neg     w2,w2       ; "
            btsc    w5,#15      ; "
            subbr   w3,#0,w3    ; "
        .endif
            
            btsc    w5,#14      ; if (sign is negative) negate the quotient
            neg     w0,w0       ; "
            btsc    w5,#14      ; "
            subbr   w1,#0,w1    ; "
            return              ; return

    #10
    1and0
    Access is Denied
    • Total Posts : 11312
    • Reward points : 0
    • Joined: 2007/05/06 12:03:20
    • Location: Harry's Gray Matter
    • Status: online
    Re: Signed 32 by 32 divide routine in assembler 2020/09/17 19:03:52 (permalink)
    0
    As I've said before, I have the case of (divisor > 0xFFFF) of the signed division mostly figured out; it seems to work but has issues when the first bit change occurs on word boundary. This case has a maximum cycle time of 56 cycles on a PIC24F. I haven't spent much time on the other case of (divisor <= 0xFFFF) yet.

    It takes at least 18 cycles, which is a big percentage of the total cycles, to pre-calculate the sign of the result, take absolute values, perform an unsigned divide, and then correct the signs. So, it will be more efficient if we can avoid this route. ;)
     
     
     
    post edited by 1and0 - 2020/09/17 19:09:34
    #11
    Spinlectrix
    Junior Member
    • Total Posts : 72
    • Reward points : 0
    • Joined: 2016/03/20 14:48:37
    • Location: 0
    • Status: offline
    Re: Signed 32 by 32 divide routine in assembler 2020/09/17 20:07:43 (permalink)
    0
    Glenn —
    I checked your code out through my tester, and found no errors after over 1M random signed long divisions without remainder calculation (the good news).  Over those million divisions, the routine's longest load-call-return took 91 cycles on an 'E'-class core, while the average path was 89.68 cycles (Optimization level set at O1).
     
    This shines against the default (long) = (long)/(long) operation on the PIC, which is 555 cycles Worst-case, and averaging 553.67 cycles per divide; a vast improvement.
     
    But over those same million divides, the code in my previous post shows 81 Worst-case cycles, with an average of 77.53, shaving over 12 cycles per average call!
     
    [As a basis for comparison, the numbers for my unsigned long divide are Worst-case 68, average 65.18, so the signed version averages about 12 cycles/call slower than unsigned.  This makes some intuitive sense:  It takes four instructions to abs() the dividend, 4 for the divisor, and 4 to ultimately restore the quotient's sign.  The numbers listed here being slightly larger than raw routine cycle-counts are due to these also including cycles to load the dividend and divisor into w0 -- w3, (in each case, done with mov.d's) as well as the call, the routine and then the return. ]
     
    I'd love it if this were another group effort:  Routines as basic as a long divide should have been optimized years ago!  Can anyone cut off a few more?  Everyone who ever uses these routines will then benefit!
     
    <Edit update> Two things:
    First of all, I've noticed by examining the listing of the testing framework code that the values above are all uniformly 10 cycles too pessimistic.  Each divide call was preceded by  6 cycles worth of register loading (nothing to do with the div routine -- properly designed the Dividend and Divisor may already be present in the appropriate registers) and 4 cycles of storing the results (also outside of the routine, and which may be avoided if the quotient produced in w1:w0 is going to be chained to another operation anyway).  In the future I will post only the values within the div routine.
    Secondly, I've improved the cycles required when calculating remainders, and will post shortly.
    <\Edit>
    post edited by Spinlectrix - 2020/09/18 19:29:57
    #12
    Spinlectrix
    Junior Member
    • Total Posts : 72
    • Reward points : 0
    • Joined: 2016/03/20 14:48:37
    • Location: 0
    • Status: offline
    Re: Signed 32 by 32 divide routine in assembler 2020/09/18 20:47:53 (permalink)
    0
    Updated Version (well, the first half anyway.  I think the Forum has stopped allowing me to post large code snippets (?), claiming permissions issues:
        .section .text, code ; Normal Code Section.
        .align 2

    ;
    ; Signed Long Integer Division of 32-bit by 32-bit
    ;
    ; sint32_t
    ; sdivmod3232 (sint32_t dividend, sint32_t divisor, sint32_t *remainder)
    ;
    ; sint32_t
    ; sdiv3232 (suint32_t dividend, sint32_t divisor)
    ;
    ;
    ; Dividend = Quotient * Divisor + Remainder
    ;
    ; Calling Parameters
    ; w1:w0 = Dividend
    ; w3:w2 = Divisor
    ;
    ; Return Values
    ; w1:w0 = Quotient
    ; w3:w2 = Remainder
    ;
    ; Used Registers
    ; w7,w6, w5,w4 and up to 2 words on the stack
    ;
    ;
    ; Ref: https://www.microchip.com/forums/FindPost/1151719
    ;
    ; Credits (All Microchip Forum Usernames):
    ; * Spinlectrix, 1and0, and GlennP17321.
    ;
      
    ; Conditional assembly:
    ; * #define Remainder = 0 to return only the quotient and no remainder.
    .ifndef Remainder
       Remainder = 0 ; Compute Remainder (if Value != 0).
    .endif
      
    ; Loop counter value for the REPEAT instruction of a DIV instruction.
    .ifdef __dsPIC33C
       DIV_RCOUNT = (6-1) ; execute DIV 6 consecutive times
    .else
       DIV_RCOUNT = (18-1) ; execute DIV 18 consecutive times
    .endif
      
    ; Division Routine Names
    .if Remainder
        .global _sdivmod3232 ; Make Visible Externally.
        .type _sdivmod3232, @function ; Debugger Information in ELS.
    _sdivmod3232:
    .endif
        .global _sdiv3232 ; Make Visible Externally.
        .type _sdiv3232, @function ; Debugger Information in ELS.

    post edited by Spinlectrix - 2020/09/18 21:00:12
    #13
    Spinlectrix
    Junior Member
    • Total Posts : 72
    • Reward points : 0
    • Joined: 2016/03/20 14:48:37
    • Location: 0
    • Status: offline
    Re: Signed 32 by 32 divide routine in assembler 2020/09/18 21:00:52 (permalink)
    0
    Operational sdiv3232 continues:
    _sdiv3232:
     
        xor    w1, w3,[w15++]   ; Preserve sign of Quotient on stack for restoration later

    ; Always start by taking ABS() of Divisor (negate the divisor, iff it is passed in as a negative value)
        btsc   w3,#15
        neg    w2,w2
        btsc   w3,#15
        subbr  w3,#0,w3
        
    .if !(Remainder)
    ; For the no remainder case we also do the ABS(Dividend) here at the top.
    ; Take ABS() of Dividend (negate the dividend, iff it is passed in as a negative value)
        btsc   w1,#15
        neg    w0,w0
        btsc   w1,#15
        subbr  w1,#0,w1
    ;And from here on, we are just operating on unsigned values (until we restore sign of Quotient):
        
        ff1l   w3, w7      ; w7 = First One Position; 1 = HO, 16 = LO.
        bra    NC, 0f      ; C(arry) Implies NO Ones in w3, so Divisor.HO == 0.
        
    .else ; Case for when Remainder is calculated
        ; For optimal remainder sign cycles, ABS(Dividend) is done differently in different methods
        ; Find First '1' Bit in Divisor.HO (if Any).
        ff1l   w3, w7           ; w7 = First One Position; 1 = HO, 16 = LO.
        bra    NC, 0f            ; C(arry) Implies NO Ones in w3, so Divisor.HO == 0.
                
    ; |Divisor| < 0x10000.  [Divisor.HO == 0 or -1]  Perform Long Division because Quotient MAY Not Fit in Word.
    ; Divide Dividend.HO / Divisor.LO.
        asr      w1,#15,w6    ; Preserve the sign bit of the Dividend in w6=-1 or 0.
        rlc      w6,w6        ; This will change a w6=0 to w6=1, or leave w6=-1 unchanged.
                              ; (To get here, Carry flag must have been set)
        
    ; Take ABS() of Dividend (negate the dividend, iff it is passed in as a negative value)
        bra    NN,2f
        neg    w0,w0
        subbr  w1,#0,w1
    2:  ; This method of negation saves one or two cycles on 'F'-class cores, and 0 or 1 on 'E'-class cores
        ; (It can save at least some cycles on all cores when the SR condition flags have already been set by a previous reference.)
    .endif
        
        mov    w0,w4        ; Preserve Dividend.LO

        repeat  #DIV_RCOUNT ; calculate quotient MSW (16/16 divide)
        div.uw  w1,w2       ; w0 = w1 / w2, w1 = w1 % w2
                            ; remainder becomes dividend MSW of next divide
        exch    w0,w4       ; save quotient MSW, restore dividend LSW
            
    ; Divide remainder and low word of dividend by the divisor
        repeat  #DIV_RCOUNT ; calculate quotient LSW (32/16 divide)
        div.ud  w0,w2       ; w0 = (w1:w0) / w2, w1 = (w1:w0) % w2    
          
    .if Remainder
        mul.us  w1,w6,w2    ; w3:w2 = remainder
    .endif
        mov     w4,w1       ; restore quotient MSW
        btss    [--w15],#15 ; Skip return iff Quotient is negative
        return              ; Return if Quotient is positive
        neg     w0,w0       ; Negate Quotient
        subbr   w1,#0,w1
        return              ; return negated Quotient

    post edited by Spinlectrix - 2020/09/18 21:19:46
    #14
    Spinlectrix
    Junior Member
    • Total Posts : 72
    • Reward points : 0
    • Joined: 2016/03/20 14:48:37
    • Location: 0
    • Status: offline
    Re: Signed 32 by 32 divide routine in assembler 2020/09/18 21:14:44 (permalink)
    0
    Damn, this is getting tedious:  With both Firefox and Safari, the Forum denies me permission to post code blocks large enough for the entire routine -- and this time its making me break it up into 3 pieces!!  Grrrrrrr!
     
    sdiv3232, the final (3 of 3) section:
    0:; Divisor > 0x0FFFF. [Divisor.HO > 0] Perform Simple Division and Possible Correction.
    ; Here Divisor.HO > 0, so the Result Will Fit into a Word (Quotient.HO must be zero).

    .if Remainder ; Finish up operand preparation by dealing with Dividend
     add w1,#0,[w15++] ; Preserve sign of Dividend on stack (while setting N-flag)
     
        ; Take ABS() of Dividend (negate the dividend, iff it is passed in as a negative value)
     bra NN,1f
     neg w0,w0
            subbr w1,#0,w1
    1: ; This method of negation saves one or two cycles on 'F'-class cores, and 0 or 1 on 'E'-class cores
        ; (It can save at least some cycles on all cores when the SR condition flags have already been set by a previous reference.)
    .endif
        
        mov.d w0,w4 ; Copy ABS() of original Dividend to w5:w4
        
    ; Calculate Shift Counts for Normalizing the Divisor.
        subr w7, #17, w7 ; w7 = Right Shift Count.
        subr w7, #16, w0 ; w0 = Left Shift Count.

    ; 'Normalized' 16-Bit Divisor Created by Shifting the Actual Divisor RIGHT by "RightShiftCount".
        sl w3, w0, w0 ; Divisor.HO <<= LeftShiftCount.
        lsr w2, w7, w6 ; Divisor.LO >>= RightShiftCount.
        ior w0, w6, w6 ; Divisor.LO = (Divisor >> RightShiftCount). Note: MSb = 1.

    ; Shift Dividend Right by 1: W1:W0 = W5:W4 >> 1.
    ; Retains Maximum Precision in Quotient (Next Step) while Assuring Quotient Fits in One Word.
        lsr w5, w1 ; Dividend.HO >>= 1.
        rrc w4, w0 ; Dividend.LO >>= 1 (and HO Bit from Dividend.HO Bit 0).

    ; Perform Division of 32/16 Unsigned using "div.ud" Instruction.
        repeat #DIV_RCOUNT ; Division Repeat Count Depends on 33C (6) or 'Other' (18).
        div.ud w0, w6 ; Divide Step. At End: w0 = (w1:w0) / w6 and w1 = (w1:w0) % w6.

    ; Scale Quotient by RightShiftCount-1 to Compensate for Divisor Shifting.
        dec w7, w7 ; Reduce RightShiftCount (Dividend Already Shifted Right One Above).
        lsr w0, w7, w0 ; Quotient (w0) >> (RightShiftCount-1). Quotient Now Scaled Back to 1:1.

    ; Form Quotient * Divisor.
        mul.uu w3, w0, w6 ; w7:w6 = Divisor.HO * Quotient.
        mov w6, w1 ; Save (Divisor.HO * Quotient).LO in w1.
        mul.uu w2, w0, w6 ; w7:w6 = Divisor.LO * Quotient.
        add w7, w1, w7 ; w7:w6 = Divisor * Quotient.
    ; For the unsigned case, we would have to check for overflow here. But for this signed case,
    ; we already have an extra bit-position, since the Dividend's magnitude can only go up to 0x7fffffff.
        sub w4, w6, w6 ; w7:w6 = dividend - quotient * divisor
        subb w5, w7, w7 ; = Remainder (probably)

    .if Remainder
            exch w2, w6 ; Swap divisor with (divisor*quotient) product
            exch w3, w7 ; (Preserve the Divisor in case we have to adjust remainder)
     
            bra C,3f ; Magnitude of Remainder correct -- go restore Remainder's sign.
     
    ; Correct quotient, Calculate the remainder for the corrected quotient
            dec w0, w0 ; Adjust quotient down by one
            add w2, w6, w2 ; We had subtracted one too many divisors:
            addc w3, w7, w3 ; Add one divisor back into Remainder
     
    3: cp0 [--w15] ; Recall original sign of Divided
            bra NN, 5f ; Iff it was positive, no need to change
            neg w2, w2 ; Negate sign of Remainder
            subbr w3, #0, w3 ; to match Divided
     
    .else ; Case for no Remainder necessary
     
    ; Determine if dividend - (quotient * divisor) is non-negative
        btss SR, #0 ; Iff Carry is clear, we have to adjust the quotient, so do not return
        dec w0, w0 ; Adjust quotient down by one
        
    .endif
        
    5: btss [--w15],#15 ; Skip return on next statement iff Quotient should be negative
        retlw #0,w1 ; Return if Quotient should be positive
        
        clr w1 ; Negate Quotient
        neg w0,w0
        subbr w1, #0, w1
        return ; return with negated Quotient
        
    .end

    post edited by Spinlectrix - 2020/09/18 21:16:36
    #15
    Spinlectrix
    Junior Member
    • Total Posts : 72
    • Reward points : 0
    • Joined: 2016/03/20 14:48:37
    • Location: 0
    • Status: offline
    Re: Signed 32 by 32 divide routine in assembler 2020/09/19 18:15:19 (permalink)
    0
    I've given up on posting using the nice, visible 'code' method.  Attached are the latest sdiv3232 and udiv3232 in zip files.
     
                                                  udiv3232 Code    udiv3232 Code    sdiv3232 Code    sdiv3232 Code       
    'E' Core Performance Stats:  Average Cycles      Worst-Case       Average Cycles     Worst-Case

    No Remainder Returned               52.06                     53                     63.85                    66           +11.79
    With Remainder Returned            54.00                     57                     70.35                    77           +16.35
                                                        (+1.94)                 (+4)                     (+6.5)                (+11)
     
    These values include the RCALL/RET overhead involved with invoking the routine. (But not any cycles spent loading the Dividend or Divisor, and not any cycles spent transferring the results elsewhere.)
     
    <Edit:  Thanks to Harry (1and0) for pointing out that previously in this post, all the cycle-counts were uniformly 2 cycles too large due to an oversight in correcting execution times caused by misuse of my debugger operating on E-core hardware.)
    post edited by Spinlectrix - 2020/09/20 13:06:38
    #16
    1and0
    Access is Denied
    • Total Posts : 11312
    • Reward points : 0
    • Joined: 2007/05/06 12:03:20
    • Location: Harry's Gray Matter
    • Status: online
    Re: Signed 32 by 32 divide routine in assembler 2020/09/20 10:21:39 (permalink)
    0
    By manually counting the cycles of your "DivideU32ByU32.s" in Post #117 from the unsigned thread for E-core, the two legs where [] is added cycles for the remainder take

    Tcy = 7 + 2*DIV + RET + [1]            
        = 7 + 2*18 + 6 + [1]
        = 49 + [1]
        = 49 [50]
        
    Tcy = 19 + BRA + DIV + RET + [6]
        = 19 + 4 + 18 + 6 + [6]
        = 47 + [6]
        = 47 [53]
     
    So,

    Tcy(max) = 49,  no remainder
             = 53,  with remainder
     
    which seems to agree with your (+4) in Post #16.
    #17
    Spinlectrix
    Junior Member
    • Total Posts : 72
    • Reward points : 0
    • Joined: 2016/03/20 14:48:37
    • Location: 0
    • Status: offline
    Re: Signed 32 by 32 divide routine in assembler 2020/09/20 10:36:14 (permalink)
    0
    1and0
    Tcy(max) = 49,  no remainder
            = 53,  with remainder

    The 4 cycle differences in our values represent the cycles used by the RCALL.  I figure you can't have a RET without having a CALL, so counted the whole duration of the routine.  (I can see your perspective that the CALL/RET overhead is not something that should count against the worst-case path of the routine, but if you're not going to count the CALL, why count the RET?  In any case, our numbers are consistent, and differ by precisely 4 cycles!
    #18
    1and0
    Access is Denied
    • Total Posts : 11312
    • Reward points : 0
    • Joined: 2007/05/06 12:03:20
    • Location: Harry's Gray Matter
    • Status: online
    Re: Signed 32 by 32 divide routine in assembler 2020/09/20 11:30:04 (permalink)
    0
    I've debated with myself in the past rather to include the CALL in the timing specs. There is nothing wrong either way; I just pick the one excluding the CALL as the instruction is not "inside" the routine. I wonder if there is a standard convention. I may check the Microchip math app notes to see what they do. In any case, add a note in the comment section of the routine will clarify this.
    #19
    Spinlectrix
    Junior Member
    • Total Posts : 72
    • Reward points : 0
    • Joined: 2016/03/20 14:48:37
    • Location: 0
    • Status: offline
    Re: Signed 32 by 32 divide routine in assembler 2020/09/20 13:23:56 (permalink)
    0
    After more contemplation (and if I squint a little) I can almost rationalize why just giving cycles counting the RET but not the RCALL is justifiable:
     
    For a routine as short as this one (< 30 instr, for udiv3232), it would be natural to in-line all occurrences of the routine wherever it is invoked.  In that case, there would be no CALL overhead, but (as long as there is more than 1 exit point) wherever a RET is located which is not at the routine's end, the RET would have to be replaced by a BRA to the end of the routine.  Even though the BRA cycles != RET cycles, its still a useful guideline...
    post edited by Spinlectrix - 2020/09/22 08:43:36
    #20
    Page: 12 > Showing page 1 of 2
    Jump to:
    © 2020 APG vNext Commercial Version 4.5