• AVR Freaks

Helpful ReplyHot!Unsigned 32 by 32 divide routine in assembler

Page: 12345.. > >> Showing page 1 of 7
Author
peterg1000
Super Member
  • Total Posts : 192
  • Reward points : 0
  • Joined: 2009/01/29 13:07:52
  • Location: Flamstead, Herts, UK
  • Status: offline
2020/09/01 02:35:02 (permalink)
5 (1)

Unsigned 32 by 32 divide routine in assembler

I found I needed a 32 by 32 bit divide routine in order to preserve resolution in a motor control application.  This involved stepper motor accel/decel calculations in less than 50uS for a 16MHz cycle frequency.  Processor is a 24FJ64GA102.
 
Couldn't find anything after an internet search so it became a DIY solution.   Attached code below seems to be bug free as far as I have tested it - but who knows!!   Totally unsophisticated brute force solution but it does the job.
 
 .global Div_32_32
;
;Divide 32 by 32 Execution time:- max 538 cycles
;Numerator w1(H):w2(L)
;Divisor w3(H):w4(L)
;Base index w5(H);w6(L)
;Result w7(H):w8(L)
;Scratch W0, w9(H):w10(L)
;
Div_32_32:
; Set base index
 mov #0x0000,w5 ; Base index high
 mov #0x0001,w6 ; base index low
;
; Align divisor and base index with most significant bit of numerator
; Reject invalid data (worst case alignment approx 128 cycles 8uS)
 bclr Systat1,#9 ; Reset flag word before proceeding
 cp0 w1 ; Numerator MSW = 0 ?
 bra z,Div_xit ; Numerator 16 bits or less - 16 bit divide
 cp0 w3 ; Divisor MSW = 0
 bra NZ,Align ; OK to align MSW's directly
 mov w4,w3 ; Swap data words
 clr w4 ;
 mov w6,w5 ; Swap index words
 clr w6 ;
 bset Systat1,#9 ; Use as flag word
;
; Both numerator and divisor now in high word position
Align: ff1l w1,w0 ; Find first from left - Numerator
 ff1l w3,w7 ; Find first from left - divisor
 sub w7,w0,w0 ; Difference
 bra NN,Lshift ; Divisor < numerator - shift left
;
Rshift: btst Systat1,#9 ; Check flag word
 bra Z,Div_xit ; Exit if original divisor > numerator
;
; Align most signifiant bits by shifting divisor and base index right
 clr w7 ; Clear result
 clr w8 ;
Rrpt: bclr SR,#C ; Clear carry in preparation for right rotate.
 rrc w3,w3 ; Shift divisor right
 rrc w4,w4 ;
 bclr SR,#C ; Clear carry
 rrc w5,w5 ; Shift index right
 rrc w6,w6 ;
 inc w0,w0 ; Done?
 bra NZ,Rrpt ; Loop
 bra Div_2
;
Lshift: clr w7 ; Clear result
 clr w8 ;
Lrpt: sl w4,w4 ; Divisor low word msb to C
 rlc w3,w3 ; C to Divisor high word lsb
 sl w6,w6 ; msb to C
 rlc w5,w5 ; C to lsb
 dec w0,w0 ; Done?
 bra NZ,Lrpt ; Loop
;
; Now do the divide with msb of numerator and divisor aligned
; Trial subtract - discard result and shift divisor right if result negative
Div_2: sub w2,w4,w10
 subb w1,w3,w9
 bra NN,Div_3 ; Do it again for real if result is positive
 bclr SR,#C ; Clear carry in preparation for right rotate.
 rrc w3,w3 ; Shift divisor right for next try
 rrc w4,w4 ;
 bclr SR,#C ; Clear carry
 rrc w5,w5 ; Shift base index right
 rrc w6,w6 ;
;
 add w5,w6,w0 ; If base index now zero - divide is complete
 bra Z,Div_xit ; Exit if index shifted out
 bra Div_2 ; Loop until subtraction leaves a positive result
;
; Real subtract
Div_3: sub w2,w4,w2 ; Nunerator updated after subtract
 subb w1,w3,w1 ;
;
; Add base index to result ;
 add w6,w8,w8 ;
 addc w5,w7,w7 ;
;
; Shift both divisor and base index right
 bclr SR,#C ; Clear carry in preparation for right shift
 rrc w3,w3 ; Shift divisor right and try again
 rrc w4,w4 ;
 bclr SR,#C ; Clear carry
 rrc w5,w5 ; Shift index right
 rrc w6,w6 ;
;
 add w5,w6,w0 ; If base index now zero - divide completed
 bra NZ,Div_2 ; Do next trial subtraction
;
Div_xit:
 nop
 nop
 bra Div_32_32

 
Hope it might be useful to some-one else
#1
GlennP
Super Member
  • Total Posts : 833
  • Reward points : 0
  • Joined: 2009/03/29 15:04:55
  • Location: El Paso County, CO, USA
  • Status: offline
Re: Unsigned 32 by 32 divide routine in assembler 2020/09/01 03:01:53 (permalink)
5 (2)
This seems like the classic "restoring" division algorithm.  I have three comments:
 
  1. Non-Restoring Division has a more predictable (and usually lower) cycle count.
    See Wikipedia.
  2. Many related PICs have a "Divide Step" and "Repeat" combination to perform 32/16 divisions.
    I can't help but think that one could somehow use them for for 32/32.
  3. Look at the code the XC16 compiler generates.
 
Currently the microsoft.com site is unavailable from here, so I was unable to look at the datasheet for your microcontroller to see if it has the "divide step" instruction.
 
GP
 
Edit 1: Change 16/16 to 32/16 after looking at the family reference manual.
post edited by GlennP - 2020/09/01 03:09:46
#2
peterg1000
Super Member
  • Total Posts : 192
  • Reward points : 0
  • Joined: 2009/01/29 13:07:52
  • Location: Flamstead, Herts, UK
  • Status: offline
Re: Unsigned 32 by 32 divide routine in assembler 2020/09/01 03:55:10 (permalink)
0
Hi GlenP,
 
Thanks for your comments - I too wondered if the inbuilt "div.ud" instruction could be adapted, but I don't have the necessary skills to pursue this possibility.  The inbuilt divide instructions on the 24FJ processors execute in 17 cycles for all flavours of the instruction.
 
I'll have a look at the Wikipedia page out of curiosity, but since my brute force solution executes well within the available time I'll put this on the back burner.
 
Peter.
#3
NorthGuy
Super Member
  • Total Posts : 6350
  • Reward points : 0
  • Joined: 2014/02/23 14:23:23
  • Location: Northern Canada
  • Status: offline
Re: Unsigned 32 by 32 divide routine in assembler 2020/09/01 06:14:50 (permalink) ☄ Helpfulby peterg1000 2020/09/01 09:15:59
5 (2)
Harry (1and0) has posted the solution few years back which uses the division command - it's just few lines of assembler.
post edited by NorthGuy - 2020/09/01 09:34:38
#4
peterg1000
Super Member
  • Total Posts : 192
  • Reward points : 0
  • Joined: 2009/01/29 13:07:52
  • Location: Flamstead, Herts, UK
  • Status: offline
Re: Unsigned 32 by 32 divide routine in assembler 2020/09/01 13:02:10 (permalink)
0
Thanks for that pointer - do you by any chance have a reference?  Haven't been able to locate it so far, but am still searching.
#5
NorthGuy
Super Member
  • Total Posts : 6350
  • Reward points : 0
  • Joined: 2014/02/23 14:23:23
  • Location: Northern Canada
  • Status: offline
Re: Unsigned 32 by 32 divide routine in assembler 2020/09/01 13:54:15 (permalink)
4 (1)
peterg1000
Thanks for that pointer - do you by any chance have a reference?  Haven't been able to locate it so far, but am still searching.



I couldn't find it. Hopefully 1and0 will join the thread and post a link. You can PM him too.
#6
GlennP
Super Member
  • Total Posts : 833
  • Reward points : 0
  • Joined: 2009/03/29 15:04:55
  • Location: El Paso County, CO, USA
  • Status: offline
Re: Unsigned 32 by 32 divide routine in assembler 2020/09/01 14:51:50 (permalink)
0
peterg1000 ...
I'll have a look at the Wikipedia page out of curiosity, but since my brute force solution executes well within the available time I'll put this on the back burner.  ...

  1. Non-Restoring Long Division is just as (or a bit more) brute force.
    The section has been recently re-written, I preferred the previous version
    as I thought it was a bit more clear (perhaps because I wrote it as MMM).
    https://en.wikipedia.org/...Non-restoring_division
  2. If you do PM 1and0, please post the link here.
GP
#7
Spinlectrix
Junior Member
  • Total Posts : 72
  • Reward points : 0
  • Joined: 2016/03/20 14:48:37
  • Location: 0
  • Status: offline
Re: Unsigned 32 by 32 divide routine in assembler 2020/09/01 15:31:07 (permalink)
0
I've written and found the following code snippet to be useful for the vast majority of cases — that is the cases where the result (the quotient) is guaranteed to be no larger than a ushort.  This is kind of like the inverse of the __builtin_muluu(,) function, where the product of 2 ushorts produce a ulong, but for this snippet any ulong can be divided by any other ulong, so long as the result (the answer) is known to fit into a ushort (you'll have to work out scaling, or do error handling yourself if such is not the case).  Since you are probably figuring out Duty-Cycle ratios for your motor-control application, it seems unlikely you need better than 1 part in 65536 resolution, so this should be applicable to your situation.
 
I've written this as an inline c function, so it can be included in a header file and be called in normal c code, like:
quotient_variable_name = UlongDIVfitsInUshort( numerator_variable_name, denominator_variable_name);
In-lining should not be a problem, because the entire ulong-by-ulong divide routine is only 13 instructions, so all call/return overhead can also be eliminated.  If you only want extract the 13 assembly instructions, they are pretty obvious in the code below; the result is in WREG.
 
// The following 32-bit divide optimization works only if the result (quotient) is guaranteed to fit in a ushort,
// (an unsigned int). Since the divide result above will always fit into a word, we can use the PIC's div.ud
// instruction to perform the divide if we first shift bits to make sure the magnitudes of the inputs are correct.
// We figure out where the MSb of the denominator is, and shift to put that bit at the top bit of the divisor, while
// also adjusting the numerator by the corresponding number of shifts. This reduces > 530 instruction cycles
// (occupying over 10 usec) to do the divide down to 30 cycles (less than half a usec, and even fewer for the new 6-cycle divide processors)
// [Note for newer-core dsPIC33Cx 6-cycle divide processors, the repeat #17 must be altered to repeat #5 !!]
static inline __attribute__ ((always_inline))
unsigned short UlongDIVfitsInUshort(unsigned long numerator, unsigned long denominator) {
unsigned short quotient;
asm __volatile__ (
     "ff1l %d2,w1 \n bra C,0f \n subr w1,#17,w1 \n subr w1,#16,w0 \n"      // Calculate # of shifts required.
     "lsr %2,w1,%2 \n sl %d2,w0,%d2 \n ior %d2,%2,%2 \n"                   // Shift denominator into place.
     "lsr %1,w1,%1 \n sl %d1,w0,%d2 \n lsr %d1,w1,%d1 \n ior %1,%d2,%1 \n" // Shift numerator hi and lo words.
     "0: repeat #17 \n div.ud %1,%2"                                       // Perform division.
/*             w0                         %d1:%1              %d2:%2    */
:/*outputs*/ "=&a" (quotient) :/*inputs*/ "C" (numerator), "e" (denominator)
:/*clobbers*/ "cc","w1");
return (quotient);
}

post edited by Spinlectrix - 2020/09/01 23:30:07
#8
nigelwright7557
Super Member
  • Total Posts : 471
  • Reward points : 0
  • Joined: 2006/11/06 08:15:51
  • Location: 0
  • Status: offline
Re: Unsigned 32 by 32 divide routine in assembler 2020/09/01 22:13:56 (permalink)
1 (1)
Here is some assembler code I wrote in 1990's and been used many times.
DIV32 is the main part but I think it calls other parts of the code too.
ANSER_0 _1 _2 _3 is sequential memory locations etc
 
;MATHS.ASM
;*********
NO2_TO_NO1_P2
MOVF NO2_0,W
MOVWF NO1_0
MOVF NO2_1,W
MOVWF NO1_1
MOVF NO2_2,W
MOVWF NO1_2
MOVF NO2_3,W
MOVWF NO1_3
RETURN
;*****************
CLEARANSER_P2 CLRF ANSER_0
CLRF ANSER_1
CLRF ANSER_2
CLRF ANSER_3
RETURN
;************************
CLEARNO1_P2
CLRF NO1_0
CLRF NO1_1
CLRF NO1_2
CLRF NO1_3
RETURN
;***************************
CLEARNO2_P2
CLRF NO2_0
CLRF NO2_1
CLRF NO2_2
CLRF NO2_3
RETURN
;***************************
NO1_TO_NO2_P2
MOVF NO1_0,W
MOVWF NO2_0
MOVF NO1_1,W
MOVWF NO2_1
MOVF NO1_2,W
MOVWF NO2_2
MOVF NO1_3,W
MOVWF NO2_3
RETURN
;*****************
ANSER_TO_NO1_P2
MOVF ANSER_0,W
MOVWF NO1_0
MOVF ANSER_1,W
MOVWF NO1_1
MOVF ANSER_2,W
MOVWF NO1_2
MOVF ANSER_3,W
MOVWF NO1_3
RETURN
;*****************************
TRIAL_TO_NO1_P2 MOVF TRIAL_0,W
MOVWF NO1_0
MOVF TRIAL_1,W
MOVWF NO1_1
MOVF TRIAL_2,W
MOVWF NO1_2
MOVF TRIAL_3,W
MOVWF NO1_3
RETURN
;*****************************
;NO1 / NO2 TO ANSER
;RETURNS REMAINDER IN TRIAL
;32*32 DIVIDE WITH 40 BIT RESULT
DIV32_P2
CALL CLEARANSER_P2
CALL CLEARTRIAL_P2
MOVLW 32
MOVWF LOOPCOUNT
D32LOOP
;ANSER*2
CLRC
RLF ANSER_0,F
RLF ANSER_1,F
RLF ANSER_2,F
RLF ANSER_3,F
;RLF NO1 INTO TRIAL
CLRC
RLF NO1_0,F
RLF NO1_1,F
RLF NO1_2,F
RLF NO1_3,F
RLF TRIAL_0,F
RLF TRIAL_1,F
RLF TRIAL_2,F
RLF TRIAL_3,F
;NO2 COMPLEMENTED TO TEMP
COMF NO2_0,W
MOVWF TEMP_0
COMF NO2_1,W
MOVWF TEMP_1
COMF NO2_2,W
MOVWF TEMP_2
COMF NO2_3,W
MOVWF TEMP_3
;ADD 1 TO COMPLETE NEGATION OF TEMP
INCFSZ TEMP_0,F
GOTO OVER
INCFSZ TEMP_1,F
GOTO OVER
INCFSZ TEMP_2,F
GOTO OVER
INCF TEMP_3,F
OVER
;NEGATED TEMP + TRIAL TO TEMP
MOVF TRIAL_0,W ;SIMPLE ADD OF 1ST TWO BYTES
ADDWF TEMP_0,F
MOVF TRIAL_1,W
BTFSC CARRYFLAG ;IF CARRY THEN INC NEXT BYTE UP
INCFSZ TRIAL_1,W ;DONT ADD IF WAS INCED TO ZERO FROM 255 ELSE CARRY LOST
ADDWF TEMP_1,F ;IF THIS IS SKIPPED THEN CARRY IS SENT TO NEXT STAGE
MOVF TRIAL_2,W
BTFSC CARRYFLAG
INCFSZ TRIAL_2,W
ADDWF TEMP_2,F
MOVF TRIAL_3,W
BTFSC CARRYFLAG
INCFSZ TRIAL_3,W
ADDWF TEMP_3,F
BNC JSROT32
;NEW VALUE TO TRIAL
MOVF TEMP_0,W
MOVWF TRIAL_0
MOVF TEMP_1,W
MOVWF TRIAL_1
MOVF TEMP_2,W
MOVWF TRIAL_2
MOVF TEMP_3,W
MOVWF TRIAL_3
BSF ANSER_0,0
JSROT32 DECFSZ LOOPCOUNT,F
GOTO D32LOOP
RETURN
;*************************************
CLEARTRIAL_P2 CLRF TRIAL_0
CLRF TRIAL_1
CLRF TRIAL_2
CLRF TRIAL_3
RETURN
;**************************************
;MPY32_P2
;NO1 * NO2 TO ANSER
;32*32 BIT MPY WITH 32 BIT ANSWER
; CALL CLEARANSER_P2
;
; MOVLW 32
; MOVWF LOOPCOUNT
;SHADD_32
; CLRC
; RLF ANSER_0,F
; RLF ANSER_1,F
; RLF ANSER_2,F
; RLF ANSER_3,F
;NO1 SHIFT LEFT ONE BIT INTO CARRY
; CLRC
; RLF NO1_0,F
; RLF NO1_1,F
; RLF NO1_2,F
; RLF NO1_3,F
;
; BNC NOADD_32
;;;;;;;;;;;;;;;
;ADD NO2 TO ANSER
; MOVF NO2_0,W ;SIMPLE ADD OF 1ST TWO BYTES
; ADDWF ANSER_0,F
;
; MOVF NO2_1,W
; BTFSC CARRYFLAG ;IF CARRY THEN INC NEXT BYTE UP
; INCFSZ NO2_1,W ;DONT ADD IF WAS INCED TO ZERO FROM 255 ELSE CARRY LOST
; ADDWF ANSER_1,F ;IF THIS IS SKIPPED THEN CARRY IS SENT TO NEXT STAGE
;
; MOVF NO2_2,W
; BTFSC CARRYFLAG
; INCFSZ NO2_2,W
; ADDWF ANSER_2,F
;
; MOVF NO2_3,W
; BTFSC CARRYFLAG
; INCFSZ NO2_3,W
; ADDWF ANSER_3,F
;
;;;;;;
;NOADD_32
; DECFSZ LOOPCOUNT,F
; GOTO SHADD_32
;
; RETURN
;*****************************************
ANSER_TO_NO2_P2 MOVF ANSER_0,W
MOVWF NO2_0
MOVF ANSER_1,W
MOVWF NO2_1
MOVF ANSER_2,W
MOVWF NO2_2
MOVF ANSER_3,W
MOVWF NO2_3
RETURN
;******************************
;ADD NO1+NO2 TO ANSER
;NO1 AND NO2 ARE LEFT INTACT
;CARRYOUT HOLDS CARRY STATUS
;ADD32_P2 ;
;
;MOVE NO2 TO ANSER
; MOVF NO2_0,W
; MOVWF ANSER_0
; MOVF NO2_1,W
; MOVWF ANSER_1
; MOVF NO2_2,W
; MOVWF ANSER_2
; MOVF NO2_3,W
; MOVWF ANSER_3
;ADD NO1 TO ANSER
; MOVF NO1_0,W ;SIMPLE ADD OF 1ST TWO BYTES
; ADDWF ANSER_0,F
;
; MOVF NO1_1,W
; BTFSC CARRYFLAG ;IF CARRY THEN INC NEXT BYTE UP
; INCFSZ NO1_1,W ;DONT ADD IF WAS INCED TO ZERO FROM 255 ELSE CARRY LOST
; ADDWF ANSER_1,F ;IF THIS IS SKIPPED THEN CARRY IS SENT TO NEXT STAGE
;
; MOVF NO1_2,W
; BTFSC CARRYFLAG
; INCFSZ NO1_2,W
; ADDWF ANSER_2,F
;
; MOVF NO1_3,W
; BTFSC CARRYFLAG
; INCFSZ NO1_3,W
; ADDWF ANSER_3,F
;
; BCF CARRYOUT ;STORE CARRY FOR LATER USE
; SKPNC
; BSF CARRYOUT
; RETURN
;*************************************
CLEARNO_P2W
CLRF NO2_0
CLRF NO2_1
CLRF NO2_2
CLRF NO2_3
MOVWF NO2_0
RETURN
;************************************
#9
nigelwright7557
Super Member
  • Total Posts : 471
  • Reward points : 0
  • Joined: 2006/11/06 08:15:51
  • Location: 0
  • Status: offline
Re: Unsigned 32 by 32 divide routine in assembler 2020/09/01 22:27:42 (permalink)
4 (2)
There are application notes on maths routines.
 
I once did a project using application note fast fourier transform.
Worked very badly.
I worked through it and in the end it was Microchip AN multiply routine didnt set sign flag correctly after multiply.
 
#10
MBedder
Circuit breaker
  • Total Posts : 6919
  • Reward points : 0
  • Joined: 2008/05/30 11:24:01
  • Location: Zelenograd, Russia
  • Status: offline
Re: Unsigned 32 by 32 divide routine in assembler 2020/09/01 23:30:04 (permalink)
5 (3)
nigelwright7557Here is some assembler code I wrote in 1990's...
This is a PIC24 forum!


#11
nigelwright7557
Super Member
  • Total Posts : 471
  • Reward points : 0
  • Joined: 2006/11/06 08:15:51
  • Location: 0
  • Status: offline
Re: Unsigned 32 by 32 divide routine in assembler 2020/09/01 23:59:33 (permalink)
1 (4)
MBedder
This is a PIC24 forum!

Come on, then convert it !
I am not doing the job for free, have too many more fruitful projects on the go.
 
#12
peterg1000
Super Member
  • Total Posts : 192
  • Reward points : 0
  • Joined: 2009/01/29 13:07:52
  • Location: Flamstead, Herts, UK
  • Status: offline
Re: Unsigned 32 by 32 divide routine in assembler 2020/09/02 01:55:44 (permalink)
3 (1)
Thanks guys for all the replies - I've sent a PM to "1and0" on the subject and will, with his permission, pass on any useful information.
 
All this is ito calculate a stepper motor speed profile using the "standard" equation :-
  Cn = Cn-1 - (2Cn-1)/(4n -1). 
This quickly runs out of resolution for large values of n, so the extra quotient bits will give me a "partial step accumulator" to stretch the acceleration profile.
 
 
post edited by peterg1000 - 2020/09/02 01:58:02
#13
1and0
Access is Denied
  • Total Posts : 11340
  • Reward points : 0
  • Joined: 2007/05/06 12:03:20
  • Location: Harry's Gray Matter
  • Status: offline
Re: Unsigned 32 by 32 divide routine in assembler 2020/09/02 05:17:10 (permalink)
0
NorthGuy
Harry (1and0) has posted the solution few years back which uses the division command - it's just few lines of assembler.


I wish I could take credit, but I don't recall doing it.  Amnesia perhaps :(
 
#14
1and0
Access is Denied
  • Total Posts : 11340
  • Reward points : 0
  • Joined: 2007/05/06 12:03:20
  • Location: Harry's Gray Matter
  • Status: offline
Re: Unsigned 32 by 32 divide routine in assembler 2020/09/02 05:25:02 (permalink)
5 (4)
peterg1000
I found I needed a 32 by 32 bit divide routine ... Processor is a 24FJ64GA102.
 
 .global Div_32_32
;
;Divide 32 by 32 Execution time:- max 538 cycles
;Numerator w1(H):w2(L)
;Divisor w3(H):w4(L)
;Base index w5(H);w6(L)
;Result w7(H):w8(L)
;Scratch W0, w9(H):w10(L)
;
Div_32_32:
; Set base index
 mov #0x0000,w5 ; Base index high
 mov #0x0001,w6 ; base index low
;
; Align divisor and base index with most significant bit of numerator
; Reject invalid data (worst case alignment approx 128 cycles 8uS)
 bclr Systat1,#9 ; Reset flag word before proceeding
 cp0 w1 ; Numerator MSW = 0 ?
 bra z,Div_xit ; Numerator 16 bits or less - 16 bit divide
 cp0 w3 ; Divisor MSW = 0
 bra NZ,Align ; OK to align MSW's directly
 mov w4,w3 ; Swap data words
 clr w4 ;
 mov w6,w5 ; Swap index words
 clr w6 ;
 bset Systat1,#9 ; Use as flag word
;
; Both numerator and divisor now in high word position
Align: ff1l w1,w0 ; Find first from left - Numerator
 ff1l w3,w7 ; Find first from left - divisor
 sub w7,w0,w0 ; Difference
 bra NN,Lshift ; Divisor < numerator - shift left
;
Rshift: btst Systat1,#9 ; Check flag word
 bra Z,Div_xit ; Exit if original divisor > numerator
;
; Align most signifiant bits by shifting divisor and base index right
 clr w7 ; Clear result
 clr w8 ;
Rrpt: bclr SR,#C ; Clear carry in preparation for right rotate.
 rrc w3,w3 ; Shift divisor right
 rrc w4,w4 ;
 bclr SR,#C ; Clear carry
 rrc w5,w5 ; Shift index right
 rrc w6,w6 ;
 inc w0,w0 ; Done?
 bra NZ,Rrpt ; Loop
 bra Div_2
;
Lshift: clr w7 ; Clear result
 clr w8 ;
Lrpt: sl w4,w4 ; Divisor low word msb to C
 rlc w3,w3 ; C to Divisor high word lsb
 sl w6,w6 ; msb to C
 rlc w5,w5 ; C to lsb
 dec w0,w0 ; Done?
 bra NZ,Lrpt ; Loop
;
; Now do the divide with msb of numerator and divisor aligned
; Trial subtract - discard result and shift divisor right if result negative
Div_2: sub w2,w4,w10
 subb w1,w3,w9
 bra NN,Div_3 ; Do it again for real if result is positive
 bclr SR,#C ; Clear carry in preparation for right rotate.
 rrc w3,w3 ; Shift divisor right for next try
 rrc w4,w4 ;
 bclr SR,#C ; Clear carry
 rrc w5,w5 ; Shift base index right
 rrc w6,w6 ;
;
 add w5,w6,w0 ; If base index now zero - divide is complete
 bra Z,Div_xit ; Exit if index shifted out
 bra Div_2 ; Loop until subtraction leaves a positive result
;
; Real subtract
Div_3: sub w2,w4,w2 ; Nunerator updated after subtract
 subb w1,w3,w1 ;
;
; Add base index to result ;
 add w6,w8,w8 ;
 addc w5,w7,w7 ;
;
; Shift both divisor and base index right
 bclr SR,#C ; Clear carry in preparation for right shift
 rrc w3,w3 ; Shift divisor right and try again
 rrc w4,w4 ;
 bclr SR,#C ; Clear carry
 rrc w5,w5 ; Shift index right
 rrc w6,w6 ;
;
 add w5,w6,w0 ; If base index now zero - divide completed
 bra NZ,Div_2 ; Do next trial subtraction
;
Div_xit:
 nop
 nop
 bra Div_32_32


 
Here's one I just wrote that takes 421 cycles maximum and a lot less memory:
;
; Unsigned Integer Division of 32-bit by 32-bit
;
; Calling Parameters
;   w1:w0 = Dividend
;   w3:w2 = Divisor
;
; Return Values
;   w1:w0 = Quotient
;
; Used Registers
;   w6,w5,w4
;
; Tcy = 3 + (32*13-1) + 3 = 421 max
;
udiv3232:
        mul.uu  w4,#0,w4 ; clear upper dividend
        mov     #32,w6   ; count = 32
        bset    SR,#C    ; assume quotient bit = 1

0:      rlc     w0,w0    ; shift out dividend bit
        rlc     w1,w1
        rlc     w4,w4
        rlc     w5,w5

        sub     w4,w2,w4 ; subtract
        subb    w5,w3,w5
        bra     c,1f     ; if bit is 1, no restore needed

        add     w4,w2,w4 ; restore
        addc    w5,w3,w5
        bclr    w0,#0    ; quotient bit = 0

1:      dec     w6,w6    ; decrement counter
        bra     nz,0b    ; loop till finish
        return

#15
peterg1000
Super Member
  • Total Posts : 192
  • Reward points : 0
  • Joined: 2009/01/29 13:07:52
  • Location: Flamstead, Herts, UK
  • Status: offline
Re: Unsigned 32 by 32 divide routine in assembler 2020/09/02 07:13:40 (permalink)
0
Hi 1and0,
 
Thanks so much for your quick response to my request - much appreciated.  Your solution looks amazingly compact - problem now is to get an ageing brain to understand the subtleties!!
 
Cheers.
Peter.
#16
GlennP
Super Member
  • Total Posts : 833
  • Reward points : 0
  • Joined: 2009/03/29 15:04:55
  • Location: El Paso County, CO, USA
  • Status: offline
Re: Unsigned 32 by 32 divide routine in assembler 2020/09/02 12:42:16 (permalink)
0
1and0 ...
Here's one I just wrote that takes 421 cycles maximum and a lot less memory:
...



I compared Harry's and the XC16 code (see attachment).  They are the same length, but (if my analysis is correct) Harry's is 32 cycles faster.  Double-plus-good.  5 Stars!
 
[Dear XC16 Team: Check out Harry's version.]
 
Non-Restoring would be faster still, but longer and more difficult to understand.
 
I'll code up Non-Restoring and see how many cycles can be saved although that doesn't seem to be an issue for the OP.
 
GP
post edited by GlennP - 2020/09/02 12:54:00

Attached Image(s)

#17
Spinlectrix
Junior Member
  • Total Posts : 72
  • Reward points : 0
  • Joined: 2016/03/20 14:48:37
  • Location: 0
  • Status: offline
Re: Unsigned 32 by 32 divide routine in assembler 2020/09/02 13:45:34 (permalink)
5 (3)
Ok, I haven't fully tested this routine yet, and it does somehow seem a bit too simple (am I missing something?), but it does appear to function properly at first inspection.
It uses one fewer register, and is approximately ten times faster than using looped trial subtraction:

; Unsigned Integer Division of 32-bit by 32-bit
;
; Calling Parameters
;   w1:w0 = Dividend
;   w3:w2 = Divisor
;
; Return Values
;   w1:w0 = Quotient
;
; Used Registers
;  w5,w4
;
; Tcy (max) = 4 + 19 + 2 + 19 + 1 = 45
;
udiv3232:
   mov.d w0,w4
   ff1l w3,w1
   bra C,0f
; Here the hi word of the divisor is non-zero, so the result of this long division would
; fit into a ushort (hi word of quotient must be zero) -- calculate shifts required:
   subr w1,#17,w1
   subr w1,#16,w0
;Shift divisor to move the most-significant-set bit of its value as a long to the MSb of its lo word
   lsr w2,w1,w2
   sl w3,w0,w3
   ior w2,w3,w2
;Apply corresponding shifts to dividend (as a result preserving the ratio of numerator to denominator)
   lsr w4,w1,w4
   sl w5,w0,w3
   lsr w5,w1,w5
   ior w4,w3,w4
;Perform division
   repeat #17
   div.ud w4,w2
   clr w1
   bra 1f
; Handle case where result _may_ not fit into a word -- Here the hi word of the divisor is zero:
; First divide hi word of the dividend by the divisor
0: repeat #17
   div.uw w5,w2
   mov w0,w3 ; Save result (hi word of quotient)
   mov w1,w5 ; Replace the hi word of the dividend with the remainder.
;Then follow up with a long divide:
   repeat #17
   div.ud w4,w2 ; Calculate lo word of quotient
   mov w3,w1    ; Restore hi word of quotient
1:  return

post edited by Spinlectrix - 2020/09/02 15:28:34
#18
GlennP
Super Member
  • Total Posts : 833
  • Reward points : 0
  • Joined: 2009/03/29 15:04:55
  • Location: El Paso County, CO, USA
  • Status: offline
Re: Unsigned 32 by 32 divide routine in assembler 2020/09/02 15:09:34 (permalink)
0
Excellent.  I'll do more analysis tonight, but this is exactly what I was hoping for in Post #2.
 
I tried this as a PM, but they are blocked: Change ";Shift devisor" to ";Shift divisor".
 
GP
post edited by GlennP - 2020/09/02 15:13:06

Attached Image(s)

#19
Spinlectrix
Junior Member
  • Total Posts : 72
  • Reward points : 0
  • Joined: 2016/03/20 14:48:37
  • Location: 0
  • Status: offline
Re: Unsigned 32 by 32 divide routine in assembler 2020/09/02 16:26:43 (permalink)
5 (1)
Here is an inline c function call version of that code, for those who'd rather just write in c.
(unsigned long) quotient_variable_name =  UlongDIV( (unsigned long) numerator_variable_name, (unsigned long) denominator_variable_name);
 
Provides a unsigned long by unsigned long divide in a maximum of 43 cycles on a PIC24F:
 
struct DoubleWord {
  int16_t LoWord;
  int16_t HiWord;
};
struct UDoubleWord {
   uint16_t ULoWord;
   uint16_t UHiWord;
};
union DoubleWordUnion {
   long LongWord;
   unsigned long ULongword;
   struct DoubleWord Words;
   struct UDoubleWord UWords;
};
static inline __attribute__ ((always_inline))
unsigned long UlongDIV(unsigned long numerator, unsigned long denominator) {
union DoubleWordUnion quotient;
asm __volatile__ (
   " ff1l %d3,w1 \n bra C,0f \n" // Determine # of significant bits in hi word of denominator (bra iff none)
   " subr w1,#17,w1 \n subr w1,#16,w0 \n" // Hi word of quotient will be zero; Calculate # of shifts required.
   " lsr %3,w1,%3 \n sl %d3,w0,%d3 \n ior %d3,%3,%3 \n" // Shift all significant bits into denominator lo word.
   " lsr %2,w1,%2 \n sl %d2,w0,%d3 \n lsr %d2,w1,%d2 \n ior %2,%d3,%2 \n" // Correspondingly shift numerator hi and lo words.
   " clr %d3 \n  bra 1f \n" // Go perform division.
"0:  repeat #17 \n div.uw %d2,%3 \n" // Hi word of denominator is zero. Perform division of numerator hi word by denominator
   " mov w1,%d2 \n mov w0,%d3 \n" // Save hi word of quotient; replace hi word of numerator with remainder
"1:  repeat #17 \n div.ud %2,%3 \n mov %d3,w1" // Long divide to calculate lo word of quotient; restore hi word of quotient
/*             w0                               w1                                         %d2:%2            %d3:%3    */
:/*outputs*/ "=&a" (quotient.UWords.ULoWord), "=&b" (quotient.UWords.UHiWord): /*inputs*/ "C" (numerator), "e" (denominator) :/*clobbers*/ "cc");
return (quotient.ULongword);
}

post edited by Spinlectrix - 2020/09/02 21:43:27
#20
Page: 12345.. > >> Showing page 1 of 7
Jump to:
© 2020 APG vNext Commercial Version 4.5