• AVR Freaks

Hot!How to stop PIC18F26K40 32-bit multiply from using software loop with XC8 V2.1

Author
trossin
Starting Member
  • Total Posts : 38
  • Reward points : 0
  • Joined: 2006/06/02 11:31:50
  • Location: 0
  • Status: offline
2020/02/28 16:29:10 (permalink)
0

How to stop PIC18F26K40 32-bit multiply from using software loop with XC8 V2.1

I used optimization level 0 and 2 with the same results of not using the 8 bit hardware multiplier.
When I step through with the debugger, the 16 and 32 bit multiplies go to Umul16.c and Umul32.c respectively. 
The 16 bit multiply uses the 8 bit multiply hardware to get the job done pretty fast while the 32 bit
multiply does not use the hardware at all and slowly calculates the answer using a software loop of
shift and add.  The source code in Umul32.c behaves as though _PIC18 is not defined.  Setting
this variable in the compiler options has no effect.  I did a little test and found the _PIC18 is already
properly defined by placing a syntax error surrounded by #ifdef _PIC18 #endif.
 
Is this a compiler bug?  For the short term, I'll work around it by breaking my 32 bit multiplies into 4 16
bit multiplies with some long adds in the hopes that I can get it down to 150 cycles from the 416 cycles.
 
Here is my test code:
 
unsigned char Val,Val2,PosD;
Val = 0x1f; Val2 = 0x2f;
PosD = Val*Val2; // 4 cycles (HW mult)

int a=0x2345, b=0x3456;
//uses "C:\Program Files (x86)\Microchip\xc8\v2.10\pic\sources\c99\common\Umul16.c"
Integral = a*b; // 26 cycles (HW mult with software help Umul16.c seems that _PIC18 is defined)

long c=0x12345, d=0x23456;
// uses "C:\Program Files (x86)\Microchip\xc8\v2.10\pic\sources\c99\common\Umul32.c"
// which does not seem to have _PIC18 defined so it uses software multiplication instead of HW
// and this takes 416 cycles
Kd= c*d; 
 
 
 
#1

11 Replies Related Threads

    trossin
    Starting Member
    • Total Posts : 38
    • Reward points : 0
    • Joined: 2006/06/02 11:31:50
    • Location: 0
    • Status: offline
    Re: How to stop PIC18F26K40 32-bit multiply from using software loop with XC8 V2.1 2020/02/28 16:49:46 (permalink)
    0
    So, it seems that one has to pay more money to get fast 32-bit multiplies or copy the source (Umul32.c) and define
    __OPTIMIZE_SPEED__ or remove that term from the code below.
    This line is in the source for Umul32.c.  I missed that __OPTIMIZE_SPEED__ (double underscore before and after)
    must be defined.
     
        #if (_Has_hardware_multiply || _Has_large_call_stack) && defined(__OPTIMIZE_SPEED__)
     
    I tried setting this variable in the compiler options and it still used the software loop.  So the copy method is the only way and you can't use the * operator on 32-bit variables.  Instead you have to call a function.
    Change: 
        Kd= c*d; // 426 cycles
    to:
        Kd = Mylmul(c,d);  // 78 cycles.
     
    This should speed up my PID control loop by a crap load!
     
     
     
    #2
    Mysil
    Super Member
    • Total Posts : 3666
    • Reward points : 0
    • Joined: 2012/07/01 04:19:50
    • Location: Norway
    • Status: offline
    Re: How to stop PIC18F26K40 32-bit multiply from using software loop with XC8 V2.1 2020/02/28 19:45:28 (permalink)
    +2 (2)
    Hi,
    By copying the source code for Umul32.c  
    from the compiler installation,  into source code for the project, and including it in the project,
    you may define __OPTIMIZE_SPEED__   directly in the source,
    or in MPLAB X 'Project Properties'   'XC8  Compiler'  'Preprocessing and Messages'  'Define Macros'   __OPTIMIZE_SPEED__=1 
     
    Then it work for me using hardware multiplier.
     
        Mysil
    post edited by Mysil - 2020/02/28 19:46:57

    Attached Image(s)

    #3
    trossin
    Starting Member
    • Total Posts : 38
    • Reward points : 0
    • Joined: 2006/06/02 11:31:50
    • Location: 0
    • Status: offline
    Re: How to stop PIC18F26K40 32-bit multiply from using software loop with XC8 V2.1 2020/02/28 22:52:14 (permalink)
    0
    Super. Thanks for the work-around
    #4
    mlp
    boots too small
    • Total Posts : 884
    • Reward points : 0
    • Joined: 2012/09/10 15:12:07
    • Location: previously Microchip XC8 team
    • Status: offline
    Re: How to stop PIC18F26K40 32-bit multiply from using software loop with XC8 V2.1 2020/03/01 21:54:49 (permalink)
    +1 (1)
    trossin
    and define
    __OPTIMIZE_SPEED__

    Did you select the project option (unsure of the exact name) that tells the compiler to optimize for speed rather than space? That option should cause the macro __OPTIMIZE_SPEED__ to be defined when building, and I thought it was orthogonal to the (license limited) options for optimization level.
     
    Umul32.c
    ...
    This should speed up my PID control loop by a crap load!

    Watch out, I'll get a swelled head.
    I'm just pleased somebody gets some benefit from my little exercise in longhand multiplication.

    Mark (this opinion available for hire)
    #5
    Mysil
    Super Member
    • Total Posts : 3666
    • Reward points : 0
    • Joined: 2012/07/01 04:19:50
    • Location: Norway
    • Status: offline
    Re: How to stop PIC18F26K40 32-bit multiply from using software loop with XC8 V2.1 2020/03/05 10:34:37 (permalink)
    0
    Hi,
    In MPLAB X v5.30, and using C99 mode,
    There seem to no longer be any 'optimize speed' option flag available in 'XC8 Compiler',
    And when Umul32.c  is not compiled as part of project source code,
    command line macros defined by -Dargument seem to Not be applied when library function is compiled.
     
    Here is snippet from .lst file:
     1527 ;; *************** function ___lmul *****************
      1528 ;; Defined at:
      1529 ;;        line 15 in file "C:\Program Files (x86)\Microchip\xc8\v2.10\pic\sources\c99\common\Umul32.c"
      1530 ;; Parameters:    Size  Location     Type
      1531 ;;  multiplier      4    4[COMRAM] unsigned long
      1532 ;;  multiplicand    4    8[COMRAM] unsigned long
      1533 ;; Auto vars:     Size  Location     Type
      1534 ;;  product         4   12[COMRAM] unsigned long
      1535 ;; Return value:  Size  Location     Type
      1536 ;;                  4    4[COMRAM] unsigned long
      1537 ;; Registers used:
      1538 ;;        wreg, status,2, status,0
      1539 ;; Tracked objects:
      1540 ;;        On entry : 0/0
      1541 ;;        On exit  : 0/0
      1542 ;;        Unchanged: 0/0
      1543 ;; Data sizes:     COMRAM   BANK0   BANK1   BANK2   BANK3   BANK4   BANK5   BANK6   BANK7   BANK8   BANK9  BANK10  BANK1
          +1  BANK12  BANK13  BANK14  BANK15
      1544 ;;      Params:         8       0       0       0       0       0       0       0       0       0       0       0       
          +0       0       0       0       0
      1545 ;;      Locals:         4       0       0       0       0       0       0       0       0       0       0       0       
          +0       0       0       0       0
      1546 ;;      Temps:          0       0       0       0       0       0       0       0       0       0       0       0       
          +0       0       0       0       0
      1547 ;;      Totals:        12       0       0       0       0       0       0       0       0       0       0       0       
          +0       0       0       0       0
      1548 ;;Total ram usage:       12 bytes
      1549 ;; Hardware stack levels used:    1
      1550 ;; Hardware stack levels required when called:    3
      1551 ;; This function calls:
      1552 ;;        Nothing
      1553 ;; This function is called by:
      1554 ;;        _MultiplicationTest
      1555 ;; This function uses a non-reentrant model
      1556 ;;
      1557                           
      1558                               psect    text14
      1559  0015A4                     __ptext14:
      1560                               opt callstack 0
      1561  0015A4                     ___lmul:
      1562                               opt callstack 26
      1563                           
      1564                           ;incstack = 0
      1565  0015A4  0E00                   movlw    0
      1566  0015A6  6E0D                   movwf    ___lmul@product^0,c
      1567  0015A8  0E00                   movlw    0
      1568  0015AA  6E0E                   movwf    (___lmul@product+1)^0,c
      1569  0015AC  0E00                   movlw    0
      1570  0015AE  6E0F                   movwf    (___lmul@product+2)^0,c
      1571  0015B0  0E00                   movlw    0
      1572  0015B2  6E10                   movwf    (___lmul@product+3)^0,c
      1573  0015B4                     l3027:
      1574  0015B4  A005                   btfss    ___lmul@multiplier^0,0,c
      1575  0015B6  D008                   goto    l3031
      1576  0015B8  5009                   movf    ___lmul@multiplicand^0,w,c
      1577  0015BA  260D                   addwf    ___lmul@product^0,f,c
      1578  0015BC  500A                   movf    (___lmul@multiplicand+1)^0,w,c
      1579  0015BE  220E                   addwfc    (___lmul@product+1)^0,f,c
      1580  0015C0  500B                   movf    (___lmul@multiplicand+2)^0,w,c
      1581  0015C2  220F                   addwfc    (___lmul@product+2)^0,f,c
      1582  0015C4  500C                   movf    (___lmul@multiplicand+3)^0,w,c
      1583  0015C6  2210                   addwfc    (___lmul@product+3)^0,f,c
      1584  0015C8                     l3031:
      1585  0015C8  90D8                   bcf    status,0,c
      1586  0015CA  3609                   rlcf    ___lmul@multiplicand^0,f,c
      1587  0015CC  360A                   rlcf    (___lmul@multiplicand+1)^0,f,c
      1588  0015CE  360B                   rlcf    (___lmul@multiplicand+2)^0,f,c
      1589  0015D0  360C                   rlcf    (___lmul@multiplicand+3)^0,f,c
      1590  0015D2  90D8                   bcf    status,0,c
      1591  0015D4  3208                   rrcf    (___lmul@multiplier+3)^0,f,c
      1592  0015D6  3207                   rrcf    (___lmul@multiplier+2)^0,f,c
      1593  0015D8  3206                   rrcf    (___lmul@multiplier+1)^0,f,c
      1594  0015DA  3205                   rrcf    ___lmul@multiplier^0,f,c
      1595  0015DC  5005                   movf    ___lmul@multiplier^0,w,c
      1596  0015DE  1006                   iorwf    (___lmul@multiplier+1)^0,w,c
      1597  0015E0  1007                   iorwf    (___lmul@multiplier+2)^0,w,c
      1598  0015E2  1008                   iorwf    (___lmul@multiplier+3)^0,w,c
      1599  0015E4  A4D8                   btfss    status,2,c
      1600  0015E6  D7E6                   goto    l3027
      1601  0015E8  C00D  F005             movff    ___lmul@product,?___lmul
      1602  0015EC  C00E  F006             movff    ___lmul@product+1,?___lmul+1
      1603  0015F0  C00F  F007             movff    ___lmul@product+2,?___lmul+2
      1604  0015F4  C010  F008             movff    ___lmul@product+3,?___lmul+3
      1605  0015F8  0012                   return        ;funcret
      1606  0015FA                     __end_of___lmul:
      1607                               opt callstack 0
      1608 

    Code above show software shift and add algorithm beeing used.
     
        Mysil
    #6
    1and0
    Access is Denied
    • Total Posts : 10550
    • Reward points : 0
    • Joined: 2007/05/06 12:03:20
    • Location: Harry's Gray Matter
    • Status: offline
    Re: How to stop PIC18F26K40 32-bit multiply from using software loop with XC8 V2.1 2020/03/05 11:20:07 (permalink)
    0
    trossin
    Here is my test code:
     
    unsigned char Val,Val2,PosD;
    Val = 0x1f; Val2 = 0x2f;
    PosD = Val*Val2; // 4 cycles (HW mult)

    int a=0x2345, b=0x3456;
    //uses "C:\Program Files (x86)\Microchip\xc8\v2.10\pic\sources\c99\common\Umul16.c"
    Integral = a*b; // 26 cycles (HW mult with software help Umul16.c seems that _PIC18 is defined)

    long c=0x12345, d=0x23456;
    // uses "C:\Program Files (x86)\Microchip\xc8\v2.10\pic\sources\c99\common\Umul32.c"
    // which does not seem to have _PIC18 defined so it uses software multiplication instead of HW
    // and this takes 416 cycles
    Kd= c*d; 

    All these three multiplications will cause overflow in the respective Umulxx() functions.
     
    If you want guaranteed speed, code the function yourself in assembly. It is just simple summations of cross products. ;)
     
    #7
    mlp
    boots too small
    • Total Posts : 884
    • Reward points : 0
    • Joined: 2012/09/10 15:12:07
    • Location: previously Microchip XC8 team
    • Status: offline
    Re: How to stop PIC18F26K40 32-bit multiply from using software loop with XC8 V2.1 2020/03/05 20:01:55 (permalink)
    0
    Mysil
    In MPLAB X v5.30, and using C99 mode,
    There seem to no longer be any 'optimize speed' option flag available in 'XC8 Compiler',

    Argh!
     
    Oh yes, now I remember: the Clang front-end uses the GCC-derived options "-Os" to optimize for speed, and "-O1" through "-O3" for size-related optimization.
    Try selecting optimization level "s".

    Mark (this opinion available for hire)
    #8
    1and0
    Access is Denied
    • Total Posts : 10550
    • Reward points : 0
    • Joined: 2007/05/06 12:03:20
    • Location: Harry's Gray Matter
    • Status: offline
    Re: How to stop PIC18F26K40 32-bit multiply from using software loop with XC8 V2.1 2020/03/05 20:49:36 (permalink)
    0
    mark.pappin
    Oh yes, now I remember: the Clang front-end uses the GCC-derived options "-Os" to optimize for speed, and "-O1" through "-O3" for size-related optimization.
    Try selecting optimization level "s".

    I thought "-Os" optimizes for size, not speed.
    #9
    ric
    Super Member
    • Total Posts : 26088
    • Reward points : 0
    • Joined: 2003/11/07 12:41:26
    • Location: Australia, Melbourne
    • Status: online
    Re: How to stop PIC18F26K40 32-bit multiply from using software loop with XC8 V2.1 2020/03/05 21:41:22 (permalink)
    +1 (1)
    XC8 User Guide
    3.7.6.6 OS: LEVEL S OPTIMIZATIONS
    The -Os option requests level s optimizations.
    This option requests all supported optimizations that decrease program size. This level
    is available only for licensed compilers.

     
    XC8 User Guide
    3.7.6.1 O0: LEVEL 0 OPTIMIZATIONS
    The -O0 option disables optimization.
    With no optimizations, the compiler’s goal is to reduce the cost of compilation and to
    make debugging produce the expected results.
    3.7.6.2 O1: LEVEL 1 OPTIMIZATIONS
    The -O or -O1 options request level 1 optimizations.
    The optimizations performed when using -O1 aims to reduce code size and execution
    time, but still allows a reasonable level of debugability. This level is available for unlicensed
    as well as licensed compilers.
    3.7.6.3 O2: LEVEL 2 OPTIMIZATIONS
    The -O2 option requests level 2 optimizations.
    At this level, the compiler performs nearly all supported optimizations. This level is
    available for unlicensed as well as licensed compilers.
    3.7.6.4 O3: LEVEL 3 OPTIMIZATIONS
    The -O3 option requests level 3 optimizations.
    This option requests all supported optimizations, including procedural abstraction, that
    reduces execution time but which might increase program size. This level is available
    only for licensed compilers.

     

    I also post at: PicForum
    Links to useful PIC information: http://picforum.ric323.co...opic.php?f=59&t=15
    NEW USERS: Posting images, links and code - workaround for restrictions.
    To get a useful answer, always state which PIC you are using!
    #10
    mlp
    boots too small
    • Total Posts : 884
    • Reward points : 0
    • Joined: 2012/09/10 15:12:07
    • Location: previously Microchip XC8 team
    • Status: offline
    Re: How to stop PIC18F26K40 32-bit multiply from using software loop with XC8 V2.1 2020/03/06 15:46:06 (permalink)
    0
    ric
    XC8 User Guide
    3.7.6.4 O3: LEVEL 3 OPTIMIZATIONS
    The -O3 option requests level 3 optimizations.
    This option requests all supported optimizations, including procedural abstraction, that
    reduces execution time but which might increase program size. This level is available
    only for licensed compilers.


    the Friendly Manual knows all and sees all
     
    As noted, some improve-speed options can significantly increase code size. As you'll see (if I recall correctly, I left comments in UmulXX.c with speed-and-size numbers) the expanded-long-multiplication code is somewhat larger than the bitwise-loop.
     
    At least you have a workaround by which you can use the fast multiply code even with the unlicensed compiler

    Mark (this opinion available for hire)
    #11
    trossin
    Starting Member
    • Total Posts : 38
    • Reward points : 0
    • Joined: 2006/06/02 11:31:50
    • Location: 0
    • Status: offline
    Re: How to stop PIC18F26K40 32-bit multiply from using software loop with XC8 V2.1 2020/03/06 18:41:41 (permalink)
    -1 (1)
    So it seems that what I said before is true:
    “ So, it seems that one has to pay more money to get fast 32-bit multiplies” that is unless you copy and paste code
    #12
    Jump to:
    © 2020 APG vNext Commercial Version 4.5