• AVR Freaks

Hot!Stack underflow in C possible ?

Author
aurelienr
Starting Member
  • Total Posts : 52
  • Reward points : 0
  • Joined: 2006/05/28 06:27:02
  • Location: France
  • Status: offline
2019/05/27 08:34:19 (permalink)
0

Stack underflow in C possible ?

Hello,
I encounter a bug on a software already in production. For the moment we did not find the way to reproduce it.
Some variables are modified even when there is no function that can modify them. I can see at least approx 15 bytes modified, all close each other in addresses. I can't determine if other variables are concerned since I don't have means to read them. It seems the program continue to work but something happends that should never happen.
Program runs on PIC24FJ512GA610, compiled with XC16 v1.32.
 
I'm investigating on several leads..
- buffer overflow that will write data outside a buffer. This is not the most probable case because in the map file all the buffers are far from the variables I know they are modified. The whole RAM is not corrupted , the program would not run anymore.... So it would more likely be a buffer pointer MSB error. But this is not the object of my request right now.
- stack underflow. The variables that are modified are located between addresses 0x4410 and 0x4413 (included). The stack begins at address 0x4436. Stack length = 15306 bytes. No heap declared, I don't use malloc functions. If a stack underflow would occur, there are also ~10 others variables that should be corrupted (between 0x4414 and 0x4435), but even if they were corrupted the software could continue to work (with some edge bugs)
But I don't know how I could have a stack underflow condition on PIC24, entirely written in C, with no inline assembly (excepting __builtin_nop(), __builtin_clrwdt(), __builtin_pwrsav(0), __builtin_disi(xxx))...

Any idea ?
 
Thank you
Aurelien
#1

19 Replies Related Threads

    du00000001
    Just Some Member
    • Total Posts : 2792
    • Reward points : 0
    • Joined: 2016/05/03 13:52:42
    • Location: Germany
    • Status: offline
    Re: Stack underflow in C possible ? 2019/05/27 09:08:24 (permalink)
    +1 (1)
    Stack underflow? Not unless you manipulate the stack pointer (W15) or the frame pointer (W14). Manipulations of the latter might result in data corruption without any other significant effects (as W14 is discarded on return from some subroutine/interrupt).
     
    Anyway, I wouldn't expect the software to continue - somewhat - proper operation once you achieved a stack underflow.
    Better look for "free running pointers", faulty table writes or alike.

    PEBKAC / EBKAC / POBCAK / PICNIC (eventually see en.wikipedia.org)
    #2
    NKurzman
    A Guy on the Net
    • Total Posts : 17519
    • Reward points : 0
    • Joined: 2008/01/16 19:33:48
    • Location: 0
    • Status: online
    Re: Stack underflow in C possible ? 2019/05/27 10:26:33 (permalink)
    +1 (1)
    The debugger can be set to break on a write to a specific memory location.
    I’m going to bet it’s a pointer problem also.
    #3
    aurelienr
    Starting Member
    • Total Posts : 52
    • Reward points : 0
    • Joined: 2006/05/28 06:27:02
    • Location: France
    • Status: offline
    Re: Stack underflow in C possible ? 2019/05/27 14:05:47 (permalink)
    0
    Hi,
    Thanks :)
    For the moment, we did not succeed to reproduce the issue. We only see the issue after some period of service (on a non negligible quantity of products), through data uploaded trough daily 3G connection. Using a debugger on a product during several days or even weeks is just impossible with ICD3, it will crash after some hours :D My customer has 2 products in test since 1 week, nothing happened up to now.
    Would be easier if I could trigger a software break in code exactly at time of data corruption to examine current program position and memory (and send full content of data to UART for example), but I don't think it's possible ?
    Program has a big size, checking every array access seems just impossible from scratch. I encountered an array  pointer issue on another program once in my life, the code was simple, and I spent a lot of time to find it, even when it had only few array accesses and I was knowing what kind of problem I was searching...
     
    Aurelien
    #4
    du00000001
    Just Some Member
    • Total Posts : 2792
    • Reward points : 0
    • Joined: 2016/05/03 13:52:42
    • Location: Germany
    • Status: offline
    Re: Stack underflow in C possible ? 2019/05/27 15:37:36 (permalink)
    +1 (1)
    You can feed some static code checker (e.g. Lint, some MISRA checker or even doxygen) with your code. These tools are not bad in detecting errors and/or bad coding.

    PEBKAC / EBKAC / POBCAK / PICNIC (eventually see en.wikipedia.org)
    #5
    Aussie Susan
    Super Member
    • Total Posts : 3591
    • Reward points : 0
    • Joined: 2008/08/18 22:20:40
    • Location: Melbourne, Australia
    • Status: offline
    Re: Stack underflow in C possible ? 2019/05/27 19:20:39 (permalink)
    +1 (1)
    Perhaps try adding a bit of code into the main loop to look at the locations that are changing and report when they differ from your expected values. Something like the equivalent of the 'watch breakpoint' but with perhaps a few more smarts so it doesn't stop when 'expected' changes occur - especially if they might occur frequently.
    I know that this changes the 'production' code but it might give you a starting point.
    Susan
    #6
    aurelienr
    Starting Member
    • Total Posts : 52
    • Reward points : 0
    • Joined: 2006/05/28 06:27:02
    • Location: France
    • Status: offline
    Re: Stack underflow in C possible ? 2019/05/28 01:40:16 (permalink)
    0
    Hello,
    Since the program continue to work even with some variables are corrupted, I guess this is not a large overwrite. Some variables that are close in location to the corrupted ones are not modified (and the stack just after seems not corrupted), so it seems it concerns only some bytes. Buffers are far away, so I may expect more probably a problem like unitialized pointer, rather than a buffer write overflow.
    Something strange is that the corruption affects often the variables with the same corruption value. That may help if I could have a dump and identify the data written.
    I can easily make a software modification that polls for the two classic corruptions I see (always the same), but I'm not sure that would help understanding what has caused the corruption. Moreover, we can wait weeks before the problem happens...I'm also afraid that adding code to make RAM dump may modify the code timings and modify the way the corruption happen...
    So I'm searching for a bug that only does corruptions with few different values, and on rare condition.
     
    I'm installing the PClint pluggin on MPLABX and I will see what it does...
     
    Aurelien
     
    #7
    du00000001
    Just Some Member
    • Total Posts : 2792
    • Reward points : 0
    • Joined: 2016/05/03 13:52:42
    • Location: Germany
    • Status: offline
    Re: Stack underflow in C possible ? 2019/05/28 02:19:28 (permalink)
    +1 (1)
    Provided you're calculating array indexes dynamically, your problem could stem from improper index calculation: especially using signed values for index calculation often results in "unexpected" results :)

    PEBKAC / EBKAC / POBCAK / PICNIC (eventually see en.wikipedia.org)
    #8
    Chris A
    Super Member
    • Total Posts : 834
    • Reward points : 0
    • Joined: 2010/07/20 04:37:07
    • Location: 0
    • Status: offline
    Re: Stack underflow in C possible ? 2019/05/28 03:10:43 (permalink)
    0
    I'll say it as it bit me once and its PIC24. Although your result sounds a bit to consistent.
     
    Check you do not have more than one interrupt that uses the 'shadow' attribute at different priorities. If you do and one preempts the other, the original register values will be trashed!
    #9
    aurelienr
    Starting Member
    • Total Posts : 52
    • Reward points : 0
    • Joined: 2006/05/28 06:27:02
    • Location: France
    • Status: offline
    Re: Stack underflow in C possible ? 2019/05/28 08:51:26 (permalink)
    0
    I don't have any shadow characteristic explicitely written in the code.
    I have several IT sources, with different priorities (with nesting enabled) : 1 timer, 6 UARTs. UARTs IT are likely to occur at same time and interrupt each other.

    All the interrupts are generated by MCC and defined like that :
    void __attribute__ ( ( interrupt, no_auto_psv ) ) _T3Interrupt ( )
    May it be an issue ?
    #10
    du00000001
    Just Some Member
    • Total Posts : 2792
    • Reward points : 0
    • Joined: 2016/05/03 13:52:42
    • Location: Germany
    • Status: offline
    Re: Stack underflow in C possible ? 2019/05/28 09:18:25 (permalink)
    +2 (2)
    Even without the use of shadow registers, interrupts interrupting each other might pose a problem if e.g. global variables are shared between the ISRs: one might want to increment the variable, the other decrement it. If incrementing/decrementing is non-atomic and the other interrupt interrupts "at the wrong moment", "surprises" might happen.

    PEBKAC / EBKAC / POBCAK / PICNIC (eventually see en.wikipedia.org)
    #11
    aurelienr
    Starting Member
    • Total Posts : 52
    • Reward points : 0
    • Joined: 2006/05/28 06:27:02
    • Location: France
    • Status: offline
    Re: Stack underflow in C possible ? 2019/05/28 14:59:59 (permalink)
    0
    None of the interrupt share variables with other ones. Each UART IT has its own buffers and pointers, and the timer IT only increments task/time counter variables.
    However, I noticed that some SFR are shared between UART IT, for example IFS5 register, and also IEC5. But after listing analysis, the operations performed on these registers in IT are atomic :
    IEC5bits.U4TXIE = false; => bclr.b 0xa3, #0x1
    IEC5bits.U4RXIE = 0; => bclr.b 0xa3, #0x0
    However I may keep investigating on these IT, but I don't understand how they could corrupt just some variables far away the object they handle...
    #12
    Aussie Susan
    Super Member
    • Total Posts : 3591
    • Reward points : 0
    • Joined: 2008/08/18 22:20:40
    • Location: Melbourne, Australia
    • Status: offline
    Re: Stack underflow in C possible ? 2019/05/28 19:04:17 (permalink)
    0
    Without any code it is hard to help you.
    However you can also check that parameters passed to functions are always of the correct type (no casts etc. to push wrong sized values onto the stack), near/far pointers getting mixed, pointer arithmetic and array indexing being wrong...
    Also don't assume that everything has a direct cause. For example, if you assume that your array index is always right, then perhaps a rogue pointer is actually corrupting the index before it is used.
    Are all variables that are updated in an ISR declared volatile?
    In all likelihood it will be something "obvious" when you find it - using the wrong similarly named variables etc..
    Susan
     
    #13
    aurelienr
    Starting Member
    • Total Posts : 52
    • Reward points : 0
    • Joined: 2006/05/28 06:27:02
    • Location: France
    • Status: offline
    Re: Stack underflow in C possible ? 2019/05/29 01:48:20 (permalink)
    0
    I know :) But even If it was possible for me to publish code, I doubt anybody would parse 10K lines of codes or even more :) I just come here to get ideas or advices in possibles origins of corruptions.
    What do you mean by "no cast" for pointers passed in functions ? I often use for example this kind of call :
    if (strncmp((const char *)shell_rx_frame, "SMODE", 5) == 0)   Shell_rx_data.cmd = CMD_SETMODE;
     
     Yes, all the variables uses in IT are volatile. Moreover I often declare my variables as volatile to ease debug.
     
    I'm also curious to know how a pointer can become mad or unreferenced ?
    post edited by aurelienr - 2019/05/29 03:12:52
    #14
    aurelienr
    Starting Member
    • Total Posts : 52
    • Reward points : 0
    • Joined: 2006/05/28 06:27:02
    • Location: France
    • Status: offline
    Re: Stack underflow in C possible ? 2019/05/29 03:28:41 (permalink)
    0
    Looking further on IT, I see the following that is generated by MCC.
     
     

    typedef struct _TMR_OBJ_STRUCT
    {
    /* Timer Elapsed */
    bool timerElapsed;
    /*Software Counter value*/
    uint8_t count;

    } TMR_OBJ;

    static TMR_OBJ tmr3_obj;

    /**
    Section: Driver Interface
    */


    void TMR3_Initialize (void)
    {
    //TMR3 0;
    TMR3 = 0x0000;
    //Period = 0.01 s; Frequency = 14745600 Hz; PR3 576;
    PR3 = 0x0240;
    //TCKPS 1:256; TON enabled; TSIDL disabled; TCS FOSC/2; TECS SOSC; TGATE disabled;
    T3CON = 0x8030;


    IFS0bits.T3IF = false;
    IEC0bits.T3IE = true;

    tmr3_obj.timerElapsed = false;

    }

     

    void __attribute__ ( ( interrupt, no_auto_psv ) ) _T3Interrupt ( )
    {
    /* Check if the Timer Interrupt/Status is set */

    //***User Area Begin

    // ticker function call;
    // ticker is 1 -> Callback function gets called everytime this ISR executes
    TMR3_CallBack();

    //***User Area End

    tmr3_obj.count++;
    tmr3_obj.timerElapsed = true;
    IFS0bits.T3IF = false;
    }


    void TMR3_Period16BitSet( uint16_t value )
    {
    /* Update the counter values */
    PR3 = value;
    /* Reset the status information */
    tmr3_obj.timerElapsed = false;
    }

    uint16_t TMR3_Period16BitGet( void )
    {
    return( PR3 );
    }

    void TMR3_Counter16BitSet ( uint16_t value )
    {
    /* Update the counter values */
    TMR3 = value;
    /* Reset the status information */
    tmr3_obj.timerElapsed = false;
    }

    uint16_t TMR3_Counter16BitGet( void )
    {
    return( TMR3 );
    }


    void __attribute__ ((weak)) TMR3_CallBack(void)
    {
    // Add your custom callback code here
    }

    void TMR3_Start( void )
    {
    /* Reset the status information */
    tmr3_obj.timerElapsed = false;

    /*Enable the interrupt*/
    IEC0bits.T3IE = true;

    /* Start the Timer */
    T3CONbits.TON = 1;
    }

    void TMR3_Stop( void )
    {
    /* Stop the Timer */
    T3CONbits.TON = false;

    /*Disable the interrupt*/
    IEC0bits.T3IE = false;
    }

    bool TMR3_GetElapsedThenClear(void)
    {
    bool status;

    status = tmr3_obj.timerElapsed;

    if(status == true)
    {
    tmr3_obj.timerElapsed = false;
    }
    return status;
    }

    int TMR3_SoftwareCounterGet(void)
    {
    return tmr3_obj.count;
    }

    void TMR3_SoftwareCounterClear(void)
    {
    tmr3_obj.count = 0;
    }




     
     
    I don't understand why the object is declared as static while it is accessed in both ISR and normalprogram operations. In my case I don't care about the object I don't use it, but I'd want to understand why it is not declared as volatile.
    #15
    NKurzman
    A Guy on the Net
    • Total Posts : 17519
    • Reward points : 0
    • Joined: 2008/01/16 19:33:48
    • Location: 0
    • Status: online
    Re: Stack underflow in C possible ? 2019/05/29 06:05:35 (permalink)
    0
    It should be static and volatile.
    Static limits the scope.
    Not the most efficient code.
    #16
    aurelienr
    Starting Member
    • Total Posts : 52
    • Reward points : 0
    • Joined: 2006/05/28 06:27:02
    • Location: France
    • Status: offline
    Re: Stack underflow in C possible ? 2019/05/29 08:13:36 (permalink)
    0
    I'm developping a version of software that make a dump of approx 100 variables including state machines, etc...in a RAM buffer that already exist (and not used by other functions) to avoid any modification to the RAM memory map.
    I will write the dump only once as soon as the bug is detected (I will check it after every task execution), I hope I will find something and that "this something" will happen soon...
    This is the only lead I have to get more information on what's happening...
    #17
    dan1138
    Super Member
    • Total Posts : 3123
    • Reward points : 0
    • Joined: 2007/02/21 23:04:16
    • Location: 0
    • Status: offline
    Re: Stack underflow in C possible ? 2019/05/29 21:06:43 (permalink)
    +2 (2)
    aurelienr
    What do you mean by "no cast" for pointers passed in functions ? I often use for example this kind of call :
    if (strncmp((const char *)shell_rx_frame, "SMODE", 5) == 0)   Shell_rx_data.cmd = CMD_SETMODE;

    This is really bad style because if you have a typo in the symbol name shell_rx_frame the explicit cast tells the compiler to NOT CHECK that the symbol matches the argument prototype.
     
    In your too short example it's impossible to tell what type of object shell_rx_frame is. It could be an integer or a pointer.
     
    If you phat-phingered the name then the cast would cause strncmp to de-refference an invalid pointer.
    #18
    LdB_ECM
    Senior Member
    • Total Posts : 110
    • Reward points : 0
    • Joined: 2019/04/16 22:01:25
    • Location: 0
    • Status: offline
    Re: Stack underflow in C possible ? 2019/06/26 20:50:03 (permalink)
    0
    aurelienr
    I don't understand why the object is declared as static while it is accessed in both ISR and normalprogram operations. In my case I don't care about the object I don't use it, but I'd want to understand why it is not declared as volatile.

    It is setup for the only joint access to happen in the callback function and as the interrupts are off and not nested during that time .. it is all marked "//***User Area Begin" to "//***User Area end". Implicit in those markings are you can only access the object during that time. Do that and it is guaranteed that you and the ISR can never access the object data at the same time and the volatile is redundant and interferes with optimization of the handler code.
     
    Things you can do wrong is access the object outside that period (AKA not in the handler) or change the interrupt to nest and then all bets are off.
    post edited by LdB_ECM - 2019/06/26 20:54:28
    #19
    aurelienr
    Starting Member
    • Total Posts : 52
    • Reward points : 0
    • Joined: 2006/05/28 06:27:02
    • Location: France
    • Status: offline
    Re: Stack underflow in C possible ? 2019/07/04 12:05:44 (permalink)
    +2 (2)
    Hello,
    I finally found the bug.
    One of the products put in test since 3 weeks with an automatic memory dump feature when bug is present has reported today that a part of memory was corrupted with data coming from an external peripheral (UART), hoperfully the frames were in ASCII and easily identifiable. But the source of the leak is not an IT problem, but in my application. In very specific conditions, my frame buffer overflow protection was not working, and the pointer was increased with each byte received (in the limit of 255 bytes). Once conditions were understood, I could easily reproduce the issue. Most of the time, the product reboots by itself after corruption (traps or watchdog protection, depends on what is corrupted), but in some case, the corruption does not affect fully the product and it still works...
     
    Thank you folks for your suggestions and tips !
     
    Aurelien
    #20
    Jump to:
    © 2019 APG vNext Commercial Version 4.5