• AVR Freaks

Helpful ReplyThis drives me nuts! realloc

Author
muellernick
Super Member
  • Total Posts : 474
  • Reward points : 0
  • Joined: 2015/01/06 23:58:23
  • Location: Germany
  • Status: offline
2018/08/02 07:30:03 (permalink)
4.5 (2)

This drives me nuts! realloc

Hi!
 
I know, chances are extremely low that there's a bug in realloc.
And I know, that some will point me to the XC32 forum, only to hear what library I have to use that doesn't work in Harmony.
 
So:
XC32 V2.05 Pro (tried older versions too, no change), Harmony V1.09. Optimization = 2
 
So I have written my own dynamicArrays that do grow and shrink (but in adjustable steps).
And here is the odd thing:
When I do a realloc from 220 bytes to 440 (other number might work as well) and there is one odd circumstance, the first byte is set to zero.
The circumstance:
In front of the current allocation is just enough memory for the increased size.  realloc will take that space AND INCLUDE the currently allocated memory.
Example (with slightly different size, 240 IIRC):
before realloc 240 bytes: A000375C
after realloc, 480 bytes: A0003664
 
 
Now I suspect, <wild guess> that realloc should make a memmove instead of a memcpy, because memory overlaps. <wild guess/>
I'm unable to make a simple example with that behavior, just because I don't know how alloc works.
And the frustrating thing is, that this error often takes hours to happen.
 
I also had a different configuration (different threshold for array grow) where the very first 4 bytes where overwritten. And what really is the odd thing that these four bytes were the address of the next memory block on the heap!
 
I have really nailed it down to the point where data is OK (just looking at the first bytes) before the realloc and is corrupted right after it.
 
I know all this sounds absolutely crazy. But whatever I try to find an other explanation for fails.
 
I checked:
Dangling pointers: All set to NULL after a free
Writes through pointers with 1 byte and zero: No such thing.
Linted all files of the project.
Reviewed all allocs and frees.
Reviewed all memcopies and memmoves.
 <edit>
Looking to the left from that first byte, I see the memory manager's data: Next block plus bytes allocated. And that data is correct. So no overwrite.
I also tested with my memory functions wrapper that adds guards (0xAA55) left and right from the allocated memory and they are also both untouched.
 
OH, BTW. I'm doing about 90 allocs / second. And it might take a million allocs for this to happen.
And yes, there is plenty of room on the heap. 10k used, 40k free is typical. I always have less than 30 allocs active at a time.
 
Nick
 
 
 
post edited by muellernick - 2018/08/02 07:46:48
#1
jtemples
عُضْوٌ جَدِيد
  • Total Posts : 11329
  • Reward points : 0
  • Joined: 2004/02/13 12:31:19
  • Location: Southern California
  • Status: offline
Re: This drives me nuts! realloc 2018/08/02 10:37:46 (permalink)
4 (1)
in front of the current allocation is just enough memory for the increased size.  realloc will take that space AND INCLUDE the currently allocated memory.

 
So you're saying, for example, you have 220 bytes allocated at address 220, and you realloc to 440 bytes, and you now have 440 bytes at address 0?
 
I'm looking at the uClib implementation of realloc and they don't attempt do an "optmization" like that.  Can you drop in a different implementation and see if the problem goes away?
#2
jtemples
عُضْوٌ جَدِيد
  • Total Posts : 11329
  • Reward points : 0
  • Joined: 2004/02/13 12:31:19
  • Location: Southern California
  • Status: offline
Re: This drives me nuts! realloc 2018/08/02 10:40:22 (permalink)
4 (2)
void *realloc(void *ptr, size_t size)
{
    void *newptr = NULL;

    if (!ptr)
        return malloc(size);
    if (!size) {
        free(ptr);
        return malloc(0);
    }

    newptr = malloc(size);
    if (newptr) {
        size_t old_size = *((size_t *) (ptr - sizeof(size_t)));
        memcpy(newptr, ptr, (old_size < size ? old_size : size));
        free(ptr);
    }
    return newptr;
}

#3
moser
Super Member
  • Total Posts : 504
  • Reward points : 0
  • Joined: 2015/06/16 02:53:47
  • Location: Germany
  • Status: offline
Re: This drives me nuts! realloc 2018/08/03 03:23:33 (permalink)
4 (2)
You wrote on the one hand:
"the first byte is set to zero."
You wrote on the other hand:
"I also tested with my memory functions wrapper that adds guards (0xAA55) left and right from the allocated memory and they are also both untouched."
 
By guards, do you mean you allocate 8 bytes more than needed, and place the 4 byte guard in the beginning and end? And with guards, do you still see the first "data" byte corrupted? And not the guard byte is corrupted? Or how did you do it?
 
And have you tried reproducing the circumstance with tests like:
allocate 240 bytes as p1
allocate 240 bytes as p2 (hopefully now p2 = p1 + 240 + 8)
verify that p2 = p1 + 240 + 8
free p1
reallocate p2 with 480 bytes

 
 
#4
muellernick
Super Member
  • Total Posts : 474
  • Reward points : 0
  • Joined: 2015/01/06 23:58:23
  • Location: Germany
  • Status: offline
Re: This drives me nuts! realloc 2018/08/06 00:22:23 (permalink)
4 (1)
So first of all, I am fully aware that this is highly unlikely. But still I observe it.
 
So you're saying, for example, you have 220 bytes allocated at address 220, and you realloc to 440 bytes, and you now have 440 bytes at address 0?

 
No. 220 bytes at 440. A realloc with 440 and the address is 220. No complaining about that. In fact, that is quite neat!
 
Just to make you aware: The problem happens with and without my guard thingy.
 
Re the guards:
With in the allocated block, I do that 2 bytes size information (actually allocated bytes, not size of the block) a 0xAA55 then the user memory and finally an other guard 0xAA55. The pointers passed to my alloc/realloc/free ("APP_MemAlloc") are are checked for the guards, then the pointer is offset to the left by 4 bytes (size + guard) and passed to the native functions. So this resides within the original memory allocation without any harm. And that way, I can implement a heap check that can be called after suspicious actions or within the task loop, or check certain pointers etc. Actually, this lib is quite simple to implement.
And no, the guard was OK. The corrupted byte was within the usable allocated memory. And always at the first place.
But again, this happens with or without (= plain libc calls) guards.
 
There is something, that makes me suspicious:
In the c-library, there is a fault with sscanf with hex to binary. Also with scanf with 64 bit integers. I reported that over in the XC32 forum. But the answer was to use the "legacy-libc". But this one, I can't get working with Harmony. So I wrote my own wrapper for HexToBin conversion. Now my wild guess is, that the libc is kind of an neglected child or so.
If someone could shed some light on that, I'd like to hear!
 
So:
I let the software run over an extended weekend (I had Friday off). It crashed, but I didn't have a look at the log yet.
I'll write my own realloc (alloc + memcopy + free), that's a no-brainer, and see what happens ...
 
Thanks,
Nick
#5
moser
Super Member
  • Total Posts : 504
  • Reward points : 0
  • Joined: 2015/06/16 02:53:47
  • Location: Germany
  • Status: offline
Re: This drives me nuts! realloc 2018/08/06 03:43:23 (permalink)
4 (1)
Since your size and guard is OK, to me this does not sound like a problem with libc realloc. As you described: from the perspective of libc realloc the byte 1-4 (size and guard) is OK and byte 5-8 (first four data bytes) is corrupted. And although it is of course possible, that would be a quite strange bug for realloc.
 
However, I think another cause is more likely. It somehow sounds like a problem "instead of setting a pointer to NULL the app accidentally sets the pointed thing to NULL" (= the first four data bytes). I would double check for those things first:
- Your app has an error with pointer handling (like this "instead of the pointer it sets the pointed thing to NULL" issue).
- A small error in your allocation library in your APP_MemAlloc/Realloc/Free function.
- Maybe at some place in your app the normal libc function is called instead of your wrapper library, and you alloc with one, and realloc or free with the other.
- Maybe you are using your APP_MemAlloc/Realloc/Free functions both in main loop and interrupts, and forgot in the main loop that alloc and realloc are not interrupt safe.
- Your app has another problem which corrupts the first four data bytes on its own.
- When alloc or realloc returns a NULL pointer it is not correctly handled in APP_MemAlloc/Realloc.
- When APP_MemAlloc/Realloc returns a NULL pointer it is not correctly handled in the app.
 
 
If this doesn't lead to anything, then I would still try to directly create a sequence of allocs, frees and reallocs as described in my post #4. Either do it directly by libc calls, or do indirectly by your APP_MemAlloc lib calls, where you need to adjust the numbers. Try to directly create a situation like described in your example in the first post, and check if you can reproduce this problem more directly. I assume the addresses in your example in the first post (with a difference of 248) are those which were returned by libc alloc/realloc, and not those of your APP_MemAlloc/Realloc, is this correct? 
 
Other ideas which you could test are:
- Instead of "2 bytes size + 2 bytes guard" try "2 bytes size + 6 bytes guard" in front, and see if still hits your first four data bytes or if it now hits the extended guard.
- Try reducing optimization to o1 and see if this has any impact.
PS:
- When the error happens try to debug output all your active allocations, and check manually if some of them are overlapping.
post edited by moser - 2018/08/06 03:45:33
#6
muellernick
Super Member
  • Total Posts : 474
  • Reward points : 0
  • Joined: 2015/01/06 23:58:23
  • Location: Germany
  • Status: offline
Re: This drives me nuts! realloc 2018/08/07 02:23:24 (permalink)
0
Short question:
 
What libc are you using?
Project Properties -> XC32 (Global Options) -> "Use Legacy libc".
Is that one checked or not?
I can't find any "official information" about that.
 
TIA,
Nick
#7
muellernick
Super Member
  • Total Posts : 474
  • Reward points : 0
  • Joined: 2015/01/06 23:58:23
  • Location: Germany
  • Status: offline
Re: This drives me nuts! realloc 2018/08/07 04:52:05 (permalink)
4 (1)
Hi jtemples!
Some editing ...
 
Thank you for posting the source code. I suppose, it is from the libc Microchip is using.
If so, I'm even more puzzled.
I quite misunderstood your posting.
I'll copy/paste your source of realloc and use this one.
 
 
I traced all mallocs, frees and reallocs and often enough, realloc does return the same pointer. Especially, when doing an realloc without changing the size (OK, that is an useless realloc).
According to the source, this can never happen.
 
So the source can't be from the libc I am using. I'm using Harmony 1.09. and XC32 V2.05.
What libc should be linked (xc32/V2.05/pic32mx/lib/$what)
 
Thanks,
Nick
A feeling that I'm a complete idiot is creeping up my back ...
post edited by muellernick - 2018/08/07 05:05:50
#8
jtemples
عُضْوٌ جَدِيد
  • Total Posts : 11329
  • Reward points : 0
  • Joined: 2004/02/13 12:31:19
  • Location: Southern California
  • Status: offline
Re: This drives me nuts! realloc 2018/08/07 09:59:45 (permalink)
3 (1)
That function I posted was from uClib, not from any MCHP source.
#9
friesen
Super Member
  • Total Posts : 2080
  • Reward points : 0
  • Joined: 2008/05/08 05:23:35
  • Location: Indiana, USA
  • Status: offline
Re: This drives me nuts! realloc 2018/08/07 15:50:52 (permalink)
0
Are you working in cached space only?

Erik Friesen
#10
muellernick
Super Member
  • Total Posts : 474
  • Reward points : 0
  • Joined: 2015/01/06 23:58:23
  • Location: Germany
  • Status: offline
Re: This drives me nuts! realloc 2018/08/08 01:05:56 (permalink)
3 (1)
 Are you working in cached space only?

 
Uh?! I suppose. Addresses are 0xA000xxxx.
 
I changed the initialization, so reallocs no longer do occur. And then it works.
Yes, I once more checked, that all frees are followed with a set to NULL.
 
@Mouser:
You have some valid points.
But mixing malloc and APP_MemAlloc doesn't work. The first heap check would fail.
APP_MemAlloc/ReAlloc/Free do send messages to the debug console if something goes wrong or is wrong. An APP_MemFree for an already freed block would be detected, because of instead of the guard ('0xAA55') it writes a '0xDEAD'.
 
The flow is dead simple:
check for first byte != 0
Call APP_DynArrayAppend
  APP_DynArrayAppend checks for enough room (APP_DynArrayGrow) for the new element and makes an realloc.
  APP_DynArrayAppend then copies over the new element to the allocated data (at the end)
Right after that, the first byte is 0
And this only occurs in the case described in the first posting. And this occurs, no matter if I have my heap-stuff enabled (#define APP_MemReAlloc(a, b) (realloc(a, b)) or not.
 
static bool APP_DynArrayGrow(APP_DynArrayData_t* daData) {
  if (daData->allocated > daData->elements)
    return true;

  // Now out of space ...
  #ifdef APP_DynArrayDebug
  APP_DynArrayDbg(daData, "grow pre");
  #endif

  APP_MemCheckPtr(daData->pData, __LINE__, __FILE__);
  uint8_t* pNew = APP_MemReAlloc(daData->pData, (daData->allocated + daData->threshold) * daData->elementSize);
  if (pNew == NULL) {
    APP_DebugWriteOutOfMem(0, 1000);

    return false;
  }

  daData->allocated += daData->threshold;
  daData->pData = pNew;

  #ifdef APP_DynArrayDebug
  APP_DynArrayDbg(daData, "grow post");
  #endif

  return true;
// ---------------------------------------------------------------------------
bool APP_DynArrayAppend(APP_DynArrayData_t* daData, const void* pElement) {
  if (APP_DynArrayGrow(daData) == false)
    return false;

  memcpy(&daData->pData[daData->elements * daData->elementSize], pElement, daData->elementSize);
  daData->elements++;

  return true;
}
// ------------------------------------------------------------------------------
*** SNIP ***
 
  if (appTLSProtData.stackFG1T222.da.elements > 0 && appTLSProtData.stackFG1T222.da.pData[0] != APP_TLSFG1T222LenDEParamIO)
    APP_DebugWrite("********* now! 1\n", 0, 0);

  APP_MemCheckPtr(appTLSProtData.stackFG1T222.da.pData, __LINE__, __FILE__);
  APP_DynArrayAppend(&appTLSProtData.stackFG1T222.da, & se222);
  APP_MemCheckPtr(appTLSProtData.stackFG1T222.da.pData, __LINE__, __FILE__);

  if (appTLSProtData.stackFG1T222.da.elements > 0 && appTLSProtData.stackFG1T222.da.pData[0] != APP_TLSFG1T222LenDEParamIO) {
    APP_DebugWrite("********* now! 2\n", 0, 0);


 
*** SNIP ***
}

There occurs no "**** now! 1", only a "*** now! 2"! So, before all was OK.
 
Edit:
APP_DynArrayData_t is considered private. There is no direct access to pData from the user's side. Only elements (number of elements in array) is read from the user's side.
All read/write/delete/inserts are done through an interface. All memmoves to pData are strictly restricted by elementSize.
 
 
I have to give up for now hunting for that bug. It takes too much time.
 
I'll come back later ...
post edited by muellernick - 2018/08/08 01:51:32
#11
moser
Super Member
  • Total Posts : 504
  • Reward points : 0
  • Joined: 2015/06/16 02:53:47
  • Location: Germany
  • Status: offline
Re: This drives me nuts! realloc 2018/08/08 03:21:58 (permalink)
0
When "********* now! 2" happens, did you check if something has previously corrupted da.elementSize or da.elements (e.g.  such famous code like: if (da.elements = 0) { ... } )?
 
If something has previously overwritten da.elementsSize or da.elements with 0, or you had a type overflow at da.elements, then your "*** SNIP ***"-code causes exactly your problem: You won't hit "********* now! 1", then APP_DynArrayAppend() overwrites pData[0], then you hit "********* now! 2" 
 
 
PS: Oh, and friesen is right: mixing cached and uncached access could also cause such problems, because your cache and your real memory might get out of sync. Just make sure, nothing is doing uncached accesses (e.g. by using coherent, or by converting pointers to uncached space, or by using DMA, or by using a library function on a buffer which uses DMA, or stuff like that).
post edited by moser - 2018/08/08 03:37:45
#12
muellernick
Super Member
  • Total Posts : 474
  • Reward points : 0
  • Joined: 2015/01/06 23:58:23
  • Location: Germany
  • Status: offline
Re: This drives me nuts! realloc 2018/08/09 00:16:53 (permalink) ☄ Helpfulby twelve12pm 2018/08/09 08:20:45
4.5 (2)
So here we go!
 
Implemented a new realloc and let the app run over night. The result is:
932022 reallocs, 3941784 allocs in 15 hours 33 minutes and NO corrupted memory.
 
So, here is my realloc:
First we need to know how the info in front of each allocated block looks like (that's a bit unconventional, but no complaining).
typedef struct {
  struct heapChain_t* pNext;
  char flag; // = 1 when in use?
} heapChain_t;

That struct is 8 bytes. The "char flag" is not my style, I would have used uint32_t to make things clear.
 
pNext points to the next info in that chain. flag tells whether that block is used or not. To determine the size of the (user's) memory, take pNext and subtract your info and minus sizeof heapChain_t
 
The code actually is quite simple:

void* APP_MemReAllocAlt(void* pOld, const size_t requestSize) {
  if (pOld == NULL)
    return malloc(requestSize);

  heapChain_t* pOldLink = (heapChain_t*) pOld - 1;
  
  // Safety check. This can't be a unused memory block
  if (pOldLink->flag != 1) {
    APP_MemDbgCB("**** APP_MemReAllocAlt: points to a freed allocation\n");
    return NULL;
  }
  
  // get the size of the previous allocation
  size_t oldSize = (uint32_t) pOldLink->pNext - (uint32_t) pOldLink - sizeof (heapChain_t);

  // determine size of new block (will be at least requestSize, as we have 8 byte alignment/size increments)
  size_t newSize = requestSize & 0xFFFFFFF8;
  if ((requestSize & 0x07) != 0)
    newSize += 8;

  // avoid realloc with same block sizes (but requestSize might be a bit different)
  if (newSize == oldSize)
    return pOld;

  // allocate new memory
  void* pNew = malloc(newSize);
  if (pNew == NULL) // keep the old allocation
    return NULL;

  // copy over from old block to new block
  memcpy(pNew, pOld, min(newSize, oldSize));

  free(pOld);
 

  return pNew;
}

 
And how I call it (remember, I do have a wrapper around malloc, free and realloc and NEVER use the libc calls in my code.
 
  // Use alternative realloc?
#define APP_MemReallocAlternative
  
#ifdef APP_MemReallocAlternative
#define realloc(a, b) (APP_MemReAllocAlt((a), (b)))
#endif

You could easily make an optimization when shrinking memory. Set a copy of pOldLink behind the shrunk block (with flag set to 0) and set your pOldLink to point to that address. This way, you can avoid a superfluous alloc and memcopy.
 
Now I'm exited about what Microchip has to say.
I'm aware that there is the argument that allocated memory moved around in a different pattern that prevented the error to reoccur. But the number of allocs and reallocs speaks against that.
Don't worry, I continue to test it ...
 
Thanks for all your input and suggestions!
 
Edit:
OH! The fine print:
No guarantee whatsoever for what version of libc that will work with. I have no control and knowledge over that whatsoever.
 
Nick
post edited by muellernick - 2018/08/09 00:29:22
#13
muellernick
Super Member
  • Total Posts : 474
  • Reward points : 0
  • Joined: 2015/01/06 23:58:23
  • Location: Germany
  • Status: offline
Re: This drives me nuts! realloc 2018/08/12 23:42:32 (permalink)
4 (1)
Update:
5.6 millions reallocs and 23.4 million allocs later ...
Still all fine. No crash, no corrupted memory.
 
Nick
#14
Jump to:
© 2019 APG vNext Commercial Version 4.5