• AVR Freaks

Helpful ReplyHot!Code execution from Flash and RAM (max speed?)

Author
toms
Starting Member
  • Total Posts : 86
  • Reward points : 0
  • Joined: 2006/03/07 18:06:24
  • Location: London, UK
  • Status: offline
2020/03/26 03:24:38 (permalink)
0

Code execution from Flash and RAM (max speed?)

Hi all,
 
Ive been working on a long time project which mostly deals with 32 bit data. I made the mistake early on in the project of spec'ing in a PIC24 for this task, and I sort of regret it now since it doesnt handle 32 bit data natively. Basically I would like the processor to be able to handle the data as natively as possible to aid with execution speed, as I have a sizeable routine that I run once every 10ms, and I would potentially like to be able to run this even more often if I can.
 
So Im looking to replace the PIC24 with a PIC32(MX) series, and Ive got myself a PIC32MX170F256B for playing around with at the moment to familiarise myself before I go full speed into this.
 
Ive had this burning question that over time I havent really been able to find an answer for, and that is basically "how fast can code execute out of Flash"? Im maxing out the system clock, the PIC24 is running at its limit of 72MHz, and I will be looking to have the PIC32 run at full speed too. I intend to choose a slightly more beefy PIC32 for my final design and run up to 100MHz or more.
 
But thats pretty quick, and I dont suppose the Flash can necessarily supply data that quickly... or can it?

A very simple test I did today involved a load of LATAINV statements back to back, toggling a pin, which I then observed with my logic analyser. Much to my surprise I saw this executing at what would seem to be full speed. The period between edges (once I enabled optimisation to remove a lot of unneccesary instructions) was 20ns, which correlates with the frequency I have the PIC32 running at (48MHz).

I also tried running the toggle code out of RAM using the __longramfunc__ attribute, but even more to my surprise this was actually slower by 2-3x. Edges are now toggling once every 40-50ns.

I always assumed that SRAM was going to be faster than Flash at this...
 
Does anyone know of any sources where I can find some data about the speed these different memories can run at, or what other things I might need to know about on the PIC32MX series about code execution, e.g. caching perhaps? Im still digging through the vast number of datasheets/FRMs for this series, so maybe I have missed or havent yet got to something that is in there (maybe you can point me to specifics).
 
Thanks!
#1
crosland
Super Member
  • Total Posts : 1886
  • Reward points : 0
  • Joined: 2005/05/10 10:55:05
  • Location: Warks, UK
  • Status: offline
Re: Code execution from Flash and RAM (max speed?) 2020/03/26 05:43:45 (permalink)
0
You might need to look at the cache setup.
#2
jdeguire
Super Member
  • Total Posts : 520
  • Reward points : 0
  • Joined: 2012/01/13 07:48:44
  • Location: United States
  • Status: offline
Re: Code execution from Flash and RAM (max speed?) 2020/03/26 09:59:54 (permalink) ☄ Helpfulby toms 2020/03/26 15:36:45
0
You can look at the Electrical Characteristics section of the datasheets to see the number of flash wait states required to access flash memory at a given processor frequency.  Just search for "Wait States" in that section to find it.  Some of the slower PIC32 devices do not require any wait states--meaning the flash can run at the max-rated CPU frequency--and so those datasheets will not have any mention of flash wait states.  If your device does require flash wait states, then you would set that in the CHECON register.
 
As for RAM, you should also check that you are setting the number of RAM wait states to 0 as it defaults to whatever the max is.  Set this in the BMXCON register.  Also, the __longramfunc__ attribute is probably putting a longer function-call assembly sequence into your code in order to allow the call to occur to a far away address.  This would explain your slowdown if you are calling your function in a loop.
#3
NorthGuy
Super Member
  • Total Posts : 5970
  • Reward points : 0
  • Joined: 2014/02/23 14:23:23
  • Location: Northern Canada
  • Status: offline
Re: Code execution from Flash and RAM (max speed?) 2020/03/26 12:49:43 (permalink) ☄ Helpfulby toms 2020/03/26 16:04:12
0
When you execute from RAM, the command fetching and data operations will compete for the same bus, which may make things slower.
 
Unlike PIC24, it is generally impossible to predict PIC32 speed as it depends on many factors, such as caching, bus contention, peripheral clocks etc.
 
Whether you can get any speed benefits from "native" 32-bit operations of PIC32 depends on what your routine does and how it is written.
 
#4
toms
Starting Member
  • Total Posts : 86
  • Reward points : 0
  • Joined: 2006/03/07 18:06:24
  • Location: London, UK
  • Status: offline
Re: Code execution from Flash and RAM (max speed?) 2020/03/26 15:36:32 (permalink)
0
jdeguire
You can look at the Electrical Characteristics section of the datasheets to see the number of flash wait states required to access flash memory at a given processor frequency.  Just search for "Wait States" in that section to find it.  Some of the slower PIC32 devices do not require any wait states--meaning the flash can run at the max-rated CPU frequency--and so those datasheets will not have any mention of flash wait states.  If your device does require flash wait states, then you would set that in the CHECON register.
 
As for RAM, you should also check that you are setting the number of RAM wait states to 0 as it defaults to whatever the max is.  Set this in the BMXCON register.  Also, the __longramfunc__ attribute is probably putting a longer function-call assembly sequence into your code in order to allow the call to occur to a far away address.  This would explain your slowdown if you are calling your function in a loop.


Ah yeah, brilliant.

I noticed just before going to bed last night that a higher spec PIC32 had a section about wait states, but my current PIC32 does not, so I guess its Flash can run that fast afterall. Faster PICs are then going to need wait states, so I imagine theres going to be some combination of tradeoffs between MHz, wait states, cache etc.

The function I placed in RAM was an ISR, so it only gets called when the interrupt fires (was playing around with the Core Timer), so not actually calling it from elsewhere or in a loop, so hopefully minimal overhead.
 
Thanks!
#5
toms
Starting Member
  • Total Posts : 86
  • Reward points : 0
  • Joined: 2006/03/07 18:06:24
  • Location: London, UK
  • Status: offline
Re: Code execution from Flash and RAM (max speed?) 2020/03/26 16:03:54 (permalink)
0
NorthGuy
When you execute from RAM, the command fetching and data operations will compete for the same bus, which may make things slower.
 
Unlike PIC24, it is generally impossible to predict PIC32 speed as it depends on many factors, such as caching, bus contention, peripheral clocks etc.
 
Whether you can get any speed benefits from "native" 32-bit operations of PIC32 depends on what your routine does and how it is written.



Yeah I can see this isnt going to be quite so clear cut.
 
Maybe I could assign some particular variables to specific registers, as there are some that are fiddled a lot more than others, so if they can stay out of RAM that might help a bit. Ive never done anything like before, so no idea if thats even a good idea...
 
I will need to write some tests to compare execution speed to work out whether theres much point moving away from the PIC24 - I suppose its entirely possible that I may just end up with similar performance. Still a lot more to discover. Im stuck in Australia for another month due to flight cancellations, so plenty of time to look into this. pink: pink
#6
ric
Super Member
  • Total Posts : 26159
  • Reward points : 0
  • Joined: 2003/11/07 12:41:26
  • Location: Australia, Melbourne
  • Status: online
Re: Code execution from Flash and RAM (max speed?) 2020/03/26 16:05:55 (permalink)
0
toms
The function I placed in RAM was an ISR, so it only gets called when the interrupt fires (was playing around with the Core Timer), so not actually calling it from elsewhere or in a loop, so hopefully minimal overhead.

Wouldn't it be the same overhead, so would be adding a lot of latency to all your interrupts...
 

I also post at: PicForum
Links to useful PIC information: http://picforum.ric323.co...opic.php?f=59&t=15
NEW USERS: Posting images, links and code - workaround for restrictions.
To get a useful answer, always state which PIC you are using!
#7
toms
Starting Member
  • Total Posts : 86
  • Reward points : 0
  • Joined: 2006/03/07 18:06:24
  • Location: London, UK
  • Status: offline
Re: Code execution from Flash and RAM (max speed?) 2020/03/26 16:46:56 (permalink)
0
crosland
You might need to look at the cache setup.

The current PIC32 that Im playing with doesnt seem to have any cache, but I can see it will be a thing for higher spec PIC32's. I might want to get myself a dev board with a higher spec'd MCU to play with as that may be more appropriate to play with in this respect.
#8
toms
Starting Member
  • Total Posts : 86
  • Reward points : 0
  • Joined: 2006/03/07 18:06:24
  • Location: London, UK
  • Status: offline
Re: Code execution from Flash and RAM (max speed?) 2020/03/26 16:52:12 (permalink)
0
ric
Wouldn't it be the same overhead, so would be adding a lot of latency to all your interrupts...

 
Yeah probably. This wasnt any specific test, just playing with __longramfunc__.
#9
simong123
Lab Member No. 003
  • Total Posts : 1374
  • Reward points : 0
  • Joined: 2012/02/07 18:21:03
  • Location: Future Gadget Lab (UK Branch)
  • Status: online
Re: Code execution from Flash and RAM (max speed?) 2020/03/26 16:55:37 (permalink)
4 (1)
ric
Wouldn't it be the same overhead, so would be adding a lot of latency to all your interrupts...

Actually it's even worse. The smaller MX's don't have shadow registers, so not only are all registers used in the ISR having to be pushed to the stack, for a ramfunc this is happening over the same bus as the instruction fetch. So I would definately not use ramfuncs on the small MX's with 0 wait states.
 
The larger MX's have a 128bit bus to the flash and prefetch cache* so most of the wait states to flash will be hidden. The benifit of ram functions will be minimal in most circumstances.
The shadow register set in the bigger MX's can be used to speed up critical interrupts.
 
The MZ's have a proper Icache, which can be prefetched and locked as long as the functions aren't too big. They also have auto prologue/epilogue which speeds up interrupt latency immensly (measured ~11clks @200MHz). However we are far from PIC24 territory here.
 
I think the 'MXs will be faster if the OP's code uses lots of 32bit multiply/divides, but not much faster, if at all, otherwise.
 
*But watch out for the errata on some MX's with prefetch cache. Some are nasty.
#10
toms
Starting Member
  • Total Posts : 86
  • Reward points : 0
  • Joined: 2006/03/07 18:06:24
  • Location: London, UK
  • Status: offline
Re: Code execution from Flash and RAM (max speed?) 2020/03/26 17:32:51 (permalink)
0
No arithmetic, it's all input and output via shift registers in 32bit chunks with a lot of bit manipulation.
#11
ric
Super Member
  • Total Posts : 26159
  • Reward points : 0
  • Joined: 2003/11/07 12:41:26
  • Location: Australia, Melbourne
  • Status: online
Re: Code execution from Flash and RAM (max speed?) 2020/03/26 17:39:59 (permalink)
4 (1)
My gut feeling is that 16 bit code in the PIC24 could be tweaked to handle that data more efficiently than the PIC32 ...
 

I also post at: PicForum
Links to useful PIC information: http://picforum.ric323.co...opic.php?f=59&t=15
NEW USERS: Posting images, links and code - workaround for restrictions.
To get a useful answer, always state which PIC you are using!
#12
Jump to:
© 2020 APG vNext Commercial Version 4.5