• AVR Freaks

Hot!Simulator vs. Stopwatch: Cycle Counts Wrong for btss and mov.d (dsPIC33EP32MC202).

Author
GlennP
Super Member
  • Total Posts : 844
  • Reward points : 0
  • Joined: 2009/03/29 15:04:55
  • Location: El Paso County, CO, USA
  • Status: offline
2020/09/12 04:36:05 (permalink)
5 (1)

Simulator vs. Stopwatch: Cycle Counts Wrong for btss and mov.d (dsPIC33EP32MC202).

Attached is a complete MPLabX Project which seems to show excessive cycle counts for the two instructions in the Thread title.  Specifically, a btss when the skip IS taken shows three (3) cycles.  Similarly, a mov.d W0, W0 show three (3) cycles.  My reading of DS70000157G (16-Bit MCU and DC Programmer's Reference Manual) says both should take two (2) cycles.
 
I suspect all skip instructions will have similar behavior, but only btss was tested.
 
Detailed instructions for reproducing the issue are in comments in the assembler file.
 
I've also attached an image with the code and stopwatch when "step into" is done after the initial breakpoint.
 
MPLabX v5.40 and XC16 v1.60 (Both Current).
 
EDIT 1: On re-reading 3.2.1, Note 3, I guess the STATUS register bit tests take an extra cycle for the MCU in question.  [The wording could be improved for me.]  But as far as I can tell, the mov.d issue is still valid.
 
EDIT 2: On re-re-reading 3.2.1, Note 3, I don't understand why the skip (not taken) shows only one cycle.  Note 3 implies a read of STATUS will take two cycles.  See lines 31 and 32 (cycles 69 and 70) of the code.
 
GlennP
post edited by GlennP - 2020/09/12 05:01:39

Attached Image(s)

#1

13 Replies Related Threads

    1and0
    Access is Denied
    • Total Posts : 11808
    • Reward points : 0
    • Joined: 2007/05/06 12:03:20
    • Location: Harry's Gray Matter
    • Status: offline
    Re: Simulator vs. Stopwatch: Cycle Counts Wrong for btss and mov.d (dsPIC33EP32MC202). 2020/09/12 10:50:03 (permalink)
    +1 (1)
    glennp17321
    Attached is a complete MPLabX Project which seems to show excessive cycle counts for the two instructions in the Thread title.  Specifically, a btss when the skip IS taken shows three (3) cycles.  Similarly, a mov.d W0, W0 show three (3) cycles.  My reading of DS70000157G (16-Bit MCU and DC Programmer's Reference Manual) says both should take two (2) cycles.

    Both "btss SR,#x" when the skip is taken and "mov.d w0,w0" should take two (2) cycles each.
     

    EDIT 1: On re-reading 3.2.1, Note 3, I guess the STATUS register bit tests take an extra cycle for the MCU in question.  [The wording could be improved for me.]  But as far as I can tell, the mov.d issue is still valid.
     
    EDIT 2: On re-re-reading 3.2.1, Note 3, I don't understand why the skip (not taken) shows only one cycle.  Note 3 implies a read of STATUS will take two cycles.  See lines 31 and 32 (cycles 69 and 70) of the code.

    The SR register is a CPU Special Function Register. The non-CPU "STATUS" register referred to in that Note is the peripheral status.
     
    Edit: The use of all uppercase letters for "STATUS" in the manual is misleading. :(
     
    post edited by 1and0 - 2020/09/12 11:07:07
    #2
    GlennP
    Super Member
    • Total Posts : 844
    • Reward points : 0
    • Joined: 2009/03/29 15:04:55
    • Location: El Paso County, CO, USA
    • Status: offline
    Re: Simulator vs. Stopwatch: Cycle Counts Wrong for btss and mov.d (dsPIC33EP32MC202). 2020/09/12 14:21:40 (permalink)
    0
    1and0 ...
    The SR register is a CPU Special Function Register. The non-CPU "STATUS" register referred to in that Note is the peripheral status.
     
    Edit: The use of all uppercase letters for "STATUS" in the manual is misleading. :(



    I went back and forth on the interpretation of "STATUS" and finally decided the case made the case.  But now I think I was wrong and you are correct.  "SR" is a CPU register.
     
    The simulator/stopwatch combination is inconsistent for bit-test-skip instructions.  Not only does the SR have issues (1 vs 3 instead of 1 vs 2 or 2 vs 3), but when I changed it to W0, the "skip taken" case takes one (1) cycle, not two.
     
    This is likely a low-priority issue for MCHP, but it really makes stating timing for different sub-families difficult.  I have an update of the timing tables for Unsigned 32 by 32 divide that I don't trust.  I think I'll post it anyway (with a caveat) as it has some clarifications in other comments that seem worthwhile.
     
    GP
    #3
    dan1138
    Super Member
    • Total Posts : 4184
    • Reward points : 0
    • Joined: 2007/02/21 23:04:16
    • Location: 0
    • Status: offline
    Re: Simulator vs. Stopwatch: Cycle Counts Wrong for btss and mov.d (dsPIC33EP32MC202). 2020/09/13 14:16:36 (permalink)
    +1 (1)
    glennp17321
    The simulator/stopwatch combination is inconsistent for bit-test-skip instructions. 

    Don't sugar coat it the MPLABX simulation tool is crap when it comes to counting cycles.
    For all of the PIC controllers where instruction cycle counting make sense (everything without virtual memory or a L1 instruction cache) the simulator will get the wrong count versus the real hardware.
     
    At present I know of two methods:
    1. Write an instrumented unit test and run it on real hardware.
    2. Count the instruction cycles for each opcode in the function by hand per execution target.
    Anyone got another method that works?
    #4
    GlennP
    Super Member
    • Total Posts : 844
    • Reward points : 0
    • Joined: 2009/03/29 15:04:55
    • Location: El Paso County, CO, USA
    • Status: offline
    Re: Simulator vs. Stopwatch: Cycle Counts Wrong for btss and mov.d (dsPIC33EP32MC202). 2020/09/13 21:35:58 (permalink)
    0
    dan1138 ...
    Don't sugar coat it the MPLABX simulation tool is crap when it comes to counting cycles.
    For all of the PIC controllers where instruction cycle counting make sense (everything without virtual memory or a L1 instruction cache) the simulator will get the wrong count versus the real hardware.
     
    At present I know of two methods:
    1. Write an instrumented unit test and run it on real hardware.
    2. Count the instruction cycles for each opcode in the function by hand per execution target.
    Anyone got another method that works?

    Not me.
     
    I now agree with Dan: the simulator is not useful for counting cycles.
     
    Both of Dan's suggested methods are difficult when the objective is to list the cycle counts for six or seven different sub-families.  Also, I don't know how to get cycle counts in "instrumented unit tests with real hardware" unless it's by flipping a port pin and tracing on an oscilloscope or a logic analyzer.  I can do that but not everyone can.  But it's a very difficult way to compare sub-families as I don't usually have the parts on hand and in a breadboard (or other "kit").
     
    Having only recently tried to get cycle counts using the simulator, I now see it was a distracting approach.  I agree that in its current state, the simulator is not helpful - in fact I wasted a bunch of time trying to reconcile the results with my manual counts.  I was surprised my manual counts were superior - I thought I'd correct my manual counts with the simulator/stopwatch not the other way around.
     
    I will say I have found the 16-bit simulator more accurate than the 8-bit simulator in some ways.  Specifically, the timers (in the modes I used them at least) seem to be decently modeled.
     
    Anyway, this thread can be used by MCHP for simulator feedback if it wishes.  I have had such poor results with the formal mechanism (create a technical support case) than I no longer use it.  Just informing MCHP of a simple documentation error (two statements conflicted) took 18 messages back and forth.  The assigned support employee asked for a code example!
     
    GlennP
    #5
    GeorgePauley
    Moderator
    • Total Posts : 1278
    • Reward points : 0
    • Joined: 2009/12/01 13:59:30
    • Location: Chandler AZ
    • Status: offline
    Re: Simulator vs. Stopwatch: Cycle Counts Wrong for btss and mov.d (dsPIC33EP32MC202). 2020/09/14 08:55:17 (permalink)
    +1 (3)
    You all are just scratching the surface of the issues with timing.  sad: sad

    As you have already noticed, the datasheets are often... confusing.  It's easy to miss all those footnotes.  Different versions of datasheets may have different notes.  And don't get me started on datasheet versus family reference.  As professional (simulator developers) we SHOULD get it right.  Realistically, we're NEVER going to get it all right.

    We used to worry about this more.  But as we dug deeper and deeper we finally gave up in despair.  Different memories will have different delays for reading and writing.  Clocks drift.  Peripherals can end up having BUS collisions.  DMA brings a whole new set of timing issues.  Timing stacked interrupts is... interesting.  When we got to caching, we just gave up.
     
    For awhile we thought we would just use real devices to answer our timing questions.  But as you have observed, this is a lot harder than one would think.

    These days we code towards "reasonably" accurate timing.  And yeah that's vague.  I'd describe it as attempting to do the 20% of the work that gets us 80% of the way there.  We will apply cycle times out of the datasheet, but won't spend 2 weeks trying to handle delay cycles caused by CPU vs SFR (vs out of bank SFR) etc.

    I wished we could be do a better job here.  But there are real time and money constraints on the team working on a simulator that we give away for free.  The simulator is great for rapidly developing algorithms.  It's amazing at automated regression testing.  But you need to test on real devices before going to production.

    We have looked at technology that builds highly accurate simulators based on the RTL code used to create the actual silicon.  What we found was that accuracy comes with a heavy speed price.  We were getting way less than 100 instructions a second.  For most users this would far more unacceptable than having cycle timing that is only 95% accurate.
    #6
    GlennP
    Super Member
    • Total Posts : 844
    • Reward points : 0
    • Joined: 2009/03/29 15:04:55
    • Location: El Paso County, CO, USA
    • Status: offline
    Re: Simulator vs. Stopwatch: Cycle Counts Wrong for btss and mov.d (dsPIC33EP32MC202). 2020/09/14 10:34:30 (permalink)
    +1 (1)
    GeorgePauley ...
    We used to worry about this more.  But as we dug deeper and deeper we finally gave up in despair.  Different memories will have different delays for reading and writing.  Clocks drift.  Peripherals can end up having BUS collisions.  DMA brings a whole new set of timing issues.  Timing stacked interrupts is... interesting.  When we got to caching, we just gave up.

     
    These excuses are just that.  None of those issues apply to 1) the MPUs in question and 2) the task (which is comparing algorithms - not estimating real-world performance).
     
    Normally I consider George reasonable.  Here I do not.  And this is not 95% accurate.  The error is 50% when you say a btss (skip taken) is one cycle (it's two).  These are flat-out bugs.  If they are not going to be addressed, then get rid of the sham of cycle counts.
     
    GlennP
     
    #7
    Antipodean
    Super Member
    • Total Posts : 1999
    • Reward points : 0
    • Joined: 2008/12/09 10:19:08
    • Location: Didcot, United Kingdom
    • Status: offline
    Re: Simulator vs. Stopwatch: Cycle Counts Wrong for btss and mov.d (dsPIC33EP32MC202). 2020/09/14 11:58:40 (permalink)
    -1 (1)
    glennp17321
    GeorgePauley ...
    We used to worry about this more.  But as we dug deeper and deeper we finally gave up in despair.  Different memories will have different delays for reading and writing.  Clocks drift.  Peripherals can end up having BUS collisions.  DMA brings a whole new set of timing issues.  Timing stacked interrupts is... interesting.  When we got to caching, we just gave up.

     
    These excuses are just that.  None of those issues apply to 1) the MPUs in question and 2) the task (which is comparing algorithms - not estimating real-world performance).
    Normally I consider George reasonable.  Here I do not.  And this is not 95% accurate.  The error is 50% when you say a btss (skip taken) is one cycle (it's two).  These are flat-out bugs.  If they are not going to be addressed, then get rid of the sham of cycle counts.
    GlennP

     
    Fine, write your own simulator. Having tried to do so in the past I can tell you just getting it to simulate the instructions correctly is hard enough without even trying to deal with cycle counts.
     

    Do not use my alias in your message body when replying, your message will disappear ...

    Alan
    #8
    dan1138
    Super Member
    • Total Posts : 4184
    • Reward points : 0
    • Joined: 2007/02/21 23:04:16
    • Location: 0
    • Status: offline
    Re: Simulator vs. Stopwatch: Cycle Counts Wrong for btss and mov.d (dsPIC33EP32MC202). 2020/09/14 13:03:19 (permalink)
    0
    Antïpodean
    GeorgePauley ...
    We used to worry about this more.  But as we dug deeper and deeper we finally gave up in despair.  Different memories will have different delays for reading and writing.  Clocks drift.  Peripherals can end up having BUS collisions.  DMA brings a whole new set of timing issues.  Timing stacked interrupts is... interesting.  When we got to caching, we just gave up.

    glennp17321
    These excuses are just that.  None of those issues apply to:
    1. the MPUs in question and
    2. the task (which is comparing algorithms - not estimating real-world performance).
    Normally I consider George reasonable.  Here I do not.  And this is not 95% accurate.  The error is 50% when you say a btss (skip taken) is one cycle (it's two).  These are flat-out bugs.  If they are not going to be addressed, then get rid of the sham of cycle counts.
    GlennP

    Fine, write your own simulator. Having tried to do so in the past I can tell you just getting it to simulate the instructions correctly is hard enough without even trying to deal with cycle counts.

     
    Alan (aka Antïpodean),
     
    I disagree in that Microchip made this mess and they should clean it up or throw the simulation tool in the bin and never offer a simulator.

    This will open a space in the market for third parties to sell a proper simulator for a profit.
    #9
    GlennP
    Super Member
    • Total Posts : 844
    • Reward points : 0
    • Joined: 2009/03/29 15:04:55
    • Location: El Paso County, CO, USA
    • Status: offline
    Re: Simulator vs. Stopwatch: Cycle Counts Wrong for btss and mov.d (dsPIC33EP32MC202). 2020/09/14 17:02:54 (permalink)
    0
    dan1138
    Antïpodean
    GeorgePauley ...
    We used to worry about this more.  But as we dug deeper and deeper we finally gave up in despair.  Different memories will have different delays for reading and writing.  Clocks drift.  Peripherals can end up having BUS collisions.  DMA brings a whole new set of timing issues.  Timing stacked interrupts is... interesting.  When we got to caching, we just gave up.

    glennp17321
    These excuses are just that.  None of those issues apply to:
    1. the MPUs in question and
    2. the task (which is comparing algorithms - not estimating real-world performance).
    Normally I consider George reasonable.  Here I do not.  And this is not 95% accurate.  The error is 50% when you say a btss (skip taken) is one cycle (it's two).  These are flat-out bugs.  If they are not going to be addressed, then get rid of the sham of cycle counts.
    GlennP

    Fine, write your own simulator. Having tried to do so in the past I can tell you just getting it to simulate the instructions correctly is hard enough without even trying to deal with cycle counts.

    Alan (aka Antïpodean),
     
    I disagree in that Microchip made this mess and they should clean it up or throw the simulation tool in the bin and never offer a simulator.

    This will open a space in the market for third parties to sell a proper simulator for a profit.

     
    Mr. Anti: I agree simulators are difficult.  But getting cycle counts correct in the simple world of the 16-bit PICs is easy.  For the complicated worlds (caches, DMA, ...) just make stated assumptions to allow algorithms to be compared (for instance: No DMA activity and All Hits in L1 cache and ...).  Hardware is always the ultimate simulator but isn't easy if one's objective is to write code that runs well across a broad swath of semi-compatible sub-families.
     
    Mr. Dan: I'd leave the simulator (I think it does a fair-to-good job on results) but I suggest MCHP withdraw the cycle count (stopwatch) part as it's poor.  It's not all-or-nothing.
     
    But I started the thread with two three purposes:
    0. To make sure I wasn't misunderstanding the results.
    1. To warn other users of the issues.
    2. To see if MCHP was aware of the issues and might fix them (yes and no I guess).
     
    One additional outcome was interesting.  The documentation is contradictory and confusing - even to the simulator folks at MCHP.  That too seems to be too difficult to rectify.
     
    Overall, an illuminating set of responses.
     
    GlennP
     
    Edit 1: Added the - typo.
     
    post edited by GlennP - 2020/09/15 14:34:22
    #10
    GeorgePauley
    Moderator
    • Total Posts : 1278
    • Reward points : 0
    • Joined: 2009/12/01 13:59:30
    • Location: Chandler AZ
    • Status: offline
    Re: Simulator vs. Stopwatch: Cycle Counts Wrong for btss and mov.d (dsPIC33EP32MC202). 2020/09/15 10:37:06 (permalink)
    +1 (1)
    glennp17321
    ...
    These excuses are just that.  None of those issues apply to 1) the MPUs in question and 2) the task (which is comparing algorithms - not estimating real-world performance).
     
    Normally I consider George reasonable.  Here I do not.  And this is not 95% accurate.  The error is 50% when you say a btss (skip taken) is one cycle (it's two).  These are flat-out bugs.  If they are not going to be addressed, then get rid of the sham of cycle counts.

     
    I knew this post would generate some heat.  mr green: mr green

    Unfortunately I don't always have time to craft the perfect response.  But I do try to respond and inform.  Even (especially) when the answer is potentially an unpopular one.

    The point I wanted to make is that one cannot rely on cycle timing accuracy within the simulator.

    I then tried to explain why Microchip and the simulator team had taken this position.  That explanation was probably a mistake, and I certainly could have done a better job.  My thought is that the more users understand about the simulator, including underlying design decisions, the more likely the simulator can be used effectively.

    Yes, cycle accuracy is easier for 8-bit devices, harder for 16-bit devices, and outrageously difficult for 32 bit devices.  When we realized that we couldn't be accurate with 32 bit devices, and that the simulator was still useful despite that, it caused us to rethink the importance of cycle accuracy.  While accuracy is certainly desirable, it simply, and demonstrably, isn't the most important design consideration.

    I have written a JIRA to address the BTSS cycle accuracy, and we will likely fix it at some point in the future.

    But for now we have multiple device families and architectures that have no simulator support at all, and critical, often used, peripherals are missing across all device families.  Those issues should, and will, be addressed before the BTSS cycle issue.
    post edited by GeorgePauley - 2020/09/16 08:11:20
    #11
    GeorgePauley
    Moderator
    • Total Posts : 1278
    • Reward points : 0
    • Joined: 2009/12/01 13:59:30
    • Location: Chandler AZ
    • Status: offline
    Re: Simulator vs. Stopwatch: Cycle Counts Wrong for btss and mov.d (dsPIC33EP32MC202). 2020/09/15 11:03:20 (permalink)
    +1 (1)
    dan1138
    This will open a space in the market for third parties to sell a proper simulator for a profit.



    I love this statement as it touches upon so many issues.

    There are many at Microchip (myself included) who have considered discontinuing the simulator.  It is expensive, and we keep sacrificing accuracy and features in the race to support ever more devices and architectures.  But the truth is that the simulator, despite its flaws, is very useful, and is used by (literally) tens of thousands of our customers. 

    Microchip has always welcomed and supported third party tools.  And we would enthusiastically cheer a third party simulator. (Certainly Microchip isn't going to lose any money to a third party simulator.mr green: mr green)  And there are a few simulators out there, but they typically only support a small subset of the devices, and their features, and don't integrate with MPLAB X.
     
    Perhaps providing a free simulator does act as a barrier to a third party provider entering the market.  But we can't just abandon the multitude of existing users who are right now using the simulator to advance their projects.  So we keep on keeping on, developing the simulator the best we can with the resources we have.  And the longer we keep adding to the simulator, the harder and more expensive it becomes to replace it (for either Microchip or a third party.)




    #12
    GlennP
    Super Member
    • Total Posts : 844
    • Reward points : 0
    • Joined: 2009/03/29 15:04:55
    • Location: El Paso County, CO, USA
    • Status: offline
    Re: Simulator vs. Stopwatch: Cycle Counts Wrong for btss and mov.d (dsPIC33EP32MC202). 2020/09/16 12:41:58 (permalink)
    +1 (1)
    GeorgePauley ...
    I have written a JIRA to address the BTSS cycle accuracy, and we will likely fix it at some point in the future.
    ...



    The mov.d is an even easier fix (only one path).  Might you ask for that one too?
     
    GP
    #13
    GeorgePauley
    Moderator
    • Total Posts : 1278
    • Reward points : 0
    • Joined: 2009/12/01 13:59:30
    • Location: Chandler AZ
    • Status: offline
    Re: Simulator vs. Stopwatch: Cycle Counts Wrong for btss and mov.d (dsPIC33EP32MC202). 2020/09/17 08:16:13 (permalink)
    +1 (1)
    mov.d is in the same JIRA, which is linked to this forum thread.  We'll examine the related instructions to make sure we fix them all.
    #14
    Jump to:
    © 2021 APG vNext Commercial Version 4.5