Hot!CAN bus with some nodes not responding

Author
Mamonetti
Starting Member
  • Total Posts : 32
  • Reward points : 0
  • Joined: 2006/09/11 13:46:32
  • Location: 0
  • Status: offline
2018/07/27 01:37:10 (permalink)
4 (1)

CAN bus with some nodes not responding

Hi


I'm developing a CANopen master, which is going to manage a number of CANopen slaves (one or more).


Regardless of other details of the protocol, it uses a master / multislave topology, where the master can send commands such us RESET or START to slaves based on their addresses, encoded in the SID.

As far as I know, in CAN bus a message sent by a node must be acknowledged by at least another node, and I've seen in some cases the TX FIFO of the master full of messages pending to be sent, apparently because no ACK has been received, or that's what it looks like.

Another scenario similar to this happens if I configure a slave in autobaudrate mode, because it usually needs to analyse several packets so as to detect the baudrate, and if there's just one in the bus while the TX FIFO blocked, no detection can be performed.

So, in order to prevent the master from getting blocked when a slave doesn't respond, my idea is to use a 1-message TX FIFO and abort the current transfer if I detect a timeout. When this happens, the steps I think I should follow are (PIC32MX):
1. Set CxCON.ABAT
2. Wait until CxCON.ABAT is 0 (the transfer has been aborted)
3. Set CxFIFOCONy.FRESET
4. Wait until CxFIFOCONy.FRESET is 0 (the FIFO has been resetted)

Using a 1-message FIFO would make easy to handle all the pointer stuff and would allow me to reset the FIFO without having to pay attention to which slave node is the one not working properly or what happens with the nodes that have already enqueued messages. This would need some extra control from upper layers, I know, but that's not a problem.

And the questions:
- Am I right when I say the FIFO is not consuming packets as a consequence of not receiving an ACK?
- Is this the right way to abort a transfer?
- What is expected to happen with the message being sent? is it going to be there in the bus "forever"?

Regards
 
post edited by Mamonetti - 2018/07/27 06:56:44
#1

17 Replies Related Threads

    DarioG
    Allmächtig.
    • Total Posts : 54081
    • Reward points : 0
    • Joined: 2006/02/25 08:58:22
    • Location: Oesterreich
    • Status: offline
    Re: CAN bus with some nodes not responding (bring it here) 2018/07/27 02:36:07 (permalink)
    0
    no admin, no hope.

    GENOVA :D :D ! GODO
    #2
    jcandle
    Super Member
    • Total Posts : 344
    • Reward points : 0
    • Joined: 2011/09/19 22:01:53
    • Location: Rockledge, FL
    • Status: offline
    Re: CAN bus with some nodes not responding 2018/07/27 20:47:58 (permalink)
    3 (1)
    I have not used the CAN on this chip yet, but 
    1) if the transmitting node wins the arbitration in the MSID, which it will if it is the only transmitting or if it has the first zero bit in the MSID, it finishes transmission.
    2) if no on acks, it retries (at hw layer) intermittently.
    3) After some number of un ack'ed transmissions, it reaches bus light and then bus heavy error states and eventually takes itself off the bus.
    4) If you are sending to autobaud, that one packet will chip until a recipient syncs up and ack's.
    5) I *think* an ack only happens if a recipient likes that msid in its filters... so make everyone have a filter that accepts the autobaud msid and then have whatever other filters.
     
    That said, set the system up with a low fixed baud rate and prove that out.  Then speed it up to understand your topology limits.  Find a CAN calculator to make sure you have the best bit timing for your cable lengths etc.  Only then should you mess with autobauding and with devices coming and going.
    #3
    Mamonetti
    Starting Member
    • Total Posts : 32
    • Reward points : 0
    • Joined: 2006/09/11 13:46:32
    • Location: 0
    • Status: offline
    Re: CAN bus with some nodes not responding 2018/07/28 13:46:36 (permalink)
    3 (1)
    According to this, it looks like the master is retrying to send the same packet and eventually it gets disconnected from the bus. This would be coherent with the fact that the TX FIFO keeps getting more and more populated until it's full (I noticed this in a 32-message FIFO, when I was inserting messages while the FIFO wasn't full, and I could see the FIFO nor full neither empty for some time, but as the packets weren't being sent it ended up getting full).
     
    The question here would be how to fully reset the CAN controller in order to start from scratch when this problem is detected. I can tell switching from normal operation mode to disable and to normal operation again doesn't make the trick. I'll try to combine this with other FIFO-related actions to see whether I can bring the bus back to life or not.
     
    Regards
    post edited by Mamonetti - 2018/07/29 06:12:17
    #4
    jcandle
    Super Member
    • Total Posts : 344
    • Reward points : 0
    • Joined: 2011/09/19 22:01:53
    • Location: Rockledge, FL
    • Status: offline
    Re: CAN bus with some nodes not responding 2018/07/30 04:16:27 (permalink)
    3 (1)
    really the problem is how to stop the problem from happening.  Again, yo have some complications like autobauding that should be gotten rid of first to test the basic physical layer and node filters, etc.
     
    Just resetting the CAN and starting over is the firmware equivalent of the definition of insanity - doing the same thing over and over but expecting a different result.
     
    yes, once you know it usually works, then resetting makes sense in the anomalous error case.
    #5
    crosland
    Super Member
    • Total Posts : 1284
    • Reward points : 0
    • Joined: 2005/05/10 10:55:05
    • Location: Bucks, UK
    • Status: offline
    Re: CAN bus with some nodes not responding 2018/07/30 04:29:28 (permalink)
    3 (1)
    A slave doing autobaud detection must be in listen only mode.
     
    There must be at least two other CAN nodes operating normally on the bus.
     
    The autobaud'ing slave can then listen to (but plays no part in) the messages on the bus until it receives a valid message and determines it has chosen the correct baud rate. Only then can it start interacting as normal on the bus.
    #6
    Mamonetti
    Starting Member
    • Total Posts : 32
    • Reward points : 0
    • Joined: 2006/09/11 13:46:32
    • Location: 0
    • Status: offline
    Re: CAN bus with some nodes not responding 2018/07/30 07:36:57 (permalink)
    0
    How about the master itself being the only one sending data to the bus in order for the slave (assuming there's just one) to detect the baudrate by listening to the master packets? Is this really possible based on the CAN hardware layer which is going to be waiting for an ACK that's not going to receive? If the answer is no, I'll have to set a well known slave speed.
     
    On the other hand, I would also like to know how to get rid of real time slave malfunctions or disconnections. That is, let's say the baudrate problem is solved (maybe by disabling the auto baudrate) and for some reason a slave doesn't respond for some time. Detecting the problem is easy, but I think I tried to follow the steps I mentioned in the first post to abort the current transfer and I couldn't clean the TX FIFO.
     
    So, is this sequence ok? How can I make sure the FIFO has gone back to square 1? I remember having tried this sequence but the TXEMPTY flag wasn't set, so the FIFO had still some message pending.
     
    Regards
     
    #7
    crosland
    Super Member
    • Total Posts : 1284
    • Reward points : 0
    • Joined: 2005/05/10 10:55:05
    • Location: Bucks, UK
    • Status: offline
    Re: CAN bus with some nodes not responding 2018/07/30 10:01:58 (permalink)
    0
    Mamonetti
    How about the master itself being the only one sending data to the bus in order for the slave (assuming there's just one) to detect the baudrate by listening to the master packets? Is this really possible based on the CAN hardware layer which is going to be waiting for an ACK that's not going to receive? If the answer is no, I'll have to set a well known slave speed.

     
    re-read what I wrote.
     
    You must have *at least* two working modules. If you only have two, why are you even bothering with CAN?
     
    On the other hand, I would also like to know how to get rid of real time slave malfunctions or disconnections. That is, let's say the baudrate problem is solved (maybe by disabling the auto baudrate) and for some reason a slave doesn't respond for some time. Detecting the problem is easy, but I think I tried to follow the steps I mentioned in the first post to abort the current transfer and I couldn't clean the TX FIFO.

     
    A malfunction could take the whole bus down, then you are stuck.
     
    If you have at least two slaves then one of the slaves can be replaced with a new slave that can do autobaud detection by listening to the traffic between the master and the other slave. Once the correct baud rate is determined the new slave can announce itself. How it does that and how you detect the failed slave is something for your chosen protocol or application.
     
     
     
     
    #8
    jcandle
    Super Member
    • Total Posts : 344
    • Reward points : 0
    • Joined: 2011/09/19 22:01:53
    • Location: Rockledge, FL
    • Status: offline
    Re: CAN bus with some nodes not responding 2018/07/30 14:28:02 (permalink)
    3 (1)
    Crossland:
    You must have *at least* two working modules. If you only have two, why are you even bothering with CAN?
    CAN is a hardware layer FIFO between two processors allowing buffered, async transfer of small data messages.  I use it 'transceiverless' on some boards just for this purpose.
     
    #9
    DarioG
    Allmächtig.
    • Total Posts : 54081
    • Reward points : 0
    • Joined: 2006/02/25 08:58:22
    • Location: Oesterreich
    • Status: offline
    Re: CAN bus with some nodes not responding 2018/07/30 14:29:52 (permalink)
    0
    interessante :)

    GENOVA :D :D ! GODO
    #10
    Mamonetti
    Starting Member
    • Total Posts : 32
    • Reward points : 0
    • Joined: 2006/09/11 13:46:32
    • Location: 0
    • Status: offline
    Re: CAN bus with some nodes not responding 2018/07/31 01:02:31 (permalink)
    0
    I'll switch to fixed baudrate then, just in case there's only one slave.
     
    However, I have detected a few times a slave not responding and the FIFO getting full in a single slave configuration. That's something I didn't expect to happen considering I'm communicating with a commercial CANopen slave, but it happens from time to time.
     
    The problem normally appears (randomly) when I switch the slave on and wait for some time before switching the master on. By some time I mean 10 or 15 minutes.
     
    I sent a request to the manufacturer support team, and they sent me a video simulating this scenario with one of their commercial masters being able to communicate with the slave.
     
    So, I'll plug a third board in listen mode to see what's really going on. That's why I was trying to find out what happens when there's a node malfunction. I believe this should be quite similar from the master's point of view to my test scenario, that is, the master somehow not receiving the ACK.
     
    Regards
     
    #11
    jcandle
    Super Member
    • Total Posts : 344
    • Reward points : 0
    • Joined: 2011/09/19 22:01:53
    • Location: Rockledge, FL
    • Status: offline
    Re: CAN bus with some nodes not responding 2018/07/31 06:21:59 (permalink)
    0
    I forget if a slave will ack a packet that does not meet its acceptance filters...
    #12
    DarioG
    Allmächtig.
    • Total Posts : 54081
    • Reward points : 0
    • Joined: 2006/02/25 08:58:22
    • Location: Oesterreich
    • Status: offline
    Re: CAN bus with some nodes not responding 2018/07/31 06:41:21 (permalink)
    0
    should not, if I'm not wrong

    GENOVA :D :D ! GODO
    #13
    Mamonetti
    Starting Member
    • Total Posts : 32
    • Reward points : 0
    • Joined: 2006/09/11 13:46:32
    • Location: 0
    • Status: offline
    Re: CAN bus with some nodes not responding 2018/11/09 04:23:48 (permalink)
    0
    After some time, I've noticed a similar problem with a slightly different situation:
    - I have just the master and one slave.
    - The master is connected to an UPS (uninterruptible power supply) while the slave is not.
    - There is a temporary power failure by wich the slave is resetted but the master isn't.
    - By the time the slave is back, master's TX FIFO is full (TXNFULLIF is active) and will remain like that even though the slave gets up again (it should acknowledge every message sent by the master at CAN level and would eventually end up consuming all the messages coming from the master). At this point the slave can send messages to the master, indicating that the bus is operational, but for some reason, the master can't get away from this blocked status.
     
    So, is there any effective way to abort all the transmissions for a FIFO? I've followed these steps:
    - CxCON.ABAT = 1
    - Wait until CxCON.ABAT == 0
    - CxFIFOCONy.FRESET = 1
    - Wait until CxFIFOCONy.FRESET == 0
     
    Is this the right procedure? As you can imagine, and based on this specific conditions, I'd like the whole system to be able to get back to normal operation by itself, without having to reboot the master.
     
    Regards
     
    #14
    crosland
    Super Member
    • Total Posts : 1284
    • Reward points : 0
    • Joined: 2005/05/10 10:55:05
    • Location: Bucks, UK
    • Status: offline
    Re: CAN bus with some nodes not responding 2018/11/09 05:30:16 (permalink)
    0
    Does your firmware on the master monitor ALL the CAN error conditions?
     
    What do you do when it goes error-passive or bus-off?
     
    With no other working CAN modules does it even see the necessary bits to recover from bus-off?
    #15
    jcandle
    Super Member
    • Total Posts : 344
    • Reward points : 0
    • Joined: 2011/09/19 22:01:53
    • Location: Rockledge, FL
    • Status: offline
    Re: CAN bus with some nodes not responding 2018/11/09 06:00:24 (permalink)
    0
    While your application may think a node is a master, CAN is physically peer to peer.  Your master is no more special than any other device and may enter bus heavy or other error states depending on conditions outside of its control.  There may be cases where a complete reset of the peripheral is the only sane response.
    #16
    Jim Nickerson
    User 452
    • Total Posts : 5447
    • Reward points : 0
    • Joined: 2003/11/07 12:35:10
    • Location: San Diego, CA
    • Status: online
    Re: CAN bus with some nodes not responding 2018/11/09 07:05:45 (permalink)
    0
    I find it of great use to have a functioning CAN device to debug my CAN devices.
    I like the https://www.peak-system.com/PCAN-USB.199.0.html
    #17
    Mamonetti
    Starting Member
    • Total Posts : 32
    • Reward points : 0
    • Joined: 2006/09/11 13:46:32
    • Location: 0
    • Status: offline
    Re: CAN bus with some nodes not responding 2018/11/12 08:56:01 (permalink)
    0
    crosland
    Does your firmware on the master monitor ALL the CAN error conditions?
     
    What do you do when it goes error-passive or bus-off?
     
    With no other working CAN modules does it even see the necessary bits to recover from bus-off?




    As you said, it had to do with the master having the Transmitter in Error State Bus Passive (CxTREC.TXBP = 1). Based on this, I'll monitor TXBO, TXBP and RXBP in order to prevent this.
     
    The way I've found to fix this is following these steps:
    - Set Disable Mode (CxCON.REQOP = 1).
    - Switch the CAN module off (CxCON.ON = 0).
    - Switch it back on (CxCON.ON = 1).
    - Go back to Normal Operation Mode (CxCON.REQOP = 0).
     
    It looks like there's no way to get rid of error-passive or bus-off other than by doing it the hard way.
     
    Thanks and regards
     
    post edited by Mamonetti - 2018/11/12 11:45:40
    #18
    Jump to:
    © 2018 APG vNext Commercial Version 4.5