IP Instance stops working SSP 1.7.12

HI.

In our proprietary board we have two ethernet interfaces (one for the lan and one for the wan). After a random time, it can be minutes or days, the interfaces stops working (mainly the wan that has telnet server). When interfaces crash, they don't even respond to ping.

I tried to debug the problem but unfortunately E2studio crashes after a while. What I have seen is that the IP Helper threads keep working even if they don't respond to the ping.

Now I have also enabled the traceX hoping to understand something. What do you think I should focus on?

Best Regards

Paolo

  • Hi Paolo-

    Have you tried using WireShark to see where the interface stops responding to pings? That might help narrow down your search.

    Often seemingly random crashes can be attributed to stack or buffer overflow issues. Perhaps that would be the place to start with traceX- see if you can spot any buffers or stacks that keep growing over time- this could help identify the problem source.

    Hopefully others in the forum will offer additional advice- this is always a tricky type of bug to chase down...

  • Hi Warren.

    I have done several tests. As per your suggestion I also tried sniffing with wireshark. When it stops working it not even respond to ARP requests (in the attached file there are two ping tests, one on 192.168.1.33 which works and one on 192.168.1.34 which is the port which in this case is blocked). I have also attached screenshots of the RTOS Resource windows.

    Every now and then the ethernet port starts working for a while. From what I understand the stack is not broken and the threads keep running. I have also attached the traceX files when it works and when it crashes (but I can't understand much).

    Tahnks

    Paolo

    Ethernet problem.zip

  • Hi,

    How are the packet pools configured? It sounds like you could be running out of packet space causing the hangs. It you are handling the packets at the application level (e.g. not using one of the NetX applications) then it is up to the application to release received packets If this is not done in a timely manner or there are not enough packets available then the network will hang waiting for packet space to become available. Also, it worth considering having different packet pools for different application using the network so one application isn't able to consume lots of packet space and starve other parts of the application.

    Use the below API to check the state of packet pools which may help determine if this is the problem.

    nx_packet_pool_info_get()

    Regards,

    Ian.

  • Hi Ian.

    Then I also have the statistics of the packet pool when the ethernet crashes.

    WAN:
            Total packets sent: 15627
            Total bytes sent: 8752012
            Send packets dropped: 1449
            Total packets received: 2908634
            Total bytes received: 7306566
            Received packets dropped: 2853380
            Received checksum errors: 0
            Invalid packets: 9
            Total fragments sent: 0
            Total fragments received: 0
            Total pool packets: 64
            Free packets pool: 48
            Empty pool request: 39
            Empty pool suspensions: 0
            Invalid packets release: 0

    By sniffing the ethernet port the arp requests arrive but the machine does not respond (see attached image).
    Disappointed.

    Any ideas?

    Thanks

    Paolo

    NOTE:  Now I am testing the system independently (begore it was connected to the corporate LAN). See if it does not crash. But still I should make it work in every scenario.

  • Hi Paolo,

    Same problem here with SSP 1.7.0. What i have found is that on a very intense traffic network, Ethernet stops to receive packets after a while. Note that my network has a lot of broadcast and multicast packets moving around (don't know why exactly...) it has cyclic peaks of thousands of packets per second.

    (As an additional reference to my network, note that ENC28J60 PHY from other device, also blocks due this intense traffic, but in this case is a known issue with that component)

    First, the problem has arise on a custom board, so I have moved to a PK-S5D9 board in order to see if the problem also happens, and yes it takes litle more time but it blocks at the end.

    So as it happens on both devices (with different PHY's) my conclusion is that there is something on SSP;

    What i've found:

    1 - When stops to reply to ping requests, data still is arriving to ethernet stack, you can see the packets updating memory pool area.


    2 - For an unknown reason, 'packet_type' seems to be corrupted so the new packets are not processed by the stack.

    From 'nx_renesas_synergy.c':1245

    /** Route the incoming packet according to its Ethernet type. */
    #ifdef FEATURE_NX_IPV6
    if (packet_type == NX_ETHERNET_IP || packet_type == NX_ETHERNET_IPV6)
    #else
    if (packet_type == NX_ETHERNET_IP)
    #endif
    {
    _nx_ip_packet_deferred_receive(nx_rec_ptr->ip_ptr, packet_ptr);
    }
    else if (packet_type == NX_ETHERNET_ARP)
    {
    _nx_arp_packet_deferred_receive(nx_rec_ptr->ip_ptr, packet_ptr);
    }
    else if (packet_type == NX_ETHERNET_RARP)
    {
    _nx_rarp_packet_deferred_receive(nx_rec_ptr->ip_ptr, packet_ptr);
    }
    else
    {
    /** Call the callback for unsupported packet type, if defined. */
    if ((NULL != nx_rec_ptr->p_callback_rec) && (NULL != nx_rec_ptr->p_callback_rec->nx_ether_unknown_packet_receive_callback))
    {
    nx_rec_ptr->p_callback_rec->nx_ether_unknown_packet_receive_callback(packet_ptr, packet_type);
    }
    else
    {
    /** If Ethernet header id invalid, release the packet. */
    nx_packet_release(packet_ptr); << ALWAYS GETTING HERE AFTER THE FAILURE
    }
    }
    return ;
    }

    I have seen on the forum, that more people has suffered this issue without get a clear solution.

    As a temporal workaround... is there any way to safely restart the ip stack?

    RENESAS PEOPLE: Note that i suspect that it only happen under a high network traffic environment so probably there is an unknown bug on the stack.

    Thaks!

    Oscar.

  • Hi Oscar.

    I also have a lot of broadcast and multicast traffic on my network. For the moment doing a dedicated VLAN seems to work without problems. I'll keep you up-to-date.

    Paolo

  • Emoty pool requests means 48 times you ran out if packets.

  • Hi Iarry.
    I had also increased the pool to 2048 packets, but it still crashed. And consider that the system didn't use ethernets (they were active but application not send/receive user data).
    As I have confirmed from Oscar it seems a problem when too many packets arrive (maybe broadcast or multicast), the ethernet stops working.

    For the moment, using a separate network, the problem no longer occurred. In my case it is not a problem because it is a device that works on networks with few devices, but should understand if there is a bug in some library for other projects.

    Thanks

    Paolo

  • Larry,

    As Paolo says, the problem is about the stack crashing not a temporary difficult to handle high traffic volume.

    In my case, it is mandatory to solve the problem because i don't know where the product will be used.

    But, it is important to get some feedback from RENESAS PEOPLE in order to see if this is a potential stack bug.

    Best!

    Oscar.