HI.
In our proprietary board we have two ethernet interfaces (one for the lan and one for the wan). After a random time, it can be minutes or days, the interfaces stops working (mainly the wan that has telnet server). When interfaces crash, they don't even respond to ping.
I tried to debug the problem but unfortunately E2studio crashes after a while. What I have seen is that the IP Helper threads keep working even if they don't respond to the ping.
Now I have also enabled the traceX hoping to understand something. What do you think I should focus on?
Best Regards
Paolo
Hi Paolo,
Same problem here with SSP 1.7.0. What i have found is that on a very intense traffic network, Ethernet stops to receive packets after a while. Note that my network has a lot of broadcast and multicast packets moving around (don't know why exactly...) it has cyclic peaks of thousands of packets per second.
(As an additional reference to my network, note that ENC28J60 PHY from other device, also blocks due this intense traffic, but in this case is a known issue with that component)
First, the problem has arise on a custom board, so I have moved to a PK-S5D9 board in order to see if the problem also happens, and yes it takes litle more time but it blocks at the end.
So as it happens on both devices (with different PHY's) my conclusion is that there is something on SSP;
What i've found:
1 - When stops to reply to ping requests, data still is arriving to ethernet stack, you can see the packets updating memory pool area.
2 - For an unknown reason, 'packet_type' seems to be corrupted so the new packets are not processed by the stack.
From 'nx_renesas_synergy.c':1245
/** Route the incoming packet according to its Ethernet type. */#ifdef FEATURE_NX_IPV6 if (packet_type == NX_ETHERNET_IP || packet_type == NX_ETHERNET_IPV6)#else if (packet_type == NX_ETHERNET_IP)#endif { _nx_ip_packet_deferred_receive(nx_rec_ptr->ip_ptr, packet_ptr); } else if (packet_type == NX_ETHERNET_ARP) { _nx_arp_packet_deferred_receive(nx_rec_ptr->ip_ptr, packet_ptr); } else if (packet_type == NX_ETHERNET_RARP) { _nx_rarp_packet_deferred_receive(nx_rec_ptr->ip_ptr, packet_ptr); } else { /** Call the callback for unsupported packet type, if defined. */ if ((NULL != nx_rec_ptr->p_callback_rec) && (NULL != nx_rec_ptr->p_callback_rec->nx_ether_unknown_packet_receive_callback)) { nx_rec_ptr->p_callback_rec->nx_ether_unknown_packet_receive_callback(packet_ptr, packet_type); } else { /** If Ethernet header id invalid, release the packet. */ nx_packet_release(packet_ptr); << ALWAYS GETTING HERE AFTER THE FAILURE } } return ;}
I have seen on the forum, that more people has suffered this issue without get a clear solution.
As a temporal workaround... is there any way to safely restart the ip stack?
RENESAS PEOPLE: Note that i suspect that it only happen under a high network traffic environment so probably there is an unknown bug on the stack.
Thaks!
Oscar.
Hi Oscar.
I also have a lot of broadcast and multicast traffic on my network. For the moment doing a dedicated VLAN seems to work without problems. I'll keep you up-to-date.
Hi Jeremy.Unfortunately changing the variable reset didn't work in my case (if I understand it correctly). I made the following change:
.Ethernet statistic
Total packets sent: 166525 Total bytes sent: 120059522 Send packets dropped: 0 Total packets received: 3065627 Total bytes received: 8576741 Received packets dropped: 2938960 Received checksum errors: 0 Invalid packets: 2 Total fragments sent: 0 Total fragments received: 0 Packets pool (free/tot): 48/64 Empty pool request: 0 Empty pool suspensions: 0 Invalid packets release: 0
As you can see this time the Empty pool request was 0. The device that is on separate VLAN, for the moment, continues to work.
Thanks
Paolo,
Can you confirm if in your case also always getting a corrupted packet_type ?
Thanks,
"Invalid packets: 2"
I think they fall into those two packages
There can be an issue if an unaligned access occurs across the boundary between SRAMHS and SRAM0 at 0x20000000. Memcpy can cause an unaligned access accros the memory boundary sometimes :-
https://renesasrulz.com/synergy/synergy_tech_notes/f/technical-bulletin-board-notification-postings/15610/gcc-newlib-nano-memory-operations-memcpy-and-memmove-cause-incorrect-data-read-when-crossing-sram-boundaries
and :-
https://renesasrulz.com/synergy/f/synergy---forum/15841/memcpy-fails-on-s5d9
and memcpy is used in a couple of places in NetX and NetX DUO :-
Hi Jeremy.I wrote my version of memcpy which shouldn't suffer from unaligned access problem.
However, I have set the -mno-unaligned-access option. I'll let you know if it solves.
There is also an issue with malloc, as it is not thread safe. If malloc is used, then the lock mechanism needs to be implemented.
I don't use malloc in my code. Also doing a search in the files it seems that they are not used by libraries either. I will keep you updated
Hi.
Unfortunately the -mno-unaligned-access option did nothing. Instead the device that has the separate network continues to work. Seeing as the changes didn't work out I'm back with the options and default code.
In this case this shouldn't be a problem, as a separate network will be used with one or at most two devices connected, but on other implementations we plan to develop with these libraries it could be a problem.
You're not alone on this. I'm also digging into Synergy code in order to find the root cause... but some help from RENESAS is welcome!
Best!
Hi Oscar-
Let us know if you find anything you want to ask about. You might consider starting a new thread if your environment is different enough from Paolo's that you think they might not be exactly the same....
Thanx, Warren
Hi all.
I confirm that isolating the network solved my problem, but this is still a problem that exists if using NetX on crowded networks.
Hi Warren,
Yes of course, I will open a separated thread with the proper SSP and enviroment.
My intention is also to be able to provide you a network capture which can cause the failure on your side.
Thank you in advance,