FreeRTOS critical section causes hard fault

I have a custom board running code (RA6M5/FSP4.5.0/FreeRTOS) which can occasionally generate a hard-fault.

The hard-fault is related to memory access performed inside a critical section. When the system operates solely using taskENTER_CRITICAL() or taskENTER_CRITICAL_FROM_ISR() everything works fine. But mixing the two will eventually cause a hard fault.

The debugger when the hard fault occurs makes the call stack look nonsensical. Its hard to get insight once you are in the hard fault handler.

Investigating taskENTER_CRITICAL I saw this on line 198:

```NOTE: This may alter the stack (depending on the portable implementation)
so must be used with care!```

This lead me to assume this was the case and try to use methods that might fix the issue. In particular I added calls to __DSB() at the beginning of each critical section. These seem to work but only when I run the program using the debugger.

I'm open to trying different primitive instead of a critical section. Inside that critical section is access to a simple circular buffer.

  • These seem to work but only when I run the program using the debugger.

    It seems more related to my use of a bootloader. With no bootloader the __DSB() work around works. When I start the application via the bootloader I see the issue.

  • I've aligned the stack size in the boot loader BPS\RA Common\Main Stack Size to be the same as the application, it was 0x800. Now it is 0x400

  • 've aligned the stack size in the boot loader BPS\RA Common\Main Stack Size to be the same as the application, it was 0x800. Now it is 0x400

    This did not help.

  • Hello,

    What kind of hard fault do you get ?

    Also how do you mix taskENTER_CRITICAL and taskENTER_CRITICAL_FROM_ISR ?

    taskENTER_CRITICAL_FROM_ISR should be called in an interrupt service routine (ISR) only.

  • What kind of hard fault do you get ?

    I broke out the handlers below and the HardFault_Handler was the one being called.

    from startup.c:

    void NMI_Handler(void); // NMI has many sources and is handled by BSP

    void HardFault_Handler(void) WEAK_REF_ATTRIBUTE;

    void MemManage_Handler(void) WEAK_REF_ATTRIBUTE;

    void BusFault_Handler(void) WEAK_REF_ATTRIBUTE;

    void UsageFault_Handler(void) WEAK_REF_ATTRIBUTE;

    void SecureFault_Handler(void) WEAK_REF_ATTRIBUTE;

    void SVC_Handler(void) WEAK_REF_ATTRIBUTE;

    void DebugMon_Handler(void) WEAK_REF_ATTRIBUTE;

    void PendSV_Handler(void) WEAK_REF_ATTRIBUTE;

    void SysTick_Handler(void) WEAK_REF_ATTRIBUTE;

    Also how do you mix taskENTER_CRITICAL and taskENTER_CRITICAL_FROM_ISR ?

    I would say that the shared resources protected by these calls are both types.

    A timer interrupt is producer of data, a timer interrupt is consumer of data, and sporadically a task is a producer.

  • I've switched to running both the bootloader and application with 0x800, the bootloader/tinycrypt library warns when you build with a stack smaller than this. I've been running for a while without issue. I'll report back when a number of hours has passed. Usually in the setup I have it fails in < 1000 seconds.

  • I succeeded in running for 26K seconds, having said that a post elsewhere (https://community.renesas.com/mcu-mpu/ra/f/forum/34955/the-firmware-has-hard-fault-and-hung-some-times/125093#125093) makes me think I should revisit the priorities of my interrupts. I'll post a reply higher up to clarify some things.

  • I've been reading https://www.freertos.org/RTOS-Cortex-M3-M4.html , I'm currently using a CM33 (RA6M5) so I believe it applies.

    #ifndef configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY
    #define configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY ((1))
    #endif
    #ifndef configMAX_SYSCALL_INTERRUPT_PRIORITY
    #define configMAX_SYSCALL_INTERRUPT_PRIORITY (configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY << (8 - __NVIC_PRIO_BITS))
    #endif

    R7FA6M5BH.h  defines __NVIC_PRIO_BITS as 4 which would mean that configMAX_SYSCALL_INTERRUPT_PRIORITY = 1 << (8-4) = 16

    If I read the documentation correctly this means that I would have to specify any interrupt that uses a critical section at 16 or above... Trouble is 15 is highest priority I can set with the FSP.

    Am I missing something here?

  • If the program runs for a long time and then hangs, it could be because a stack overflow. Please try to increase the stack size.

    Also when the hard fault occurs, what information do you get from the Fault Stats window on e2studio ?

  • If the program runs for a long time and then hangs, it could be because a stack overflow. Please try to increase the stack size

    I have tried increasing the stack it did not change the behaviour.

    Also when the hard fault occurs, what information do you get from the Fault Stats window on e2studio ?

    I don't think it is available to me using the RA6M5/FSP4.5.0. At least I can't find it.