Watchdog Timer Design for Reliable Embedded Systems

By Jithin Tom

Published in Embedded Concepts

June 03, 2026

3 min read

Watchdog Timer Design for Reliable Embedded Systems

Introduction

The Two Common STM32 Watchdog Architectures

Common Design Mistakes

The Task Monitor Pattern

Early Wakeup Interrupt: Last Chance to Log

Configuration Checklist

Summary

References

Frequently Asked Questions

Introduction

Every embedded system ships with bugs. Not the obvious ones — the subtle, field-only variety triggered by a cosmic ray bit-flip at 3 AM, a timing margin violated only at 85°C, or a race condition that manifests once in ten thousand power cycles. When these bugs cause firmware to hang, there is only one last line of defense: the watchdog timer.

A watchdog timer (WDT) is a hardware peripheral that resets the processor if software fails to periodically service it. The concept is simple — a counter that software must “kick” before it expires. If the software is stuck in an infinite loop or crashed, the counter resets the system. But implementing a reliable watchdog is far more nuanced than sprinkling HAL_IWDG_Refresh() in your main loop. This article walks through the two main watchdog architectures found on STM32 (ARM Cortex-M) microcontrollers, common pitfalls, and a robust task-monitoring pattern.

The Two Common STM32 Watchdog Architectures

STM32 microcontrollers typically provide two distinct watchdog peripherals.

Independent Watchdog (IWDG)

The IWDG is driven by the Low-Speed Internal (LSI) oscillator, nominally around 32 kHz on many STM32 devices. This clock independence is its defining trait: even if the main oscillator fails or the PLL locks up, the IWDG continues counting. It is the last thing standing when everything else has fallen.

The timeout is configured via a prescaler and a 12-bit reload value:

Timeout (ms) = (Prescaler × Reload) / LSI_Frequency (kHz)

With LSI = 32 kHz, prescaler = 64, reload = 500: timeout = 1 second.

Once enabled, the IWDG cannot be disabled except by a system reset. Whether it starts again automatically after reset depends on the device’s watchdog start configuration. Bootloader code must be aware of this: a slow firmware update can trigger a watchdog reset mid-flash if no one kicks the dog.

Window Watchdog (WWDG)

The WWDG is clocked from an APB bus clock, often APB1 depending on the STM32 family. This trade-off buys precision and a unique feature: the refresh window. Software must refresh the counter within a specific time band — not before and not after. Refresh too early, too late, or not at all, and the system resets.

WWDG Refresh Window
====================

  Counter
  0x7F  +------------------+  Max (T[6:0])
        |                  |
        |   REFRESH HERE   |  Valid window
        |   (not too early,|
        |    not too late) |
        |                  |
  0x40  +------------------+  Window threshold
        |                  |
        |  EARLY WAKEUP    |  IRQ fires here
        |  INTERRUPT (EWI) |
        |                  |
  0x3F  +------------------+  Reset threshold
        |  SYSTEM RESET    |
        +------------------+

  Refresh too early (counter > W[6:0]) => RESET
  Refresh too late   (counter < 0x40)  => RESET
  No refresh at all                    => RESET

This catches failures a basic timeout-only IWDG cannot: a runaway loop refreshing the IWDG as fast as it can, or an ISR that keeps firing while the main application is dead.

Comparison

Feature	IWDG	WWDG
Clock Source	LSI (~32 kHz)	APB bus clock
Clock Independence	Runs if main clock fails	Depends on APB clock domain
Reset on Timeout	Yes	Yes
Reset on Early Kick	Device-dependent	Yes (window mode)
Early Wakeup IRQ	Device-dependent	Yes (at counter 0x40)
Debug Freeze	Configurable via DBGMCU	Configurable via DBGMCU
Best For	System-level liveness	Timing-critical software monitoring

Common Design Mistakes

Unconditional refresh in the main loop. Kicking the watchdog regardless of system health proves only that the CPU is ticking — not that the system is working. A sensor returning garbage or a communication peripheral locked up won’t prevent the refresh.

Ignoring bootloader implications. If the IWDG is active before the bootloader runs, the bootloader must refresh it. A 30-second firmware flash with a 10-second timeout will reset mid-write.

Refreshing from ISRs. A high-priority ISR that refreshes the watchdog masks failures in the main application. The ISR keeps firing even when the application has crashed, preventing the watchdog from ever expiring.

Disabling during development. Code that works with an absent watchdog may behave differently when it is active. Enable the watchdog early and configure DBGMCU to freeze it during debug halts.

The Task Monitor Pattern

The robust approach treats the watchdog as a health-check aggregation point. Each critical task reports its health to a central monitor, which only refreshes the watchdog when all tasks have checked in within their expected period.

typedef struct {
    volatile uint32_t last_checkin_tick;
    uint32_t max_allowed_ticks;
    const char *name;
} watchdog_task_t;

static watchdog_task_t monitored_tasks[] = {
    { 0, 2000, "sensor"  },
    { 0, 5000, "comm"    },
    { 0, 1000, "control" },
};

void watchdog_task_checkin(watchdog_task_t *task) {
    task->last_checkin_tick = HAL_GetTick();
}

bool watchdog_all_tasks_healthy(void) {
    uint32_t now = HAL_GetTick();
    for (int i = 0; i < sizeof(monitored_tasks)/sizeof(monitored_tasks[0]); i++) {
        if (monitored_tasks[i].last_checkin_tick == 0) {
            return false;  /* Task never checked in */
        }
        uint32_t elapsed = now - monitored_tasks[i].last_checkin_tick;
        if (elapsed > monitored_tasks[i].max_allowed_ticks) {
            return false;  /* Task missed its deadline */
        }
    }
    return true;
}

void watchdog_manager_task(void *param) {
    for (;;) {
        if (watchdog_all_tasks_healthy()) {
            HAL_IWDG_Refresh(&hiwdg);
        }
        /* Not healthy? Withhold refresh — let it reset */
        vTaskDelay(pdMS_TO_TICKS(50));
    }
}

The watchdog manager runs at the lowest priority so all monitored tasks get CPU time first. If any task hangs, its checkin stops, and the next manager iteration withholds the refresh. The counter drains and the system resets.

Critical detail: on Cortex-M devices with BASEPRI support, use __set_BASEPRI() instead of __disable_irq() when you need to preserve high-priority interrupt latency. Note that BASEPRI is not available on Cortex-M0/M0+, and it does not allow a task-based watchdog manager to run inside a critical section; it only allows selected higher-priority interrupts to preempt.

Early Wakeup Interrupt: Last Chance to Log

The WWDG’s Early Wakeup Interrupt fires at counter 0x40 — one count before reset at 0x3F. Use it to capture diagnostics before the inevitable reboot.

void HAL_WWDG_EarlyWakeupCallback(WWDG_HandleTypeDef *hwwdg) {
    /* Log reset cause, active task, stack pointer to NV memory */
    log_watchdog_fault_to_nvram();
    /* Do NOT refresh here — use exclusively for diagnostics */
}

Do not refresh the watchdog in the EWI handler. The system is faulted. Record the program counter, active task ID, and fault status registers. After reset, the bootloader reads the RCC status register and diagnostic data for post-mortem analysis.

Configuration Checklist

WATCHDOG IMPLEMENTATION CHECKLIST
----------------------------------
[ ] Watchdog is NEVER refreshed unconditionally
[ ] Every critical task must check in to earn a refresh
[ ] IWDG used with independent LSI clock
[ ] Window mode used where supported and appropriate
[ ] Timeout tuned for worst-case legitimate operation
[ ] Watchdog enabled during development (freeze in debug)
[ ] Reset cause logged (RCC_CSR flags) for post-mortem
[ ] Verified by deliberately injecting hangs in each task

Summary

A watchdog timer is a system-level architectural decision that constrains bootloader design, task scheduling, and error handling. Key takeaways:

Use the IWDG for system-level crash recovery with clock independence
Use the WWDG (or IWDG window mode, where supported) to enforce timing constraints and catch runaway execution
Never refresh unconditionally — aggregate task health and only kick when the entire system is healthy
Log every watchdog reset to non-volatile memory; without logs, it is just a mystery reboot
Test by injecting faults — deliberately hang each task and verify the watchdog catches it

The watchdog is your firmware’s insurance policy. Like all insurance, you hope never to use it — but when you need it, you need it to work on the first try.

References

Frequently Asked Questions

What is the main purpose of a watchdog timer (WDT)?

A watchdog timer is a hardware counter that resets the microcontroller if the software fails to periodic 'kick' or clear it, recovering the system from lockups, infinite loops, or noise-induced crashes.

What is a windowed watchdog timer (WWDT)?

A windowed watchdog requires the software to refresh it within a specific open window. Refreshing it too early (which might happen in a broken control loop) or too late will trigger a system reset, adding extra safety.

Why should you avoid kicking the watchdog in a timer interrupt?

If the main application task gets stuck in an infinite loop, timer interrupts may continue running in the background. Kicking the watchdog in an interrupt would prevent it from resetting the failed system.