HomeAbout UsContact Us

Watchdog Timer Design for Reliable Embedded Systems

By embeddedSoft
Published in Embedded Concepts
June 03, 2026
3 min read
Watchdog Timer Design for Reliable Embedded Systems

Table Of Contents

01
Introduction
02
The Two Watchdog Architectures
03
Common Design Mistakes
04
The Task Monitor Pattern
05
Early Wakeup Interrupt: Last Chance to Log
06
Configuration Checklist
07
Summary
08
References

Introduction

Every embedded system ships with bugs. Not the obvious ones — the subtle, field-only variety triggered by a cosmic ray bit-flip at 3 AM, a timing margin violated only at 85°C, or a race condition that manifests once in ten thousand power cycles. When these bugs cause firmware to hang, there is only one last line of defense: the watchdog timer.

A watchdog timer (WDT) is a hardware peripheral that resets the processor if software fails to periodically service it. The concept is simple — a counter that software must “kick” before it expires. If the software is stuck in an infinite loop or crashed, the counter resets the system. But implementing a reliable watchdog is far more nuanced than sprinkling HAL_IWDG_Refresh() in your main loop. This article walks through the two main watchdog architectures on ARM Cortex-M microcontrollers, common pitfalls, and a robust task-monitoring pattern.

The Two Watchdog Architectures

Most modern microcontrollers — particularly the STM32 family — provide two distinct watchdog peripherals.

Independent Watchdog (IWDG)

The IWDG is driven by its own Low-Speed Internal (LSI) oscillator, typically 32 kHz. This clock independence is its defining trait: even if the main oscillator fails or the PLL locks up, the IWDG continues counting. It is the last thing standing when everything else has fallen.

The timeout is configured via a prescaler and a 12-bit reload value:

Timeout (ms) = (Prescaler × Reload) / LSI_Frequency (kHz)

With LSI = 32 kHz, prescaler = 64, reload = 500: timeout = 1 second.

Once enabled, the IWDG cannot be disabled — only a system reset stops it, and the reset immediately restarts the counter. Bootloader code must be aware of this: a slow firmware update will trigger a watchdog reset mid-flash if no one kicks the dog.

Window Watchdog (WWDG)

The WWDG is clocked from the APB1 bus clock. This trade-off buys precision and a unique feature: the refresh window. Software must refresh the counter within a specific time band — not before and not after. Refresh too early, too late, or not at all, and the system resets.

WWDG Refresh Window
====================
Counter
0x7F +------------------+ Max (T[5:0])
| |
| REFRESH HERE | Valid window
| (not too early,|
| not too late) |
| |
0x40 +------------------+ Window (W[5:0])
| |
| EARLY WAKEUP | IRQ fires here
| INTERRUPT (EWI) |
| |
0x3F +------------------+ Reset threshold
| SYSTEM RESET |
+------------------+
Refresh too early (counter > W[5:0]) => RESET
Refresh too late (counter < 0x40) => RESET
No refresh at all => RESET

This catches failures the IWDG cannot: a runaway loop refreshing the IWDG as fast as it can, or an ISR that keeps firing while the main application is dead.

Comparison

FeatureIWDGWWDG
Clock SourceLSI (~32 kHz)APB1 bus clock
Clock IndependenceRuns if main clock failsDies with system clock
Reset on TimeoutYesYes
Reset on Early KickNoYes (window mode)
Early Wakeup IRQGenerally noYes (at counter 0x40)
Debug FreezeConfigurable via DBGMCUConfigurable via DBGMCU
Best ForSystem-level livenessTiming-critical software monitoring

Common Design Mistakes

Unconditional refresh in the main loop. Kicking the watchdog regardless of system health proves only that the CPU is ticking — not that the system is working. A sensor returning garbage or a communication peripheral locked up won’t prevent the refresh.

Ignoring bootloader implications. If the IWDG is active before the bootloader runs, the bootloader must refresh it. A 30-second firmware flash with a 10-second timeout will reset mid-write.

Refreshing from ISRs. A high-priority ISR that refreshes the watchdog masks failures in the main application. The ISR keeps firing even when the application has crashed, preventing the watchdog from ever expiring.

Disabling during development. Code that works with an absent watchdog may behave differently when it is active. Enable the watchdog early and configure DBGMCU to freeze it during debug halts.

The Task Monitor Pattern

The robust approach treats the watchdog as a health-check aggregation point. Each critical task reports its health to a central monitor, which only refreshes the watchdog when all tasks have checked in within their expected period.

typedef struct {
volatile uint32_t last_checkin_tick;
uint32_t max_allowed_ticks;
const char *name;
} watchdog_task_t;
static watchdog_task_t monitored_tasks[] = {
{ 0, 2000, "sensor" },
{ 0, 5000, "comm" },
{ 0, 1000, "control" },
};
void watchdog_task_checkin(watchdog_task_t *task) {
task->last_checkin_tick = HAL_GetTick();
}
bool watchdog_all_tasks_healthy(void) {
uint32_t now = HAL_GetTick();
for (int i = 0; i < sizeof(monitored_tasks)/sizeof(monitored_tasks[0]); i++) {
if (monitored_tasks[i].last_checkin_tick == 0) {
return false; /* Task never checked in */
}
uint32_t elapsed = now - monitored_tasks[i].last_checkin_tick;
if (elapsed > monitored_tasks[i].max_allowed_ticks) {
return false; /* Task missed its deadline */
}
}
return true;
}
void watchdog_manager_task(void *param) {
for (;;) {
if (watchdog_all_tasks_healthy()) {
HAL_IWDG_Refresh(&hiwdg);
}
/* Not healthy? Withhold refresh — let it reset */
vTaskDelay(pdMS_TO_TICKS(50));
}
}

The watchdog manager runs at the lowest priority so all monitored tasks get CPU time first. If any task hangs, its checkin stops, and the next manager iteration withholds the refresh. The counter drains and the system resets.

Critical detail: on Cortex-M devices with BASEPRI support (everything except Cortex-M0), use __set_BASEPRI() instead of __disable_irq() for critical sections. This allows the watchdog manager’s interrupt to preempt even critical sections, catching hangs inside HAL_ENTER_CRITICAL() blocks.

Early Wakeup Interrupt: Last Chance to Log

The WWDG’s Early Wakeup Interrupt fires at counter 0x40 — one count before reset at 0x3F. Use it to capture diagnostics before the inevitable reboot.

void HAL_WWDG_EarlyWakeupCallback(WWDG_HandleTypeDef *hwwdg) {
/* Log reset cause, active task, stack pointer to NV memory */
log_watchdog_fault_to_nvram();
/* Do NOT refresh here — use exclusively for diagnostics */
}

Do not refresh the watchdog in the EWI handler. The system is faulted. Record the program counter, active task ID, and fault status registers. After reset, the bootloader reads the RCC status register and diagnostic data for post-mortem analysis.

Configuration Checklist

WATCHDOG IMPLEMENTATION CHECKLIST
----------------------------------
[ ] Watchdog is NEVER refreshed unconditionally
[ ] Every critical task must check in to earn a refresh
[ ] IWDG used with independent LSI clock
[ ] Window mode enabled to catch early refresh
[ ] Timeout tuned for worst-case legitimate operation
[ ] Watchdog enabled during development (freeze in debug)
[ ] Reset cause logged (RCC_CSR flags) for post-mortem
[ ] Verified by deliberately injecting hangs in each task

Summary

A watchdog timer is a system-level architectural decision that constrains bootloader design, task scheduling, and error handling. Key takeaways:

  • Use the IWDG for system-level crash recovery with clock independence
  • Use the WWDG (or IWDG window mode) to enforce timing constraints and catch runaway execution
  • Never refresh unconditionally — aggregate task health and only kick when the entire system is healthy
  • Log every watchdog reset to non-volatile memory; without logs, it is just a mystery reboot
  • Test by injecting faults — deliberately hang each task and verify the watchdog catches it

The watchdog is your firmware’s insurance policy. Like all insurance, you hope never to use it — but when you need it, you need it to work on the first try.

References


Tags

watchdogembedded-systemsreliabilityrtossafety

Share


Previous Article
Memory Alignment and Padding in Embedded C Demystified
embeddedSoft

embeddedSoft

Embedded Systems Articles by Jithin Tom & Hermes (AI Agent)

Related Posts

SPI Communication Protocol Explained for Embedded Systems
SPI Communication Protocol Explained for Embedded Systems
May 31, 2026
3 min
© 2026, All Rights Reserved.
Powered By Netlyft

Quick Links

Advertise with usAbout UsContact Us

Social Media