HomeAbout UsContact Us

JTAG and SWD Debugging Strategies for Embedded Systems

By embeddedSoft
Published in Embedded Concepts
June 11, 2026
4 min read
JTAG and SWD Debugging Strategies for Embedded Systems

Table Of Contents

01
Understanding JTAG and SWD
02
Choosing Between JTAG and SWD
03
Core Debugging Strategies
04
Debug Probe Selection
05
Summary
06
References

Debugging is one of the most critical skills in embedded systems development. Unlike desktop applications, embedded firmware runs on resource-constrained hardware with limited visibility into internal state. Two debug interfaces dominate the ARM Cortex-M ecosystem: JTAG (Joint Test Action Group) and SWD (Serial Wire Debug). Understanding their differences, trade-offs, and practical debugging strategies can dramatically improve your development workflow.

Understanding JTAG and SWD

JTAG, standardized as IEEE 1149.1, has been the industry standard for decades. It uses a Test Access Port (TAP) controller with a state machine that shifts data through a scan chain. A typical JTAG connection requires four mandatory pins — TCK (clock), TMS (mode select), TDI (data in), and TDO (data out) — plus an optional TRST (reset). JTAG supports boundary scan testing, which is invaluable for PCB manufacturing tests and verifying interconnections between devices.

SWD, developed by ARM as part of the CoreSight debug architecture, is a two-pin alternative that provides the same debug capabilities as JTAG (minus boundary scan). It uses SWDIO (bidirectional data) and SWCLK (clock). Most modern Cortex-M MCUs implement a combined SWJ-DP (Serial Wire/JTAG Debug Port) that can auto-negotiate between the two protocols, sharing pins so that SWD signals overlay JTAG signals.

+----------------------------------------------------------+
| Debug Interface Comparison |
+---------------------------+------------------------------+
| JTAG (IEEE 1149.1) | SWD (ARM CoreSight) |
+---------------------------+------------------------------+
| 4-5 pins (TCK,TMS,TDI, | 2 pins (SWDIO, SWCLK) |
| TDO, TRST) | |
+---------------------------+------------------------------+
| Boundary scan supported | No boundary scan |
+---------------------------+------------------------------+
| Up to ~20 MHz clock | Up to 30 MHz clock |
+---------------------------+------------------------------+
| TAP state machine | Packet-based protocol |
+---------------------------+------------------------------+
| Multi-device scan chain | Multi-drop (SWD v2) |
+---------------------------+------------------------------+

Choosing Between JTAG and SWD

For most Cortex-M microcontroller projects, SWD is the recommended default. The two-pin interface conserves precious GPIO pins — a critical consideration on small packages like QFN-32 or WLCSP. SWD also offers higher clock rates, which translates to faster flash programming and smoother stepping through code during debug sessions.

However, JTAG remains essential in several scenarios. If your design requires boundary scan testing for production PCB verification, JTAG is the only option. When debugging multi-processor SoCs with legacy ARM cores (ARM7, ARM9, ARM11) alongside Cortex-M cores, JTAG’s scan chain architecture allows a single debug probe to access all devices. Some older debug tools also only support JTAG.

A practical strategy is to design your board with a SWJ-DP compatible header that exposes both protocols. ARM recommends the 10-pin Cortex Debug Connector (0.05” pitch) for new designs — it supports SWD, JTAG, and Serial Wire Viewer (SWV) trace through a compact, low-cost connector.

Core Debugging Strategies

1. Hardware Breakpoints and Watchpoints

ARM Cortex-M processors provide a Flash Patch and Breakpoint (FPB) unit that supports hardware breakpoints. The number of available breakpoints is implementation-defined — Cortex-M3/M4 devices typically provide 6 instruction comparators, while Cortex-M7 can support up to 8 (check your MCU’s FP_CTRL register for the exact count). These are essential for debugging code in flash memory since you can’t insert software breakpoint instructions (like BKPT) into read-only flash without the FPB remapping mechanism.

For data watchpoints, the Data Watchpoint and Trace (DWT) unit allows you to halt execution when a specific memory address is read or written. This is invaluable for tracking down stack overflows, buffer overruns, and unexpected peripheral register modifications.

// Example: Setting a data watchpoint on a variable using DWT
// (Cortex-M4 with 4 comparators)
volatile uint32_t sensor_value = 0;
void enable_data_watchpoint(void)
{
// Enable DWT and ITM
CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk;
// Configure DWT comparator 0
DWT->COMP0 = (uint32_t)&sensor_value; // Address to watch
DWT->MASK0 = 0; // Match exact address (4 bytes)
DWT->FUNCTION0 = 0x5; // Data read watchpoint (FUNCTION = 0b0101)
// Now halt on debugger — any read of sensor_value triggers a halt
}

2. Semihosting and SWO Trace

Semihosting allows the target to use the debugger’s I/O facilities — for example, printing to the host console via printf. While convenient, semihosting is slow (it halts the CPU for each operation) and should be avoided in timing-sensitive code.

A better alternative is SWO (Serial Wire Output), which provides a dedicated trace pin for streaming data with minimal CPU overhead. Combined with the Instrumentation Trace Macrocell (ITM), SWO can output printf-style messages, data trace events, and exception entry/exit information without halting the target.

// ITM-based printf redirection via SWO (non-blocking)
int _write(int file, char *ptr, int len)
{
for (int i = 0; i < len; i++)
{
ITM_SendChar(*ptr++);
}
return len;
}
// Usage: printf("Sensor: %d\r\n", sensor_value);
// Output appears on SWO pin at core clock (HCLK) / (TPI->ACPR + 1)

3. Exception and Fault Debugging

HardFaults are among the most common and frustrating issues in embedded development. When a HardFault occurs, the processor pushes a stack frame containing the program counter (PC), link register (LR), R0–R3, R12, and xPSR. Additionally, the System Control Block (SCB) provides the Configurable Fault Status Register (CFSR) and fault address registers that indicate the fault cause.

A systematic approach to fault debugging:

  1. Capture the stacked PC from the exception stack frame to identify the exact instruction that caused the fault.
  2. Read the CFSR — its sub-registers (UFSR bits[31:16], BFSR bits[15:8], MMFSR bits[7:0]) pinpoint whether the fault was due to an undefined instruction, bus error, or memory management violation.
  3. Check the BFAR/MMFAR — the Bus Fault Address Register (BFAR) or MemManage Fault Address Register (MMFAR) contain the faulting address, but only when the corresponding BFARVALID or MMARVALID bit in the CFSR is set.
  4. Examine the LR value on exception entry — the EXC_RETURN bits indicate which stack pointer (MSP or PSP) was in use.
// HardFault handler that captures diagnostic information
void HardFault_Handler(void)
{
__asm volatile (
"TST LR, #4 \n" // Check EXC_RETURN bit 2
"ITE EQ \n"
"MRSEQ R0, MSP \n" // Use MSP if bit 2 == 0
"MRSNE R0, PSP \n" // Use PSP if bit 2 == 1
"B hard_fault_handler_c \n"
);
}
void hard_fault_handler_c(uint32_t *hardfault_args)
{
volatile uint32_t stacked_pc = hardfault_args[6];
volatile uint32_t stacked_lr = hardfault_args[5];
volatile uint32_t cfsr = SCB->CFSR;
volatile uint32_t bfar = SCB->BFAR;
volatile uint32_t mmfar = SCB->MMFAR;
// Set a breakpoint here in your debugger to inspect these values
(void)stacked_pc;
(void)stacked_lr;
(void)cfsr;
(void)bfar;
(void)mmfar;
while (1) { }
}

4. Multi-Core and Multi-Drop Debugging

Modern embedded designs increasingly use multi-core MCUs (e.g., Cortex-M4 + Cortex-M0+). SWD version 2 introduces Multi-Drop, allowing a single debug probe to access multiple DAPs (Debug Access Ports) on the same SWD bus. Each DAP has a unique target ID, and the debugger selects which core to communicate with using a target selection protocol.

For JTAG-based multi-core debugging, devices are daisy-chained in a scan chain. The debugger must know the position and instruction register length of each device in the chain — typically configured via a board description file (like a .jtag or .svf file).

Debug Probe Selection

The choice of debug probe significantly impacts your debugging experience. Popular options include:

ProbeProtocolsSpeedNotes
SEGGER J-LinkJTAG + SWDUp to 50 MHzIndustry standard, excellent IDE integration
CMSIS-DAP / DAPLinkJTAG + SWDUp to 10 MHzOpen source, no drivers needed (USB HID)
ST-LINK/V3JTAG + SWDUp to 24 MHzBundled with STM32 Nucleo/Discovery boards
ULINKproJTAG + SWDUp to 50 MHzARM/KEIL, supports ETM trace (100 MHz trace clock)

For professional development, SEGGER J-Link offers the best combination of speed, reliability, and tool integration. For hobbyist or educational use, CMSIS-DAP based probes (often built into development boards) provide a zero-cost, driver-free solution.

Summary

Effective embedded debugging requires understanding both the hardware interfaces (JTAG vs SWD) and the architectural debug features built into your target processor. SWD is the practical choice for most Cortex-M projects due to its lower pin count and higher speed, while JTAG remains necessary for boundary scan and legacy device support. Leveraging hardware breakpoints, ITM/SWO trace, and systematic fault analysis techniques will help you diagnose issues faster and ship more reliable firmware.

References


Tags

jtagswddebuggingembedded-systemsarm-cortex-m

Share


Previous Article
RTOS Performance Profiling and Optimization Techniques
embeddedSoft

embeddedSoft

Embedded Systems Articles by Jithin Tom & Hermes (AI Agent)

Related Posts

Power Management Techniques for Battery-Powered Embedded Systems
Power Management Techniques for Battery-Powered Embedded Systems
June 08, 2026
4 min
© 2026, All Rights Reserved.
Powered By Netlyft

Quick Links

Advertise with usAbout UsContact Us

Social Media