
A real-time operating system (RTOS) gives the illusion of running multiple tasks simultaneously on a single-core microcontroller. Behind this illusion lies one of the most critical mechanisms in embedded systems: the context switch. Understanding how and when context switches occur is essential for writing predictable, deterministic firmware — and for debugging the subtle timing bugs that plague complex RTOS-based applications.
A context switch is the mechanism by which the RTOS saves the execution state of the currently running task and restores the state of another task, allowing it to resume execution from where it left off. The “context” includes the contents of CPU registers, the stack pointer, and the program counter. On ARM Cortex-M processors, the hardware partially automates this process through the PendSV (Pendable Service Call) exception, but the RTOS must handle the software side — saving and restoring the remaining registers and selecting the next task to run.
The Task Control Block (TCB) is the data structure at the heart of context switching. Each task has its own TCB, which stores the task’s stack pointer, priority, state, and other metadata. The first member of the TCB is always the top of the task’s stack — a design choice that makes context switching efficient, since the stack pointer can be saved and restored with a single memory write.
On ARM Cortex-M, FreeRTOS and most other RTOS kernels use three system exceptions for scheduling:
When the SysTick handler determines that a context switch is needed (for example, because a higher-priority task has become ready), it does not switch immediately. Instead, it sets the PendSV pending bit. This deferred approach is critical: it allows higher-priority ISRs to complete before the context switch occurs, keeping interrupt latency low.
Here is the core of the FreeRTOS PendSV handler for Cortex-M3, annotated:
xPortPendSVHandler:mrs r0, psp ; Read Process Stack Pointerisb ; Instruction Synchronization Barrierldr r3, =pxCurrentTCB ; Get address of current TCB pointerldr r2, [r3] ; Load current TCB addressstmdb r0!, {r4-r11} ; Save registers R4-R11 onto task stackstr r0, [r2] ; Save new top of stack into TCBstmdb sp!, {r3, r14} ; Save scratch regs (using MSP)mov r0, #MAX_SYSCALL ; Set BASEPRI to mask kernel interruptsmsr basepri, r0bl vTaskSwitchContext ; Select next task (updates pxCurrentTCB)mov r0, #0 ; Clear BASEPRI to re-enable interruptsmsr basepri, r0ldmia sp!, {r3, r14} ; Restore scratch regsldr r1, [r3] ; Load new TCB addressldr r0, [r1] ; Load new task's stack pointerldmia r0!, {r4-r11} ; Restore registers R4-R11 from new stackmsr psp, r0 ; Update Process Stack Pointerisbbx r14 ; Return from exception (restores R0-R3, LR, PC)
The handler first saves the callee-saved registers (R4-R11) onto the current task’s stack, then calls vTaskSwitchContext() to select the highest-priority ready task. After the switch, it restores the new task’s registers and returns from the exception, resuming the new task exactly where it was interrupted.
For Cortex-M4F and other processors with a Floating Point Unit (FPU), the context switch is more complex. The handler must check whether the task used the FPU (by examining bit 4 of the exception return value in R14) and, if so, save and restore the FPU registers S16-S31 in addition to the core registers. This lazy stacking feature of the Cortex-M architecture helps avoid unnecessary FPU register saves when tasks do not use floating-point operations.
Most RTOS kernels use a priority-based preemptive scheduler. At any given moment, the highest-priority task that is in the Ready state will be executing. If a higher-priority task becomes ready (for example, because a semaphore it was waiting on was given by an ISR), the RTOS will immediately preempt the current task and switch to the higher-priority one.
When multiple tasks share the same priority, the RTOS can optionally use round-robin scheduling. In this mode, each task gets a time slice (typically one tick period). When the time slice expires, the scheduler moves the current task to the end of the ready queue for that priority and switches to the next task. This ensures fair CPU sharing among equal-priority tasks.
The choice between preemptive-only and round-robin scheduling is controlled by configuration constants. In FreeRTOS, configUSE_PREEMPTION enables preemption, and configUSE_TIME_SLICING enables round-robin time slicing. Disabling time slicing means that an equal-priority task will only be preempted when it voluntarily yields or blocks.
Context switches can be triggered by several events:
Tick interrupt (time-based): The SysTick handler increments the tick counter and checks if any blocked tasks have timed out. If a higher-priority task becomes ready, a context switch is pended.
Task yields (voluntary): A task calls taskYIELD() to voluntarily give up the CPU. This triggers a PendSV exception to request a context switch.
Blocking operations (implicit): When a task calls a blocking API such as vTaskDelay(), xSemaphoreTake(), or xQueueReceive(), the RTOS checks if a higher-priority task is ready and switches if necessary.
ISR-triggered (deferred): When an ISR unblocks a higher-priority task (e.g., by giving a semaphore), it sets the PendSV pending bit. The context switch is deferred until all ISRs have completed, then PendSV fires at the lowest priority.
Every task in an RTOS exists in one of four states:
vTaskSuspend(). Unlike blocked tasks, suspended tasks have no timeout and can only be resumed by an explicit API call.The scheduler maintains a separate ready list for each priority level. When selecting the next task, it scans the ready lists from highest priority to lowest, picking the first task in the highest non-empty list. This ensures O(1) scheduling decision time when the number of priorities is small.
Tick rate selection: The tick rate (configTICK_RATE_HZ) determines the timing resolution of the RTOS. A 1000 Hz tick gives 1 ms resolution but consumes more CPU overhead than 100 Hz. Choose the lowest tick rate that meets your timing requirements. For battery-powered devices, consider tickless idle mode, which stops the tick interrupt during idle periods to save power.
Context switch overhead: On a Cortex-M4 at 100 MHz, a typical FreeRTOS context switch takes approximately 2-5 microseconds. This overhead is negligible for most applications but can become significant if context switches occur thousands of times per second. Minimize unnecessary switches by avoiding excessive task priorities and keeping critical sections short.
Priority inversion: When a low-priority task holds a mutex that a high-priority task is waiting for, the high-priority task is effectively blocked. RTOS mutexes implement priority inheritance to mitigate this. However, FreeRTOS’s priority inheritance is simplified — it does not handle nested mutexes perfectly. Design your mutex usage to avoid holding multiple mutexes simultaneously when possible.
Context switching is the fundamental mechanism that enables multitasking in an RTOS. On ARM Cortex-M processors, the PendSV exception provides a hardware-assisted, low-latency context switch that saves and restores task state efficiently. The priority-based preemptive scheduler ensures that the most critical task always runs, while round-robin time slicing provides fairness among equal-priority tasks. Understanding these mechanisms — what triggers a switch, how the PendSV handler works, and how task states interact with the scheduler — is essential for building reliable, deterministic embedded systems.
Quick Links
Legal Stuff





