
Memory alignment and structure padding are among the most misunderstood yet critical concepts in embedded C programming. Misaligned memory accesses can cause hard faults on ARM Cortex-M processors, waste precious SRAM, and introduce subtle bugs when sharing data structures between different compilers or architectures. This article provides a thorough explanation of why alignment matters, how compilers insert padding, and what embedded engineers can do to write correct and memory-efficient code.
Modern processors access memory most efficiently when data is aligned to addresses that are multiples of the data type’s natural size. A 32-bit integer, for example, is naturally aligned when its address is a multiple of 4. ARM Cortex-M0 and Cortex-M3/M4 processors have specific alignment requirements: Cortex-M0 does not support unaligned 32-bit accesses and will raise a HardFault, while Cortex-M3/M4 support certain unaligned accesses (e.g., LDR/STR), but not all instructions or memory regions. Unaligned accesses may still fault depending on configuration and bus behavior, and can incur a performance penalty. RISC-V cores may also trap on unaligned accesses depending on the implementation.
The C standard (C11, §6.2.8) defines alignment requirements for types; the exact layout of structures is implementation-defined and governed by the platform ABI. Objects of a particular type must be located at addresses that are multiples of a boundary value that is a power of two. The compiler is responsible for ensuring these constraints are met, and it does so by inserting padding bytes between structure members and at the end of structures.
Consider the following structure:
struct example {uint8_t a; // 1 byteuint32_t b; // 4 bytesuint8_t c; // 1 byte};
An inexperienced programmer might assume this structure occupies 6 bytes (1 + 4 + 1). In practice, on a 32-bit system, the compiler typically allocates 12 bytes. Here is why:
a occupies byte offset 0 (1 byte).b starts at offset 4, which is 4-byte aligned.b occupies bytes 4–7 (4 bytes).c occupies byte 8 (1 byte).This trailing padding ensures that in an array of struct example, each element’s b member is properly aligned.
Compilers provide non-standard extensions to control packing and alignment. In GCC and Clang, __attribute__((packed)) tells the compiler to use the minimum possible padding:
struct __attribute__((packed)) packed_example {uint8_t a;uint32_t b;uint8_t c;};
This packed structure typically occupies 6 bytes on GCC/Clang, but the exact layout is implementation-defined. However, accessing b on a Cortex-M0 may generate a HardFault because it now sits at an unaligned offset of 1. A portable and safe way to access a potentially unaligned 32-bit member is to use memcpy:
uint32_t read_b(const struct packed_example *s) {uint32_t val;memcpy(&val, &s->b, sizeof(val));return val;}
The memcpy approach is often optimized by modern compilers into efficient load/store sequences when alignment permits, or into safe byte-wise accesses on architectures that require it.
The aligned attribute can be used to enforce a minimum alignment for a member or the entire structure:
struct __attribute__((aligned(8))) cache_aligned_struct {uint32_t a;uint32_t b;};
This guarantees that instances of cache_aligned_struct are placed at addresses that are multiples of 8 bytes, which is useful for DMA buffers on processors with data cache, or for SIMD alignment requirements.
A simple but effective technique is to order structure members by decreasing alignment requirement. Members with the same alignment can be grouped together:
// Poor order — 20 bytes on a 32-bit systemstruct poor_order {uint8_t a; // 1 byte + 3 paddinguint32_t b; // 4 bytesuint8_t c; // 1 byte + 3 paddinguint32_t d; // 4 bytesuint8_t e; // 1 byte + 3 padding};
// Optimal order — 12 bytes on a 32-bit systemstruct optimal_order {uint32_t b; // 4 bytesuint32_t d; // 4 bytesuint8_t a; // 1 byteuint8_t c; // 1 byteuint8_t e; // 1 byte + 1 padding};
In this example, simply reordering the members saves 8 bytes — a 40% reduction in memory footprint. For embedded systems with limited SRAM (often 16 KB to 512 KB), these savings can be significant when the structure is used in arrays or as frequently allocated objects.
Embedded systems frequently map peripheral registers to fixed memory addresses. The ARM Cortex-M SysTick timer consists of four 32-bit registers (16 bytes total) at a specific address in the system control space. When defining register structures for memory-mapped I/O, precise control over padding is essential:
typedef struct {volatile uint32_t CTRL;volatile uint32_t LOAD;volatile uint32_t VAL;volatile const uint32_t CALIB;} SysTick_Type;#define SysTick ((SysTick_Type *)0xE000E010UL)
Here, the members are contiguous because all are uint32_t (4-byte aligned naturally). No padding is inserted. If a structure mixes different-sized members to model a peripheral with reserved gaps between registers, the engineer must either use __attribute__((packed)) or explicitly add placeholder members:
typedef struct {volatile uint32_t DATA;volatile uint32_t CTRL;uint32_t RESERVED0[2];volatile uint32_t BRR;} UART_Type;
The CMSIS (Cortex Microcontroller Software Interface Standard) headers use exactly this pattern for all ARM peripheral register structures.
sizeof() and offsetof(): Always verify the actual size and member offsets of structures using sizeof() and offsetof() at compile time or in unit tests, rather than relying on assumptions._Static_assert (C11) or static_assert (C++11) to catch unexpected structure sizes at compile time:static_assert(sizeof(struct optimal_order) == 12, "Unexpected size");
__attribute__((packed)) sparingly and consistently — mixed packed and unpacked definitions of the same structure lead to undefined behavior due to type mismatch.Memory alignment and structure padding are not merely theoretical concerns — they directly affect the correctness, performance, and memory footprint of embedded firmware. By understanding how compilers lay out structures, using compiler attributes wisely, and ordering members strategically, embedded engineers can write code that is both correct and efficient. Always verify assumptions with sizeof() and offsetof(), and use static assertions to enforce expected layouts at compile time.
__attribute__ Extensions: https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.htmlQuick Links
Legal Stuff




