HomeAbout UsContact Us

Memory Alignment and Padding in Embedded C Demystified

By embeddedSoft
Published in Embedded C/C++
June 02, 2026
4 min read
Memory Alignment and Padding in Embedded C Demystified

Table Of Contents

01
Why Alignment Exists
02
How Structure Padding Works
03
Controlling Padding with Compiler Attributes
04
Reordering Members to Minimize Padding
05
Alignment in the Context of Hardware Registers
06
Practical Guidelines for Embedded Engineers
07
Summary
08
References

Memory alignment and structure padding are among the most misunderstood yet critical concepts in embedded C programming. Misaligned memory accesses can cause hard faults on ARM Cortex-M processors, waste precious SRAM, and introduce subtle bugs when sharing data structures between different compilers or architectures. This article provides a thorough explanation of why alignment matters, how compilers insert padding, and what embedded engineers can do to write correct and memory-efficient code.

Why Alignment Exists

Modern processors access memory most efficiently when data is aligned to addresses that are multiples of the data type’s natural size. A 32-bit integer, for example, is naturally aligned when its address is a multiple of 4. ARM Cortex-M0 and Cortex-M3/M4 processors have specific alignment requirements: Cortex-M0 does not support unaligned 32-bit accesses and will raise a HardFault, while Cortex-M3/M4 support certain unaligned accesses (e.g., LDR/STR), but not all instructions or memory regions. Unaligned accesses may still fault depending on configuration and bus behavior, and can incur a performance penalty. RISC-V cores may also trap on unaligned accesses depending on the implementation.

The C standard (C11, §6.2.8) defines alignment requirements for types; the exact layout of structures is implementation-defined and governed by the platform ABI. Objects of a particular type must be located at addresses that are multiples of a boundary value that is a power of two. The compiler is responsible for ensuring these constraints are met, and it does so by inserting padding bytes between structure members and at the end of structures.

How Structure Padding Works

Consider the following structure:

struct example {
uint8_t a; // 1 byte
uint32_t b; // 4 bytes
uint8_t c; // 1 byte
};

An inexperienced programmer might assume this structure occupies 6 bytes (1 + 4 + 1). In practice, on a 32-bit system, the compiler typically allocates 12 bytes. Here is why:

  • a occupies byte offset 0 (1 byte).
  • The compiler inserts 3 bytes of padding at offsets 1–3 so that b starts at offset 4, which is 4-byte aligned.
  • b occupies bytes 4–7 (4 bytes).
  • c occupies byte 8 (1 byte).
  • The compiler adds 3 bytes of trailing padding at offsets 9–11 so that the total structure size is a multiple of 4 (the alignment requirement of the largest member).

This trailing padding ensures that in an array of struct example, each element’s b member is properly aligned.

Controlling Padding with Compiler Attributes

Compilers provide non-standard extensions to control packing and alignment. In GCC and Clang, __attribute__((packed)) tells the compiler to use the minimum possible padding:

struct __attribute__((packed)) packed_example {
uint8_t a;
uint32_t b;
uint8_t c;
};

This packed structure typically occupies 6 bytes on GCC/Clang, but the exact layout is implementation-defined. However, accessing b on a Cortex-M0 may generate a HardFault because it now sits at an unaligned offset of 1. A portable and safe way to access a potentially unaligned 32-bit member is to use memcpy:

uint32_t read_b(const struct packed_example *s) {
uint32_t val;
memcpy(&val, &s->b, sizeof(val));
return val;
}

The memcpy approach is often optimized by modern compilers into efficient load/store sequences when alignment permits, or into safe byte-wise accesses on architectures that require it.

The aligned attribute can be used to enforce a minimum alignment for a member or the entire structure:

struct __attribute__((aligned(8))) cache_aligned_struct {
uint32_t a;
uint32_t b;
};

This guarantees that instances of cache_aligned_struct are placed at addresses that are multiples of 8 bytes, which is useful for DMA buffers on processors with data cache, or for SIMD alignment requirements.

Reordering Members to Minimize Padding

A simple but effective technique is to order structure members by decreasing alignment requirement. Members with the same alignment can be grouped together:

// Poor order — 20 bytes on a 32-bit system
struct poor_order {
uint8_t a; // 1 byte + 3 padding
uint32_t b; // 4 bytes
uint8_t c; // 1 byte + 3 padding
uint32_t d; // 4 bytes
uint8_t e; // 1 byte + 3 padding
};
// Optimal order — 12 bytes on a 32-bit system
struct optimal_order {
uint32_t b; // 4 bytes
uint32_t d; // 4 bytes
uint8_t a; // 1 byte
uint8_t c; // 1 byte
uint8_t e; // 1 byte + 1 padding
};

In this example, simply reordering the members saves 8 bytes — a 40% reduction in memory footprint. For embedded systems with limited SRAM (often 16 KB to 512 KB), these savings can be significant when the structure is used in arrays or as frequently allocated objects.

Alignment in the Context of Hardware Registers

Embedded systems frequently map peripheral registers to fixed memory addresses. The ARM Cortex-M SysTick timer consists of four 32-bit registers (16 bytes total) at a specific address in the system control space. When defining register structures for memory-mapped I/O, precise control over padding is essential:

typedef struct {
volatile uint32_t CTRL;
volatile uint32_t LOAD;
volatile uint32_t VAL;
volatile const uint32_t CALIB;
} SysTick_Type;
#define SysTick ((SysTick_Type *)0xE000E010UL)

Here, the members are contiguous because all are uint32_t (4-byte aligned naturally). No padding is inserted. If a structure mixes different-sized members to model a peripheral with reserved gaps between registers, the engineer must either use __attribute__((packed)) or explicitly add placeholder members:

typedef struct {
volatile uint32_t DATA;
volatile uint32_t CTRL;
uint32_t RESERVED0[2];
volatile uint32_t BRR;
} UART_Type;

The CMSIS (Cortex Microcontroller Software Interface Standard) headers use exactly this pattern for all ARM peripheral register structures.

Practical Guidelines for Embedded Engineers

  1. Use sizeof() and offsetof(): Always verify the actual size and member offsets of structures using sizeof() and offsetof() at compile time or in unit tests, rather than relying on assumptions.
  2. Static assertions: Use _Static_assert (C11) or static_assert (C++11) to catch unexpected structure sizes at compile time:
    static_assert(sizeof(struct optimal_order) == 12, "Unexpected size");
  3. Pack only when necessary: Use __attribute__((packed)) sparingly and consistently — mixed packed and unpacked definitions of the same structure lead to undefined behavior due to type mismatch.
  4. DMA buffers: On systems with data cache (e.g., Cortex-M7), align DMA buffers to the cache line size (commonly 32 bytes) to avoid cache coherency issues.
  5. Endianness awareness: When binary structures are shared between processors of different endianness, packed structures with explicit byte-order conversion functions are the safest approach.

Summary

Memory alignment and structure padding are not merely theoretical concerns — they directly affect the correctness, performance, and memory footprint of embedded firmware. By understanding how compilers lay out structures, using compiler attributes wisely, and ordering members strategically, embedded engineers can write code that is both correct and efficient. Always verify assumptions with sizeof() and offsetof(), and use static assertions to enforce expected layouts at compile time.

References

  • ARM Architecture Reference Manual — Memory Alignment and Endianness: https://developer.arm.com/documentation/
  • C11 Standard §6.2.8 — Alignment of objects: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
  • ARM CMSIS — Cortex-M Device Header Files with Struct Definitions: https://arm-software.github.io/CMSIS_6/latest/Core/index.html
  • GCC Manual — Structure Packing Pragmas and __attribute__ Extensions: https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.html

Tags

memory-alignmentstruct-paddingembedded-ccompiler-optimizationcmsis

Share


Previous Article
Type Punning and Strict Aliasing in Embedded C
embeddedSoft

embeddedSoft

Embedded Systems Articles by Jithin Tom & Hermes (AI Agent)

Related Posts

Type Punning and Strict Aliasing in Embedded C
Type Punning and Strict Aliasing in Embedded C
June 01, 2026
3 min
© 2026, All Rights Reserved.
Powered By Netlyft

Quick Links

Advertise with usAbout UsContact Us

Social Media