
When embedded systems communicate — over UART, SPI, CAN, or Ethernet — they exchange binary data in tightly defined formats. A sensor might pack a 16-bit temperature reading and a 32-bit timestamp into a 6-byte frame. A bootloader expects a specific command structure with headers, payload lengths, and CRC fields. In all these cases, the C struct is the natural way to model the data layout.
But there is a problem: the compiler inserts padding bytes between struct members for alignment. A struct that you expect to be 6 bytes might be 8 or 12. When that struct gets sent over a wire or written to a memory-mapped register, the padding corrupts the protocol. This article covers the techniques to control struct layout, serialize data safely, and avoid the endianness and alignment pitfalls that cause subtle, hard-to-debug failures in embedded systems.
On ARM Cortex-M and most 32-bit architectures, the CPU accesses multi-byte values most efficiently when they are naturally aligned — a uint32_t at an address divisible by 4, a uint16_t at an even address. The compiler inserts padding bytes to guarantee this alignment for every struct member.
Consider this struct:
typedef struct {uint8_t id; // 1 byte + 1 paddinguint16_t payload; // 2 bytesuint8_t flags; // 1 byte + 3 paddinguint32_t timestamp; // 4 bytes} sensor_packet_t;
On a 32-bit system, sizeof(sensor_packet_t) is likely 12 bytes, not 8. The compiler inserted 1 byte of padding after id to align payload to a 2-byte boundary, and 3 bytes after flags to align timestamp to a 4-byte boundary.
You can detect padding at compile time using static_assert and offsetof:
#include <stddef.h>#include <stdint.h>#include <assert.h>// Expected: 1 + 1(pad) + 2 + 1 + 3(pad) + 4 = 12static_assert(sizeof(sensor_packet_t) == 12,"Unexpected padding in sensor_packet_t");static_assert(offsetof(sensor_packet_t, timestamp) == 8,"timestamp field offset is wrong");
__attribute__((packed)) to Eliminate PaddingGCC and Clang provide __attribute__((packed)) to tell the compiler to lay out struct members with zero padding:
typedef struct __attribute__((packed)) {uint8_t id; // 1 byteuint16_t payload; // 2 bytesuint8_t flags; // 1 byteuint32_t timestamp; // 4 bytes} sensor_packet_t_packed; // sizeof == 8
Now sizeof(sensor_packet_t_packed) is exactly 8 bytes — the sum of its members. This is essential when the struct maps directly to a wire protocol or hardware register layout.
However, packed structs come with two important caveats:
Unaligned access: The uint16_t payload at offset 1 and uint32_t timestamp at offset 4 may not be naturally aligned. On ARM Cortex-M0/M0+, unaligned access causes a HardFault. On Cortex-M3 and later, the hardware handles unaligned 16-bit and 32-bit accesses, but with a performance penalty.
Pointer danger: Taking the address of a packed struct member creates a pointer that may be unaligned. Dereferencing it through a normal pointer type can fault on strict-alignment architectures:
sensor_packet_t_packed pkt;uint16_t *p = &pkt.payload; // DANGEROUS: p may be unaligned// On Cortex-M0+, *p could HardFault
The compiler handles direct member access (pkt.payload) correctly for packed structs — it generates byte-by-byte access instructions. But a raw pointer loses that information.
Instead of using __attribute__((packed)), you can reorder struct members to minimize padding naturally:
// Original: 12 bytes (with padding)typedef struct {uint8_t id; // offset 0, 1 byte + 3 paduint32_t timestamp; // offset 4, 4 bytesuint8_t flags; // offset 8, 1 byte + 1 paduint16_t payload; // offset 10, 2 bytes} sensor_packet_t; // sizeof == 12// Reordered: 8 bytes (no padding needed)typedef struct {uint32_t timestamp; // offset 0, 4 bytesuint16_t payload; // offset 4, 2 bytesuint8_t id; // offset 6, 1 byteuint8_t flags; // offset 7, 1 byte} sensor_packet_t; // sizeof == 8
Rule of thumb: Place the largest members first, then descending by size. This eliminates padding on virtually all architectures without needing __attribute__((packed)) and without any unaligned access risk.
For protocol structs where layout must match exactly — especially across different architectures or compilers — the most robust approach is explicit serialization. Instead of casting a struct to a byte array, write each field byte by byte:
#include <stdint.h>#include <string.h>typedef struct {uint32_t sequence;uint16_t command;uint8_t channel;uint8_t status;} protocol_cmd_t;/* Serialize a 32-bit value into a byte buffer in big-endian order */static inline void serialize_u32_be(uint8_t *dst, uint32_t val) {dst[0] = (uint8_t)(val >> 24);dst[1] = (uint8_t)(val >> 16);dst[2] = (uint8_t)(val >> 8);dst[3] = (uint8_t)(val);}/* Serialize a 16-bit value into a byte buffer in big-endian order */static inline void serialize_u16_be(uint8_t *dst, uint16_t val) {dst[0] = (uint8_t)(val >> 8);dst[1] = (uint8_t)(val);}/* Pack a protocol_cmd_t into an 8-byte wire frame (big-endian) */void protocol_cmd_serialize(const protocol_cmd_t *cmd,uint8_t frame[8]){serialize_u32_be(&frame[0], cmd->sequence);serialize_u16_be(&frame[4], cmd->command);frame[6] = cmd->channel;frame[7] = cmd->status;}
This approach has several advantages:
serialize_u32_be to shift in reverse order for little-endian.The corresponding deserialization function reads bytes back:
static inline uint32_t deserialize_u32_be(const uint8_t *src) {return ((uint32_t)src[0] << 24) |((uint32_t)src[1] << 16) |((uint32_t)src[2] << 8) |((uint32_t)src[3]);}static inline uint16_t deserialize_u16_be(const uint8_t *src) {return (uint16_t)((src[0] << 8) | src[1]);}void protocol_cmd_deserialize(protocol_cmd_t *cmd,const uint8_t frame[8]){cmd->sequence = deserialize_u32_be(&frame[0]);cmd->command = deserialize_u16_be(&frame[4]);cmd->channel = frame[6];cmd->status = frame[7];}
ARM Cortex-M processors are little-endian by default: the least significant byte of a multi-byte value is stored at the lowest address. Network protocols like TCP/IP use big-endian (most significant byte first). If you cast a uint32_t to a byte array on a little-endian system and send it over the network, the receiver will interpret the bytes in the wrong order.
The standard conversion functions handle this:
| Function | Meaning | Example |
|---|---|---|
htons() | Host to network (16-bit) | 0x1234 → {0x12, 0x34} |
htonl() | Host to network (32-bit) | 0x12345678 → {0x12, 0x34, 0x56, 0x78} |
ntohs() | Network to host (16-bit) | {0x12, 0x34} → 0x1234 |
ntohl() | Network to host (32-bit) | {0x12, 0x34, 0x56, 0x78} → 0x12345678 |
On a big-endian system, these functions are no-ops. On little-endian (most ARM MCUs), they perform byte swaps.
For bare-metal systems without a POSIX layer, you can implement portable versions:
#include <stdint.h>/* Compile-time endianness detection (GCC/Clang) */#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__#define cpu_to_le16(x) ((uint16_t)(x))#define cpu_to_be16(x) __builtin_bswap16(x)#define le16_to_cpu(x) ((uint16_t)(x))#define be16_to_cpu(x) __builtin_bswap16(x)#else#define cpu_to_le16(x) __builtin_bswap16(x)#define cpu_to_be16(x) ((uint16_t)(x))#define le16_to_cpu(x) __builtin_bswap16(x)#define be16_to_cpu(x) ((uint16_t)(x))#endif
Here is how the same logical data is laid out in memory under three approaches:
Natural struct (12 bytes, padded):+---------+---------+---------+---------+| id | pad | payload | pad | offset 0-3+---------+---------+---------+---------+| flags | pad | pad | pad | offset 4-7+---------+---------+---------+---------+|timestamp| ---- | ---- | ---- | offset 8-11+---------+---------+---------+---------+Packed struct (8 bytes, unaligned):+---------+---------+---------+---------+| id | payload | flags | ---- | offset 0-3+---------+---------+---------+---------+|timestamp| ---- | ---- | ---- | offset 4-7+---------+---------+---------+---------+Reordered struct (8 bytes, aligned):+---------+---------+---------+---------+|timestamp| ---- | ---- | ---- | offset 0-3+---------+---------+---------+---------+| payload | id | flags | ---- | offset 4-7+---------+---------+---------+---------+Serialized frame (8 bytes, wire format):+---------+---------+---------+---------+| sequence|(big-end)| ---- | ---- | offset 0-3+---------+---------+---------+---------+| cmd | chan | stat | -- | offset 4-7+---------+---------+---------+---------+
Reorder fields first: Always try to arrange struct members in descending size order before reaching for __attribute__((packed)). This gives you compact layout with zero alignment risk.
Use packed structs for register maps: When mapping a struct to hardware registers at a fixed base address, __attribute__((packed)) is appropriate because the register layout is defined by the hardware, not the compiler.
Use explicit serialization for wire protocols: When sending data over UART, SPI, CAN, or Ethernet, serialize field by field. This eliminates all ambiguity about padding, alignment, and byte order.
Use fixed-width types: Always use uint8_t, uint16_t, uint32_t from <stdint.h> for protocol and register structs. Never use int or long — their sizes vary across architectures.
Validate with static assertions: Add static_assert(sizeof(your_struct) == expected_size) to catch layout changes at compile time when you change toolchains or target architectures.
Beware of pointer casts: Casting a packed struct’s address to a pointer to a larger type (e.g., casting uint8_t* to uint32_t*) is undefined behavior if the resulting pointer is unaligned. Use memcpy() instead:
uint32_t value;memcpy(&value, &frame[0], sizeof(value)); // Safe, even if unalignedvalue = be32_to_cpu(value); // Then convert endianness
Struct packing and serialization are fundamental skills for embedded engineers working with binary protocols. The compiler’s default padding behavior, while correct for general-purpose code, can silently break protocol compatibility. By understanding alignment, using __attribute__((packed)) judiciously, reordering fields to minimize padding, and employing explicit serialization for wire formats, you can write robust, portable embedded code that communicates reliably across any architecture.
__attribute__((packed)) documentation: https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.htmlQuick Links
Legal Stuff





