Struct Packing and Serialization for Embedded Protocols

By embeddedSoft

Published in Embedded C/C++

June 09, 2026

4 min read

Struct Packing and Serialization for Embedded Protocols

Introduction

Why Padding Exists and How to Detect It

Using __attribute__((packed)) to Eliminate Padding

Field Reordering: The Zero-Cost Alternative

Serialization: The Portable Approach

Endianness: The Invisible Byte Swap

Memory Layout Comparison

Practical Recommendations

Summary

References

Introduction

When embedded systems communicate — over UART, SPI, CAN, or Ethernet — they exchange binary data in tightly defined formats. A sensor might pack a 16-bit temperature reading and a 32-bit timestamp into a 6-byte frame. A bootloader expects a specific command structure with headers, payload lengths, and CRC fields. In all these cases, the C struct is the natural way to model the data layout.

But there is a problem: the compiler inserts padding bytes between struct members for alignment. A struct that you expect to be 6 bytes might be 8 or 12. When that struct gets sent over a wire or written to a memory-mapped register, the padding corrupts the protocol. This article covers the techniques to control struct layout, serialize data safely, and avoid the endianness and alignment pitfalls that cause subtle, hard-to-debug failures in embedded systems.

Why Padding Exists and How to Detect It

On ARM Cortex-M and most 32-bit architectures, the CPU accesses multi-byte values most efficiently when they are naturally aligned — a uint32_t at an address divisible by 4, a uint16_t at an even address. The compiler inserts padding bytes to guarantee this alignment for every struct member.

Consider this struct:

typedef struct {
    uint8_t  id;        // 1 byte  + 1 padding
    uint16_t payload;   // 2 bytes
    uint8_t  flags;     // 1 byte  + 3 padding
    uint32_t timestamp; // 4 bytes
} sensor_packet_t;

On a 32-bit system, sizeof(sensor_packet_t) is likely 12 bytes, not 8. The compiler inserted 1 byte of padding after id to align payload to a 2-byte boundary, and 3 bytes after flags to align timestamp to a 4-byte boundary.

You can detect padding at compile time using static_assert and offsetof:

#include <stddef.h>
#include <stdint.h>
#include <assert.h>

// Expected: 1 + 1(pad) + 2 + 1 + 3(pad) + 4 = 12
static_assert(sizeof(sensor_packet_t) == 12,
              "Unexpected padding in sensor_packet_t");

static_assert(offsetof(sensor_packet_t, timestamp) == 8,
              "timestamp field offset is wrong");

Using `attribute((packed))` to Eliminate Padding

GCC and Clang provide __attribute__((packed)) to tell the compiler to lay out struct members with zero padding:

typedef struct __attribute__((packed)) {
    uint8_t  id;        // 1 byte
    uint16_t payload;   // 2 bytes
    uint8_t  flags;     // 1 byte
    uint32_t timestamp; // 4 bytes
} sensor_packet_t_packed;  // sizeof == 8

Now sizeof(sensor_packet_t_packed) is exactly 8 bytes — the sum of its members. This is essential when the struct maps directly to a wire protocol or hardware register layout.

However, packed structs come with two important caveats:

Unaligned access: The uint16_t payload at offset 1 and uint32_t timestamp at offset 4 may not be naturally aligned. On ARM Cortex-M0/M0+, unaligned access causes a HardFault. On Cortex-M3 and later, the hardware handles unaligned 16-bit and 32-bit accesses, but with a performance penalty.
Pointer danger: Taking the address of a packed struct member creates a pointer that may be unaligned. Dereferencing it through a normal pointer type can fault on strict-alignment architectures:

sensor_packet_t_packed pkt;
uint16_t *p = &pkt.payload;  // DANGEROUS: p may be unaligned
// On Cortex-M0+, *p could HardFault

The compiler handles direct member access (pkt.payload) correctly for packed structs — it generates byte-by-byte access instructions. But a raw pointer loses that information.

Field Reordering: The Zero-Cost Alternative

Instead of using __attribute__((packed)), you can reorder struct members to minimize padding naturally:

// Original: 12 bytes (with padding)
typedef struct {
    uint8_t  id;        // offset 0, 1 byte  + 3 pad
    uint32_t timestamp; // offset 4, 4 bytes
    uint8_t  flags;     // offset 8, 1 byte  + 1 pad
    uint16_t payload;   // offset 10, 2 bytes
} sensor_packet_t;      // sizeof == 12

// Reordered: 8 bytes (no padding needed)
typedef struct {
    uint32_t timestamp; // offset 0, 4 bytes
    uint16_t payload;   // offset 4, 2 bytes
    uint8_t  id;        // offset 6, 1 byte
    uint8_t  flags;     // offset 7, 1 byte
} sensor_packet_t;      // sizeof == 8

Rule of thumb: Place the largest members first, then descending by size. This eliminates padding on virtually all architectures without needing __attribute__((packed)) and without any unaligned access risk.

Serialization: The Portable Approach

For protocol structs where layout must match exactly — especially across different architectures or compilers — the most robust approach is explicit serialization. Instead of casting a struct to a byte array, write each field byte by byte:

#include <stdint.h>
#include <string.h>

typedef struct {
    uint32_t sequence;
    uint16_t command;
    uint8_t  channel;
    uint8_t  status;
} protocol_cmd_t;

/* Serialize a 32-bit value into a byte buffer in big-endian order */
static inline void serialize_u32_be(uint8_t *dst, uint32_t val) {
    dst[0] = (uint8_t)(val >> 24);
    dst[1] = (uint8_t)(val >> 16);
    dst[2] = (uint8_t)(val >> 8);
    dst[3] = (uint8_t)(val);
}

/* Serialize a 16-bit value into a byte buffer in big-endian order */
static inline void serialize_u16_be(uint8_t *dst, uint16_t val) {
    dst[0] = (uint8_t)(val >> 8);
    dst[1] = (uint8_t)(val);
}

/* Pack a protocol_cmd_t into an 8-byte wire frame (big-endian) */
void protocol_cmd_serialize(const protocol_cmd_t *cmd,
                            uint8_t frame[8])
{
    serialize_u32_be(&frame[0], cmd->sequence);
    serialize_u16_be(&frame[4], cmd->command);
    frame[6] = cmd->channel;
    frame[7] = cmd->status;
}

This approach has several advantages:

No struct padding issues: The wire frame is always exactly 8 bytes.
Endianness is explicit: Big-endian is used here (common for network protocols). Change serialize_u32_be to shift in reverse order for little-endian.
No unaligned access: Each field is written as individual bytes.
Portable: Works identically on every architecture and compiler.

The corresponding deserialization function reads bytes back:

static inline uint32_t deserialize_u32_be(const uint8_t *src) {
    return ((uint32_t)src[0] << 24) |
           ((uint32_t)src[1] << 16) |
           ((uint32_t)src[2] << 8)  |
           ((uint32_t)src[3]);
}

static inline uint16_t deserialize_u16_be(const uint8_t *src) {
    return (uint16_t)((src[0] << 8) | src[1]);
}

void protocol_cmd_deserialize(protocol_cmd_t *cmd,
                              const uint8_t frame[8])
{
    cmd->sequence = deserialize_u32_be(&frame[0]);
    cmd->command  = deserialize_u16_be(&frame[4]);
    cmd->channel  = frame[6];
    cmd->status   = frame[7];
}

Endianness: The Invisible Byte Swap

ARM Cortex-M processors are little-endian by default: the least significant byte of a multi-byte value is stored at the lowest address. Network protocols like TCP/IP use big-endian (most significant byte first). If you cast a uint32_t to a byte array on a little-endian system and send it over the network, the receiver will interpret the bytes in the wrong order.

The standard conversion functions handle this:

Function	Meaning	Example
`htons()`	Host to network (16-bit)	`0x1234` → `{0x12, 0x34}`
`htonl()`	Host to network (32-bit)	`0x12345678` → `{0x12, 0x34, 0x56, 0x78}`
`ntohs()`	Network to host (16-bit)	`{0x12, 0x34}` → `0x1234`
`ntohl()`	Network to host (32-bit)	`{0x12, 0x34, 0x56, 0x78}` → `0x12345678`

On a big-endian system, these functions are no-ops. On little-endian (most ARM MCUs), they perform byte swaps.

For bare-metal systems without a POSIX layer, you can implement portable versions:

#include <stdint.h>

/* Compile-time endianness detection (GCC/Clang) */
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
  #define cpu_to_le16(x) ((uint16_t)(x))
  #define cpu_to_be16(x) __builtin_bswap16(x)
  #define le16_to_cpu(x) ((uint16_t)(x))
  #define be16_to_cpu(x) __builtin_bswap16(x)
#else
  #define cpu_to_le16(x) __builtin_bswap16(x)
  #define cpu_to_be16(x) ((uint16_t)(x))
  #define le16_to_cpu(x) __builtin_bswap16(x)
  #define be16_to_cpu(x) ((uint16_t)(x))
#endif

Memory Layout Comparison

Here is how the same logical data is laid out in memory under three approaches:

Natural struct (12 bytes, padded):
+---------+---------+---------+---------+
|    id   |   pad   | payload |   pad   |  offset 0-3
+---------+---------+---------+---------+
|  flags  |   pad   |   pad   |   pad   |  offset 4-7
+---------+---------+---------+---------+
|timestamp|   ----  |   ----  |   ----  |  offset 8-11
+---------+---------+---------+---------+

Packed struct (8 bytes, unaligned):
+---------+---------+---------+---------+
|    id   | payload |  flags  |   ----  |  offset 0-3
+---------+---------+---------+---------+
|timestamp|   ----  |   ----  |   ----  |  offset 4-7
+---------+---------+---------+---------+

Reordered struct (8 bytes, aligned):
+---------+---------+---------+---------+
|timestamp|   ----  |   ----  |   ----  |  offset 0-3
+---------+---------+---------+---------+
| payload |    id   |  flags  |   ----  |  offset 4-7
+---------+---------+---------+---------+

Serialized frame (8 bytes, wire format):
+---------+---------+---------+---------+
| sequence|(big-end)|   ----  |   ----  |  offset 0-3
+---------+---------+---------+---------+
|   cmd   |   chan  |   stat  |    --   |  offset 4-7
+---------+---------+---------+---------+

Practical Recommendations

Reorder fields first: Always try to arrange struct members in descending size order before reaching for __attribute__((packed)). This gives you compact layout with zero alignment risk.
Use packed structs for register maps: When mapping a struct to hardware registers at a fixed base address, __attribute__((packed)) is appropriate because the register layout is defined by the hardware, not the compiler.
Use explicit serialization for wire protocols: When sending data over UART, SPI, CAN, or Ethernet, serialize field by field. This eliminates all ambiguity about padding, alignment, and byte order.
Use fixed-width types: Always use uint8_t, uint16_t, uint32_t from <stdint.h> for protocol and register structs. Never use int or long — their sizes vary across architectures.
Validate with static assertions: Add static_assert(sizeof(your_struct) == expected_size) to catch layout changes at compile time when you change toolchains or target architectures.
Beware of pointer casts: Casting a packed struct’s address to a pointer to a larger type (e.g., casting uint8_t* to uint32_t*) is undefined behavior if the resulting pointer is unaligned. Use memcpy() instead:

uint32_t value;
memcpy(&value, &frame[0], sizeof(value));  // Safe, even if unaligned
value = be32_to_cpu(value);                // Then convert endianness

Summary

Struct packing and serialization are fundamental skills for embedded engineers working with binary protocols. The compiler’s default padding behavior, while correct for general-purpose code, can silently break protocol compatibility. By understanding alignment, using __attribute__((packed)) judiciously, reordering fields to minimize padding, and employing explicit serialization for wire formats, you can write robust, portable embedded code that communicates reliably across any architecture.