HomeAbout UsContact Us

Struct Packing and Serialization for Embedded Protocols

By embeddedSoft
Published in Embedded C/C++
June 09, 2026
4 min read
Struct Packing and Serialization for Embedded Protocols

Table Of Contents

01
Introduction
02
Why Padding Exists and How to Detect It
03
Using __attribute__((packed)) to Eliminate Padding
04
Field Reordering: The Zero-Cost Alternative
05
Serialization: The Portable Approach
06
Endianness: The Invisible Byte Swap
07
Memory Layout Comparison
08
Practical Recommendations
09
Summary
10
References

Introduction

When embedded systems communicate — over UART, SPI, CAN, or Ethernet — they exchange binary data in tightly defined formats. A sensor might pack a 16-bit temperature reading and a 32-bit timestamp into a 6-byte frame. A bootloader expects a specific command structure with headers, payload lengths, and CRC fields. In all these cases, the C struct is the natural way to model the data layout.

But there is a problem: the compiler inserts padding bytes between struct members for alignment. A struct that you expect to be 6 bytes might be 8 or 12. When that struct gets sent over a wire or written to a memory-mapped register, the padding corrupts the protocol. This article covers the techniques to control struct layout, serialize data safely, and avoid the endianness and alignment pitfalls that cause subtle, hard-to-debug failures in embedded systems.

Why Padding Exists and How to Detect It

On ARM Cortex-M and most 32-bit architectures, the CPU accesses multi-byte values most efficiently when they are naturally aligned — a uint32_t at an address divisible by 4, a uint16_t at an even address. The compiler inserts padding bytes to guarantee this alignment for every struct member.

Consider this struct:

typedef struct {
uint8_t id; // 1 byte + 1 padding
uint16_t payload; // 2 bytes
uint8_t flags; // 1 byte + 3 padding
uint32_t timestamp; // 4 bytes
} sensor_packet_t;

On a 32-bit system, sizeof(sensor_packet_t) is likely 12 bytes, not 8. The compiler inserted 1 byte of padding after id to align payload to a 2-byte boundary, and 3 bytes after flags to align timestamp to a 4-byte boundary.

You can detect padding at compile time using static_assert and offsetof:

#include <stddef.h>
#include <stdint.h>
#include <assert.h>
// Expected: 1 + 1(pad) + 2 + 1 + 3(pad) + 4 = 12
static_assert(sizeof(sensor_packet_t) == 12,
"Unexpected padding in sensor_packet_t");
static_assert(offsetof(sensor_packet_t, timestamp) == 8,
"timestamp field offset is wrong");

Using __attribute__((packed)) to Eliminate Padding

GCC and Clang provide __attribute__((packed)) to tell the compiler to lay out struct members with zero padding:

typedef struct __attribute__((packed)) {
uint8_t id; // 1 byte
uint16_t payload; // 2 bytes
uint8_t flags; // 1 byte
uint32_t timestamp; // 4 bytes
} sensor_packet_t_packed; // sizeof == 8

Now sizeof(sensor_packet_t_packed) is exactly 8 bytes — the sum of its members. This is essential when the struct maps directly to a wire protocol or hardware register layout.

However, packed structs come with two important caveats:

  1. Unaligned access: The uint16_t payload at offset 1 and uint32_t timestamp at offset 4 may not be naturally aligned. On ARM Cortex-M0/M0+, unaligned access causes a HardFault. On Cortex-M3 and later, the hardware handles unaligned 16-bit and 32-bit accesses, but with a performance penalty.

  2. Pointer danger: Taking the address of a packed struct member creates a pointer that may be unaligned. Dereferencing it through a normal pointer type can fault on strict-alignment architectures:

sensor_packet_t_packed pkt;
uint16_t *p = &pkt.payload; // DANGEROUS: p may be unaligned
// On Cortex-M0+, *p could HardFault

The compiler handles direct member access (pkt.payload) correctly for packed structs — it generates byte-by-byte access instructions. But a raw pointer loses that information.

Field Reordering: The Zero-Cost Alternative

Instead of using __attribute__((packed)), you can reorder struct members to minimize padding naturally:

// Original: 12 bytes (with padding)
typedef struct {
uint8_t id; // offset 0, 1 byte + 3 pad
uint32_t timestamp; // offset 4, 4 bytes
uint8_t flags; // offset 8, 1 byte + 1 pad
uint16_t payload; // offset 10, 2 bytes
} sensor_packet_t; // sizeof == 12
// Reordered: 8 bytes (no padding needed)
typedef struct {
uint32_t timestamp; // offset 0, 4 bytes
uint16_t payload; // offset 4, 2 bytes
uint8_t id; // offset 6, 1 byte
uint8_t flags; // offset 7, 1 byte
} sensor_packet_t; // sizeof == 8

Rule of thumb: Place the largest members first, then descending by size. This eliminates padding on virtually all architectures without needing __attribute__((packed)) and without any unaligned access risk.

Serialization: The Portable Approach

For protocol structs where layout must match exactly — especially across different architectures or compilers — the most robust approach is explicit serialization. Instead of casting a struct to a byte array, write each field byte by byte:

#include <stdint.h>
#include <string.h>
typedef struct {
uint32_t sequence;
uint16_t command;
uint8_t channel;
uint8_t status;
} protocol_cmd_t;
/* Serialize a 32-bit value into a byte buffer in big-endian order */
static inline void serialize_u32_be(uint8_t *dst, uint32_t val) {
dst[0] = (uint8_t)(val >> 24);
dst[1] = (uint8_t)(val >> 16);
dst[2] = (uint8_t)(val >> 8);
dst[3] = (uint8_t)(val);
}
/* Serialize a 16-bit value into a byte buffer in big-endian order */
static inline void serialize_u16_be(uint8_t *dst, uint16_t val) {
dst[0] = (uint8_t)(val >> 8);
dst[1] = (uint8_t)(val);
}
/* Pack a protocol_cmd_t into an 8-byte wire frame (big-endian) */
void protocol_cmd_serialize(const protocol_cmd_t *cmd,
uint8_t frame[8])
{
serialize_u32_be(&frame[0], cmd->sequence);
serialize_u16_be(&frame[4], cmd->command);
frame[6] = cmd->channel;
frame[7] = cmd->status;
}

This approach has several advantages:

  • No struct padding issues: The wire frame is always exactly 8 bytes.
  • Endianness is explicit: Big-endian is used here (common for network protocols). Change serialize_u32_be to shift in reverse order for little-endian.
  • No unaligned access: Each field is written as individual bytes.
  • Portable: Works identically on every architecture and compiler.

The corresponding deserialization function reads bytes back:

static inline uint32_t deserialize_u32_be(const uint8_t *src) {
return ((uint32_t)src[0] << 24) |
((uint32_t)src[1] << 16) |
((uint32_t)src[2] << 8) |
((uint32_t)src[3]);
}
static inline uint16_t deserialize_u16_be(const uint8_t *src) {
return (uint16_t)((src[0] << 8) | src[1]);
}
void protocol_cmd_deserialize(protocol_cmd_t *cmd,
const uint8_t frame[8])
{
cmd->sequence = deserialize_u32_be(&frame[0]);
cmd->command = deserialize_u16_be(&frame[4]);
cmd->channel = frame[6];
cmd->status = frame[7];
}

Endianness: The Invisible Byte Swap

ARM Cortex-M processors are little-endian by default: the least significant byte of a multi-byte value is stored at the lowest address. Network protocols like TCP/IP use big-endian (most significant byte first). If you cast a uint32_t to a byte array on a little-endian system and send it over the network, the receiver will interpret the bytes in the wrong order.

The standard conversion functions handle this:

FunctionMeaningExample
htons()Host to network (16-bit)0x1234{0x12, 0x34}
htonl()Host to network (32-bit)0x12345678{0x12, 0x34, 0x56, 0x78}
ntohs()Network to host (16-bit){0x12, 0x34}0x1234
ntohl()Network to host (32-bit){0x12, 0x34, 0x56, 0x78}0x12345678

On a big-endian system, these functions are no-ops. On little-endian (most ARM MCUs), they perform byte swaps.

For bare-metal systems without a POSIX layer, you can implement portable versions:

#include <stdint.h>
/* Compile-time endianness detection (GCC/Clang) */
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
#define cpu_to_le16(x) ((uint16_t)(x))
#define cpu_to_be16(x) __builtin_bswap16(x)
#define le16_to_cpu(x) ((uint16_t)(x))
#define be16_to_cpu(x) __builtin_bswap16(x)
#else
#define cpu_to_le16(x) __builtin_bswap16(x)
#define cpu_to_be16(x) ((uint16_t)(x))
#define le16_to_cpu(x) __builtin_bswap16(x)
#define be16_to_cpu(x) ((uint16_t)(x))
#endif

Memory Layout Comparison

Here is how the same logical data is laid out in memory under three approaches:

Natural struct (12 bytes, padded):
+---------+---------+---------+---------+
| id | pad | payload | pad | offset 0-3
+---------+---------+---------+---------+
| flags | pad | pad | pad | offset 4-7
+---------+---------+---------+---------+
|timestamp| ---- | ---- | ---- | offset 8-11
+---------+---------+---------+---------+
Packed struct (8 bytes, unaligned):
+---------+---------+---------+---------+
| id | payload | flags | ---- | offset 0-3
+---------+---------+---------+---------+
|timestamp| ---- | ---- | ---- | offset 4-7
+---------+---------+---------+---------+
Reordered struct (8 bytes, aligned):
+---------+---------+---------+---------+
|timestamp| ---- | ---- | ---- | offset 0-3
+---------+---------+---------+---------+
| payload | id | flags | ---- | offset 4-7
+---------+---------+---------+---------+
Serialized frame (8 bytes, wire format):
+---------+---------+---------+---------+
| sequence|(big-end)| ---- | ---- | offset 0-3
+---------+---------+---------+---------+
| cmd | chan | stat | -- | offset 4-7
+---------+---------+---------+---------+

Practical Recommendations

  1. Reorder fields first: Always try to arrange struct members in descending size order before reaching for __attribute__((packed)). This gives you compact layout with zero alignment risk.

  2. Use packed structs for register maps: When mapping a struct to hardware registers at a fixed base address, __attribute__((packed)) is appropriate because the register layout is defined by the hardware, not the compiler.

  3. Use explicit serialization for wire protocols: When sending data over UART, SPI, CAN, or Ethernet, serialize field by field. This eliminates all ambiguity about padding, alignment, and byte order.

  4. Use fixed-width types: Always use uint8_t, uint16_t, uint32_t from <stdint.h> for protocol and register structs. Never use int or long — their sizes vary across architectures.

  5. Validate with static assertions: Add static_assert(sizeof(your_struct) == expected_size) to catch layout changes at compile time when you change toolchains or target architectures.

  6. Beware of pointer casts: Casting a packed struct’s address to a pointer to a larger type (e.g., casting uint8_t* to uint32_t*) is undefined behavior if the resulting pointer is unaligned. Use memcpy() instead:

uint32_t value;
memcpy(&value, &frame[0], sizeof(value)); // Safe, even if unaligned
value = be32_to_cpu(value); // Then convert endianness

Summary

Struct packing and serialization are fundamental skills for embedded engineers working with binary protocols. The compiler’s default padding behavior, while correct for general-purpose code, can silently break protocol compatibility. By understanding alignment, using __attribute__((packed)) judiciously, reordering fields to minimize padding, and employing explicit serialization for wire formats, you can write robust, portable embedded code that communicates reliably across any architecture.

References


Tags

struct-packingserializationprotocolsembedded-cendianness

Share


Previous Article
Power Management Techniques for Battery-Powered Embedded Systems
embeddedSoft

embeddedSoft

Embedded Systems Articles by Jithin Tom & Hermes (AI Agent)

Related Posts

Compiler Attributes and Pragma Directives in Embedded C
Compiler Attributes and Pragma Directives in Embedded C
June 04, 2026
5 min
© 2026, All Rights Reserved.
Powered By Netlyft

Quick Links

Advertise with usAbout UsContact Us

Social Media