Target Triples in headerkit¶
What is a target triple?¶
A target triple is a string that identifies the compilation target: the combination of processor architecture, vendor, operating system, and optionally the runtime environment. The format is:
Examples:
| Triple | Meaning |
|---|---|
x86_64-pc-linux-gnu |
64-bit x86, PC vendor, Linux, GNU libc |
aarch64-apple-darwin25.3.0 |
ARM64, Apple, macOS 25.3 |
x86_64-pc-windows-msvc |
64-bit x86, PC, Windows, MSVC ABI |
i686-pc-windows-msvc |
32-bit x86, PC, Windows, MSVC ABI |
aarch64-unknown-linux-gnu |
ARM64, unknown vendor, Linux, GNU libc |
armv7-unknown-linux-gnueabihf |
ARMv7, Linux, GNU hard-float ABI |
x86_64-unknown-linux-musl |
64-bit x86, Linux, musl libc |
The name "triple" is historical - it originally had exactly three parts. Modern triples can have 3 to 5 components. There is no formal standard or standards body that assigns these names. LLVM is the de facto authority.
Why headerkit uses target triples¶
headerkit parses C/C++ headers and generates Python bindings. The parsed IR (intermediate representation) is the result of C preprocessing, which is target-sensitive:
#ifdef _WIN64expands differently for 32-bit vs 64-bit Windows targetssizeof(void*)is 4 on 32-bit, 8 on 64-bitsizeof(long)is 4 on Windows, 8 on Linux 64-bit- System headers differ between OS versions (new functions, changed structs)
- ABI matters:
gnuvsmuslcan affect struct layouts
The cache key for parsed IR must capture all of these distinctions. A target triple encodes exactly this information in a single canonical string.
What the cache key represents¶
The cache key answers: "if I parse this header for this target, will I get the same IR?" Two cache entries should match if and only if their preprocessing output would be identical. The target triple is the right granularity for this because it's what the C preprocessor uses to make its decisions.
Detecting the target triple¶
The problem with host detection¶
The naive approach is to ask the system what platform it is:
import sys, platform
sys.platform # "darwin", "linux", "win32"
platform.machine() # "x86_64", "arm64", "AMD64"
This gives the host platform, not the target. They differ when:
- 32-bit Python runs on a 64-bit OS (common on Windows via cibuildwheel)
- Cross-compiling for a different architecture entirely
- Running in an emulated environment (QEMU, Rosetta 2)
Common approaches like platform.machine(), cc -dumpmachine, and
struct.calcsize("P") each have limitations: they report the host machine,
the compiler's default target, or the pointer width respectively -- none
directly give the process's build target. headerkit sidesteps all of these
by using HOST_GNU_TYPE, which is the actual triple from autoconf baked
into the Python build at compile time.
headerkit's detection algorithm¶
headerkit resolves the target triple via config precedence:
- Explicit
targetkwarg togenerate()(highest priority) HEADERKIT_TARGETenvironment variable[tool.headerkit] targetin pyproject.toml- Auto-detection via
detect_process_triple()
The auto-detection uses one signal per platform:
- POSIX (Linux, macOS, BSDs):
sysconfig.get_config_var('HOST_GNU_TYPE')-- the--hostvalue from autoconf, baked into the Python build at compile time. This is inherently process-aware: a 32-bit Python build has a 32-bitHOST_GNU_TYPE, so no pointer-width correction is needed. - Windows:
sysconfig.get_platform()-- returnswin-amd64,win32, orwin-arm64. Mapped to LLVM-style triples (e.g.,x86_64-pc-windows-msvc).
For cross-compilation, set --target, HEADERKIT_TARGET, or
[tool.headerkit] target explicitly rather than relying on auto-detection.
Why HOST_GNU_TYPE?¶
HOST_GNU_TYPE is the most direct signal for "what target was this Python
built for." Unlike sysconfig.get_platform() (which returns a lossy platform
tag like linux-x86_64 with no libc flavor) or cc -dumpmachine (which
reports the compiler's default target, not the process), HOST_GNU_TYPE is
the actual triple from autoconf and includes vendor, OS, and libc flavor
(e.g., x86_64-pc-linux-gnu vs x86_64-pc-linux-musl on Python 3.13+).
musl libc detection¶
On Linux, the libc flavor (glibc vs musl) matters for C preprocessing:
different system headers, potentially different struct layouts. HOST_GNU_TYPE
correctly reports linux-musl on Python 3.13+ (fixed in CPython issue #95855).
On pre-3.13 Python, HOST_GNU_TYPE may report linux-gnu even when the
interpreter is linked against musl (CPython issue #87278).
headerkit corrects this with a runtime libc sniff using
os.confstr('CS_GNU_LIBC_VERSION'). On glibc, this returns a version string
(e.g., 'glibc 2.35'). On musl, it raises ValueError or OSError. This
is process-aware: it checks what THIS interpreter links against, not what
libraries are installed on the system.
cibuildwheel integration¶
cibuildwheel invokes PEP 517 build
backends once per architecture per wheel. On Linux, it uses QEMU emulation,
so the running Python IS the target architecture and HOST_GNU_TYPE is
correct. On macOS, it downloads the matching Python for each arch. On
Windows, it uses native builds per arch. In all cases, headerkit's
auto-detection works without special hooks or configuration.
Normalization (user input only)¶
normalize_triple() canonicalizes user-provided triples (--target,
HEADERKIT_TARGET, config file). Auto-detected triples from
detect_process_triple() are already canonical and bypass normalization.
- Lowercases all components
- Inserts
unknownvendor for 3-component triples missing it:x86_64-linux-gnu->x86_64-unknown-linux-gnu
Architecture names are used as-is. If you specify --target arm64-apple-darwin
and auto-detect would produce aarch64-apple-darwin, those are different
cache keys. This is intentional: --target means "use this exact triple."
What flows where¶
The resolved triple is used in two places:
- Cache key: the full triple (including OS version) goes into the SHA-256 hash for the IR cache key. This ensures cache entries are correctly invalidated when the target changes.
-targetflag: the full triple is passed to libclang via-targetso that preprocessing reflects the correct target platform.
The slug (human-readable cache directory name) uses a shortened form
(arch-os with version stripped) for readability:
x86_64-pc-linux-gnu -> x86_64-linux in the directory name.
Cross-compilation workflow¶
With target triple support, cross-compilation is straightforward:
# Generate bindings for ARM64 Linux while running on x86_64 macOS
from headerkit import generate
output = generate(
"mylib.h",
target="aarch64-unknown-linux-gnu",
writer_name="cffi",
)
Or via CLI:
Or via environment variable (useful in CI):
The cache will store entries keyed by target triple, so the same machine can build for multiple targets and each gets its own cache entry.
Limitations and future work¶
System headers in cross-compilation¶
When cross-compiling, libclang needs access to the target platform's system
headers (sysroot). headerkit passes -target to libclang but does not
automatically locate or configure a sysroot. Users must provide include paths
via -I flags or include_dirs for target-specific headers.
Python cross-compilation ecosystem¶
PEP 720 documents the challenges of
cross-compiling Python packages. The Python packaging ecosystem lacks
standardized cross-compilation infrastructure. headerkit uses
HOST_GNU_TYPE (which is already baked into every autoconf-built Python)
for native detection and relies on explicit --target for
cross-compilation. If a future PEP standardizes a cross-compilation
signaling mechanism, headerkit can adopt it.
No formal triple standard¶
As noted in "What the Hell Is a Target Triple?", there is no formal standard for triple format. LLVM and GCC triples are similar but not identical. headerkit follows LLVM conventions since it uses libclang as its parsing backend.