Architecture Overview¶

headerkit is organized around a three-layer pipeline: backends parse C/C++ headers, producing an IR (Intermediate Representation), which writers consume to generate output.

The Pipeline¶

graph TD
    A["C/C++ Source Code"] --> B
    B["Backend<br>(ParserBackend protocol)"] --> C
    C["IR<br>(Header, Declaration, TypeExpr)"] --> D
    D["Writer<br>(WriterBackend protocol)"] --> E
    E["Output String<br>(CFFI cdef, ctypes, Cython .pxd, ...)"]

    B -.- B1["e.g., LibclangBackend"]
    D -.- D1["e.g., CffiWriter, CtypesWriter,<br>CythonWriter, LuaWriter, ..."]

Each layer is independent. Backends know nothing about writers. Writers know nothing about backends. The IR is the contract between them.

Layer 1: Backends (Parsing)¶

A backend implements the ParserBackend protocol and converts C/C++ source code into IR.

from headerkit import ParserBackend
from headerkit.ir import Header

class ParserBackend(Protocol):
    def parse(
        self,
        code: str,
        filename: str,
        include_dirs: list[str] | None = None,
        extra_args: list[str] | None = None,
        *,
        use_default_includes: bool = True,
        recursive_includes: bool = True,
        max_depth: int = 10,
        project_prefixes: tuple[str, ...] | None = None,
    ) -> Header: ...

    @property
    def name(self) -> str: ...

    @property
    def supports_macros(self) -> bool: ...

    @property
    def supports_cpp(self) -> bool: ...

Built-in Backend: LibclangBackend¶

The LibclangBackend uses LLVM's libclang to parse headers. It provides:

Full C and C++ support (templates, namespaces, classes)
Preprocessor handling (#include, #define, #ifdef)
Source location tracking for error reporting
Recursive include processing for umbrella headers

from headerkit import get_backend

backend = get_backend("libclang")
header = backend.parse(code, "myheader.h")

Backend Registry¶

Backends register themselves using register_backend():

from headerkit.backends import register_backend

register_backend("mybackend", MyBackendClass, is_default=False)

Registry functions:

Function	Description
`get_backend(name=None)`	Get a backend instance (default if `name` is `None`)
`list_backends()`	List all registered backend names
`is_backend_available(name)`	Check if a backend is usable (real load test for libclang)
`register_backend(name, cls)`	Register a new backend

See Writing Custom Backends for a complete guide.

Layer 2: IR (Intermediate Representation)¶

The IR is a tree of Python dataclasses rooted at Header. It is designed to be parser-agnostic: any backend that can parse C/C++ can produce the same IR.

Type Expressions¶

Type expressions (TypeExpr) represent C types as composable trees:

classDiagram
    class TypeExpr {
        <<protocol>>
    }
    class CType {
        name: str
        qualifiers: list[str]
    }
    class Pointer {
        pointee: TypeExpr
        qualifiers: list[str]
    }
    class Array {
        element_type: TypeExpr
        size: int | None
    }
    class FunctionPointer {
        return_type: TypeExpr
        parameters: list[Parameter]
        is_variadic: bool
    }

    TypeExpr <|-- CType
    TypeExpr <|-- Pointer
    TypeExpr <|-- Array
    TypeExpr <|-- FunctionPointer
    Pointer --> TypeExpr : pointee
    Array --> TypeExpr : element_type
    FunctionPointer --> TypeExpr : return_type

Class	Represents	Example
`CType`	Base type with qualifiers	`int`, `const char`, `unsigned long`
`Pointer`	Pointer to another type	`int`, `const char`, `void**`
`Array`	Fixed or flexible array	`int[10]`, `char[]`
`FunctionPointer`	Function pointer	`void ()(int, char)`

Types compose naturally:

from headerkit import CType, Pointer, Array

# const char*
const_char_ptr = Pointer(CType("char", ["const"]))

# int**
int_ptr_ptr = Pointer(Pointer(CType("int")))

# const char*[]
string_array = Array(Pointer(CType("char", ["const"])))

Declarations¶

Declarations (Declaration) represent top-level C/C++ constructs:

classDiagram
    class Declaration {
        <<protocol>>
        name: str | None
        location: SourceLocation | None
    }
    class Struct {
        fields: list[Field]
        is_union: bool
        is_typedef: bool
    }
    class Enum {
        values: list[EnumValue]
        is_typedef: bool
    }
    class Function {
        return_type: TypeExpr
        parameters: list[Parameter]
        is_variadic: bool
    }
    class Typedef {
        underlying_type: TypeExpr
    }
    class Variable {
        type: TypeExpr
    }
    class Constant {
        value: int | str | None
        is_macro: bool
    }

    Declaration <|-- Struct
    Declaration <|-- Enum
    Declaration <|-- Function
    Declaration <|-- Typedef
    Declaration <|-- Variable
    Declaration <|-- Constant

Class	Represents
`Struct`	Structs, unions, and C++ classes
`Enum`	Enumerations with named constants
`Function`	Function prototypes
`Typedef`	Type aliases
`Variable`	Global/extern variables
`Constant`	`#define` macros and `const` values

Header is the top-level container returned by all backends:

from headerkit.ir import Header

# Header fields:
#   path: str                        -- original file path
#   declarations: list[Declaration]  -- all extracted declarations
#   included_headers: set[str]       -- basenames of included headers

Layer 3: Writers (Output)¶

A writer implements the WriterBackend protocol and converts IR into a string output:

from headerkit.writers import WriterBackend
from headerkit.ir import Header

class WriterBackend(Protocol):
    def write(self, header: Header) -> str: ...

    @property
    def name(self) -> str: ...

    @property
    def format_description(self) -> str: ...

Writer-specific options (e.g., exclude_patterns for CFFI, indent for JSON) are constructor parameters on the concrete class, not part of the write() method signature.

Built-in Writers¶

Writer	Registry Name	Output	Constructor Options
`CffiWriter`	`cffi` (default)	CFFI `cdef` strings	`exclude_patterns: list[str] \\| None`
`CtypesWriter`	`ctypes`	Python ctypes binding modules	`lib_name: str`
`CythonWriter`	`cython`	Cython `.pxd` declarations	--
`DiffWriter`	`diff`	API compatibility diff reports (JSON or Markdown)	`baseline: Header \\| None`, `format: str`
`JsonWriter`	`json`	JSON serialization of IR	`indent: int \\| None`
`LuaWriter`	`lua`	LuaJIT FFI bindings	--
`PromptWriter`	`prompt`	Token-optimized output for LLM context	`verbosity: str`

Writer Registry¶

Writers use the same registry pattern as backends:

from headerkit.writers import register_writer

register_writer("mywriter", MyWriterClass, description="My custom output format")

Registry functions:

Function	Description
`get_writer(name=None, **kwargs)`	Get a writer instance; kwargs forwarded to constructor
`list_writers()`	List all registered writer names
`is_writer_available(name)`	Check if a writer is registered
`register_writer(name, cls)`	Register a new writer
`get_writer_info()`	Get metadata for all writers

See Writing Custom Writers for a complete guide.

Design Principles¶

Parser-agnostic IR. The IR does not leak backend-specific details. A Struct from libclang looks exactly the same as a Struct from any other backend. This means writers work identically regardless of which backend produced the IR.

Composable types. Type expressions are recursive dataclasses that mirror how C types actually compose. const char** is Pointer(Pointer(CType("char", ["const"]))) -- no string parsing needed.

Best-effort output. Writers silently skip declarations they cannot represent rather than raising exceptions. This makes the pipeline robust against headers with exotic constructs.

Self-registering plugins. Both backends and writers register themselves at import time. Adding a new backend or writer requires zero changes to headerkit's core code. Just implement the protocol, call register_backend() or register_writer(), and your plugin is available through get_backend() or get_writer().