Skip to content

Architecture Overview

headerkit is organized around a three-layer pipeline: backends parse C/C++ headers, producing an IR (Intermediate Representation), which writers consume to generate output.

The Pipeline

graph TD
    A["C/C++ Source Code"] --> B
    B["Backend<br>(ParserBackend protocol)"] --> C
    C["IR<br>(Header, Declaration, TypeExpr)"] --> D
    D["Writer<br>(WriterBackend protocol)"] --> E
    E["Output String<br>(CFFI cdef, ctypes, Cython .pxd, ...)"]

    B -.- B1["e.g., LibclangBackend"]
    D -.- D1["e.g., CffiWriter, CtypesWriter,<br>CythonWriter, LuaWriter, ..."]

Each layer is independent. Backends know nothing about writers. Writers know nothing about backends. The IR is the contract between them.

Layer 1: Backends (Parsing)

A backend implements the ParserBackend protocol and converts C/C++ source code into IR.

from headerkit import ParserBackend
from headerkit.ir import Header

class ParserBackend(Protocol):
    def parse(
        self,
        code: str,
        filename: str,
        include_dirs: list[str] | None = None,
        extra_args: list[str] | None = None,
        *,
        use_default_includes: bool = True,
        recursive_includes: bool = True,
        max_depth: int = 10,
        project_prefixes: tuple[str, ...] | None = None,
    ) -> Header: ...

    @property
    def name(self) -> str: ...

    @property
    def supports_macros(self) -> bool: ...

    @property
    def supports_cpp(self) -> bool: ...

Built-in Backend: LibclangBackend

The LibclangBackend uses LLVM's libclang to parse headers. It provides:

  • Full C and C++ support (templates, namespaces, classes)
  • Preprocessor handling (#include, #define, #ifdef)
  • Source location tracking for error reporting
  • Recursive include processing for umbrella headers
from headerkit import get_backend

backend = get_backend("libclang")
header = backend.parse(code, "myheader.h")

Backend Registry

Backends register themselves using register_backend():

from headerkit.backends import register_backend

register_backend("mybackend", MyBackendClass, is_default=False)

Registry functions:

Function Description
get_backend(name=None) Get a backend instance (default if name is None)
list_backends() List all registered backend names
is_backend_available(name) Check if a backend is usable (real load test for libclang)
register_backend(name, cls) Register a new backend

See Writing Custom Backends for a complete guide.

Layer 2: IR (Intermediate Representation)

The IR is a tree of Python dataclasses rooted at Header. It is designed to be parser-agnostic: any backend that can parse C/C++ can produce the same IR.

Type Expressions

Type expressions (TypeExpr) represent C types as composable trees:

classDiagram
    class TypeExpr {
        <<protocol>>
    }
    class CType {
        name: str
        qualifiers: list[str]
    }
    class Pointer {
        pointee: TypeExpr
        qualifiers: list[str]
    }
    class Array {
        element_type: TypeExpr
        size: int | None
    }
    class FunctionPointer {
        return_type: TypeExpr
        parameters: list[Parameter]
        is_variadic: bool
    }

    TypeExpr <|-- CType
    TypeExpr <|-- Pointer
    TypeExpr <|-- Array
    TypeExpr <|-- FunctionPointer
    Pointer --> TypeExpr : pointee
    Array --> TypeExpr : element_type
    FunctionPointer --> TypeExpr : return_type
Class Represents Example
CType Base type with qualifiers int, const char, unsigned long
Pointer Pointer to another type int*, const char*, void**
Array Fixed or flexible array int[10], char[]
FunctionPointer Function pointer void (*)(int, char*)

Types compose naturally:

from headerkit import CType, Pointer, Array

# const char*
const_char_ptr = Pointer(CType("char", ["const"]))

# int**
int_ptr_ptr = Pointer(Pointer(CType("int")))

# const char*[]
string_array = Array(Pointer(CType("char", ["const"])))

Declarations

Declarations (Declaration) represent top-level C/C++ constructs:

classDiagram
    class Declaration {
        <<protocol>>
        name: str | None
        location: SourceLocation | None
    }
    class Struct {
        fields: list[Field]
        is_union: bool
        is_typedef: bool
    }
    class Enum {
        values: list[EnumValue]
        is_typedef: bool
    }
    class Function {
        return_type: TypeExpr
        parameters: list[Parameter]
        is_variadic: bool
    }
    class Typedef {
        underlying_type: TypeExpr
    }
    class Variable {
        type: TypeExpr
    }
    class Constant {
        value: int | str | None
        is_macro: bool
    }

    Declaration <|-- Struct
    Declaration <|-- Enum
    Declaration <|-- Function
    Declaration <|-- Typedef
    Declaration <|-- Variable
    Declaration <|-- Constant
Class Represents
Struct Structs, unions, and C++ classes
Enum Enumerations with named constants
Function Function prototypes
Typedef Type aliases
Variable Global/extern variables
Constant #define macros and const values

The Header Container

Header is the top-level container returned by all backends:

from headerkit.ir import Header

# Header fields:
#   path: str                        -- original file path
#   declarations: list[Declaration]  -- all extracted declarations
#   included_headers: set[str]       -- basenames of included headers

Layer 3: Writers (Output)

A writer implements the WriterBackend protocol and converts IR into a string output:

from headerkit.writers import WriterBackend
from headerkit.ir import Header

class WriterBackend(Protocol):
    def write(self, header: Header) -> str: ...

    @property
    def name(self) -> str: ...

    @property
    def format_description(self) -> str: ...

Writer-specific options (e.g., exclude_patterns for CFFI, indent for JSON) are constructor parameters on the concrete class, not part of the write() method signature.

Built-in Writers

Writer Registry Name Output Constructor Options
CffiWriter cffi (default) CFFI cdef strings exclude_patterns: list[str] \| None
CtypesWriter ctypes Python ctypes binding modules lib_name: str
CythonWriter cython Cython .pxd declarations --
DiffWriter diff API compatibility diff reports (JSON or Markdown) baseline: Header \| None, format: str
JsonWriter json JSON serialization of IR indent: int \| None
LuaWriter lua LuaJIT FFI bindings --
PromptWriter prompt Token-optimized output for LLM context verbosity: str

Writer Registry

Writers use the same registry pattern as backends:

from headerkit.writers import register_writer

register_writer("mywriter", MyWriterClass, description="My custom output format")

Registry functions:

Function Description
get_writer(name=None, **kwargs) Get a writer instance; kwargs forwarded to constructor
list_writers() List all registered writer names
is_writer_available(name) Check if a writer is registered
register_writer(name, cls) Register a new writer
get_writer_info() Get metadata for all writers

See Writing Custom Writers for a complete guide.

Design Principles

Parser-agnostic IR. The IR does not leak backend-specific details. A Struct from libclang looks exactly the same as a Struct from any other backend. This means writers work identically regardless of which backend produced the IR.

Composable types. Type expressions are recursive dataclasses that mirror how C types actually compose. const char** is Pointer(Pointer(CType("char", ["const"]))) -- no string parsing needed.

Best-effort output. Writers silently skip declarations they cannot represent rather than raising exceptions. This makes the pipeline robust against headers with exotic constructs.

Self-registering plugins. Both backends and writers register themselves at import time. Adding a new backend or writer requires zero changes to headerkit's core code. Just implement the protocol, call register_backend() or register_writer(), and your plugin is available through get_backend() or get_writer().