Writing Custom Backends¶
headerkit's backend system is pluggable. You can write your own parser backend to support alternative parsing strategies (tree-sitter, pycparser, hand-written parsers) and register it alongside the built-in LibclangBackend.
The ParserBackend Protocol¶
Every backend must implement the ParserBackend protocol:
from headerkit.ir import Header, ParserBackend
class ParserBackend(Protocol):
def parse(
self,
code: str,
filename: str,
include_dirs: list[str] | None = None,
extra_args: list[str] | None = None,
*,
use_default_includes: bool = True,
recursive_includes: bool = True,
max_depth: int = 10,
project_prefixes: tuple[str, ...] | None = None,
) -> Header: ...
@property
def name(self) -> str: ...
@property
def supports_macros(self) -> bool: ...
@property
def supports_cpp(self) -> bool: ...
Method and Property Details¶
parse(code, filename, ...) -- Parse C/C++ source code and return a Header containing all extracted declarations. The code parameter is the source text (not a file path). The filename is used for error messages and #line directives; it does not need to exist on disk.
name -- A human-readable name for the backend (e.g., "tree-sitter"). This is the string users pass to get_backend().
supports_macros -- Whether this backend can extract #define constants as Constant declarations.
supports_cpp -- Whether this backend can parse C++ code (classes, templates, namespaces).
Registering a Backend¶
Use register_backend() to add your backend to the global registry:
from headerkit.backends import register_backend
register_backend("mybackend", MyBackend, is_default=False)
Parameters:
name-- The lookup key forget_backend(name)backend_class-- The class implementingParserBackendis_default-- IfTrue, this becomes the default backend returned byget_backend()with no arguments
Registration timing
Registration must happen at import time (module level), not inside a function. The registry is populated lazily when get_backend() or list_backends() is first called.
Example: Tree-sitter Backend Skeleton¶
Here is a skeleton for a backend that uses tree-sitter to parse C headers:
"""Tree-sitter based parser backend for headerkit."""
from __future__ import annotations
from headerkit.backends import register_backend
from headerkit.ir import (
CType,
Enum,
EnumValue,
Field,
Function,
Header,
Parameter,
Pointer,
SourceLocation,
Struct,
Typedef,
)
class TreeSitterBackend:
"""Parser backend using tree-sitter-c."""
@property
def name(self) -> str:
return "tree-sitter"
@property
def supports_macros(self) -> bool:
return False # Would need separate macro handling
@property
def supports_cpp(self) -> bool:
return False # tree-sitter-c handles C only
def parse(
self,
code: str,
filename: str,
include_dirs: list[str] | None = None,
extra_args: list[str] | None = None,
*,
use_default_includes: bool = True,
recursive_includes: bool = True,
max_depth: int = 10,
project_prefixes: tuple[str, ...] | None = None,
) -> Header:
import tree_sitter_c as tsc
from tree_sitter import Language, Parser
parser = Parser(Language(tsc.language()))
tree = parser.parse(code.encode())
declarations = []
for node in tree.root_node.children:
decl = self._convert_node(node, filename)
if decl is not None:
declarations.append(decl)
return Header(path=filename, declarations=declarations)
def _convert_node(self, node, filename):
"""Convert a tree-sitter node to an IR declaration.
This is where the bulk of the work goes: mapping tree-sitter's
concrete syntax tree nodes to headerkit IR types.
"""
# Implementation would handle:
# - "struct_specifier" -> Struct
# - "enum_specifier" -> Enum
# - "function_definition" / "declaration" -> Function
# - "type_definition" -> Typedef
# Each node type needs its own conversion logic.
return None
# Self-register at import time
try:
import tree_sitter_c # noqa: F401
register_backend("tree-sitter", TreeSitterBackend)
except ImportError:
pass # tree-sitter-c not installed; backend not available
Using Your Backend¶
Once registered, your backend is available through the standard API:
from headerkit import get_backend, list_backends
# List all available backends
print(list_backends()) # ['libclang', 'tree-sitter']
# Use your backend explicitly
backend = get_backend("tree-sitter")
header = backend.parse(code, "example.h")
Producing Correct IR¶
When implementing a backend, pay attention to these IR conventions:
Typedefs vs. Tagged Types¶
When C code uses typedef struct { ... } Name;, the IR should produce a Struct with is_typedef=True. This tells writers to emit the typedef form rather than a bare struct declaration.
Anonymous Types¶
Set name=None for truly anonymous structs, enums, or unions. The built-in writers will skip anonymous types that cannot be referenced.
Source Locations¶
Populate SourceLocation on declarations when possible. This enables filtering by file (to exclude system headers) and better error messages:
Type Composition¶
Build types from the inside out. For const char **:
from headerkit import CType, Pointer
const_char = CType("char", ["const"])
const_char_ptr = Pointer(const_char)
const_char_ptr_ptr = Pointer(const_char_ptr)
Testing Your Backend¶
Test your backend by comparing its output against the built-in LibclangBackend for the same input:
from headerkit import get_backend
code = """
struct Point {
int x;
int y;
};
int add(int a, int b);
"""
libclang = get_backend("libclang")
custom = get_backend("my-backend")
expected = libclang.parse(code, "test.h")
actual = custom.parse(code, "test.h")
# Compare declaration counts and types
assert len(actual.declarations) == len(expected.declarations)
for exp, act in zip(expected.declarations, actual.declarations):
assert type(exp) == type(act)
assert exp.name == act.name