Instruments that help people leverage AI

Context engineering. Retrieval workflows. Reproducible benchmarking. Spec management for the humans and agents doing the work.

domains

MBSE + SDLC

structured engineering tasks

method

per-task scoring

effect sizes, not leaderboards

tools

open source

Rust + Python

outputs

papers + MRs

preprints, upstream contributions

O12

One sentence of tool selection guidance eliminated a 13-point accuracy penalty from over-tooling.

Pre-rendered model views scored 0.893 vs 0.558 for agent-assembled context. d=1.01, N=10. 4x cheaper.

Exploratory study, single corpus, N=3-10 replications. Full methodology and threats →

What we build

synthesist

spec management

Tracks work through orient-plan-agree-execute-reflect-report. Task DAGs, propagation, stakeholder dispositions. Rust.

rune

skill registry

Syncs reusable AI agent instructions from git-based registries. Bidirectional, multi-registry, drift detection. Rust.

kit

tool verification

Manages developer toolchains from git registries. Checksum and signature verification. Generates mise config. Rust.

muxr

session manager

Tmux sessions organized into verticals and remotes. Save, restore, server isolation. One keybind to switch. Rust.

sysml

CLI + MCP server

Structural retrieval, graph traversal, and completeness checking for SysML v2 models. Rust. 14 commands, 10 MCP tools.

sysml-bench

benchmark harness

Reproducible evaluation of tool-augmented LLMs on structured engineering tasks. Python.

lever

reference architecture

Four primitives for LLM-correct codebases. Derived obligations, prescriptive failure, bundled enforcement.

kebnf

grammar converter

Converts OMG KeBNF specs to ANTLR4 and tree-sitter. Bridges OMG specifications and working parsers. Rust.

tree-sitter-sysml

parser

Tree-sitter grammar for SysML v2. 6 language bindings. The parsing foundation for the MBSE toolchain.

How this started

This work started as an academic exploration of how AI interacts with structured engineering artifacts. We built tools, ran benchmarks, wrote papers. Along the way, we found alignment with GitLab's Knowledge Graph team, who are solving related problems in context engineering and retrieval at production scale. We've been contributing findings on prescriptive failure patterns and tool description effectiveness into their eval methodology.

Everything is MIT-licensed and on GitLab. If you work with engineering models and are curious about how AI performs on them, we'd like to talk.

About us →