Instruments that help people leverage AI

Context engineering. Retrieval workflows. Reproducible benchmarking. Spec management for the humans and agents doing the work.

DOMAIN MBSE SysML v2 toolchain + benchmark SDLC graphs GitLab Knowledge Graph eval code repair feedback signal primitives EVALUATION APPARATUS define tasks build tools vary conditions measure replicate structured engineering queries parsers, CLI, MCP servers models x tools x representations per-field scoring, statistics N trials, ablations, baselines open source | open data | open licensed designed for independent replication designed to foster community exploration where is leverage? DISCOVERIES representation > retrieval examples essential in tools precision > brevity OUTPUTS papers preprint, submission MRs upstream open tools CLI, bench, data community open method discoveries inform what to explore next
benchmarks
2
SysML v2 + GitLab KG
tasks evaluated
194
132 SysML + 62 SDLC
experimental conditions
50+
across 5 models
papers
7
5 preprints + 2 venues
The consistent finding

How you present information to the model matters more than how you retrieve it.

Retrieval interventions (vector search, graph traversal, planning tools) produced null results. Representation interventions (pre-rendered views, tool selection guidance) produced large, replicable effects at a fraction of the cost.

Exploratory study, single corpus, N=3-10 replications. Full methodology and threats →

What we build

How this started

This work started as an academic exploration of how AI interacts with structured engineering artifacts. We built tools, ran benchmarks, wrote papers. Along the way, we found alignment with GitLab's Knowledge Graph team, who are solving related problems in context engineering and retrieval at production scale. We've been contributing findings on prescriptive failure patterns and tool description effectiveness into their eval methodology.

Everything is MIT-licensed and on GitLab. If you work with engineering models and are curious about how AI performs on them, we'd like to talk.