Papers
All paper repos include source, data, and build tooling. Repos will be made public on submission.
Representation Over Retrieval
How information presentation affects AI performance on structured engineering tasks. The anchor paper for our benchmark findings.
Beyond the Mean
Per-task analysis reveals hidden structure in tool-augmented LLM evaluation. Why aggregate scores hide the interesting signal.
sysml-bench
Evaluating tool-augmented LLMs on SysML v2 model comprehension. Methodology, tasks, scoring, replication.
Grammar Conversion at Scale
From specification notation to parser generators for SysML v2. How kebnf bridges the gap between OMG specs and working parsers.
Context Asymmetry Is a Representation Problem
Disposition graphs for AI-augmented collaborative development. The formal model behind synthesist's stakeholder tracking.
Anti-Vacuity Enforcement by Construction
A test framework pattern for LLM agents. Currently under double-anonymous review.
AI-Assisted Systems Engineering with SysML v2
Applying the benchmark methodology to defense ground vehicle systems modeling.