Papers

All paper repos include source, data, and build tooling. Repos will be made public on submission.

arXiv cs.SE Draft

How information presentation affects AI performance on structured engineering tasks. The anchor paper for our benchmark findings.

arXiv cs.SE Draft

Per-task analysis reveals hidden structure in tool-augmented LLM evaluation. Why aggregate scores hide the interesting signal.

arXiv cs.SE Draft

Evaluating tool-augmented LLMs on SysML v2 model comprehension. Methodology, tasks, scoring, replication.

arXiv cs.SE Draft

From specification notation to parser generators for SysML v2. How kebnf bridges the gap between OMG specs and working parsers.

arXiv cs.SE Draft

Disposition graphs for AI-augmented collaborative development. The formal model behind synthesist's stakeholder tracking.

ICSE NIER 2026 Under review

A test framework pattern for LLM agents. Currently under double-anonymous review.

GVSETS 2026 Submitted

Applying the benchmark methodology to defense ground vehicle systems modeling.