Nomograph Labs
Composable tooling for AI to interact with engineering models. Benchmarks to measure AI performance. Exploring the results.
What we're finding
We ran 132 benchmark tasks across 4 models and 40+ experimental conditions on a SysML v2 corpus, trying to understand what tool configurations help LLMs comprehend structured engineering models.
The early signal that keeps showing up: how you present information to the model seems to matter more than how you retrieve it. Retrieval interventions (vector search, graph traversal, planning tools) produced null results on our benchmark. Representation interventions (pre-rendered views, tool selection guidance, step-by-step retrieval in the right form) produced large effects. We're still working through what that means.
One sentence of tool selection guidance eliminated a 13-point accuracy penalty from over-tooling.
Pre-rendered model views scored 0.873 vs 0.490 for agent-assembled context. 38-point gap.
Exploratory study, single corpus, 14 observations. Full methodology and results at nomograph.ai/results.
Projects
sysml
CLI tool for SysML v2 with MCP server built in. Structural retrieval, graph traversal, and completeness checking for AI on systems models.
sysml-bench
Benchmark harness for AI on SysML v2 tasks. Reproducible evaluation across models, tool configurations, and corpus scales.
tree-sitter-sysml
Tree-sitter grammar for SysML v2. The parsing foundation for all Nomograph tooling.
Interested?
There are more formal languages than one group can cover. If you work with engineering models and are curious about how AI performs on them, or if you just find this kind of measurement interesting, we'd like to talk. Everything is MIT-licensed and on GitLab.
gitlab.com/nomograph →