Nomograph Labs

Composable tooling for AI to interact with engineering models. Benchmarks to measure AI performance. Exploring the results.

What we're finding

We ran 132 benchmark tasks across 4 models and 40+ experimental conditions on a SysML v2 corpus, trying to understand what tool configurations help LLMs comprehend structured engineering models.

The early signal that keeps showing up: how you present information to the model seems to matter more than how you retrieve it. Retrieval interventions (vector search, graph traversal, planning tools) produced null results on our benchmark. Representation interventions (pre-rendered views, tool selection guidance, step-by-step retrieval in the right form) produced large effects. We're still working through what that means.

O12

One sentence of tool selection guidance eliminated a 13-point accuracy penalty from over-tooling.

Pre-rendered model views scored 0.873 vs 0.490 for agent-assembled context. 38-point gap.

Exploratory study, single corpus, 14 observations. Full methodology and results at nomograph.ai/results.

Projects

nomograph/sysml

sysml

CLI tool for SysML v2 with MCP server built in. Structural retrieval, graph traversal, and completeness checking for AI on systems models.

14 commands · 10 MCP tools · 123 tests

nomograph/sysml-bench

sysml-bench

Benchmark harness for AI on SysML v2 tasks. Reproducible evaluation across models, tool configurations, and corpus scales.

132 tasks · 4 models · 14 observations

nomograph/tree-sitter-sysml

tree-sitter-sysml

Tree-sitter grammar for SysML v2. The parsing foundation for all Nomograph tooling.

192 tests · 89% external file coverage

Interested?

There are more formal languages than one group can cover. If you work with engineering models and are curious about how AI performs on them, or if you just find this kind of measurement interesting, we'd like to talk. Everything is MIT-licensed and on GitLab.

gitlab.com/nomograph →