Nomograph Labs
Composable tooling for AI to interact with engineering models. Benchmarks to measure AI performance. Exploring the results.
AADL
OSCAL
KiCad
···
···
graph · render
inspect · plan
···
We built a benchmark harness, ran 132 tasks across 4 models and 40+ experimental conditions on a SysML v2 corpus, and are methodically working through what the results mean for tool-augmented AI on engineering models.
The consistent finding so far: how you present information to the model matters more than how you retrieve it. Retrieval interventions (vector search, graph traversal, planning tools) produced null results. Representation interventions (pre-rendered views, tool selection guidance) produced large, replicable effects — and at a fraction of the cost. A schema ablation identified the specific mechanism behind one apparent tool penalty. The methodology, data, and all tooling are open source.
One sentence of tool selection guidance eliminated a 13-point accuracy penalty from over-tooling.
Pre-rendered model views scored 0.893 vs 0.558 for agent-assembled context (d=1.01, N=10). 4× cheaper.
Exploratory study, single corpus, N=3–10 replications. Full methodology, statistical context, and threats at nomograph.ai/results.
sysml
CLI tool for SysML v2 with MCP server built in. Structural retrieval, graph traversal, and completeness checking for AI on systems models.
sysml-bench
Benchmark harness for AI on SysML v2 tasks. Reproducible evaluation across models, tool configurations, and corpus scales.
tree-sitter-sysml
Tree-sitter grammar for SysML v2. The parsing foundation for all Nomograph tooling.
There are more formal languages than one group can cover. If you work with engineering models and are curious about how AI performs on them, or if you just find this kind of measurement interesting, we'd like to talk. Everything is MIT-licensed and on GitLab.
[email protected] · gitlab.com/nomograph →