Projects
We are exploring a methodology for making formal engineering languages legible to AI. Pick a language, build a parser, build composable CLI tooling on top of it, build a benchmark to measure what the tooling actually does for AI performance. Each iteration produces a tool others can use, a benchmark others can extend, and observations about where AI succeeds and fails on that language.
| Metric | Value |
|---|---|
| CLI commands | 14 |
| MCP tools | 10 |
| Tests | 123 |
| Token reduction | 94% avg |
| Language | Rust |
| License | MIT |
Rust CLI tool with an MCP server built in. Indexes .sysml
repositories into a persistent knowledge graph and exposes it through 14 CLI
commands and 10 MCP tools. Single binary, dual mode: run it as a CLI for
scripting and benchmarking, or as an MCP server for editor and agent integration.
9-signal hybrid index: keyword scoring (8 signals including exact match, prefix, containment, vocabulary expansion, relationship adjacency) plus fastembed all-MiniLM-L6-v2 vector search (384-dim, HNSW) and 27 SysML v2 structural relationship types. Achieves 94% average token reduction vs raw file injection.
Install
cargo install --path crates/sysml-cli Key commands
| Command | Description |
|---|---|
sysml search | Hybrid keyword + vector search across indexed models |
sysml trace | Follow structural relationships (specialization, usage, allocation) |
sysml render | Pre-render element views for LLM consumption |
sysml check | Completeness checking against SysML v2 structural rules |
sysml inspect | Detailed element metadata and relationship graph |
| Metric | Value |
|---|---|
| Tasks | 132 |
| Models | 4 |
| Conditions | 40+ |
| Observations | 14 |
| Language | Python |
| License | MIT |
Evaluation harness measuring how CLI tool configurations affect LLM accuracy on structured systems engineering tasks. 132 tasks across 8 categories: discovery, reasoning, explanation, layer, boundary, vector-sensitive, structural trace, and corpus scaling.
Per-field structured scoring (Bool, Float, Str, ListStr F1 with threshold). Corpus: Eve Online Mining Frigate SysML v2 model, 19 files, 798 elements, 1,515 relationships. Scaling corpus: 95 files.
Install
git clone https://gitlab.com/nomograph/sysml-bench.git && cd sysml-bench && uv sync Task categories
| Category | Tasks | Tests |
|---|---|---|
| Discovery | Attribute lookup, element enumeration | Can the model find specific facts? |
| Reasoning | Multi-hop inference, constraint satisfaction | Can the model reason across relationships? |
| Explanation | Summarize structure, describe behavior | Can the model explain what a model element does? |
| Structural trace | Follow allocation, specialization chains | Can the model traverse the model graph? |
| Corpus scaling | Same tasks on 5× larger corpus | Does performance hold at scale? |
| Metric | Value |
|---|---|
| Tests | 192 |
| External coverage | 89% |
| Bindings | 6 languages |
| Language | C (tree-sitter) |
| License | MIT |
Tree-sitter grammar for SysML v2. The parsing foundation for all Nomograph tooling. Built by curating a corpus of real-world SysML v2 models and iterating the grammar against it: run the parser, find failures, fix the grammar, repeat. 89% coverage on external files (models we did not author). 192 tests passing.
Provides incremental parsing with bindings for Rust, C, Node.js, Python, Go, and Swift. The sysml CLI and benchmark harness both depend on it. SysML v2 was adopted by OMG in June 2025; this grammar tracks the current specification.
Bindings
| Language | Package | Status |
|---|---|---|
| Rust | tree-sitter-sysml | Primary (used by sysml CLI) |
| C | Header + source | Generated by tree-sitter |
| Node.js | tree-sitter-sysml | Available |
| Python | tree-sitter-sysml | Available |
| Go | tree-sitter-sysml | Available |
| Swift | TreeSitterSysml | Available |
Candidate languages:
| Domain | Language |
|---|---|
| Embedded | AADL |
| Safety | OSCAL, GSN |
| Electronics | KiCad, SV |
| 3D/CAD | OpenSCAD |
| Supply chain | CycloneDX |
We are genuinely curious about the return to composable CLI tools for LLM interaction. There is something appealing about the Unix philosophy applied to AI tooling: small programs that do one thing well, piped together, with text as the universal interface. Our benchmark data suggests this intuition has substance. CLI tool-based search outperformed both MCP transport and RAG on discovery tasks, using 21% fewer tokens. The composable approach seems to produce less overhead and more predictable behavior.
The methodology is designed to extend to any formal language with a grammar. AADL for embedded systems, OSCAL for security compliance, KiCad for electronics, OpenSCAD for parametric 3D. The source code of the physical world, made legible to AI through the same tooling pattern: parse it, index it, expose it through composable CLI commands, measure what happens.