Transport fault repair
Use case
China Mobile Zhejiang and ZTT built a system that, on its own, finds and fixes faults in their transport network, the backbone links that carry traffic between sites. They wrote it up in the GSMA Foundry use-case library
The eval rebuilds that loop as eight graded tasks. When a run starts, a hidden fault knocks a router link down, and the agent is handed the alarm burst it set off plus a live command line into the routers. The headline score, called pass^3, rewards bringing the path back on three runs in a row
Agent
Each of the six capabilities below is graded by a task in the suite:
Environment
- To read the fault, FRR, the open-source router software, runs the OSPF routing protocol across the chain of routers and emits a burst of standard (X.733) alarms the agent triages; a hidden fault record seeds the same misconfiguration every trial
- To act on it, vtysh, the routers’ command line, lets the agent inspect and rewrite config on the live device while the rest of the chain stays up
- To prove the result, the grader re-reads the links between routers (their adjacency), their routes, and their running config, and runs an independent ping
Suite
- What just happened? Collapse the alarm burst into distinct events, then decide whether it’s worth a work order (t1 and t2)
- Where’s the fault? Pin it to one device or link (t3)
- What’s misconfigured? Name the wrong parameter and its value (t4)
- What’s the repair? Write the ordered plan, then check a draft of it for gaps (t5 and t6)
- Does the fix hold? Apply it on the live router (t7)
- Is the path back? Confirm reachability end to end (t8)
| ID | Task | What it tests | Grader |
|---|---|---|---|
| t1 | Collapse the alarm burst into distinct events | alarm correlation | Jaccard over the dedup tuple, partial credit |
| t2 | Decide whether the burst needs a work order | triage against live state | verdict vs live state |
| t3 | Localise the fault to a device or link | fault localisation | entity match + live adjacency cross-check |
| t4 | Diagnose the misconfigured parameter and value | config diagnosis | parameter and current-value match |
| t5 | Write the ordered repair plan | repair planning | step coverage + ordering, partial credit |
| t6 | Judge whether a draft plan is complete | plan-completeness judgment | verdict plus the named omitted step |
| t7 | Apply the repair via CLI | live repair | live adjacency, routes, and config |
| t8 | Verify reachability after repair | reachability verification | verdict vs scorer re-probe |
Tools
The only tools are for the steps that touch the live network. Collapsing the alarm burst, planning the repair, judging a draft, those are reasoning steps on what the agent is handed, and need no tool. Localising, diagnosing, and applying the fix (t3, t4, t7) mean reading and rewriting router config, so the agent gets vtysh, the FRR command line, and a general shell for what vtysh doesn’t cover. Verifying (t8) means checking the path end to end, so a reachability probe and a traceroute
Notes
| Impractical task | Problem | Practical alternative |
|---|---|---|
| Break a live metro backbone to test the repair | Injecting an OSPF fault on a production network drops customer traffic | Run a six-container FRR/OSPF chain in Docker and seed the fault there |
| Wait for a real config drift to take an adjacency down | Real misconfigurations are rare, slow, and unrepeatable | Inject one fault from a hidden record at boot, the same fault every trial |
| Triage against a vendor alarm and CLI platform | Proprietary, vendor-locked, and different per operator | Emit a seeded X.733 alarm burst and expose the routers through FRR’s vtysh CLI |
| Inconsistency | Problem | Better |
|---|---|---|
| Let the agent in before OSPF has converged | A still-flapping adjacency makes the same fault look different each run | Converge OSPF and gate on the fault actually manifesting before the agent’s first turn |
| Score the repair from the agent’s report | A narrated fix can’t be trusted, and the path may still be down | Re-read live adjacency, routes, and running-config after the agent submits |
| Judge reachability from a single probe | One ping can pass or fail on incidental container timing | Re-probe with an independent scorer and judge the verdict against it |