Skip to content
Transport fault repair

Transport fault repair

Use case

China Mobile Zhejiang and ZTT built a system that, on its own, finds and fixes faults in their transport network, the backbone links that carry traffic between sites. They wrote it up in the GSMA Foundry use-case library

The eval rebuilds that loop as eight graded tasks. When a run starts, a hidden fault knocks a router link down, and the agent is handed the alarm burst it set off plus a live command line into the routers. The headline score, called pass^3, rewards bringing the path back on three runs in a row

Agent

Each of the six capabilities below is graded by a task in the suite:

Environment

Simulated components
FRRmetro router chain data-networkreachability host OSPFadjacency · routes fault recordseeded OSPF misconfig alarm burstX.733, seeded ping · traceroutereachability probes
HW sim Software Data Observability
  • To read the fault, FRR, the open-source router software, runs the OSPF routing protocol across the chain of routers and emits a burst of standard (X.733) alarms the agent triages; a hidden fault record seeds the same misconfiguration every trial
  • To act on it, vtysh, the routers’ command line, lets the agent inspect and rewrite config on the live device while the rest of the chain stays up
  • To prove the result, the grader re-reads the links between routers (their adjacency), their routes, and their running config, and runs an independent ping

Suite

  • What just happened? Collapse the alarm burst into distinct events, then decide whether it’s worth a work order (t1 and t2)
  • Where’s the fault? Pin it to one device or link (t3)
  • What’s misconfigured? Name the wrong parameter and its value (t4)
  • What’s the repair? Write the ordered plan, then check a draft of it for gaps (t5 and t6)
  • Does the fix hold? Apply it on the live router (t7)
  • Is the path back? Confirm reachability end to end (t8)
IDTaskWhat it testsGrader
t1Collapse the alarm burst into distinct eventsalarm correlationJaccard over the dedup tuple, partial credit
t2Decide whether the burst needs a work ordertriage against live stateverdict vs live state
t3Localise the fault to a device or linkfault localisationentity match + live adjacency cross-check
t4Diagnose the misconfigured parameter and valueconfig diagnosisparameter and current-value match
t5Write the ordered repair planrepair planningstep coverage + ordering, partial credit
t6Judge whether a draft plan is completeplan-completeness judgmentverdict plus the named omitted step
t7Apply the repair via CLIlive repairlive adjacency, routes, and config
t8Verify reachability after repairreachability verificationverdict vs scorer re-probe

Tools

The only tools are for the steps that touch the live network. Collapsing the alarm burst, planning the repair, judging a draft, those are reasoning steps on what the agent is handed, and need no tool. Localising, diagnosing, and applying the fix (t3, t4, t7) mean reading and rewriting router config, so the agent gets vtysh, the FRR command line, and a general shell for what vtysh doesn’t cover. Verifying (t8) means checking the path end to end, so a reachability probe and a traceroute

Transport O&M agent
4 tools · the agent's access in the sandbox
vtysh
exec_shell
ping_pair
traceroute
Four tools carry one incident from alarm to verified repair. The grader never reads these calls; it re-reads the live network state on its own. The two highlighted tools name that state: vtysh exposes the adjacency, routes, and config it re-reads, and ping_pair the reachability it re-probes.

Notes

Impractical taskProblemPractical alternative
Break a live metro backbone to test the repairInjecting an OSPF fault on a production network drops customer trafficRun a six-container FRR/OSPF chain in Docker and seed the fault there
Wait for a real config drift to take an adjacency downReal misconfigurations are rare, slow, and unrepeatableInject one fault from a hidden record at boot, the same fault every trial
Triage against a vendor alarm and CLI platformProprietary, vendor-locked, and different per operatorEmit a seeded X.733 alarm burst and expose the routers through FRR’s vtysh CLI
InconsistencyProblemBetter
Let the agent in before OSPF has convergedA still-flapping adjacency makes the same fault look different each runConverge OSPF and gate on the fault actually manifesting before the agent’s first turn
Score the repair from the agent’s reportA narrated fix can’t be trusted, and the path may still be downRe-read live adjacency, routes, and running-config after the agent submits
Judge reachability from a single probeOne ping can pass or fail on incidental container timingRe-probe with an independent scorer and judge the verdict against it