V*
Reverse BFS labels every reachable state with its optimal distance, 0–14. O(1) lookup.
DAVI as the heuristic, BWAS at inference, V* as ground truth.
Reverse BFS labels every reachable state with its optimal distance, 0–14. O(1) lookup.
Small residual MLP trained against bootstrapped V* estimates. Scalar cost-to-go.
Batched weighted A*: pops lowest-f nodes, expands them in one heuristic pass. f = λ·g + h.