cubing

Methodology

DAVI trains a value network against a slow-moving target. BWAS uses it at inference.

DAVI training

Bellman one-step lookahead with a frozen target.

  1. V_target <- copy V_theta
  2. for each iteration:
  3. sample batch of scrambles s_1, ..., s_B
  4. for each s in batch:
  5. if s is goal:
  6. y(s) <- 0
  7. else:
  8. children <- expand_all(s)
  9. y(s) <- 1 + min over children: V_target(child)
  10. update V_theta toward {(s_i, y(s_i))}
  11. every K iterations:
  12. V_target <- copy V_theta

BWAS inference

Batched weighted A*. One heuristic call per pop.

  1. open <- priority queue keyed by f
  2. g[start] <- 0
  3. push(open, start, f = lambda * 0 + h(start))
  4. while open is not empty:
  5. batch <- pop up to N lowest-f nodes from open
  6. if any node in batch is goal:
  7. return reconstruct(parents, goal)
  8. children <- expand_all(batch)
  9. h_children <- heuristic(children) # one batched call
  10. for each (parent, action) -> child in children:
  11. g_new <- g[parent] + 1
  12. if g_new < g[child]:
  13. g[child] <- g_new
  14. parents[child] <- (parent, action)
  15. push(open, child, f = lambda * g_new + h_children[child])
  16. return failure