Definition
Normally, we have two kinds of search: Offline-search, Online-search
For Offline-search, it commits a solution to an agent and then executes it.
For Online-search, it interleaves computation and action(search → execute → observe → search ...)
On-line search can build a map and find a goal state if there are no dead-ends.
In terms of Online-search, it works in following situation:
- Dynamic environments.
2.Time-constrained environments.
3.Unknown environment
Formulation
Actions(s): returns the set of legal actions in states.
The step cost function c(s, a, s′): returns the cost of going from s to s′ via action a. This cannot be used until the agent has tried a in s and knows that s′ is the outcome.
Goal-test(s): returns true if s is a goal state
Requirement
1.Observability: the current percept identifies the current state.
2.The agent might have access to some (admissible) heuristic function h(s).
3.No dead-end: the goal must be reachable from every state no algorithm can avoid dead-ends in all environments.
4.The agent cannot determine Result(s,a) except by being in s, executing a and observing the result.
Solution---LRTA*: Learning Real-Time A
1.Store a table result(s, a) recording the result of actions tried When a is first tried in s, record the resulting state.
2.Store and update a table H(s) recording the estimate of visited states When s is first met, initialise H(s) = h(s) for some admissible h(s) When s is left, update its cost estimate H(s) to mina C(s, a)
3.C (s, a) is the estimated cost of reaching the goal from state s via action a:
C(s, a) =
c(s, a, s′) + H(s′)------if result(s, a) is known to be s′
h(s)------------------otherwise
4.Take the best local move (real-time):In state s select the action a minimising the estimated cost C(s,a)
LRTA* updates heuristic information to escape local minima.