Assistance game framework

Robot and human both act in one environment, Only human has access to its real utility function.

\(\mathcal{M} = \langle S, \{A_H, A_R\}, \{\Omega_H, \Omega_R\}, \{O_H, O_R\}, T, P_S, \gamma, \langle \Theta, r_\theta, P_\theta \rangle \rangle\)

S: states, A: actions, \(\Omega\): observations, \(O_{H,R} : S \to \Delta(\Omega_{H,R})\) defines observations, T defines transition probabilities, \(P_S\) is the initial state. \(P_\theta \in \Delta(\Theta)\) is distribution over human preference parameters.

Policies can depend on history: \(\pi_R(a_R | a_t^R, \tau^R_{t-1})\) (where \(\tau\) is observation history - of \(O_R, A_H, A_R\))

In full generality, agent can have a policy-conditioned belief \(B:\Pi_R \to \Delta(\Pi_H)\) - determines how human responds to robot's choice of policy.

What's Boltzmann rationality? Modeling interaction via the principle of maximum causal entropy

Assistance problem: assistance game with a fixed human policy \(\pi_H\) - obtained by a human model.