News
In this scenario, agents collaborate to estimate the value function of a target team policy. The proposed algorithm combines off-policy learning, eligibility traces, and linear function approximation.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results