News

In this scenario, agents collaborate to estimate the value function of a target team policy. The proposed algorithm combines off-policy learning, eligibility traces, and linear function approximation.