offlax.cql#

Functions

sample(trajectories, rng, batch_size)

Classes

CQLDiscrete(rng, actor, critic, state_dims, …)

Implementation of Conservative Q Learning (CQL) algorithm.

class offlax.cql.CQLDiscrete(rng, actor, critic, state_dims, action_dims, gamma, tau)[source]#

Bases: object

Implementation of Conservative Q Learning (CQL) algorithm.

Paper: https://arxiv.org/abs/2006.04779

Parameters
  • rng (jax.random.PRNGKey) –

  • actor (ActorDiscrete) –

  • critic (Critic) –

  • state_dims (List[int]) –

  • action_dims (int) –

  • gamma (float) –

  • tau (float) –

get_search_metric()[source]#

Returns the search metric for hyperparameter tuning

Returns

(objective=[‘min’, ‘max’], objective metric)

Return type

Tuple[str, str]